@@ -58,7 +59,7 @@
- 感谢CopyRight@[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)、[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)、[PaddleGAN](https://github.com/PaddlePaddle/PaddleGAN)、[AnimeGAN](https://github.com/TachibanaYoshino/AnimeGANv2)、[openpose](https://github.com/CMU-Perceptual-Computing-Lab/openpose)、[PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg)、[Zhengxia Zou](https://github.com/jiupinjia/SkyAR)、[PaddleClas](https://github.com/PaddlePaddle/PaddleClas) 提供相关预训练模型,训练能力开放,欢迎体验。
-### **文本类(129个)**
+### **[文本类(130个)](./modules/README_ch.md#文本)**
- 包括中文分词、词性标注与命名实体识别、句法分析、AI写诗/对联/情话/藏头诗、中文的评论情感分析、中文色情文本审核等
![](./docs/imgs/Readme_Related/Text_all.gif)
@@ -67,9 +68,37 @@
- 感谢CopyRight@[ERNIE](https://github.com/PaddlePaddle/ERNIE)、[LAC](https://github.com/baidu/LAC)、[DDParser](https://github.com/baidu/DDParser)提供相关预训练模型,训练能力开放,欢迎体验。
-### **语音类(3个)**
+### **[语音类(15个)](./modules/README_ch.md#语音)**
+- ASR语音识别算法,多种算法可选
+- 语音识别效果如下:
+
+
+
+
+ Input Audio |
+ Recognition Result |
+
+
+
+
+
+
+ ![](./docs/imgs/Readme_Related/audio_icon.png)
+ |
+ I knocked at the door on the ancient side of the building. |
+
+
+
+
+ ![](./docs/imgs/Readme_Related/audio_icon.png)
+ |
+ 我认为跑步最重要的就是给我带来了身体健康。 |
+
+
+
+
+
- TTS语音合成算法,多种算法可选
-- 感谢CopyRight@[Parakeet](https://github.com/PaddlePaddle/Parakeet)提供预训练模型,训练能力开放,欢迎体验。
- 输入:`Life was like a box of chocolates, you never know what you're gonna get.`
- 合成效果如下:
@@ -100,7 +129,9 @@
-### **视频类(8个)**
+- 感谢CopyRight@[PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech)提供预训练模型,训练能力开放,欢迎体验。
+
+### **[视频类(8个)](./modules/README_ch.md#视频)**
- 包含短视频分类,支持3000+标签种类,可输出TOP-K标签,多种算法可选。
- 感谢CopyRight@[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo)提供预训练模型,训练能力开放,欢迎体验。
- `举例:输入一段游泳的短视频,算法可以输出"游泳"结果`
@@ -247,3 +278,4 @@ print(results)
* 非常感谢[zl1271](https://github.com/zl1271)修复了serving文档中的错别字
* 非常感谢[AK391](https://github.com/AK391)在Hugging Face spaces中添加了UGATIT和deoldify模型的web demo
* 非常感谢[itegel](https://github.com/itegel)修复了快速开始文档中的错别字
+* 非常感谢[AK391](https://github.com/AK391)在Hugging Face spaces中添加了Photo2Cartoon模型的web demo
diff --git a/docs/docs_en/visualization.md b/docs/docs_en/visualization.md
index 43dd60ea6a7ff52c3f912acad1bb4ce6149d8469..363ac7a95121c098c292de2004e636ec78c7646f 100644
--- a/docs/docs_en/visualization.md
+++ b/docs/docs_en/visualization.md
@@ -50,6 +50,8 @@
**UGATIT Selfie2anime Huggingface Web Demo**: Integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/U-GAT-IT-selfie2anime)
+**Photo2Cartoon Huggingface Web Demo**: Integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/photo2cartoon)
+
### Object Detection
- Pedestrian detection, vehicle detection, and more industrial-grade ultra-large-scale pretrained models are provided.
diff --git a/modules/README.md b/modules/README.md
index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..bc90c687d9b027681e93c05c2892cc58dbff4719 100644
--- a/modules/README.md
+++ b/modules/README.md
@@ -0,0 +1,547 @@
+English | [简体中文](README_ch.md)
+
+# CONTENTS
+|[Image](#Image) (212)|[Text](#Text) (130)|[Audio](#Audio) (15)|[Video](#Video) (8)|[Industrial Application](#Industrial-Application) (1)|
+|--|--|--|--|--|
+|[Image Classification](#Image-Classification) (108)|[Text Generation](#Text-Generation) (17)| [Voice Cloning](#Voice-Cloning) (2)|[Video Classification](#Video-Classification) (5)| [Meter Detection](#Meter-Detection) (1)|
+|[Image Generation](#Image-Generation) (26)|[Word Embedding](#Word-Embedding) (62)|[Text to Speech](#Text-to-Speech) (5)|[Video Editing](#Video-Editing) (1)|-|
+|[Keypoint Detection](#Keypoint-Detection) (5)|[Machine Translation](#Machine-Translation) (2)|[Automatic Speech Recognition](#Automatic-Speech-Recognition) (5)|[Multiple Object tracking](#Multiple-Object-tracking) (2)|-|
+|[Semantic Segmentation](#Semantic-Segmentation) (25)|[Language Model](#Language-Model) (30)|[Audio Classification](#Audio-Classification) (3)| -|-|
+|[Face Detection](#Face-Detection) (7)|[Sentiment Analysis](#Sentiment-Analysis) (7)|-|-|-|
+|[Text Recognition](#Text-Recognition) (17)|[Syntactic Analysis](#Syntactic-Analysis) (1)|-|-|-|
+|[Image Editing](#Image-Editing) (8)|[Simultaneous Translation](#Simultaneous-Translation) (5)|-|-|-|
+|[Instance Segmentation](#Instance-Segmentation) (1)|[Lexical Analysis](#Lexical-Analysis) (2)|-|-|-|
+|[Object Detection](#Object-Detection) (13)|[Punctuation Restoration](#Punctuation-Restoration) (1)|-|-|-|
+|[Depth Estimation](#Depth-Estimation) (2)|[Text Review](#Text-Review) (3)|-|-|-|
+
+## Image
+ - ### Image Classification
+
+
expand
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[DriverStatusRecognition](image/classification/DriverStatusRecognition)|MobileNetV3_small_ssld|分心司机检测数据集||
+|[mobilenet_v2_animals](image/classification/mobilenet_v2_animals)|MobileNet_v2|百度自建动物数据集||
+|[repvgg_a1_imagenet](image/classification/repvgg_a1_imagenet)|RepVGG|ImageNet-2012||
+|[repvgg_a0_imagenet](image/classification/repvgg_a0_imagenet)|RepVGG|ImageNet-2012||
+|[resnext152_32x4d_imagenet](image/classification/resnext152_32x4d_imagenet)|ResNeXt|ImageNet-2012||
+|[resnet_v2_152_imagenet](image/classification/resnet_v2_152_imagenet)|ResNet V2|ImageNet-2012||
+|[resnet50_vd_animals](image/classification/resnet50_vd_animals)|ResNet50_vd|百度自建动物数据集||
+|[food_classification](image/classification/food_classification)|ResNet50_vd_ssld|美食数据集||
+|[mobilenet_v3_large_imagenet_ssld](image/classification/mobilenet_v3_large_imagenet_ssld)|Mobilenet_v3_large|ImageNet-2012||
+|[resnext152_vd_32x4d_imagenet](image/classification/resnext152_vd_32x4d_imagenet)||||
+|[ghostnet_x1_3_imagenet_ssld](image/classification/ghostnet_x1_3_imagenet_ssld)|GhostNet|ImageNet-2012||
+|[rexnet_1_5_imagenet](image/classification/rexnet_1_5_imagenet)|ReXNet|ImageNet-2012||
+|[resnext50_64x4d_imagenet](image/classification/resnext50_64x4d_imagenet)|ResNeXt|ImageNet-2012||
+|[resnext101_64x4d_imagenet](image/classification/resnext101_64x4d_imagenet)|ResNeXt|ImageNet-2012||
+|[efficientnetb0_imagenet](image/classification/efficientnetb0_imagenet)|EfficientNet|ImageNet-2012||
+|[efficientnetb1_imagenet](image/classification/efficientnetb1_imagenet)|EfficientNet|ImageNet-2012||
+|[mobilenet_v2_imagenet_ssld](image/classification/mobilenet_v2_imagenet_ssld)|Mobilenet_v2|ImageNet-2012||
+|[resnet50_vd_dishes](image/classification/resnet50_vd_dishes)|ResNet50_vd|百度自建菜品数据集||
+|[pnasnet_imagenet](image/classification/pnasnet_imagenet)|PNASNet|ImageNet-2012||
+|[rexnet_2_0_imagenet](image/classification/rexnet_2_0_imagenet)|ReXNet|ImageNet-2012||
+|[SnakeIdentification](image/classification/SnakeIdentification)|ResNet50_vd_ssld|蛇种数据集||
+|[hrnet40_imagenet](image/classification/hrnet40_imagenet)|HRNet|ImageNet-2012||
+|[resnet_v2_34_imagenet](image/classification/resnet_v2_34_imagenet)|ResNet V2|ImageNet-2012||
+|[mobilenet_v2_dishes](image/classification/mobilenet_v2_dishes)|MobileNet_v2|百度自建菜品数据集||
+|[resnext101_vd_32x4d_imagenet](image/classification/resnext101_vd_32x4d_imagenet)|ResNeXt|ImageNet-2012||
+|[repvgg_b2g4_imagenet](image/classification/repvgg_b2g4_imagenet)|RepVGG|ImageNet-2012||
+|[fix_resnext101_32x48d_wsl_imagenet](image/classification/fix_resnext101_32x48d_wsl_imagenet)|ResNeXt|ImageNet-2012||
+|[vgg13_imagenet](image/classification/vgg13_imagenet)|VGG|ImageNet-2012||
+|[se_resnext101_32x4d_imagenet](image/classification/se_resnext101_32x4d_imagenet)|SE_ResNeXt|ImageNet-2012||
+|[hrnet30_imagenet](image/classification/hrnet30_imagenet)|HRNet|ImageNet-2012||
+|[ghostnet_x1_3_imagenet](image/classification/ghostnet_x1_3_imagenet)|GhostNet|ImageNet-2012||
+|[dpn107_imagenet](image/classification/dpn107_imagenet)|DPN|ImageNet-2012||
+|[densenet161_imagenet](image/classification/densenet161_imagenet)|DenseNet|ImageNet-2012||
+|[vgg19_imagenet](image/classification/vgg19_imagenet)|vgg19_imagenet|ImageNet-2012||
+|[mobilenet_v2_imagenet](image/classification/mobilenet_v2_imagenet)|Mobilenet_v2|ImageNet-2012||
+|[resnet50_vd_10w](image/classification/resnet50_vd_10w)|ResNet_vd|百度自建数据集||
+|[resnet_v2_101_imagenet](image/classification/resnet_v2_101_imagenet)|ResNet V2 101|ImageNet-2012||
+|[darknet53_imagenet](image/classification/darknet53_imagenet)|DarkNet|ImageNet-2012||
+|[se_resnext50_32x4d_imagenet](image/classification/se_resnext50_32x4d_imagenet)|SE_ResNeXt|ImageNet-2012||
+|[se_hrnet64_imagenet_ssld](image/classification/se_hrnet64_imagenet_ssld)|HRNet|ImageNet-2012||
+|[resnext101_32x16d_wsl](image/classification/resnext101_32x16d_wsl)|ResNeXt_wsl|ImageNet-2012||
+|[hrnet18_imagenet](image/classification/hrnet18_imagenet)|HRNet|ImageNet-2012||
+|[spinalnet_res101_gemstone](image/classification/spinalnet_res101_gemstone)|resnet101|gemstone||
+|[densenet264_imagenet](image/classification/densenet264_imagenet)|DenseNet|ImageNet-2012||
+|[resnext50_vd_32x4d_imagenet](image/classification/resnext50_vd_32x4d_imagenet)|ResNeXt_vd|ImageNet-2012||
+|[SpinalNet_Gemstones](image/classification/SpinalNet_Gemstones)||||
+|[spinalnet_vgg16_gemstone](image/classification/spinalnet_vgg16_gemstone)|vgg16|gemstone||
+|[xception71_imagenet](image/classification/xception71_imagenet)|Xception|ImageNet-2012||
+|[repvgg_b2_imagenet](image/classification/repvgg_b2_imagenet)|RepVGG|ImageNet-2012||
+|[dpn68_imagenet](image/classification/dpn68_imagenet)|DPN|ImageNet-2012||
+|[alexnet_imagenet](image/classification/alexnet_imagenet)|AlexNet|ImageNet-2012||
+|[rexnet_1_3_imagenet](image/classification/rexnet_1_3_imagenet)|ReXNet|ImageNet-2012||
+|[hrnet64_imagenet](image/classification/hrnet64_imagenet)|HRNet|ImageNet-2012||
+|[efficientnetb7_imagenet](image/classification/efficientnetb7_imagenet)|EfficientNet|ImageNet-2012||
+|[efficientnetb0_small_imagenet](image/classification/efficientnetb0_small_imagenet)|EfficientNet|ImageNet-2012||
+|[efficientnetb6_imagenet](image/classification/efficientnetb6_imagenet)|EfficientNet|ImageNet-2012||
+|[hrnet48_imagenet](image/classification/hrnet48_imagenet)|HRNet|ImageNet-2012||
+|[rexnet_3_0_imagenet](image/classification/rexnet_3_0_imagenet)|ReXNet|ImageNet-2012||
+|[shufflenet_v2_imagenet](image/classification/shufflenet_v2_imagenet)|ShuffleNet V2|ImageNet-2012||
+|[ghostnet_x0_5_imagenet](image/classification/ghostnet_x0_5_imagenet)|GhostNet|ImageNet-2012||
+|[inception_v4_imagenet](image/classification/inception_v4_imagenet)|Inception_V4|ImageNet-2012||
+|[resnext101_vd_64x4d_imagenet](image/classification/resnext101_vd_64x4d_imagenet)|ResNeXt_vd|ImageNet-2012||
+|[densenet201_imagenet](image/classification/densenet201_imagenet)|DenseNet|ImageNet-2012||
+|[vgg16_imagenet](image/classification/vgg16_imagenet)|VGG|ImageNet-2012||
+|[mobilenet_v3_small_imagenet_ssld](image/classification/mobilenet_v3_small_imagenet_ssld)|Mobilenet_v3_Small|ImageNet-2012||
+|[hrnet18_imagenet_ssld](image/classification/hrnet18_imagenet_ssld)|HRNet|ImageNet-2012||
+|[resnext152_64x4d_imagenet](image/classification/resnext152_64x4d_imagenet)|ResNeXt|ImageNet-2012||
+|[efficientnetb3_imagenet](image/classification/efficientnetb3_imagenet)|EfficientNet|ImageNet-2012||
+|[efficientnetb2_imagenet](image/classification/efficientnetb2_imagenet)|EfficientNet|ImageNet-2012||
+|[repvgg_b1g4_imagenet](image/classification/repvgg_b1g4_imagenet)|RepVGG|ImageNet-2012||
+|[resnext101_32x4d_imagenet](image/classification/resnext101_32x4d_imagenet)|ResNeXt|ImageNet-2012||
+|[resnext50_32x4d_imagenet](image/classification/resnext50_32x4d_imagenet)|ResNeXt|ImageNet-2012||
+|[repvgg_a2_imagenet](image/classification/repvgg_a2_imagenet)|RepVGG|ImageNet-2012||
+|[resnext152_vd_64x4d_imagenet](image/classification/resnext152_vd_64x4d_imagenet)|ResNeXt_vd|ImageNet-2012||
+|[xception41_imagenet](image/classification/xception41_imagenet)|Xception|ImageNet-2012||
+|[googlenet_imagenet](image/classification/googlenet_imagenet)|GoogleNet|ImageNet-2012||
+|[resnet50_vd_imagenet_ssld](image/classification/resnet50_vd_imagenet_ssld)|ResNet_vd|ImageNet-2012||
+|[repvgg_b1_imagenet](image/classification/repvgg_b1_imagenet)|RepVGG|ImageNet-2012||
+|[repvgg_b0_imagenet](image/classification/repvgg_b0_imagenet)|RepVGG|ImageNet-2012||
+|[resnet_v2_50_imagenet](image/classification/resnet_v2_50_imagenet)|ResNet V2|ImageNet-2012||
+|[rexnet_1_0_imagenet](image/classification/rexnet_1_0_imagenet)|ReXNet|ImageNet-2012||
+|[resnet_v2_18_imagenet](image/classification/resnet_v2_18_imagenet)|ResNet V2|ImageNet-2012||
+|[resnext101_32x8d_wsl](image/classification/resnext101_32x8d_wsl)|ResNeXt_wsl|ImageNet-2012||
+|[efficientnetb4_imagenet](image/classification/efficientnetb4_imagenet)|EfficientNet|ImageNet-2012||
+|[efficientnetb5_imagenet](image/classification/efficientnetb5_imagenet)|EfficientNet|ImageNet-2012||
+|[repvgg_b1g2_imagenet](image/classification/repvgg_b1g2_imagenet)|RepVGG|ImageNet-2012||
+|[resnext101_32x48d_wsl](image/classification/resnext101_32x48d_wsl)|ResNeXt_wsl|ImageNet-2012||
+|[resnet50_vd_wildanimals](image/classification/resnet50_vd_wildanimals)|ResNet_vd|IFAW 自建野生动物数据集||
+|[nasnet_imagenet](image/classification/nasnet_imagenet)|NASNet|ImageNet-2012||
+|[se_resnet18_vd_imagenet](image/classification/se_resnet18_vd_imagenet)||||
+|[spinalnet_res50_gemstone](image/classification/spinalnet_res50_gemstone)|resnet50|gemstone||
+|[resnext50_vd_64x4d_imagenet](image/classification/resnext50_vd_64x4d_imagenet)|ResNeXt_vd|ImageNet-2012||
+|[resnext101_32x32d_wsl](image/classification/resnext101_32x32d_wsl)|ResNeXt_wsl|ImageNet-2012||
+|[dpn131_imagenet](image/classification/dpn131_imagenet)|DPN|ImageNet-2012||
+|[xception65_imagenet](image/classification/xception65_imagenet)|Xception|ImageNet-2012||
+|[repvgg_b3g4_imagenet](image/classification/repvgg_b3g4_imagenet)|RepVGG|ImageNet-2012||
+|[marine_biometrics](image/classification/marine_biometrics)|ResNet50_vd_ssld|Fish4Knowledge||
+|[res2net101_vd_26w_4s_imagenet](image/classification/res2net101_vd_26w_4s_imagenet)|Res2Net|ImageNet-2012||
+|[dpn98_imagenet](image/classification/dpn98_imagenet)|DPN|ImageNet-2012||
+|[resnet18_vd_imagenet](image/classification/resnet18_vd_imagenet)|ResNet_vd|ImageNet-2012||
+|[densenet121_imagenet](image/classification/densenet121_imagenet)|DenseNet|ImageNet-2012||
+|[vgg11_imagenet](image/classification/vgg11_imagenet)|VGG|ImageNet-2012||
+|[hrnet44_imagenet](image/classification/hrnet44_imagenet)|HRNet|ImageNet-2012||
+|[densenet169_imagenet](image/classification/densenet169_imagenet)|DenseNet|ImageNet-2012||
+|[hrnet32_imagenet](image/classification/hrnet32_imagenet)|HRNet|ImageNet-2012||
+|[dpn92_imagenet](image/classification/dpn92_imagenet)|DPN|ImageNet-2012||
+|[ghostnet_x1_0_imagenet](image/classification/ghostnet_x1_0_imagenet)|GhostNet|ImageNet-2012||
+|[hrnet48_imagenet_ssld](image/classification/hrnet48_imagenet_ssld)|HRNet|ImageNet-2012||
+
+
+
+
+ - ### Image Generation
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[pixel2style2pixel](image/Image_gan/gan/pixel2style2pixel/)|Pixel2Style2Pixel|-|人脸转正|
+|[stgan_bald](image/Image_gan/gan/stgan_bald/)|STGAN|CelebA|秃头生成器|
+|[styleganv2_editing](image/Image_gan/gan/styleganv2_editing)|StyleGAN V2|-|人脸编辑|
+|[wav2lip](image/Image_gan/gan/wav2lip)|wav2lip|LRS2|唇形生成|
+|[attgan_celeba](image/Image_gan/attgan_celeba/)|AttGAN|Celeba|人脸编辑|
+|[cyclegan_cityscapes](image/Image_gan/cyclegan_cityscapes)|CycleGAN|Cityscapes|实景图和语义分割结果互相转换|
+|[stargan_celeba](image/Image_gan/stargan_celeba)|StarGAN|Celeba|人脸编辑|
+|[stgan_celeba](image/Image_gan/stgan_celeba/)|STGAN|Celeba|人脸编辑|
+|[ID_Photo_GEN](image/Image_gan/style_transfer/ID_Photo_GEN)|HRNet_W18|-|证件照生成|
+|[Photo2Cartoon](image/Image_gan/style_transfer/Photo2Cartoon)|U-GAT-IT|cartoon_data|人脸卡通化|
+|[U2Net_Portrait](image/Image_gan/style_transfer/U2Net_Portrait)|U^2Net|-|人脸素描化|
+|[UGATIT_100w](image/Image_gan/style_transfer/UGATIT_100w)|U-GAT-IT|selfie2anime|人脸动漫化|
+|[UGATIT_83w](image/Image_gan/style_transfer/UGATIT_83w)|U-GAT-IT|selfie2anime|人脸动漫化|
+|[UGATIT_92w](image/Image_gan/style_transfer/UGATIT_92w)| U-GAT-IT|selfie2anime|人脸动漫化|
+|[animegan_v1_hayao_60](image/Image_gan/style_transfer/animegan_v1_hayao_60)|AnimeGAN|The Wind Rises|图像风格迁移-宫崎骏|
+|[animegan_v2_hayao_64](image/Image_gan/style_transfer/animegan_v2_hayao_64)|AnimeGAN|The Wind Rises|图像风格迁移-宫崎骏|
+|[animegan_v2_hayao_99](image/Image_gan/style_transfer/animegan_v2_hayao_99)|AnimeGAN|The Wind Rises|图像风格迁移-宫崎骏|
+|[animegan_v2_paprika_54](image/Image_gan/style_transfer/animegan_v2_paprika_54)|AnimeGAN|Paprika|图像风格迁移-今敏|
+|[animegan_v2_paprika_74](image/Image_gan/style_transfer/animegan_v2_paprika_74)|AnimeGAN|Paprika|图像风格迁移-今敏|
+|[animegan_v2_paprika_97](image/Image_gan/style_transfer/animegan_v2_paprika_97)|AnimeGAN|Paprika|图像风格迁移-今敏|
+|[animegan_v2_paprika_98](image/Image_gan/style_transfer/animegan_v2_paprika_98)|AnimeGAN|Paprika|图像风格迁移-今敏|
+|[animegan_v2_shinkai_33](image/Image_gan/style_transfer/animegan_v2_shinkai_33)|AnimeGAN|Your Name, Weathering with you|图像风格迁移-新海诚|
+|[animegan_v2_shinkai_53](image/Image_gan/style_transfer/animegan_v2_shinkai_53)|AnimeGAN|Your Name, Weathering with you|图像风格迁移-新海诚|
+|[msgnet](image/Image_gan/style_transfer/msgnet)|msgnet|COCO2014|
+|[stylepro_artistic](image/Image_gan/style_transfer/stylepro_artistic)|StyleProNet|MS-COCO + WikiArt|艺术风格迁移|
+|stylegan_ffhq|StyleGAN|FFHQ|图像风格迁移|
+
+ - ### Keypoint Detection
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[face_landmark_localization](image/keypoint_detection/face_landmark_localization)|Face_Landmark|AFW/AFLW|人脸关键点检测|
+|[hand_pose_localization](image/keypoint_detection/hand_pose_localization)|-|MPII, NZSL|手部关键点检测|
+|[openpose_body_estimation](image/keypoint_detection/openpose_body_estimation)|two-branch multi-stage CNN|MPII, COCO 2016|肢体关键点检测|
+|[human_pose_estimation_resnet50_mpii](image/keypoint_detection/human_pose_estimation_resnet50_mpii)|Pose_Resnet50|MPII|人体骨骼关键点检测
+|[openpose_hands_estimation](image/keypoint_detection/openpose_hands_estimation)|-|MPII, NZSL|手部关键点检测|
+
+ - ### Semantic Segmentation
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[deeplabv3p_xception65_humanseg](image/semantic_segmentation/deeplabv3p_xception65_humanseg)|deeplabv3p|百度自建数据集|人像分割|
+|[humanseg_server](image/semantic_segmentation/humanseg_server)|deeplabv3p|百度自建数据集|人像分割|
+|[humanseg_mobile](image/semantic_segmentation/humanseg_mobile)|hrnet|百度自建数据集|人像分割-移动端前置摄像头|
+|[humanseg_lite](image/semantic_segmentation/umanseg_lite)|shufflenet|百度自建数据集|轻量级人像分割-移动端实时|
+|[ExtremeC3_Portrait_Segmentation](image/semantic_segmentation/ExtremeC3_Portrait_Segmentation)|ExtremeC3|EG1800, Baidu fashion dataset|轻量化人像分割|
+|[SINet_Portrait_Segmentation](image/semantic_segmentation/SINet_Portrait_Segmentation)|SINet|EG1800, Baidu fashion dataset|轻量化人像分割|
+|[FCN_HRNet_W18_Face_Seg](image/semantic_segmentation/FCN_HRNet_W18_Face_Seg)|FCN_HRNet_W18|-|人像分割|
+|[ace2p](image/semantic_segmentation/ace2p)|ACE2P|LIP|人体解析|
+|[Pneumonia_CT_LKM_PP](image/semantic_segmentation/Pneumonia_CT_LKM_PP)|U-NET+|连心医疗授权脱敏数据集|肺炎CT影像分析|
+|[Pneumonia_CT_LKM_PP_lung](image/semantic_segmentation/Pneumonia_CT_LKM_PP_lung)|U-NET+|连心医疗授权脱敏数据集|肺炎CT影像分析|
+|[ocrnet_hrnetw18_voc](image/semantic_segmentation/ocrnet_hrnetw18_voc)|ocrnet, hrnet|PascalVoc2012|
+|[U2Net](image/semantic_segmentation/U2Net)|U^2Net|-|图像前景背景分割|
+|[U2Netp](image/semantic_segmentation/U2Netp)|U^2Net|-|图像前景背景分割|
+|[Extract_Line_Draft](image/semantic_segmentation/Extract_Line_Draft)|UNet|Pixiv|线稿提取|
+|[unet_cityscapes](image/semantic_segmentation/unet_cityscapes)|UNet|cityscapes|
+|[ocrnet_hrnetw18_cityscapes](image/semantic_segmentation/ocrnet_hrnetw18_cityscapes)|ocrnet_hrnetw18|cityscapes|
+|[hardnet_cityscapes](image/semantic_segmentation/hardnet_cityscapes)|hardnet|cityscapes|
+|[fcn_hrnetw48_voc](image/semantic_segmentation/fcn_hrnetw48_voc)|fcn_hrnetw48|PascalVoc2012|
+|[fcn_hrnetw48_cityscapes](image/semantic_segmentation/fcn_hrnetw48_cityscapes)|fcn_hrnetw48|cityscapes|
+|[fcn_hrnetw18_voc](image/semantic_segmentation/fcn_hrnetw18_voc)|fcn_hrnetw18|PascalVoc2012|
+|[fcn_hrnetw18_cityscapes](image/semantic_segmentation/fcn_hrnetw18_cityscapes)|fcn_hrnetw18|cityscapes|
+|[fastscnn_cityscapes](image/semantic_segmentation/fastscnn_cityscapes)|fastscnn|cityscapes|
+|[deeplabv3p_resnet50_voc](image/semantic_segmentation/deeplabv3p_resnet50_voc)|deeplabv3p, resnet50|PascalVoc2012|
+|[deeplabv3p_resnet50_cityscapes](image/semantic_segmentation/deeplabv3p_resnet50_cityscapes)|deeplabv3p, resnet50|cityscapes|
+|[bisenetv2_cityscapes](image/semantic_segmentation/bisenetv2_cityscapes)|bisenetv2|cityscapes|
+
+
+
+ - ### Face Detection
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[pyramidbox_lite_mobile](image/face_detection/pyramidbox_lite_mobile)|PyramidBox|WIDER FACE数据集 + 百度自采人脸数据集|轻量级人脸检测-移动端|
+|[pyramidbox_lite_mobile_mask](image/face_detection/pyramidbox_lite_mobile_mask)|PyramidBox|WIDER FACE数据集 + 百度自采人脸数据集|轻量级人脸口罩检测-移动端|
+|[pyramidbox_lite_server_mask](image/face_detection/pyramidbox_lite_server_mask)|PyramidBox|WIDER FACE数据集 + 百度自采人脸数据集|轻量级人脸口罩检测|
+|[ultra_light_fast_generic_face_detector_1mb_640](image/face_detection/ultra_light_fast_generic_face_detector_1mb_640)|Ultra-Light-Fast-Generic-Face-Detector-1MB|WIDER FACE数据集|轻量级通用人脸检测-低算力设备|
+|[ultra_light_fast_generic_face_detector_1mb_320](image/face_detection/ultra_light_fast_generic_face_detector_1mb_320)|Ultra-Light-Fast-Generic-Face-Detector-1MB|WIDER FACE数据集|轻量级通用人脸检测-低算力设备|
+|[pyramidbox_lite_server](image/face_detection/pyramidbox_lite_server)|PyramidBox|WIDER FACE数据集 + 百度自采人脸数据集|轻量级人脸检测|
+|[pyramidbox_face_detection](image/face_detection/pyramidbox_face_detection)|PyramidBox|WIDER FACE数据集|人脸检测|
+
+ - ### Text Recognition
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[chinese_ocr_db_crnn_mobile](image/text_recognition/chinese_ocr_db_crnn_mobile)|Differentiable Binarization+RCNN|icdar2015数据集|中文文字识别|[chinese_text_detection_db_mobile](image/text_recognition/chinese_text_detection_db_mobile)|Differentiable Binarization|icdar2015数据集|中文文本检测|
+|[chinese_text_detection_db_server](image/text_recognition/chinese_text_detection_db_server)|Differentiable Binarization|icdar2015数据集|中文文本检测|
+|[chinese_ocr_db_crnn_server](image/text_recognition/chinese_ocr_db_crnn_server)|Differentiable Binarization+RCNN|icdar2015数据集|中文文字识别|
+|[Vehicle_License_Plate_Recognition](image/text_recognition/Vehicle_License_Plate_Recognition)|-|CCPD|车牌识别|
+|[chinese_cht_ocr_db_crnn_mobile](image/text_recognition/chinese_cht_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|繁体中文文字识别|
+|[japan_ocr_db_crnn_mobile](image/text_recognition/japan_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|日文文字识别|
+|[korean_ocr_db_crnn_mobile](image/text_recognition/korean_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|韩文文字识别|
+|[german_ocr_db_crnn_mobile](image/text_recognition/german_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|德文文字识别|
+|[french_ocr_db_crnn_mobile](image/text_recognition/french_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|法文文字识别|
+|[latin_ocr_db_crnn_mobile](image/text_recognition/latin_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|拉丁文文字识别|
+|[cyrillic_ocr_db_crnn_mobile](image/text_recognition/cyrillic_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|斯拉夫文文字识别|
+|[multi_languages_ocr_db_crnn](image/text_recognition/multi_languages_ocr_db_crnn)|Differentiable Binarization+RCNN|icdar2015数据集|多语言文字识别|
+|[kannada_ocr_db_crnn_mobile](image/text_recognition/kannada_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|卡纳达文文字识别|
+|[arabic_ocr_db_crnn_mobile](image/text_recognition/arabic_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|阿拉伯文文字识别|
+|[telugu_ocr_db_crnn_mobile](image/text_recognition/telugu_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|泰卢固文文字识别|
+|[devanagari_ocr_db_crnn_mobile](image/text_recognition/devanagari_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|梵文文字识别|
+|[tamil_ocr_db_crnn_mobile](image/text_recognition/tamil_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|泰米尔文文字识别|
+
+
+ - ### Image Editing
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[realsr](image/Image_editing/super_resolution/realsr)|LP-KPN|RealSR dataset|图像/视频超分-4倍|
+|[deoldify](image/Image_editing/colorization/deoldify)|GAN|ILSVRC 2012|黑白照片/视频着色|
+|[photo_restoration](image/Image_editing/colorization/photo_restoration)|基于deoldify和realsr模型|-|老照片修复|
+|[user_guided_colorization](image/Image_editing/colorization/user_guided_colorization)|siggraph|ILSVRC 2012|图像着色|
+|[falsr_c](image/Image_editing/super_resolution/falsr_c)|falsr_c| DIV2k|轻量化超分-2倍|
+|[dcscn](image/Image_editing/super_resolution/dcscn)|dcscn| DIV2k|轻量化超分-2倍|
+|[falsr_a](image/Image_editing/super_resolution/falsr_a)|falsr_a| DIV2k|轻量化超分-2倍|
+|[falsr_b](image/Image_editing/super_resolution/falsr_b)|falsr_b|DIV2k|轻量化超分-2倍|
+
+ - ### Instance Segmentation
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[solov2](image/instance_segmentation/solov2)|-|COCO2014|实例分割|
+
+ - ### Object Detection
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[faster_rcnn_resnet50_coco2017](image/object_detection/faster_rcnn_resnet50_coco2017)|faster_rcnn|COCO2017||
+|[ssd_vgg16_512_coco2017](image/object_detection/ssd_vgg16_512_coco2017)|SSD|COCO2017||
+|[faster_rcnn_resnet50_fpn_venus](image/object_detection/faster_rcnn_resnet50_fpn_venus)|faster_rcnn|百度自建数据集|大规模通用目标检测|
+|[ssd_vgg16_300_coco2017](image/object_detection/ssd_vgg16_300_coco2017)||||
+|[yolov3_resnet34_coco2017](image/object_detection/yolov3_resnet34_coco2017)|YOLOv3|COCO2017||
+|[yolov3_darknet53_pedestrian](image/object_detection/yolov3_darknet53_pedestrian)|YOLOv3|百度自建大规模行人数据集|行人检测|
+|[yolov3_mobilenet_v1_coco2017](image/object_detection/yolov3_mobilenet_v1_coco2017)|YOLOv3|COCO2017||
+|[ssd_mobilenet_v1_pascal](image/object_detection/ssd_mobilenet_v1_pascal)|SSD|PASCAL VOC||
+|[faster_rcnn_resnet50_fpn_coco2017](image/object_detection/faster_rcnn_resnet50_fpn_coco2017)|faster_rcnn|COCO2017||
+|[yolov3_darknet53_coco2017](image/object_detection/yolov3_darknet53_coco2017)|YOLOv3|COCO2017||
+|[yolov3_darknet53_vehicles](image/object_detection/yolov3_darknet53_vehicles)|YOLOv3|百度自建大规模车辆数据集|车辆检测|
+|[yolov3_darknet53_venus](image/object_detection/yolov3_darknet53_venus)|YOLOv3|百度自建数据集|大规模通用检测|
+|[yolov3_resnet50_vd_coco2017](image/object_detection/yolov3_resnet50_vd_coco2017)|YOLOv3|COCO2017||
+
+ - ### Depth Estimation
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[MiDaS_Large](image/depth_estimation/MiDaS_Large)|-|3D Movies, WSVD, ReDWeb, MegaDepth||
+|[MiDaS_Small](image/depth_estimation/MiDaS_Small)|-|3D Movies, WSVD, ReDWeb, MegaDepth, etc.||
+
+## Text
+ - ### Text Generation
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[ernie_gen](text/text_generation/ernie_gen)|ERNIE-GEN|-|面向生成任务的预训练-微调框架|
+|[ernie_gen_poetry](text/text_generation/ernie_gen_poetry)|ERNIE-GEN|开源诗歌数据集|诗歌生成|
+|[ernie_gen_couplet](text/text_generation/ernie_gen_couplet)|ERNIE-GEN|开源对联数据集|对联生成|
+|[ernie_gen_lover_words](text/text_generation/ernie_gen_lover_words)|ERNIE-GEN|网络情诗、情话数据|情话生成|
+|[ernie_tiny_couplet](text/text_generation/ernie_tiny_couplet)|Eernie_tiny|开源对联数据集|对联生成|
+|[ernie_gen_acrostic_poetry](text/text_generation/ernie_gen_acrostic_poetry)|ERNIE-GEN|开源诗歌数据集|藏头诗生成|
+|[Rumor_prediction](text/text_generation/Rumor_prediction)|-|新浪微博中文谣言数据|谣言预测|
+|[plato-mini](text/text_generation/plato-mini)|Unified Transformer|十亿级别的中文对话数据|中文对话|
+|[plato2_en_large](text/text_generation/plato2_en_large)|plato2|开放域多轮数据集|超大规模生成式对话|
+|[plato2_en_base](text/text_generation/plato2_en_base)|plato2|开放域多轮数据集|超大规模生成式对话|
+|[CPM_LM](text/text_generation/CPM_LM)|GPT-2|自建数据集|中文文本生成|
+|[unified_transformer-12L-cn](text/text_generation/unified_transformer-12L-cn)|Unified Transformer|千万级别中文会话数据|人机多轮对话|
+|[unified_transformer-12L-cn-luge](text/text_generation/unified_transformer-12L-cn-luge)|Unified Transformer|千言对话数据集|人机多轮对话|
+|[reading_pictures_writing_poems](text/text_generation/reading_pictures_writing_poems)|多网络级联|-|看图写诗|
+|[GPT2_CPM_LM](text/text_generation/GPT2_CPM_LM)|||问答类文本生成|
+|[GPT2_Base_CN](text/text_generation/GPT2_Base_CN)|||问答类文本生成|
+
+ - ### Word Embedding
+
+
expand
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[w2v_weibo_target_word-bigram_dim300](text/embedding/w2v_weibo_target_word-bigram_dim300)|w2v|weibo||
+|[w2v_baidu_encyclopedia_target_word-ngram_1-2_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-ngram_1-2_dim300)|w2v|baidu_encyclopedia||
+|[w2v_literature_target_word-word_dim300](text/embedding/w2v_literature_target_word-word_dim300)|w2v|literature||
+|[word2vec_skipgram](text/embedding/word2vec_skipgram)|skip-gram|百度自建数据集||
+|[w2v_sogou_target_word-char_dim300](text/embedding/w2v_sogou_target_word-char_dim300)|w2v|sogou||
+|[w2v_weibo_target_bigram-char_dim300](text/embedding/w2v_weibo_target_bigram-char_dim300)|w2v|weibo||
+|[w2v_zhihu_target_word-bigram_dim300](text/embedding/w2v_zhihu_target_word-bigram_dim300)|w2v|zhihu||
+|[w2v_financial_target_word-word_dim300](text/embedding/w2v_financial_target_word-word_dim300)|w2v|financial||
+|[w2v_wiki_target_word-word_dim300](text/embedding/w2v_wiki_target_word-word_dim300)|w2v|wiki||
+|[w2v_baidu_encyclopedia_context_word-word_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-word_dim300)|w2v|baidu_encyclopedia||
+|[w2v_weibo_target_word-word_dim300](text/embedding/w2v_weibo_target_word-word_dim300)|w2v|weibo||
+|[w2v_zhihu_target_bigram-char_dim300](text/embedding/w2v_zhihu_target_bigram-char_dim300)|w2v|zhihu||
+|[w2v_zhihu_target_word-word_dim300](text/embedding/w2v_zhihu_target_word-word_dim300)|w2v|zhihu||
+|[w2v_people_daily_target_word-char_dim300](text/embedding/w2v_people_daily_target_word-char_dim300)|w2v|people_daily||
+|[w2v_sikuquanshu_target_word-word_dim300](text/embedding/w2v_sikuquanshu_target_word-word_dim300)|w2v|sikuquanshu||
+|[glove_twitter_target_word-word_dim200_en](text/embedding/glove_twitter_target_word-word_dim200_en)|fasttext|twitter||
+|[fasttext_crawl_target_word-word_dim300_en](text/embedding/fasttext_crawl_target_word-word_dim300_en)|fasttext|crawl||
+|[w2v_wiki_target_word-bigram_dim300](text/embedding/w2v_wiki_target_word-bigram_dim300)|w2v|wiki||
+|[w2v_baidu_encyclopedia_context_word-character_char1-1_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-character_char1-1_dim300)|w2v|baidu_encyclopedia||
+|[glove_wiki2014-gigaword_target_word-word_dim300_en](text/embedding/glove_wiki2014-gigaword_target_word-word_dim300_en)|glove|wiki2014-gigaword||
+|[glove_wiki2014-gigaword_target_word-word_dim50_en](text/embedding/glove_wiki2014-gigaword_target_word-word_dim50_en)|glove|wiki2014-gigaword||
+|[w2v_baidu_encyclopedia_context_word-ngram_2-2_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-ngram_2-2_dim300)|w2v|baidu_encyclopedia||
+|[w2v_wiki_target_bigram-char_dim300](text/embedding/w2v_wiki_target_bigram-char_dim300)|w2v|wiki||
+|[w2v_baidu_encyclopedia_target_word-character_char1-1_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-character_char1-1_dim300)|w2v|baidu_encyclopedia||
+|[w2v_financial_target_bigram-char_dim300](text/embedding/w2v_financial_target_bigram-char_dim300)|w2v|financial||
+|[glove_wiki2014-gigaword_target_word-word_dim200_en](text/embedding/glove_wiki2014-gigaword_target_word-word_dim200_en)|glove|wiki2014-gigaword||
+|[w2v_financial_target_word-bigram_dim300](text/embedding/w2v_financial_target_word-bigram_dim300)|w2v|financial||
+|[w2v_mixed-large_target_word-char_dim300](text/embedding/w2v_mixed-large_target_word-char_dim300)|w2v|mixed||
+|[w2v_baidu_encyclopedia_target_word-wordPosition_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-wordPosition_dim300)|w2v|baidu_encyclopedia||
+|[w2v_baidu_encyclopedia_context_word-ngram_1-3_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-ngram_1-3_dim300)|w2v|baidu_encyclopedia||
+|[w2v_baidu_encyclopedia_target_word-wordLR_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-wordLR_dim300)|w2v|baidu_encyclopedia||
+|[w2v_sogou_target_bigram-char_dim300](text/embedding/w2v_sogou_target_bigram-char_dim300)|w2v|sogou||
+|[w2v_weibo_target_word-char_dim300](text/embedding/w2v_weibo_target_word-char_dim300)|w2v|weibo||
+|[w2v_people_daily_target_word-word_dim300](text/embedding/w2v_people_daily_target_word-word_dim300)|w2v|people_daily||
+|[w2v_zhihu_target_word-char_dim300](text/embedding/w2v_zhihu_target_word-char_dim300)|w2v|zhihu||
+|[w2v_wiki_target_word-char_dim300](text/embedding/w2v_wiki_target_word-char_dim300)|w2v|wiki||
+|[w2v_sogou_target_word-bigram_dim300](text/embedding/w2v_sogou_target_word-bigram_dim300)|w2v|sogou||
+|[w2v_financial_target_word-char_dim300](text/embedding/w2v_financial_target_word-char_dim300)|w2v|financial||
+|[w2v_baidu_encyclopedia_target_word-ngram_1-3_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-ngram_1-3_dim300)|w2v|baidu_encyclopedia||
+|[glove_wiki2014-gigaword_target_word-word_dim100_en](text/embedding/glove_wiki2014-gigaword_target_word-word_dim100_en)|glove|wiki2014-gigaword||
+|[w2v_baidu_encyclopedia_target_word-character_char1-4_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-character_char1-4_dim300)|w2v|baidu_encyclopedia||
+|[w2v_sogou_target_word-word_dim300](text/embedding/w2v_sogou_target_word-word_dim300)|w2v|sogou||
+|[w2v_literature_target_word-char_dim300](text/embedding/w2v_literature_target_word-char_dim300)|w2v|literature||
+|[w2v_baidu_encyclopedia_target_bigram-char_dim300](text/embedding/w2v_baidu_encyclopedia_target_bigram-char_dim300)|w2v|baidu_encyclopedia||
+|[w2v_baidu_encyclopedia_target_word-word_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-word_dim300)|w2v|baidu_encyclopedia||
+|[glove_twitter_target_word-word_dim100_en](text/embedding/glove_twitter_target_word-word_dim100_en)|glove|crawl||
+|[w2v_baidu_encyclopedia_target_word-ngram_2-2_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-ngram_2-2_dim300)|w2v|baidu_encyclopedia||
+|[w2v_baidu_encyclopedia_context_word-character_char1-4_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-character_char1-4_dim300)|w2v|baidu_encyclopedia||
+|[w2v_literature_target_bigram-char_dim300](text/embedding/w2v_literature_target_bigram-char_dim300)|w2v|literature||
+|[fasttext_wiki-news_target_word-word_dim300_en](text/embedding/fasttext_wiki-news_target_word-word_dim300_en)|fasttext|wiki-news||
+|[w2v_people_daily_target_word-bigram_dim300](text/embedding/w2v_people_daily_target_word-bigram_dim300)|w2v|people_daily||
+|[w2v_mixed-large_target_word-word_dim300](text/embedding/w2v_mixed-large_target_word-word_dim300)|w2v|mixed||
+|[w2v_people_daily_target_bigram-char_dim300](text/embedding/w2v_people_daily_target_bigram-char_dim300)|w2v|people_daily||
+|[w2v_literature_target_word-bigram_dim300](text/embedding/w2v_literature_target_word-bigram_dim300)|w2v|literature||
+|[glove_twitter_target_word-word_dim25_en](text/embedding/glove_twitter_target_word-word_dim25_en)|glove|twitter||
+|[w2v_baidu_encyclopedia_context_word-ngram_1-2_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-ngram_1-2_dim300)|w2v|baidu_encyclopedia||
+|[w2v_sikuquanshu_target_word-bigram_dim300](text/embedding/w2v_sikuquanshu_target_word-bigram_dim300)|w2v|sikuquanshu||
+|[w2v_baidu_encyclopedia_context_word-character_char1-2_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-character_char1-2_dim300)|w2v|baidu_encyclopedia||
+|[glove_twitter_target_word-word_dim50_en](text/embedding/glove_twitter_target_word-word_dim50_en)|glove|twitter||
+|[w2v_baidu_encyclopedia_context_word-wordLR_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-wordLR_dim300)|w2v|baidu_encyclopedia||
+|[w2v_baidu_encyclopedia_target_word-character_char1-2_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-character_char1-2_dim300)|w2v|baidu_encyclopedia||
+|[w2v_baidu_encyclopedia_context_word-wordPosition_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-wordPosition_dim300)|w2v|baidu_encyclopedia||
+
+
+
+ - ### Machine Translation
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[transformer_zh-en](text/machine_translation/transformer/transformer_zh-en)|Transformer|CWMT2021|中文译英文|
+|[transformer_en-de](text/machine_translation/transformer/transformer_en-de)|Transformer|WMT14 EN-DE|英文译德文|
+
+ - ### Language Model
+
+
expand
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[chinese_electra_small](text/language_model/chinese_electra_small)||||
+|[chinese_electra_base](text/language_model/chinese_electra_base)||||
+|[roberta-wwm-ext-large](text/language_model/roberta-wwm-ext-large)|roberta-wwm-ext-large|百度自建数据集||
+|[chinese-bert-wwm-ext](text/language_model/chinese_bert_wwm_ext)|chinese-bert-wwm-ext|百度自建数据集||
+|[lda_webpage](text/language_model/lda_webpage)|LDA|百度自建网页领域数据集||
+|[lda_novel](text/language_model/lda_novel)||||
+|[bert-base-multilingual-uncased](text/language_model/bert-base-multilingual-uncased)||||
+|[rbt3](text/language_model/rbt3)||||
+|[ernie_v2_eng_base](text/language_model/ernie_v2_eng_base)|ernie_v2_eng_base|百度自建数据集||
+|[bert-base-multilingual-cased](text/language_model/bert-base-multilingual-cased)||||
+|[rbtl3](text/language_model/rbtl3)||||
+|[chinese-bert-wwm](text/language_model/chinese_bert_wwm)|chinese-bert-wwm|百度自建数据集||
+|[bert-large-uncased](text/language_model/bert-large-uncased)||||
+|[slda_novel](text/language_model/slda_novel)||||
+|[slda_news](text/language_model/slda_news)||||
+|[electra_small](text/language_model/electra_small)||||
+|[slda_webpage](text/language_model/slda_webpage)||||
+|[bert-base-cased](text/language_model/bert-base-cased)||||
+|[slda_weibo](text/language_model/slda_weibo)||||
+|[roberta-wwm-ext](text/language_model/roberta-wwm-ext)|roberta-wwm-ext|百度自建数据集||
+|[bert-base-uncased](text/language_model/bert-base-uncased)||||
+|[electra_large](text/language_model/electra_large)||||
+|[ernie](text/language_model/ernie)|ernie-1.0|百度自建数据集||
+|[simnet_bow](text/language_model/simnet_bow)|BOW|百度自建数据集||
+|[ernie_tiny](text/language_model/ernie_tiny)|ernie_tiny|百度自建数据集||
+|[bert-base-chinese](text/language_model/bert-base-chinese)|bert-base-chinese|百度自建数据集||
+|[lda_news](text/language_model/lda_news)|LDA|百度自建新闻领域数据集||
+|[electra_base](text/language_model/electra_base)||||
+|[ernie_v2_eng_large](text/language_model/ernie_v2_eng_large)|ernie_v2_eng_large|百度自建数据集||
+|[bert-large-cased](text/language_model/bert-large-cased)||||
+
+
+
+
+ - ### Sentiment Analysis
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[ernie_skep_sentiment_analysis](text/sentiment_analysis/ernie_skep_sentiment_analysis)|SKEP|百度自建数据集|句子级情感分析|
+|[emotion_detection_textcnn](text/sentiment_analysis/emotion_detection_textcnn)|TextCNN|百度自建数据集|对话情绪识别|
+|[senta_bilstm](text/sentiment_analysis/senta_bilstm)|BiLSTM|百度自建数据集|中文情感倾向分析|
+|[senta_bow](text/sentiment_analysis/senta_bow)|BOW|百度自建数据集|中文情感倾向分析|
+|[senta_gru](text/sentiment_analysis/senta_gru)|GRU|百度自建数据集|中文情感倾向分析|
+|[senta_lstm](text/sentiment_analysis/senta_lstm)|LSTM|百度自建数据集|中文情感倾向分析|
+|[senta_cnn](text/sentiment_analysis/senta_cnn)|CNN|百度自建数据集|中文情感倾向分析|
+
+ - ### Syntactic Analysis
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[DDParser](text/syntactic_analysis/DDParser)|Deep Biaffine Attention|搜索query、网页文本、语音输入等数据|句法分析|
+
+ - ### Simultaneous Translation
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[transformer_nist_wait_1](text/simultaneous_translation/stacl/transformer_nist_wait_1)|transformer|NIST 2008-中英翻译数据集|中译英-wait-1策略|
+|[transformer_nist_wait_3](text/simultaneous_translation/stacl/transformer_nist_wait_3)|transformer|NIST 2008-中英翻译数据集|中译英-wait-3策略|
+|[transformer_nist_wait_5](text/simultaneous_translation/stacl/transformer_nist_wait_5)|transformer|NIST 2008-中英翻译数据集|中译英-wait-5策略|
+|[transformer_nist_wait_7](text/simultaneous_translation/stacl/transformer_nist_wait_7)|transformer|NIST 2008-中英翻译数据集|中译英-wait-7策略|
+|[transformer_nist_wait_all](text/simultaneous_translation/stacl/transformer_nist_wait_all)|transformer|NIST 2008-中英翻译数据集|中译英-waitk=-1策略|
+
+
+ - ### Lexical Analysis
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[jieba_paddle](text/lexical_analysis/jieba_paddle)|BiGRU+CRF|百度自建数据集|百度自研联合的词法分析模型,能整体性地完成中文分词、词性标注、专名识别任务。在百度自建数据集上评测,LAC效果:Precision=88.0%,Recall=88.7%,F1-Score=88.4%。|
+|[lac](text/lexical_analysis/lac)|BiGRU+CRF|百度自建数据集|jieba使用Paddle搭建的切词网络(双向GRU)。同时支持jieba的传统切词方法,如精确模式、全模式、搜索引擎模式等切词模式。|
+
+ - ### Punctuation Restoration
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[auto_punc](text/punctuation_restoration/auto_punc)|Ernie-1.0|WuDaoCorpora 2.0|自动添加7种标点符号|
+
+ - ### Text Review
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[porn_detection_cnn](text/text_review/porn_detection_cnn)|CNN|百度自建数据集|色情检测,自动判别文本是否涉黄并给出相应的置信度,对文本中的色情描述、低俗交友、污秽文案进行识别|
+|[porn_detection_gru](text/text_review/porn_detection_gru)|GRU|百度自建数据集|色情检测,自动判别文本是否涉黄并给出相应的置信度,对文本中的色情描述、低俗交友、污秽文案进行识别|
+|[porn_detection_lstm](text/text_review/porn_detection_lstm)|LSTM|百度自建数据集|色情检测,自动判别文本是否涉黄并给出相应的置信度,对文本中的色情描述、低俗交友、污秽文案进行识别|
+
+## Audio
+
+ - ### Video cloning
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[ge2e_fastspeech2_pwgan](audio/voice_cloning/ge2e_fastspeech2_pwgan)|FastSpeech2|AISHELL-3|中文语音克隆|
+|[lstm_tacotron2](audio/voice_cloning/lstm_tacotron2)|LSTM、Tacotron2、WaveFlow|AISHELL-3|中文语音克隆|
+
+ - ### Text to Speech
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[transformer_tts_ljspeech](audio/tts/transformer_tts_ljspeech)|Transformer|LJSpeech-1.1|英文语音合成|
+|[fastspeech_ljspeech](audio/tts/fastspeech_ljspeech)|FastSpeech|LJSpeech-1.1|英文语音合成|
+|[fastspeech2_baker](audio/tts/fastspeech2_baker)|FastSpeech2|Chinese Standard Mandarin Speech Copus|中文语音合成|
+|[fastspeech2_ljspeech](audio/tts/fastspeech2_ljspeech)|FastSpeech2|LJSpeech-1.1|英文语音合成|
+|[deepvoice3_ljspeech](audio/tts/deepvoice3_ljspeech)|DeepVoice3|LJSpeech-1.1|英文语音合成|
+
+ - ### Automatic Speech Recognition
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[deepspeech2_aishell](audio/asr/deepspeech2_aishell)|DeepSpeech2|AISHELL-1|中文语音识别|
+|[deepspeech2_librispeech](audio/asr/deepspeech2_librispeech)|DeepSpeech2|LibriSpeech|英文语音识别|
+|[u2_conformer_aishell](audio/asr/u2_conformer_aishell)|Conformer|AISHELL-1|中文语音识别|
+|[u2_conformer_wenetspeech](audio/asr/u2_conformer_wenetspeech)|Conformer|WenetSpeech|中文语音识别|
+|[u2_conformer_librispeech](audio/asr/u2_conformer_librispeech)|Conformer|LibriSpeech|英文语音识别|
+
+
+ - ### Audio Classification
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[panns_cnn6](audio/audio_classification/PANNs/cnn6)|PANNs|Google Audioset|主要包含4个卷积层和2个全连接层,模型参数为4.5M。经过预训练后,可以用于提取音频的embbedding,维度是512|
+|[panns_cnn14](audio/audio_classification/PANNs/cnn14)|PANNs|Google Audioset|主要包含12个卷积层和2个全连接层,模型参数为79.6M。经过预训练后,可以用于提取音频的embbedding,维度是2048|
+|[panns_cnn10](audio/audio_classification/PANNs/cnn10)|PANNs|Google Audioset|主要包含8个卷积层和2个全连接层,模型参数为4.9M。经过预训练后,可以用于提取音频的embbedding,维度是512|
+
+## Video
+ - ### Video Classification
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[videotag_tsn_lstm](video/classification/videotag_tsn_lstm)|TSN + AttentionLSTM|百度自建数据集|大规模短视频分类打标签|
+|[tsn_kinetics400](video/classification/tsn_kinetics400)|TSN|Kinetics-400|视频分类|
+|[tsm_kinetics400](video/classification/tsm_kinetics400)|TSM|Kinetics-400|视频分类|
+|[stnet_kinetics400](video/classification/stnet_kinetics400)|StNet|Kinetics-400|视频分类|
+|[nonlocal_kinetics400](video/classification/nonlocal_kinetics400)|Non-local|Kinetics-400|视频分类|
+
+
+ - ### Video Editing
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[SkyAR](video/Video_editing/SkyAR)|UNet|UNet|视频换天|
+
+ - ### Multiple Object tracking
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[fairmot_dla34](video/multiple_object_tracking/fairmot_dla34)|CenterNet|Caltech Pedestrian+CityPersons+CUHK-SYSU+PRW+ETHZ+MOT17|实时多目标跟踪|
+|[jde_darknet53](video/multiple_object_tracking/jde_darknet53)|YOLOv3|Caltech Pedestrian+CityPersons+CUHK-SYSU+PRW+ETHZ+MOT17|多目标跟踪-兼顾精度和速度|
+
+## Industrial Application
+
+ - ### Meter Detection
+
+|module|Network|Dataset|Introduction|
+|--|--|--|--|
+|[WatermeterSegmentation](image/semantic_segmentation/WatermeterSegmentation)|DeepLabV3|水表的数字表盘分割数据集|水表的数字表盘分割|
diff --git a/modules/README_ch.md b/modules/README_ch.md
new file mode 100644
index 0000000000000000000000000000000000000000..d3389e3c3307494357437e2e65d933a9e40c6663
--- /dev/null
+++ b/modules/README_ch.md
@@ -0,0 +1,546 @@
+简体中文 | [English](README.md)
+
+# 目录
+|[图像](#图像) (212个)|[文本](#文本) (130个)|[语音](#语音) (15个)|[视频](#视频) (8个)|[工业应用](#工业应用) (1个)|
+|--|--|--|--|--|
+|[图像分类](#图像分类) (108)|[文本生成](#文本生成) (17)| [声音克隆](#声音克隆) (2)|[视频分类](#视频分类) (5)| [表针识别](#表针识别) (1)|
+|[图像生成](#图像生成) (26)|[词向量](#词向量) (62)|[语音合成](#语音合成) (5)|[视频修复](#视频修复) (1)|-|
+|[关键点检测](#关键点检测) (5)|[机器翻译](#机器翻译) (2)|[语音识别](#语音识别) (5)|[多目标追踪](#多目标追踪) (2)|-|
+|[图像分割](#图像分割) (25)|[语义模型](#语义模型) (30)|[声音分类](#声音分类) (3)| -|-|
+|[人脸检测](#人脸检测) (7)|[情感分析](#情感分析) (7)|-|-|-|
+|[文字识别](#文字识别) (17)|[句法分析](#句法分析) (1)|-|-|-|
+|[图像编辑](#图像编辑) (8)|[同声传译](#同声传译) (5)|-|-|-|
+|[实例分割](#实例分割) (1)|[词法分析](#词法分析) (2)|-|-|-|
+|[目标检测](#目标检测) (13)|[标点恢复](#标点恢复) (1)|-|-|-|
+|[深度估计](#深度估计) (2)|[文本审核](#文本审核) (3)|-|-|-|
+
+## 图像
+ - ### 图像分类
+
+
expand
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[DriverStatusRecognition](image/classification/DriverStatusRecognition)|MobileNetV3_small_ssld|分心司机检测数据集||
+|[mobilenet_v2_animals](image/classification/mobilenet_v2_animals)|MobileNet_v2|百度自建动物数据集||
+|[repvgg_a1_imagenet](image/classification/repvgg_a1_imagenet)|RepVGG|ImageNet-2012||
+|[repvgg_a0_imagenet](image/classification/repvgg_a0_imagenet)|RepVGG|ImageNet-2012||
+|[resnext152_32x4d_imagenet](image/classification/resnext152_32x4d_imagenet)|ResNeXt|ImageNet-2012||
+|[resnet_v2_152_imagenet](image/classification/resnet_v2_152_imagenet)|ResNet V2|ImageNet-2012||
+|[resnet50_vd_animals](image/classification/resnet50_vd_animals)|ResNet50_vd|百度自建动物数据集||
+|[food_classification](image/classification/food_classification)|ResNet50_vd_ssld|美食数据集||
+|[mobilenet_v3_large_imagenet_ssld](image/classification/mobilenet_v3_large_imagenet_ssld)|Mobilenet_v3_large|ImageNet-2012||
+|[resnext152_vd_32x4d_imagenet](image/classification/resnext152_vd_32x4d_imagenet)||||
+|[ghostnet_x1_3_imagenet_ssld](image/classification/ghostnet_x1_3_imagenet_ssld)|GhostNet|ImageNet-2012||
+|[rexnet_1_5_imagenet](image/classification/rexnet_1_5_imagenet)|ReXNet|ImageNet-2012||
+|[resnext50_64x4d_imagenet](image/classification/resnext50_64x4d_imagenet)|ResNeXt|ImageNet-2012||
+|[resnext101_64x4d_imagenet](image/classification/resnext101_64x4d_imagenet)|ResNeXt|ImageNet-2012||
+|[efficientnetb0_imagenet](image/classification/efficientnetb0_imagenet)|EfficientNet|ImageNet-2012||
+|[efficientnetb1_imagenet](image/classification/efficientnetb1_imagenet)|EfficientNet|ImageNet-2012||
+|[mobilenet_v2_imagenet_ssld](image/classification/mobilenet_v2_imagenet_ssld)|Mobilenet_v2|ImageNet-2012||
+|[resnet50_vd_dishes](image/classification/resnet50_vd_dishes)|ResNet50_vd|百度自建菜品数据集||
+|[pnasnet_imagenet](image/classification/pnasnet_imagenet)|PNASNet|ImageNet-2012||
+|[rexnet_2_0_imagenet](image/classification/rexnet_2_0_imagenet)|ReXNet|ImageNet-2012||
+|[SnakeIdentification](image/classification/SnakeIdentification)|ResNet50_vd_ssld|蛇种数据集||
+|[hrnet40_imagenet](image/classification/hrnet40_imagenet)|HRNet|ImageNet-2012||
+|[resnet_v2_34_imagenet](image/classification/resnet_v2_34_imagenet)|ResNet V2|ImageNet-2012||
+|[mobilenet_v2_dishes](image/classification/mobilenet_v2_dishes)|MobileNet_v2|百度自建菜品数据集||
+|[resnext101_vd_32x4d_imagenet](image/classification/resnext101_vd_32x4d_imagenet)|ResNeXt|ImageNet-2012||
+|[repvgg_b2g4_imagenet](image/classification/repvgg_b2g4_imagenet)|RepVGG|ImageNet-2012||
+|[fix_resnext101_32x48d_wsl_imagenet](image/classification/fix_resnext101_32x48d_wsl_imagenet)|ResNeXt|ImageNet-2012||
+|[vgg13_imagenet](image/classification/vgg13_imagenet)|VGG|ImageNet-2012||
+|[se_resnext101_32x4d_imagenet](image/classification/se_resnext101_32x4d_imagenet)|SE_ResNeXt|ImageNet-2012||
+|[hrnet30_imagenet](image/classification/hrnet30_imagenet)|HRNet|ImageNet-2012||
+|[ghostnet_x1_3_imagenet](image/classification/ghostnet_x1_3_imagenet)|GhostNet|ImageNet-2012||
+|[dpn107_imagenet](image/classification/dpn107_imagenet)|DPN|ImageNet-2012||
+|[densenet161_imagenet](image/classification/densenet161_imagenet)|DenseNet|ImageNet-2012||
+|[vgg19_imagenet](image/classification/vgg19_imagenet)|vgg19_imagenet|ImageNet-2012||
+|[mobilenet_v2_imagenet](image/classification/mobilenet_v2_imagenet)|Mobilenet_v2|ImageNet-2012||
+|[resnet50_vd_10w](image/classification/resnet50_vd_10w)|ResNet_vd|百度自建数据集||
+|[resnet_v2_101_imagenet](image/classification/resnet_v2_101_imagenet)|ResNet V2 101|ImageNet-2012||
+|[darknet53_imagenet](image/classification/darknet53_imagenet)|DarkNet|ImageNet-2012||
+|[se_resnext50_32x4d_imagenet](image/classification/se_resnext50_32x4d_imagenet)|SE_ResNeXt|ImageNet-2012||
+|[se_hrnet64_imagenet_ssld](image/classification/se_hrnet64_imagenet_ssld)|HRNet|ImageNet-2012||
+|[resnext101_32x16d_wsl](image/classification/resnext101_32x16d_wsl)|ResNeXt_wsl|ImageNet-2012||
+|[hrnet18_imagenet](image/classification/hrnet18_imagenet)|HRNet|ImageNet-2012||
+|[spinalnet_res101_gemstone](image/classification/spinalnet_res101_gemstone)|resnet101|gemstone||
+|[densenet264_imagenet](image/classification/densenet264_imagenet)|DenseNet|ImageNet-2012||
+|[resnext50_vd_32x4d_imagenet](image/classification/resnext50_vd_32x4d_imagenet)|ResNeXt_vd|ImageNet-2012||
+|[SpinalNet_Gemstones](image/classification/SpinalNet_Gemstones)||||
+|[spinalnet_vgg16_gemstone](image/classification/spinalnet_vgg16_gemstone)|vgg16|gemstone||
+|[xception71_imagenet](image/classification/xception71_imagenet)|Xception|ImageNet-2012||
+|[repvgg_b2_imagenet](image/classification/repvgg_b2_imagenet)|RepVGG|ImageNet-2012||
+|[dpn68_imagenet](image/classification/dpn68_imagenet)|DPN|ImageNet-2012||
+|[alexnet_imagenet](image/classification/alexnet_imagenet)|AlexNet|ImageNet-2012||
+|[rexnet_1_3_imagenet](image/classification/rexnet_1_3_imagenet)|ReXNet|ImageNet-2012||
+|[hrnet64_imagenet](image/classification/hrnet64_imagenet)|HRNet|ImageNet-2012||
+|[efficientnetb7_imagenet](image/classification/efficientnetb7_imagenet)|EfficientNet|ImageNet-2012||
+|[efficientnetb0_small_imagenet](image/classification/efficientnetb0_small_imagenet)|EfficientNet|ImageNet-2012||
+|[efficientnetb6_imagenet](image/classification/efficientnetb6_imagenet)|EfficientNet|ImageNet-2012||
+|[hrnet48_imagenet](image/classification/hrnet48_imagenet)|HRNet|ImageNet-2012||
+|[rexnet_3_0_imagenet](image/classification/rexnet_3_0_imagenet)|ReXNet|ImageNet-2012||
+|[shufflenet_v2_imagenet](image/classification/shufflenet_v2_imagenet)|ShuffleNet V2|ImageNet-2012||
+|[ghostnet_x0_5_imagenet](image/classification/ghostnet_x0_5_imagenet)|GhostNet|ImageNet-2012||
+|[inception_v4_imagenet](image/classification/inception_v4_imagenet)|Inception_V4|ImageNet-2012||
+|[resnext101_vd_64x4d_imagenet](image/classification/resnext101_vd_64x4d_imagenet)|ResNeXt_vd|ImageNet-2012||
+|[densenet201_imagenet](image/classification/densenet201_imagenet)|DenseNet|ImageNet-2012||
+|[vgg16_imagenet](image/classification/vgg16_imagenet)|VGG|ImageNet-2012||
+|[mobilenet_v3_small_imagenet_ssld](image/classification/mobilenet_v3_small_imagenet_ssld)|Mobilenet_v3_Small|ImageNet-2012||
+|[hrnet18_imagenet_ssld](image/classification/hrnet18_imagenet_ssld)|HRNet|ImageNet-2012||
+|[resnext152_64x4d_imagenet](image/classification/resnext152_64x4d_imagenet)|ResNeXt|ImageNet-2012||
+|[efficientnetb3_imagenet](image/classification/efficientnetb3_imagenet)|EfficientNet|ImageNet-2012||
+|[efficientnetb2_imagenet](image/classification/efficientnetb2_imagenet)|EfficientNet|ImageNet-2012||
+|[repvgg_b1g4_imagenet](image/classification/repvgg_b1g4_imagenet)|RepVGG|ImageNet-2012||
+|[resnext101_32x4d_imagenet](image/classification/resnext101_32x4d_imagenet)|ResNeXt|ImageNet-2012||
+|[resnext50_32x4d_imagenet](image/classification/resnext50_32x4d_imagenet)|ResNeXt|ImageNet-2012||
+|[repvgg_a2_imagenet](image/classification/repvgg_a2_imagenet)|RepVGG|ImageNet-2012||
+|[resnext152_vd_64x4d_imagenet](image/classification/resnext152_vd_64x4d_imagenet)|ResNeXt_vd|ImageNet-2012||
+|[xception41_imagenet](image/classification/xception41_imagenet)|Xception|ImageNet-2012||
+|[googlenet_imagenet](image/classification/googlenet_imagenet)|GoogleNet|ImageNet-2012||
+|[resnet50_vd_imagenet_ssld](image/classification/resnet50_vd_imagenet_ssld)|ResNet_vd|ImageNet-2012||
+|[repvgg_b1_imagenet](image/classification/repvgg_b1_imagenet)|RepVGG|ImageNet-2012||
+|[repvgg_b0_imagenet](image/classification/repvgg_b0_imagenet)|RepVGG|ImageNet-2012||
+|[resnet_v2_50_imagenet](image/classification/resnet_v2_50_imagenet)|ResNet V2|ImageNet-2012||
+|[rexnet_1_0_imagenet](image/classification/rexnet_1_0_imagenet)|ReXNet|ImageNet-2012||
+|[resnet_v2_18_imagenet](image/classification/resnet_v2_18_imagenet)|ResNet V2|ImageNet-2012||
+|[resnext101_32x8d_wsl](image/classification/resnext101_32x8d_wsl)|ResNeXt_wsl|ImageNet-2012||
+|[efficientnetb4_imagenet](image/classification/efficientnetb4_imagenet)|EfficientNet|ImageNet-2012||
+|[efficientnetb5_imagenet](image/classification/efficientnetb5_imagenet)|EfficientNet|ImageNet-2012||
+|[repvgg_b1g2_imagenet](image/classification/repvgg_b1g2_imagenet)|RepVGG|ImageNet-2012||
+|[resnext101_32x48d_wsl](image/classification/resnext101_32x48d_wsl)|ResNeXt_wsl|ImageNet-2012||
+|[resnet50_vd_wildanimals](image/classification/resnet50_vd_wildanimals)|ResNet_vd|IFAW 自建野生动物数据集||
+|[nasnet_imagenet](image/classification/nasnet_imagenet)|NASNet|ImageNet-2012||
+|[se_resnet18_vd_imagenet](image/classification/se_resnet18_vd_imagenet)||||
+|[spinalnet_res50_gemstone](image/classification/spinalnet_res50_gemstone)|resnet50|gemstone||
+|[resnext50_vd_64x4d_imagenet](image/classification/resnext50_vd_64x4d_imagenet)|ResNeXt_vd|ImageNet-2012||
+|[resnext101_32x32d_wsl](image/classification/resnext101_32x32d_wsl)|ResNeXt_wsl|ImageNet-2012||
+|[dpn131_imagenet](image/classification/dpn131_imagenet)|DPN|ImageNet-2012||
+|[xception65_imagenet](image/classification/xception65_imagenet)|Xception|ImageNet-2012||
+|[repvgg_b3g4_imagenet](image/classification/repvgg_b3g4_imagenet)|RepVGG|ImageNet-2012||
+|[marine_biometrics](image/classification/marine_biometrics)|ResNet50_vd_ssld|Fish4Knowledge||
+|[res2net101_vd_26w_4s_imagenet](image/classification/res2net101_vd_26w_4s_imagenet)|Res2Net|ImageNet-2012||
+|[dpn98_imagenet](image/classification/dpn98_imagenet)|DPN|ImageNet-2012||
+|[resnet18_vd_imagenet](image/classification/resnet18_vd_imagenet)|ResNet_vd|ImageNet-2012||
+|[densenet121_imagenet](image/classification/densenet121_imagenet)|DenseNet|ImageNet-2012||
+|[vgg11_imagenet](image/classification/vgg11_imagenet)|VGG|ImageNet-2012||
+|[hrnet44_imagenet](image/classification/hrnet44_imagenet)|HRNet|ImageNet-2012||
+|[densenet169_imagenet](image/classification/densenet169_imagenet)|DenseNet|ImageNet-2012||
+|[hrnet32_imagenet](image/classification/hrnet32_imagenet)|HRNet|ImageNet-2012||
+|[dpn92_imagenet](image/classification/dpn92_imagenet)|DPN|ImageNet-2012||
+|[ghostnet_x1_0_imagenet](image/classification/ghostnet_x1_0_imagenet)|GhostNet|ImageNet-2012||
+|[hrnet48_imagenet_ssld](image/classification/hrnet48_imagenet_ssld)|HRNet|ImageNet-2012||
+
+
+
+
+ - ### 图像生成
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[pixel2style2pixel](image/Image_gan/gan/pixel2style2pixel/)|Pixel2Style2Pixel|-|人脸转正|
+|[stgan_bald](image/Image_gan/gan/stgan_bald/)|STGAN|CelebA|秃头生成器|
+|[styleganv2_editing](image/Image_gan/gan/styleganv2_editing)|StyleGAN V2|-|人脸编辑|
+|[wav2lip](image/Image_gan/gan/wav2lip)|wav2lip|LRS2|唇形生成|
+|[attgan_celeba](image/Image_gan/attgan_celeba/)|AttGAN|Celeba|人脸编辑|
+|[cyclegan_cityscapes](image/Image_gan/cyclegan_cityscapes)|CycleGAN|Cityscapes|实景图和语义分割结果互相转换|
+|[stargan_celeba](image/Image_gan/stargan_celeba)|StarGAN|Celeba|人脸编辑|
+|[stgan_celeba](image/Image_gan/stgan_celeba/)|STGAN|Celeba|人脸编辑|
+|[ID_Photo_GEN](image/Image_gan/style_transfer/ID_Photo_GEN)|HRNet_W18|-|证件照生成|
+|[Photo2Cartoon](image/Image_gan/style_transfer/Photo2Cartoon)|U-GAT-IT|cartoon_data|人脸卡通化|
+|[U2Net_Portrait](image/Image_gan/style_transfer/U2Net_Portrait)|U^2Net|-|人脸素描化|
+|[UGATIT_100w](image/Image_gan/style_transfer/UGATIT_100w)|U-GAT-IT|selfie2anime|人脸动漫化|
+|[UGATIT_83w](image/Image_gan/style_transfer/UGATIT_83w)|U-GAT-IT|selfie2anime|人脸动漫化|
+|[UGATIT_92w](image/Image_gan/style_transfer/UGATIT_92w)| U-GAT-IT|selfie2anime|人脸动漫化|
+|[animegan_v1_hayao_60](image/Image_gan/style_transfer/animegan_v1_hayao_60)|AnimeGAN|The Wind Rises|图像风格迁移-宫崎骏|
+|[animegan_v2_hayao_64](image/Image_gan/style_transfer/animegan_v2_hayao_64)|AnimeGAN|The Wind Rises|图像风格迁移-宫崎骏|
+|[animegan_v2_hayao_99](image/Image_gan/style_transfer/animegan_v2_hayao_99)|AnimeGAN|The Wind Rises|图像风格迁移-宫崎骏|
+|[animegan_v2_paprika_54](image/Image_gan/style_transfer/animegan_v2_paprika_54)|AnimeGAN|Paprika|图像风格迁移-今敏|
+|[animegan_v2_paprika_74](image/Image_gan/style_transfer/animegan_v2_paprika_74)|AnimeGAN|Paprika|图像风格迁移-今敏|
+|[animegan_v2_paprika_97](image/Image_gan/style_transfer/animegan_v2_paprika_97)|AnimeGAN|Paprika|图像风格迁移-今敏|
+|[animegan_v2_paprika_98](image/Image_gan/style_transfer/animegan_v2_paprika_98)|AnimeGAN|Paprika|图像风格迁移-今敏|
+|[animegan_v2_shinkai_33](image/Image_gan/style_transfer/animegan_v2_shinkai_33)|AnimeGAN|Your Name, Weathering with you|图像风格迁移-新海诚|
+|[animegan_v2_shinkai_53](image/Image_gan/style_transfer/animegan_v2_shinkai_53)|AnimeGAN|Your Name, Weathering with you|图像风格迁移-新海诚|
+|[msgnet](image/Image_gan/style_transfer/msgnet)|msgnet|COCO2014|
+|[stylepro_artistic](image/Image_gan/style_transfer/stylepro_artistic)|StyleProNet|MS-COCO + WikiArt|艺术风格迁移|
+|stylegan_ffhq|StyleGAN|FFHQ|图像风格迁移|
+
+ - ### 关键点检测
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[face_landmark_localization](image/keypoint_detection/face_landmark_localization)|Face_Landmark|AFW/AFLW|人脸关键点检测|
+|[hand_pose_localization](image/keypoint_detection/hand_pose_localization)|-|MPII, NZSL|手部关键点检测|
+|[openpose_body_estimation](image/keypoint_detection/openpose_body_estimation)|two-branch multi-stage CNN|MPII, COCO 2016|肢体关键点检测|
+|[human_pose_estimation_resnet50_mpii](image/keypoint_detection/human_pose_estimation_resnet50_mpii)|Pose_Resnet50|MPII|人体骨骼关键点检测
+|[openpose_hands_estimation](image/keypoint_detection/openpose_hands_estimation)|-|MPII, NZSL|手部关键点检测|
+
+ - ### 图像分割
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[deeplabv3p_xception65_humanseg](image/semantic_segmentation/deeplabv3p_xception65_humanseg)|deeplabv3p|百度自建数据集|人像分割|
+|[humanseg_server](image/semantic_segmentation/humanseg_server)|deeplabv3p|百度自建数据集|人像分割|
+|[humanseg_mobile](image/semantic_segmentation/humanseg_mobile)|hrnet|百度自建数据集|人像分割-移动端前置摄像头|
+|[humanseg_lite](image/semantic_segmentation/umanseg_lite)|shufflenet|百度自建数据集|轻量级人像分割-移动端实时|
+|[ExtremeC3_Portrait_Segmentation](image/semantic_segmentation/ExtremeC3_Portrait_Segmentation)|ExtremeC3|EG1800, Baidu fashion dataset|轻量化人像分割|
+|[SINet_Portrait_Segmentation](image/semantic_segmentation/SINet_Portrait_Segmentation)|SINet|EG1800, Baidu fashion dataset|轻量化人像分割|
+|[FCN_HRNet_W18_Face_Seg](image/semantic_segmentation/FCN_HRNet_W18_Face_Seg)|FCN_HRNet_W18|-|人像分割|
+|[ace2p](image/semantic_segmentation/ace2p)|ACE2P|LIP|人体解析|
+|[Pneumonia_CT_LKM_PP](image/semantic_segmentation/Pneumonia_CT_LKM_PP)|U-NET+|连心医疗授权脱敏数据集|肺炎CT影像分析|
+|[Pneumonia_CT_LKM_PP_lung](image/semantic_segmentation/Pneumonia_CT_LKM_PP_lung)|U-NET+|连心医疗授权脱敏数据集|肺炎CT影像分析|
+|[ocrnet_hrnetw18_voc](image/semantic_segmentation/ocrnet_hrnetw18_voc)|ocrnet, hrnet|PascalVoc2012|
+|[U2Net](image/semantic_segmentation/U2Net)|U^2Net|-|图像前景背景分割|
+|[U2Netp](image/semantic_segmentation/U2Netp)|U^2Net|-|图像前景背景分割|
+|[Extract_Line_Draft](image/semantic_segmentation/Extract_Line_Draft)|UNet|Pixiv|线稿提取|
+|[unet_cityscapes](image/semantic_segmentation/unet_cityscapes)|UNet|cityscapes|
+|[ocrnet_hrnetw18_cityscapes](image/semantic_segmentation/ocrnet_hrnetw18_cityscapes)|ocrnet_hrnetw18|cityscapes|
+|[hardnet_cityscapes](image/semantic_segmentation/hardnet_cityscapes)|hardnet|cityscapes|
+|[fcn_hrnetw48_voc](image/semantic_segmentation/fcn_hrnetw48_voc)|fcn_hrnetw48|PascalVoc2012|
+|[fcn_hrnetw48_cityscapes](image/semantic_segmentation/fcn_hrnetw48_cityscapes)|fcn_hrnetw48|cityscapes|
+|[fcn_hrnetw18_voc](image/semantic_segmentation/fcn_hrnetw18_voc)|fcn_hrnetw18|PascalVoc2012|
+|[fcn_hrnetw18_cityscapes](image/semantic_segmentation/fcn_hrnetw18_cityscapes)|fcn_hrnetw18|cityscapes|
+|[fastscnn_cityscapes](image/semantic_segmentation/fastscnn_cityscapes)|fastscnn|cityscapes|
+|[deeplabv3p_resnet50_voc](image/semantic_segmentation/deeplabv3p_resnet50_voc)|deeplabv3p, resnet50|PascalVoc2012|
+|[deeplabv3p_resnet50_cityscapes](image/semantic_segmentation/deeplabv3p_resnet50_cityscapes)|deeplabv3p, resnet50|cityscapes|
+|[bisenetv2_cityscapes](image/semantic_segmentation/bisenetv2_cityscapes)|bisenetv2|cityscapes|
+
+
+
+ - ### 人脸检测
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[pyramidbox_lite_mobile](image/face_detection/pyramidbox_lite_mobile)|PyramidBox|WIDER FACE数据集 + 百度自采人脸数据集|轻量级人脸检测-移动端|
+|[pyramidbox_lite_mobile_mask](image/face_detection/pyramidbox_lite_mobile_mask)|PyramidBox|WIDER FACE数据集 + 百度自采人脸数据集|轻量级人脸口罩检测-移动端|
+|[pyramidbox_lite_server_mask](image/face_detection/pyramidbox_lite_server_mask)|PyramidBox|WIDER FACE数据集 + 百度自采人脸数据集|轻量级人脸口罩检测|
+|[ultra_light_fast_generic_face_detector_1mb_640](image/face_detection/ultra_light_fast_generic_face_detector_1mb_640)|Ultra-Light-Fast-Generic-Face-Detector-1MB|WIDER FACE数据集|轻量级通用人脸检测-低算力设备|
+|[ultra_light_fast_generic_face_detector_1mb_320](image/face_detection/ultra_light_fast_generic_face_detector_1mb_320)|Ultra-Light-Fast-Generic-Face-Detector-1MB|WIDER FACE数据集|轻量级通用人脸检测-低算力设备|
+|[pyramidbox_lite_server](image/face_detection/pyramidbox_lite_server)|PyramidBox|WIDER FACE数据集 + 百度自采人脸数据集|轻量级人脸检测|
+|[pyramidbox_face_detection](image/face_detection/pyramidbox_face_detection)|PyramidBox|WIDER FACE数据集|人脸检测|
+
+ - ### 文字识别
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[chinese_ocr_db_crnn_mobile](image/text_recognition/chinese_ocr_db_crnn_mobile)|Differentiable Binarization+RCNN|icdar2015数据集|中文文字识别|[chinese_text_detection_db_mobile](image/text_recognition/chinese_text_detection_db_mobile)|Differentiable Binarization|icdar2015数据集|中文文本检测|
+|[chinese_text_detection_db_server](image/text_recognition/chinese_text_detection_db_server)|Differentiable Binarization|icdar2015数据集|中文文本检测|
+|[chinese_ocr_db_crnn_server](image/text_recognition/chinese_ocr_db_crnn_server)|Differentiable Binarization+RCNN|icdar2015数据集|中文文字识别|
+|[Vehicle_License_Plate_Recognition](image/text_recognition/Vehicle_License_Plate_Recognition)|-|CCPD|车牌识别|
+|[chinese_cht_ocr_db_crnn_mobile](image/text_recognition/chinese_cht_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|繁体中文文字识别|
+|[japan_ocr_db_crnn_mobile](image/text_recognition/japan_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|日文文字识别|
+|[korean_ocr_db_crnn_mobile](image/text_recognition/korean_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|韩文文字识别|
+|[german_ocr_db_crnn_mobile](image/text_recognition/german_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|德文文字识别|
+|[french_ocr_db_crnn_mobile](image/text_recognition/french_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|法文文字识别|
+|[latin_ocr_db_crnn_mobile](image/text_recognition/latin_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|拉丁文文字识别|
+|[cyrillic_ocr_db_crnn_mobile](image/text_recognition/cyrillic_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|斯拉夫文文字识别|
+|[multi_languages_ocr_db_crnn](image/text_recognition/multi_languages_ocr_db_crnn)|Differentiable Binarization+RCNN|icdar2015数据集|多语言文字识别|
+|[kannada_ocr_db_crnn_mobile](image/text_recognition/kannada_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|卡纳达文文字识别|
+|[arabic_ocr_db_crnn_mobile](image/text_recognition/arabic_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|阿拉伯文文字识别|
+|[telugu_ocr_db_crnn_mobile](image/text_recognition/telugu_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|泰卢固文文字识别|
+|[devanagari_ocr_db_crnn_mobile](image/text_recognition/devanagari_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|梵文文字识别|
+|[tamil_ocr_db_crnn_mobile](image/text_recognition/tamil_ocr_db_crnn_mobile)|Differentiable Binarization+CRNN|icdar2015数据集|泰米尔文文字识别|
+
+
+ - ### 图像编辑
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[realsr](image/Image_editing/super_resolution/realsr)|LP-KPN|RealSR dataset|图像/视频超分-4倍|
+|[deoldify](image/Image_editing/colorization/deoldify)|GAN|ILSVRC 2012|黑白照片/视频着色|
+|[photo_restoration](image/Image_editing/colorization/photo_restoration)|基于deoldify和realsr模型|-|老照片修复|
+|[user_guided_colorization](image/Image_editing/colorization/user_guided_colorization)|siggraph|ILSVRC 2012|图像着色|
+|[falsr_c](image/Image_editing/super_resolution/falsr_c)|falsr_c| DIV2k|轻量化超分-2倍|
+|[dcscn](image/Image_editing/super_resolution/dcscn)|dcscn| DIV2k|轻量化超分-2倍|
+|[falsr_a](image/Image_editing/super_resolution/falsr_a)|falsr_a| DIV2k|轻量化超分-2倍|
+|[falsr_b](image/Image_editing/super_resolution/falsr_b)|falsr_b|DIV2k|轻量化超分-2倍|
+
+ - ### 实例分割
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[solov2](image/instance_segmentation/solov2)|-|COCO2014|实例分割|
+
+ - ### 目标检测
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[faster_rcnn_resnet50_coco2017](image/object_detection/faster_rcnn_resnet50_coco2017)|faster_rcnn|COCO2017||
+|[ssd_vgg16_512_coco2017](image/object_detection/ssd_vgg16_512_coco2017)|SSD|COCO2017||
+|[faster_rcnn_resnet50_fpn_venus](image/object_detection/faster_rcnn_resnet50_fpn_venus)|faster_rcnn|百度自建数据集|大规模通用目标检测|
+|[ssd_vgg16_300_coco2017](image/object_detection/ssd_vgg16_300_coco2017)||||
+|[yolov3_resnet34_coco2017](image/object_detection/yolov3_resnet34_coco2017)|YOLOv3|COCO2017||
+|[yolov3_darknet53_pedestrian](image/object_detection/yolov3_darknet53_pedestrian)|YOLOv3|百度自建大规模行人数据集|行人检测|
+|[yolov3_mobilenet_v1_coco2017](image/object_detection/yolov3_mobilenet_v1_coco2017)|YOLOv3|COCO2017||
+|[ssd_mobilenet_v1_pascal](image/object_detection/ssd_mobilenet_v1_pascal)|SSD|PASCAL VOC||
+|[faster_rcnn_resnet50_fpn_coco2017](image/object_detection/faster_rcnn_resnet50_fpn_coco2017)|faster_rcnn|COCO2017||
+|[yolov3_darknet53_coco2017](image/object_detection/yolov3_darknet53_coco2017)|YOLOv3|COCO2017||
+|[yolov3_darknet53_vehicles](image/object_detection/yolov3_darknet53_vehicles)|YOLOv3|百度自建大规模车辆数据集|车辆检测|
+|[yolov3_darknet53_venus](image/object_detection/yolov3_darknet53_venus)|YOLOv3|百度自建数据集|大规模通用检测|
+|[yolov3_resnet50_vd_coco2017](image/object_detection/yolov3_resnet50_vd_coco2017)|YOLOv3|COCO2017||
+
+ - ### 深度估计
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[MiDaS_Large](image/depth_estimation/MiDaS_Large)|-|3D Movies, WSVD, ReDWeb, MegaDepth||
+|[MiDaS_Small](image/depth_estimation/MiDaS_Small)|-|3D Movies, WSVD, ReDWeb, MegaDepth, etc.||
+
+## 文本
+ - ### 文本生成
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[ernie_gen](text/text_generation/ernie_gen)|ERNIE-GEN|-|面向生成任务的预训练-微调框架|
+|[ernie_gen_poetry](text/text_generation/ernie_gen_poetry)|ERNIE-GEN|开源诗歌数据集|诗歌生成|
+|[ernie_gen_couplet](text/text_generation/ernie_gen_couplet)|ERNIE-GEN|开源对联数据集|对联生成|
+|[ernie_gen_lover_words](text/text_generation/ernie_gen_lover_words)|ERNIE-GEN|网络情诗、情话数据|情话生成|
+|[ernie_tiny_couplet](text/text_generation/ernie_tiny_couplet)|Eernie_tiny|开源对联数据集|对联生成|
+|[ernie_gen_acrostic_poetry](text/text_generation/ernie_gen_acrostic_poetry)|ERNIE-GEN|开源诗歌数据集|藏头诗生成|
+|[Rumor_prediction](text/text_generation/Rumor_prediction)|-|新浪微博中文谣言数据|谣言预测|
+|[plato-mini](text/text_generation/plato-mini)|Unified Transformer|十亿级别的中文对话数据|中文对话|
+|[plato2_en_large](text/text_generation/plato2_en_large)|plato2|开放域多轮数据集|超大规模生成式对话|
+|[plato2_en_base](text/text_generation/plato2_en_base)|plato2|开放域多轮数据集|超大规模生成式对话|
+|[CPM_LM](text/text_generation/CPM_LM)|GPT-2|自建数据集|中文文本生成|
+|[unified_transformer-12L-cn](text/text_generation/unified_transformer-12L-cn)|Unified Transformer|千万级别中文会话数据|人机多轮对话|
+|[unified_transformer-12L-cn-luge](text/text_generation/unified_transformer-12L-cn-luge)|Unified Transformer|千言对话数据集|人机多轮对话|
+|[reading_pictures_writing_poems](text/text_generation/reading_pictures_writing_poems)|多网络级联|-|看图写诗|
+|[GPT2_CPM_LM](text/text_generation/GPT2_CPM_LM)|||问答类文本生成|
+|[GPT2_Base_CN](text/text_generation/GPT2_Base_CN)|||问答类文本生成|
+
+ - ### 词向量
+
+
expand
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[w2v_weibo_target_word-bigram_dim300](text/embedding/w2v_weibo_target_word-bigram_dim300)|w2v|weibo||
+|[w2v_baidu_encyclopedia_target_word-ngram_1-2_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-ngram_1-2_dim300)|w2v|baidu_encyclopedia||
+|[w2v_literature_target_word-word_dim300](text/embedding/w2v_literature_target_word-word_dim300)|w2v|literature||
+|[word2vec_skipgram](text/embedding/word2vec_skipgram)|skip-gram|百度自建数据集||
+|[w2v_sogou_target_word-char_dim300](text/embedding/w2v_sogou_target_word-char_dim300)|w2v|sogou||
+|[w2v_weibo_target_bigram-char_dim300](text/embedding/w2v_weibo_target_bigram-char_dim300)|w2v|weibo||
+|[w2v_zhihu_target_word-bigram_dim300](text/embedding/w2v_zhihu_target_word-bigram_dim300)|w2v|zhihu||
+|[w2v_financial_target_word-word_dim300](text/embedding/w2v_financial_target_word-word_dim300)|w2v|financial||
+|[w2v_wiki_target_word-word_dim300](text/embedding/w2v_wiki_target_word-word_dim300)|w2v|wiki||
+|[w2v_baidu_encyclopedia_context_word-word_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-word_dim300)|w2v|baidu_encyclopedia||
+|[w2v_weibo_target_word-word_dim300](text/embedding/w2v_weibo_target_word-word_dim300)|w2v|weibo||
+|[w2v_zhihu_target_bigram-char_dim300](text/embedding/w2v_zhihu_target_bigram-char_dim300)|w2v|zhihu||
+|[w2v_zhihu_target_word-word_dim300](text/embedding/w2v_zhihu_target_word-word_dim300)|w2v|zhihu||
+|[w2v_people_daily_target_word-char_dim300](text/embedding/w2v_people_daily_target_word-char_dim300)|w2v|people_daily||
+|[w2v_sikuquanshu_target_word-word_dim300](text/embedding/w2v_sikuquanshu_target_word-word_dim300)|w2v|sikuquanshu||
+|[glove_twitter_target_word-word_dim200_en](text/embedding/glove_twitter_target_word-word_dim200_en)|fasttext|twitter||
+|[fasttext_crawl_target_word-word_dim300_en](text/embedding/fasttext_crawl_target_word-word_dim300_en)|fasttext|crawl||
+|[w2v_wiki_target_word-bigram_dim300](text/embedding/w2v_wiki_target_word-bigram_dim300)|w2v|wiki||
+|[w2v_baidu_encyclopedia_context_word-character_char1-1_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-character_char1-1_dim300)|w2v|baidu_encyclopedia||
+|[glove_wiki2014-gigaword_target_word-word_dim300_en](text/embedding/glove_wiki2014-gigaword_target_word-word_dim300_en)|glove|wiki2014-gigaword||
+|[glove_wiki2014-gigaword_target_word-word_dim50_en](text/embedding/glove_wiki2014-gigaword_target_word-word_dim50_en)|glove|wiki2014-gigaword||
+|[w2v_baidu_encyclopedia_context_word-ngram_2-2_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-ngram_2-2_dim300)|w2v|baidu_encyclopedia||
+|[w2v_wiki_target_bigram-char_dim300](text/embedding/w2v_wiki_target_bigram-char_dim300)|w2v|wiki||
+|[w2v_baidu_encyclopedia_target_word-character_char1-1_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-character_char1-1_dim300)|w2v|baidu_encyclopedia||
+|[w2v_financial_target_bigram-char_dim300](text/embedding/w2v_financial_target_bigram-char_dim300)|w2v|financial||
+|[glove_wiki2014-gigaword_target_word-word_dim200_en](text/embedding/glove_wiki2014-gigaword_target_word-word_dim200_en)|glove|wiki2014-gigaword||
+|[w2v_financial_target_word-bigram_dim300](text/embedding/w2v_financial_target_word-bigram_dim300)|w2v|financial||
+|[w2v_mixed-large_target_word-char_dim300](text/embedding/w2v_mixed-large_target_word-char_dim300)|w2v|mixed||
+|[w2v_baidu_encyclopedia_target_word-wordPosition_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-wordPosition_dim300)|w2v|baidu_encyclopedia||
+|[w2v_baidu_encyclopedia_context_word-ngram_1-3_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-ngram_1-3_dim300)|w2v|baidu_encyclopedia||
+|[w2v_baidu_encyclopedia_target_word-wordLR_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-wordLR_dim300)|w2v|baidu_encyclopedia||
+|[w2v_sogou_target_bigram-char_dim300](text/embedding/w2v_sogou_target_bigram-char_dim300)|w2v|sogou||
+|[w2v_weibo_target_word-char_dim300](text/embedding/w2v_weibo_target_word-char_dim300)|w2v|weibo||
+|[w2v_people_daily_target_word-word_dim300](text/embedding/w2v_people_daily_target_word-word_dim300)|w2v|people_daily||
+|[w2v_zhihu_target_word-char_dim300](text/embedding/w2v_zhihu_target_word-char_dim300)|w2v|zhihu||
+|[w2v_wiki_target_word-char_dim300](text/embedding/w2v_wiki_target_word-char_dim300)|w2v|wiki||
+|[w2v_sogou_target_word-bigram_dim300](text/embedding/w2v_sogou_target_word-bigram_dim300)|w2v|sogou||
+|[w2v_financial_target_word-char_dim300](text/embedding/w2v_financial_target_word-char_dim300)|w2v|financial||
+|[w2v_baidu_encyclopedia_target_word-ngram_1-3_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-ngram_1-3_dim300)|w2v|baidu_encyclopedia||
+|[glove_wiki2014-gigaword_target_word-word_dim100_en](text/embedding/glove_wiki2014-gigaword_target_word-word_dim100_en)|glove|wiki2014-gigaword||
+|[w2v_baidu_encyclopedia_target_word-character_char1-4_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-character_char1-4_dim300)|w2v|baidu_encyclopedia||
+|[w2v_sogou_target_word-word_dim300](text/embedding/w2v_sogou_target_word-word_dim300)|w2v|sogou||
+|[w2v_literature_target_word-char_dim300](text/embedding/w2v_literature_target_word-char_dim300)|w2v|literature||
+|[w2v_baidu_encyclopedia_target_bigram-char_dim300](text/embedding/w2v_baidu_encyclopedia_target_bigram-char_dim300)|w2v|baidu_encyclopedia||
+|[w2v_baidu_encyclopedia_target_word-word_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-word_dim300)|w2v|baidu_encyclopedia||
+|[glove_twitter_target_word-word_dim100_en](text/embedding/glove_twitter_target_word-word_dim100_en)|glove|crawl||
+|[w2v_baidu_encyclopedia_target_word-ngram_2-2_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-ngram_2-2_dim300)|w2v|baidu_encyclopedia||
+|[w2v_baidu_encyclopedia_context_word-character_char1-4_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-character_char1-4_dim300)|w2v|baidu_encyclopedia||
+|[w2v_literature_target_bigram-char_dim300](text/embedding/w2v_literature_target_bigram-char_dim300)|w2v|literature||
+|[fasttext_wiki-news_target_word-word_dim300_en](text/embedding/fasttext_wiki-news_target_word-word_dim300_en)|fasttext|wiki-news||
+|[w2v_people_daily_target_word-bigram_dim300](text/embedding/w2v_people_daily_target_word-bigram_dim300)|w2v|people_daily||
+|[w2v_mixed-large_target_word-word_dim300](text/embedding/w2v_mixed-large_target_word-word_dim300)|w2v|mixed||
+|[w2v_people_daily_target_bigram-char_dim300](text/embedding/w2v_people_daily_target_bigram-char_dim300)|w2v|people_daily||
+|[w2v_literature_target_word-bigram_dim300](text/embedding/w2v_literature_target_word-bigram_dim300)|w2v|literature||
+|[glove_twitter_target_word-word_dim25_en](text/embedding/glove_twitter_target_word-word_dim25_en)|glove|twitter||
+|[w2v_baidu_encyclopedia_context_word-ngram_1-2_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-ngram_1-2_dim300)|w2v|baidu_encyclopedia||
+|[w2v_sikuquanshu_target_word-bigram_dim300](text/embedding/w2v_sikuquanshu_target_word-bigram_dim300)|w2v|sikuquanshu||
+|[w2v_baidu_encyclopedia_context_word-character_char1-2_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-character_char1-2_dim300)|w2v|baidu_encyclopedia||
+|[glove_twitter_target_word-word_dim50_en](text/embedding/glove_twitter_target_word-word_dim50_en)|glove|twitter||
+|[w2v_baidu_encyclopedia_context_word-wordLR_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-wordLR_dim300)|w2v|baidu_encyclopedia||
+|[w2v_baidu_encyclopedia_target_word-character_char1-2_dim300](text/embedding/w2v_baidu_encyclopedia_target_word-character_char1-2_dim300)|w2v|baidu_encyclopedia||
+|[w2v_baidu_encyclopedia_context_word-wordPosition_dim300](text/embedding/w2v_baidu_encyclopedia_context_word-wordPosition_dim300)|w2v|baidu_encyclopedia||
+
+
+
+ - ### 机器翻译
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[transformer_zh-en](text/machine_translation/transformer/transformer_zh-en)|Transformer|CWMT2021|中文译英文|
+|[transformer_en-de](text/machine_translation/transformer/transformer_en-de)|Transformer|WMT14 EN-DE|英文译德文|
+
+ - ### 语义模型
+
+
expand
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[chinese_electra_small](text/language_model/chinese_electra_small)||||
+|[chinese_electra_base](text/language_model/chinese_electra_base)||||
+|[roberta-wwm-ext-large](text/language_model/roberta-wwm-ext-large)|roberta-wwm-ext-large|百度自建数据集||
+|[chinese-bert-wwm-ext](text/language_model/chinese_bert_wwm_ext)|chinese-bert-wwm-ext|百度自建数据集||
+|[lda_webpage](text/language_model/lda_webpage)|LDA|百度自建网页领域数据集||
+|[lda_novel](text/language_model/lda_novel)||||
+|[bert-base-multilingual-uncased](text/language_model/bert-base-multilingual-uncased)||||
+|[rbt3](text/language_model/rbt3)||||
+|[ernie_v2_eng_base](text/language_model/ernie_v2_eng_base)|ernie_v2_eng_base|百度自建数据集||
+|[bert-base-multilingual-cased](text/language_model/bert-base-multilingual-cased)||||
+|[rbtl3](text/language_model/rbtl3)||||
+|[chinese-bert-wwm](text/language_model/chinese_bert_wwm)|chinese-bert-wwm|百度自建数据集||
+|[bert-large-uncased](text/language_model/bert-large-uncased)||||
+|[slda_novel](text/language_model/slda_novel)||||
+|[slda_news](text/language_model/slda_news)||||
+|[electra_small](text/language_model/electra_small)||||
+|[slda_webpage](text/language_model/slda_webpage)||||
+|[bert-base-cased](text/language_model/bert-base-cased)||||
+|[slda_weibo](text/language_model/slda_weibo)||||
+|[roberta-wwm-ext](text/language_model/roberta-wwm-ext)|roberta-wwm-ext|百度自建数据集||
+|[bert-base-uncased](text/language_model/bert-base-uncased)||||
+|[electra_large](text/language_model/electra_large)||||
+|[ernie](text/language_model/ernie)|ernie-1.0|百度自建数据集||
+|[simnet_bow](text/language_model/simnet_bow)|BOW|百度自建数据集||
+|[ernie_tiny](text/language_model/ernie_tiny)|ernie_tiny|百度自建数据集||
+|[bert-base-chinese](text/language_model/bert-base-chinese)|bert-base-chinese|百度自建数据集||
+|[lda_news](text/language_model/lda_news)|LDA|百度自建新闻领域数据集||
+|[electra_base](text/language_model/electra_base)||||
+|[ernie_v2_eng_large](text/language_model/ernie_v2_eng_large)|ernie_v2_eng_large|百度自建数据集||
+|[bert-large-cased](text/language_model/bert-large-cased)||||
+
+
+
+
+ - ### 情感分析
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[ernie_skep_sentiment_analysis](text/sentiment_analysis/ernie_skep_sentiment_analysis)|SKEP|百度自建数据集|句子级情感分析|
+|[emotion_detection_textcnn](text/sentiment_analysis/emotion_detection_textcnn)|TextCNN|百度自建数据集|对话情绪识别|
+|[senta_bilstm](text/sentiment_analysis/senta_bilstm)|BiLSTM|百度自建数据集|中文情感倾向分析|
+|[senta_bow](text/sentiment_analysis/senta_bow)|BOW|百度自建数据集|中文情感倾向分析|
+|[senta_gru](text/sentiment_analysis/senta_gru)|GRU|百度自建数据集|中文情感倾向分析|
+|[senta_lstm](text/sentiment_analysis/senta_lstm)|LSTM|百度自建数据集|中文情感倾向分析|
+|[senta_cnn](text/sentiment_analysis/senta_cnn)|CNN|百度自建数据集|中文情感倾向分析|
+
+ - ### 句法分析
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[DDParser](text/syntactic_analysis/DDParser)|Deep Biaffine Attention|搜索query、网页文本、语音输入等数据|句法分析|
+
+ - ### 同声传译
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[transformer_nist_wait_1](text/simultaneous_translation/stacl/transformer_nist_wait_1)|transformer|NIST 2008-中英翻译数据集|中译英-wait-1策略|
+|[transformer_nist_wait_3](text/simultaneous_translation/stacl/transformer_nist_wait_3)|transformer|NIST 2008-中英翻译数据集|中译英-wait-3策略|
+|[transformer_nist_wait_5](text/simultaneous_translation/stacl/transformer_nist_wait_5)|transformer|NIST 2008-中英翻译数据集|中译英-wait-5策略|
+|[transformer_nist_wait_7](text/simultaneous_translation/stacl/transformer_nist_wait_7)|transformer|NIST 2008-中英翻译数据集|中译英-wait-7策略|
+|[transformer_nist_wait_all](text/simultaneous_translation/stacl/transformer_nist_wait_all)|transformer|NIST 2008-中英翻译数据集|中译英-waitk=-1策略|
+
+
+ - ### 词法分析
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[jieba_paddle](text/lexical_analysis/jieba_paddle)|BiGRU+CRF|百度自建数据集|百度自研联合的词法分析模型,能整体性地完成中文分词、词性标注、专名识别任务。在百度自建数据集上评测,LAC效果:Precision=88.0%,Recall=88.7%,F1-Score=88.4%。|
+|[lac](text/lexical_analysis/lac)|BiGRU+CRF|百度自建数据集|jieba使用Paddle搭建的切词网络(双向GRU)。同时支持jieba的传统切词方法,如精确模式、全模式、搜索引擎模式等切词模式。|
+
+ - ### 标点恢复
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[auto_punc](text/punctuation_restoration/auto_punc)|Ernie-1.0|WuDaoCorpora 2.0|自动添加7种标点符号|
+
+ - ### 文本审核
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[porn_detection_cnn](text/text_review/porn_detection_cnn)|CNN|百度自建数据集|色情检测,自动判别文本是否涉黄并给出相应的置信度,对文本中的色情描述、低俗交友、污秽文案进行识别|
+|[porn_detection_gru](text/text_review/porn_detection_gru)|GRU|百度自建数据集|色情检测,自动判别文本是否涉黄并给出相应的置信度,对文本中的色情描述、低俗交友、污秽文案进行识别|
+|[porn_detection_lstm](text/text_review/porn_detection_lstm)|LSTM|百度自建数据集|色情检测,自动判别文本是否涉黄并给出相应的置信度,对文本中的色情描述、低俗交友、污秽文案进行识别|
+
+## 语音
+ - ### 声音克隆
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[ge2e_fastspeech2_pwgan](audio/voice_cloning/ge2e_fastspeech2_pwgan)|FastSpeech2|AISHELL-3|中文语音克隆|
+|[lstm_tacotron2](audio/voice_cloning/lstm_tacotron2)|LSTM、Tacotron2、WaveFlow|AISHELL-3|中文语音克隆|
+
+ - ### 语音合成
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[transformer_tts_ljspeech](audio/tts/transformer_tts_ljspeech)|Transformer|LJSpeech-1.1|英文语音合成|
+|[fastspeech_ljspeech](audio/tts/fastspeech_ljspeech)|FastSpeech|LJSpeech-1.1|英文语音合成|
+|[fastspeech2_baker](audio/tts/fastspeech2_baker)|FastSpeech2|Chinese Standard Mandarin Speech Copus|中文语音合成|
+|[fastspeech2_ljspeech](audio/tts/fastspeech2_ljspeech)|FastSpeech2|LJSpeech-1.1|英文语音合成|
+|[deepvoice3_ljspeech](audio/tts/deepvoice3_ljspeech)|DeepVoice3|LJSpeech-1.1|英文语音合成|
+
+ - ### 语音识别
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[deepspeech2_aishell](audio/asr/deepspeech2_aishell)|DeepSpeech2|AISHELL-1|中文语音识别|
+|[deepspeech2_librispeech](audio/asr/deepspeech2_librispeech)|DeepSpeech2|LibriSpeech|英文语音识别|
+|[u2_conformer_aishell](audio/asr/u2_conformer_aishell)|Conformer|AISHELL-1|中文语音识别|
+|[u2_conformer_wenetspeech](audio/asr/u2_conformer_wenetspeech)|Conformer|WenetSpeech|中文语音识别|
+|[u2_conformer_librispeech](audio/asr/u2_conformer_librispeech)|Conformer|LibriSpeech|英文语音识别|
+
+
+ - ### 声音分类
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[panns_cnn6](audio/audio_classification/PANNs/cnn6)|PANNs|Google Audioset|主要包含4个卷积层和2个全连接层,模型参数为4.5M。经过预训练后,可以用于提取音频的embbedding,维度是512|
+|[panns_cnn14](audio/audio_classification/PANNs/cnn14)|PANNs|Google Audioset|主要包含12个卷积层和2个全连接层,模型参数为79.6M。经过预训练后,可以用于提取音频的embbedding,维度是2048|
+|[panns_cnn10](audio/audio_classification/PANNs/cnn10)|PANNs|Google Audioset|主要包含8个卷积层和2个全连接层,模型参数为4.9M。经过预训练后,可以用于提取音频的embbedding,维度是512|
+
+## 视频
+ - ### 视频分类
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[videotag_tsn_lstm](video/classification/videotag_tsn_lstm)|TSN + AttentionLSTM|百度自建数据集|大规模短视频分类打标签|
+|[tsn_kinetics400](video/classification/tsn_kinetics400)|TSN|Kinetics-400|视频分类|
+|[tsm_kinetics400](video/classification/tsm_kinetics400)|TSM|Kinetics-400|视频分类|
+|[stnet_kinetics400](video/classification/stnet_kinetics400)|StNet|Kinetics-400|视频分类|
+|[nonlocal_kinetics400](video/classification/nonlocal_kinetics400)|Non-local|Kinetics-400|视频分类|
+
+
+ - ### 视频修复
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[SkyAR](video/Video_editing/SkyAR)|UNet|UNet|视频换天|
+
+ - ### 多目标追踪
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[fairmot_dla34](video/multiple_object_tracking/fairmot_dla34)|CenterNet|Caltech Pedestrian+CityPersons+CUHK-SYSU+PRW+ETHZ+MOT17|实时多目标跟踪|
+|[jde_darknet53](video/multiple_object_tracking/jde_darknet53)|YOLOv3|Caltech Pedestrian+CityPersons+CUHK-SYSU+PRW+ETHZ+MOT17|多目标跟踪-兼顾精度和速度|
+
+## 工业应用
+
+ - ### 表针识别
+
+|module|网络|数据集|简介|
+|--|--|--|--|
+|[WatermeterSegmentation](image/semantic_segmentation/WatermeterSegmentation)|DeepLabV3|水表的数字表盘分割数据集|水表的数字表盘分割|
diff --git a/modules/audio/asr/u2_conformer_aishell/README.md b/modules/audio/asr/u2_conformer_aishell/README.md
index bd0bc64f7d200d22ad6d437541d92fdb4c405610..0f3ef2b17089711b8b107fdb905546900a9c8e3f 100644
--- a/modules/audio/asr/u2_conformer_aishell/README.md
+++ b/modules/audio/asr/u2_conformer_aishell/README.md
@@ -3,7 +3,7 @@
|模型名称|u2_conformer_aishell|
| :--- | :---: |
|类别|语音-语音识别|
-|网络|DeepSpeech2|
+|网络|Conformer|
|数据集|AISHELL-1|
|是否支持Fine-tuning|否|
|模型大小|284MB|
diff --git a/modules/audio/asr/u2_conformer_librispeech/README.md b/modules/audio/asr/u2_conformer_librispeech/README.md
index f16da3f58cda36d36337c9f974b7464da38e8a19..8e4f12fef792ba73bb7651a4f96a336e58fa8f00 100644
--- a/modules/audio/asr/u2_conformer_librispeech/README.md
+++ b/modules/audio/asr/u2_conformer_librispeech/README.md
@@ -3,7 +3,7 @@
|模型名称|u2_conformer_librispeech|
| :--- | :---: |
|类别|语音-语音识别|
-|网络|DeepSpeech2|
+|网络|Conformer|
|数据集|LibriSpeech|
|是否支持Fine-tuning|否|
|模型大小|191MB|
diff --git a/modules/audio/asr/u2_conformer_wenetspeech/README.md b/modules/audio/asr/u2_conformer_wenetspeech/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..3cc2442c3b577f4059202ea9f53a5f2eaa9cf192
--- /dev/null
+++ b/modules/audio/asr/u2_conformer_wenetspeech/README.md
@@ -0,0 +1,157 @@
+# u2_conformer_wenetspeech
+
+|模型名称|u2_conformer_wenetspeech|
+| :--- | :---: |
+|类别|语音-语音识别|
+|网络|Conformer|
+|数据集|WenetSpeech|
+|是否支持Fine-tuning|否|
+|模型大小|494MB|
+|最新更新日期|2021-12-10|
+|数据指标|中文CER 0.087 |
+
+## 一、模型基本信息
+
+### 模型介绍
+
+U2 Conformer模型是一种适用于英文和中文的end-to-end语音识别模型。u2_conformer_wenetspeech采用了conformer的encoder和transformer的decoder的模型结构,并且使用了ctc-prefix beam search的方式进行一遍打分,再利用attention decoder进行二次打分的方式进行解码来得到最终结果。
+
+u2_conformer_wenetspeech在中文普通话开源语音数据集[WenetSpeech](https://wenet-e2e.github.io/WenetSpeech/)进行了预训练,该模型在其DEV测试集上的CER指标是0.087。
+
+
+
+
+
+
+
+
+
+更多详情请参考:
+- [Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition](https://arxiv.org/abs/2012.05481)
+- [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100)
+- [WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition](https://arxiv.org/abs/2110.03370)
+
+## 二、安装
+
+- ### 1、系统依赖
+
+ - libsndfile
+ - Linux
+ ```shell
+ $ sudo apt-get install libsndfile
+ or
+ $ sudo yum install libsndfile
+ ```
+ - MacOs
+ ```
+ $ brew install libsndfile
+ ```
+
+- ### 2、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 3、安装
+
+ - ```shell
+ $ hub install u2_conformer_wenetspeech
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、预测代码示例
+
+ ```python
+ import paddlehub as hub
+
+ # 采样率为16k,格式为wav的中文语音音频
+ wav_file = '/PATH/TO/AUDIO'
+
+ model = hub.Module(
+ name='u2_conformer_wenetspeech',
+ version='1.0.0')
+ text = model.speech_recognize(wav_file)
+
+ print(text)
+ ```
+
+- ### 2、API
+ - ```python
+ def check_audio(audio_file)
+ ```
+ - 检查输入音频格式和采样率是否满足为16000,如果不满足,则重新采样至16000并将新的音频文件保存至相同目录。
+
+ - **参数**
+
+ - `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
+
+ - ```python
+ def speech_recognize(
+ audio_file,
+ device='cpu',
+ )
+ ```
+ - 将输入的音频识别成文字
+
+ - **参数**
+
+ - `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
+ - `device`:预测时使用的设备,默认为`cpu`,如需使用gpu预测,请设置为`gpu`。
+
+ - **返回**
+
+ - `text`:str类型,返回输入音频的识别文字结果。
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线的语音识别服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - ```shell
+ $ hub serving start -m u2_conformer_wenetspeech
+ ```
+
+ - 这样就完成了一个语音识别服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+
+ # 需要识别的音频的存放路径,确保部署服务的机器可访问
+ file = '/path/to/input.wav'
+
+ # 以key的方式指定text传入预测方法的时的参数,此例中为"audio_file"
+ data = {"audio_file": file}
+
+ # 发送post请求,content-type类型应指定json方式,url中的ip地址需改为对应机器的ip
+ url = "http://127.0.0.1:8866/predict/u2_conformer_wenetspeech"
+
+ # 指定post请求的headers为application/json方式
+ headers = {"Content-Type": "application/json"}
+
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ print(r.json())
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ ```shell
+ $ hub install u2_conformer_wenetspeech
+ ```
diff --git a/modules/audio/asr/u2_conformer_wenetspeech/__init__.py b/modules/audio/asr/u2_conformer_wenetspeech/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/audio/asr/u2_conformer_wenetspeech/module.py b/modules/audio/asr/u2_conformer_wenetspeech/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..51ff08c77a2baf29e31ca70dac9d9109279b00c1
--- /dev/null
+++ b/modules/audio/asr/u2_conformer_wenetspeech/module.py
@@ -0,0 +1,56 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+import paddle
+from paddleaudio import load, save_wav
+from paddlespeech.cli import ASRExecutor
+from paddlehub.module.module import moduleinfo, serving
+from paddlehub.utils.log import logger
+
+
+@moduleinfo(
+ name="u2_conformer_wenetspeech", version="1.0.0", summary="", author="Wenet", author_email="", type="audio/asr")
+class U2Conformer(paddle.nn.Layer):
+ def __init__(self):
+ super(U2Conformer, self).__init__()
+ self.asr_executor = ASRExecutor()
+ self.asr_kw_args = {
+ 'model': 'conformer_wenetspeech',
+ 'lang': 'zh',
+ 'sample_rate': 16000,
+ 'config': None, # Set `config` and `ckpt_path` to None to use pretrained model.
+ 'ckpt_path': None,
+ }
+
+ @staticmethod
+ def check_audio(audio_file):
+ assert audio_file.endswith('.wav'), 'Input file must be a wave file `*.wav`.'
+ sig, sample_rate = load(audio_file)
+ if sample_rate != 16000:
+ sig, _ = load(audio_file, 16000)
+ audio_file_16k = audio_file[:audio_file.rindex('.')] + '_16k.wav'
+ logger.info('Resampling to 16000 sample rate to new audio file: {}'.format(audio_file_16k))
+ save_wav(sig, 16000, audio_file_16k)
+ return audio_file_16k
+ else:
+ return audio_file
+
+ @serving
+ def speech_recognize(self, audio_file, device='cpu'):
+ assert os.path.isfile(audio_file), 'File not exists: {}'.format(audio_file)
+ audio_file = self.check_audio(audio_file)
+ text = self.asr_executor(audio_file=audio_file, device=device, **self.asr_kw_args)
+ return text
diff --git a/modules/audio/asr/u2_conformer_wenetspeech/requirements.txt b/modules/audio/asr/u2_conformer_wenetspeech/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..697ab54b76553598c45dfe7764a014826b393114
--- /dev/null
+++ b/modules/audio/asr/u2_conformer_wenetspeech/requirements.txt
@@ -0,0 +1 @@
+paddlespeech==0.1.0a9
diff --git a/modules/audio/keyword_spotting/kwmlp_speech_commands/README.md b/modules/audio/keyword_spotting/kwmlp_speech_commands/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..3e3357a09341435e312e2c12314e1e85b30cff53
--- /dev/null
+++ b/modules/audio/keyword_spotting/kwmlp_speech_commands/README.md
@@ -0,0 +1,98 @@
+# kwmlp_speech_commands
+
+|模型名称|kwmlp_speech_commands|
+| :--- | :---: |
+|类别|语音-语言识别|
+|网络|Keyword-MLP|
+|数据集|Google Speech Commands V2|
+|是否支持Fine-tuning|否|
+|模型大小|1.6MB|
+|最新更新日期|2022-01-04|
+|数据指标|ACC 97.56%|
+
+## 一、模型基本信息
+
+### 模型介绍
+
+kwmlp_speech_commands采用了 [Keyword-MLP](https://arxiv.org/pdf/2110.07749v1.pdf) 的轻量级模型结构,并在 [Google Speech Commands V2](https://arxiv.org/abs/1804.03209) 数据集上进行了预训练,在其测试集的测试结果为 ACC 97.56%。
+
+
+
+
+
+
+更多详情请参考
+- [Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition](https://arxiv.org/abs/1804.03209)
+- [ATTENTION-FREE KEYWORD SPOTTING](https://arxiv.org/pdf/2110.07749v1.pdf)
+- [Keyword-MLP](https://github.com/AI-Research-BD/Keyword-MLP)
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.2.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install kwmlp_speech_commands
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、预测代码示例
+
+ ```python
+ import paddlehub as hub
+
+ model = hub.Module(
+ name='kwmlp_speech_commands',
+ version='1.0.0')
+
+ # 通过下列链接可下载示例音频
+ # https://paddlehub.bj.bcebos.com/paddlehub_dev/go.wav
+
+ # Keyword spotting
+ score, label = model.keyword_recognize('no.wav')
+ print(score, label)
+ # [0.89498246] no
+ score, label = model.keyword_recognize('go.wav')
+ print(score, label)
+ # [0.8997176] go
+ score, label = model.keyword_recognize('one.wav')
+ print(score, label)
+ # [0.88598305] one
+ ```
+
+- ### 2、API
+ - ```python
+ def keyword_recognize(
+ wav: os.PathLike,
+ )
+ ```
+ - 检测音频中包含的关键词。
+
+ - **参数**
+
+ - `wav`:输入的包含关键词的音频文件,格式为`*.wav`。
+
+ - **返回**
+
+ - 输出结果的得分和对应的关键词标签。
+
+
+## 四、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ ```shell
+ $ hub install kwmlp_speech_commands
+ ```
diff --git a/modules/audio/keyword_spotting/kwmlp_speech_commands/__init__.py b/modules/audio/keyword_spotting/kwmlp_speech_commands/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..185a92b8d94d3426d616c0624f0f2ee04339349e
--- /dev/null
+++ b/modules/audio/keyword_spotting/kwmlp_speech_commands/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/modules/audio/keyword_spotting/kwmlp_speech_commands/feature.py b/modules/audio/keyword_spotting/kwmlp_speech_commands/feature.py
new file mode 100644
index 0000000000000000000000000000000000000000..900a2eab26e4414b487d6d7858381ee302a107e8
--- /dev/null
+++ b/modules/audio/keyword_spotting/kwmlp_speech_commands/feature.py
@@ -0,0 +1,59 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import math
+
+import numpy as np
+import paddle
+import paddleaudio
+
+
+def create_dct(n_mfcc: int, n_mels: int, norm: str = 'ortho'):
+ n = paddle.arange(float(n_mels))
+ k = paddle.arange(float(n_mfcc)).unsqueeze(1)
+ dct = paddle.cos(math.pi / float(n_mels) * (n + 0.5) * k) # size (n_mfcc, n_mels)
+ if norm is None:
+ dct *= 2.0
+ else:
+ assert norm == "ortho"
+ dct[0] *= 1.0 / math.sqrt(2.0)
+ dct *= math.sqrt(2.0 / float(n_mels))
+ return dct.t()
+
+
+def compute_mfcc(
+ x: paddle.Tensor,
+ sr: int = 16000,
+ n_mels: int = 40,
+ n_fft: int = 480,
+ win_length: int = 480,
+ hop_length: int = 160,
+ f_min: float = 0.0,
+ f_max: float = None,
+ center: bool = False,
+ top_db: float = 80.0,
+ norm: str = 'ortho',
+):
+ fbank = paddleaudio.features.spectrum.MelSpectrogram(
+ sr=sr,
+ n_mels=n_mels,
+ n_fft=n_fft,
+ win_length=win_length,
+ hop_length=hop_length,
+ f_min=0.0,
+ f_max=f_max,
+ center=center)(x) # waveforms batch ~ (B, T)
+ log_fbank = paddleaudio.features.spectrum.power_to_db(fbank, top_db=top_db)
+ dct_matrix = create_dct(n_mfcc=n_mels, n_mels=n_mels, norm=norm)
+ mfcc = paddle.matmul(log_fbank.transpose((0, 2, 1)), dct_matrix).transpose((0, 2, 1)) # (B, n_mels, L)
+ return mfcc
diff --git a/modules/audio/keyword_spotting/kwmlp_speech_commands/kwmlp.py b/modules/audio/keyword_spotting/kwmlp_speech_commands/kwmlp.py
new file mode 100644
index 0000000000000000000000000000000000000000..df8c37e6fb14d1f5c43b0080410d3288406cfa77
--- /dev/null
+++ b/modules/audio/keyword_spotting/kwmlp_speech_commands/kwmlp.py
@@ -0,0 +1,143 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+
+class Residual(nn.Layer):
+ def __init__(self, fn):
+ super().__init__()
+ self.fn = fn
+
+ def forward(self, x):
+ return self.fn(x) + x
+
+
+class PreNorm(nn.Layer):
+ def __init__(self, dim, fn):
+ super().__init__()
+ self.fn = fn
+ self.norm = nn.LayerNorm(dim)
+
+ def forward(self, x, **kwargs):
+ x = self.norm(x)
+ return self.fn(x, **kwargs)
+
+
+class PostNorm(nn.Layer):
+ def __init__(self, dim, fn):
+ super().__init__()
+ self.norm = nn.LayerNorm(dim)
+ self.fn = fn
+
+ def forward(self, x, **kwargs):
+ return self.norm(self.fn(x, **kwargs))
+
+
+class SpatialGatingUnit(nn.Layer):
+ def __init__(self, dim, dim_seq, act=nn.Identity(), init_eps=1e-3):
+ super().__init__()
+ dim_out = dim // 2
+
+ self.norm = nn.LayerNorm(dim_out)
+ self.proj = nn.Conv1D(dim_seq, dim_seq, 1)
+
+ self.act = act
+
+ init_eps /= dim_seq
+
+ def forward(self, x):
+ res, gate = x.split(2, axis=-1)
+ gate = self.norm(gate)
+
+ weight, bias = self.proj.weight, self.proj.bias
+ gate = F.conv1d(gate, weight, bias)
+
+ return self.act(gate) * res
+
+
+class gMLPBlock(nn.Layer):
+ def __init__(self, *, dim, dim_ff, seq_len, act=nn.Identity()):
+ super().__init__()
+ self.proj_in = nn.Sequential(nn.Linear(dim, dim_ff), nn.GELU())
+
+ self.sgu = SpatialGatingUnit(dim_ff, seq_len, act)
+ self.proj_out = nn.Linear(dim_ff // 2, dim)
+
+ def forward(self, x):
+ x = self.proj_in(x)
+ x = self.sgu(x)
+ x = self.proj_out(x)
+ return x
+
+
+class Rearrange(nn.Layer):
+ def __init__(self):
+ super().__init__()
+
+ def forward(self, x):
+ x = x.transpose([0, 1, 3, 2]).squeeze(1)
+ return x
+
+
+class Reduce(nn.Layer):
+ def __init__(self, axis=1):
+ super().__init__()
+ self.axis = axis
+
+ def forward(self, x):
+ x = x.mean(axis=self.axis, keepdim=False)
+ return x
+
+
+class KW_MLP(nn.Layer):
+ """Keyword-MLP."""
+
+ def __init__(self,
+ input_res=[40, 98],
+ patch_res=[40, 1],
+ num_classes=35,
+ dim=64,
+ depth=12,
+ ff_mult=4,
+ channels=1,
+ prob_survival=0.9,
+ pre_norm=False,
+ **kwargs):
+ super().__init__()
+ image_height, image_width = input_res
+ patch_height, patch_width = patch_res
+ assert (image_height % patch_height) == 0 and (
+ image_width % patch_width) == 0, 'image height and width must be divisible by patch size'
+ num_patches = (image_height // patch_height) * (image_width // patch_width)
+
+ P_Norm = PreNorm if pre_norm else PostNorm
+
+ dim_ff = dim * ff_mult
+
+ self.to_patch_embed = nn.Sequential(Rearrange(), nn.Linear(channels * patch_height * patch_width, dim))
+
+ self.prob_survival = prob_survival
+
+ self.layers = nn.LayerList(
+ [Residual(P_Norm(dim, gMLPBlock(dim=dim, dim_ff=dim_ff, seq_len=num_patches))) for i in range(depth)])
+
+ self.to_logits = nn.Sequential(nn.LayerNorm(dim), Reduce(axis=1), nn.Linear(dim, num_classes))
+
+ def forward(self, x):
+ x = self.to_patch_embed(x)
+ layers = self.layers
+ x = nn.Sequential(*layers)(x)
+ return self.to_logits(x)
diff --git a/modules/audio/keyword_spotting/kwmlp_speech_commands/module.py b/modules/audio/keyword_spotting/kwmlp_speech_commands/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..34342de360f2927236429baaa41789993038bd5a
--- /dev/null
+++ b/modules/audio/keyword_spotting/kwmlp_speech_commands/module.py
@@ -0,0 +1,86 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+
+import numpy as np
+import paddle
+import paddleaudio
+
+from .feature import compute_mfcc
+from .kwmlp import KW_MLP
+from paddlehub.module.module import moduleinfo
+from paddlehub.utils.log import logger
+
+
+@moduleinfo(
+ name="kwmlp_speech_commands",
+ version="1.0.0",
+ summary="",
+ author="paddlepaddle",
+ author_email="",
+ type="audio/language_identification")
+class KWS(paddle.nn.Layer):
+ def __init__(self):
+ super(KWS, self).__init__()
+ ckpt_path = os.path.join(self.directory, 'assets', 'model.pdparams')
+ label_path = os.path.join(self.directory, 'assets', 'label.txt')
+
+ self.label_list = []
+ with open(label_path, 'r') as f:
+ for l in f:
+ self.label_list.append(l.strip())
+
+ self.sr = 16000
+ model_conf = {
+ 'input_res': [40, 98],
+ 'patch_res': [40, 1],
+ 'num_classes': 35,
+ 'channels': 1,
+ 'dim': 64,
+ 'depth': 12,
+ 'pre_norm': False,
+ 'prob_survival': 0.9,
+ }
+ self.model = KW_MLP(**model_conf)
+ self.model.set_state_dict(paddle.load(ckpt_path))
+ self.model.eval()
+
+ def load_audio(self, wav):
+ wav = os.path.abspath(os.path.expanduser(wav))
+ assert os.path.isfile(wav), 'Please check wav file: {}'.format(wav)
+ waveform, _ = paddleaudio.load(wav, sr=self.sr, mono=True, normal=False)
+ return waveform
+
+ def keyword_recognize(self, wav):
+ waveform = self.load_audio(wav)
+
+ # fix_length to 1s
+ if len(waveform) > self.sr:
+ waveform = waveform[:self.sr]
+ else:
+ waveform = np.pad(waveform, (0, self.sr - len(waveform)))
+
+ logits = self(paddle.to_tensor(waveform)).reshape([-1])
+ probs = paddle.nn.functional.softmax(logits)
+ idx = paddle.argmax(probs)
+ return probs[idx].numpy(), self.label_list[idx]
+
+ def forward(self, x):
+ if len(x.shape) == 1: # x: waveform tensors with (B, T) shape
+ x = x.unsqueeze(0)
+
+ mfcc = compute_mfcc(x).unsqueeze(1) # (B, C, n_mels, L)
+ logits = self.model(mfcc).squeeze(1)
+
+ return logits
diff --git a/modules/audio/keyword_spotting/kwmlp_speech_commands/requirements.txt b/modules/audio/keyword_spotting/kwmlp_speech_commands/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..defe617fa36bc5ab7b72438034c785ee2b3ac3c9
--- /dev/null
+++ b/modules/audio/keyword_spotting/kwmlp_speech_commands/requirements.txt
@@ -0,0 +1 @@
+paddleaudio==0.1.0
diff --git a/modules/audio/language_identification/ecapa_tdnn_common_language/README.md b/modules/audio/language_identification/ecapa_tdnn_common_language/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..f648202e4c97c1a1707bb1f0a0d98949735f047d
--- /dev/null
+++ b/modules/audio/language_identification/ecapa_tdnn_common_language/README.md
@@ -0,0 +1,100 @@
+# ecapa_tdnn_common_language
+
+|模型名称|ecapa_tdnn_common_language|
+| :--- | :---: |
+|类别|语音-语言识别|
+|网络|ECAPA-TDNN|
+|数据集|CommonLanguage|
+|是否支持Fine-tuning|否|
+|模型大小|79MB|
+|最新更新日期|2021-12-30|
+|数据指标|ACC 84.9%|
+
+## 一、模型基本信息
+
+### 模型介绍
+
+ecapa_tdnn_common_language采用了[ECAPA-TDNN](https://arxiv.org/abs/2005.07143)的模型结构,并在[CommonLanguage](https://zenodo.org/record/5036977/)数据集上进行了预训练,在其测试集的测试结果为 ACC 84.9%。
+
+
+
+
+
+
+更多详情请参考
+- [CommonLanguage](https://zenodo.org/record/5036977#.Yc19b5Mzb0o)
+- [ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification](https://arxiv.org/pdf/2005.07143.pdf)
+- [The SpeechBrain Toolkit](https://github.com/speechbrain/speechbrain)
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.2.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ecapa_tdnn_common_language
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、预测代码示例
+
+ ```python
+ import paddlehub as hub
+
+ model = hub.Module(
+ name='ecapa_tdnn_common_language',
+ version='1.0.0')
+
+ # 通过下列链接可下载示例音频
+ # https://paddlehub.bj.bcebos.com/paddlehub_dev/zh.wav
+ # https://paddlehub.bj.bcebos.com/paddlehub_dev/en.wav
+ # https://paddlehub.bj.bcebos.com/paddlehub_dev/it.wav
+
+ # Language Identification
+ score, label = model.speaker_verify('zh.wav')
+ print(score, label)
+ # array([0.6214552], dtype=float32), 'Chinese_China'
+ score, label = model.speaker_verify('en.wav')
+ print(score, label)
+ # array([0.37193954], dtype=float32), 'English'
+ score, label = model.speaker_verify('it.wav')
+ print(score, label)
+ # array([0.46913534], dtype=float32), 'Italian'
+ ```
+
+- ### 2、API
+ - ```python
+ def language_identify(
+ wav: os.PathLike,
+ )
+ ```
+ - 判断输入人声音频的语言类别。
+
+ - **参数**
+
+ - `wav`:输入的说话人的音频文件,格式为`*.wav`。
+
+ - **返回**
+
+ - 输出结果的得分和对应的语言类别。
+
+
+## 四、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ ```shell
+ $ hub install ecapa_tdnn_common_language
+ ```
diff --git a/modules/audio/language_identification/ecapa_tdnn_common_language/__init__.py b/modules/audio/language_identification/ecapa_tdnn_common_language/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..185a92b8d94d3426d616c0624f0f2ee04339349e
--- /dev/null
+++ b/modules/audio/language_identification/ecapa_tdnn_common_language/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/modules/audio/language_identification/ecapa_tdnn_common_language/ecapa_tdnn.py b/modules/audio/language_identification/ecapa_tdnn_common_language/ecapa_tdnn.py
new file mode 100644
index 0000000000000000000000000000000000000000..950a9df7dd465abf56b30b5594e9b16adb49e573
--- /dev/null
+++ b/modules/audio/language_identification/ecapa_tdnn_common_language/ecapa_tdnn.py
@@ -0,0 +1,406 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import math
+import os
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+
+def length_to_mask(length, max_len=None, dtype=None):
+ assert len(length.shape) == 1
+
+ if max_len is None:
+ max_len = length.max().astype('int').item() # using arange to generate mask
+ mask = paddle.arange(max_len, dtype=length.dtype).expand((len(length), max_len)) < length.unsqueeze(1)
+
+ if dtype is None:
+ dtype = length.dtype
+
+ mask = paddle.to_tensor(mask, dtype=dtype)
+ return mask
+
+
+class Conv1d(nn.Layer):
+ def __init__(
+ self,
+ in_channels,
+ out_channels,
+ kernel_size,
+ stride=1,
+ padding="same",
+ dilation=1,
+ groups=1,
+ bias=True,
+ padding_mode="reflect",
+ ):
+ super(Conv1d, self).__init__()
+
+ self.kernel_size = kernel_size
+ self.stride = stride
+ self.dilation = dilation
+ self.padding = padding
+ self.padding_mode = padding_mode
+
+ self.conv = nn.Conv1D(
+ in_channels,
+ out_channels,
+ self.kernel_size,
+ stride=self.stride,
+ padding=0,
+ dilation=self.dilation,
+ groups=groups,
+ bias_attr=bias,
+ )
+
+ def forward(self, x):
+ if self.padding == "same":
+ x = self._manage_padding(x, self.kernel_size, self.dilation, self.stride)
+ else:
+ raise ValueError("Padding must be 'same'. Got {self.padding}")
+
+ return self.conv(x)
+
+ def _manage_padding(self, x, kernel_size: int, dilation: int, stride: int):
+ L_in = x.shape[-1] # Detecting input shape
+ padding = self._get_padding_elem(L_in, stride, kernel_size, dilation) # Time padding
+ x = F.pad(x, padding, mode=self.padding_mode, data_format="NCL") # Applying padding
+ return x
+
+ def _get_padding_elem(self, L_in: int, stride: int, kernel_size: int, dilation: int):
+ if stride > 1:
+ n_steps = math.ceil(((L_in - kernel_size * dilation) / stride) + 1)
+ L_out = stride * (n_steps - 1) + kernel_size * dilation
+ padding = [kernel_size // 2, kernel_size // 2]
+ else:
+ L_out = (L_in - dilation * (kernel_size - 1) - 1) // stride + 1
+
+ padding = [(L_in - L_out) // 2, (L_in - L_out) // 2]
+
+ return padding
+
+
+class BatchNorm1d(nn.Layer):
+ def __init__(
+ self,
+ input_size,
+ eps=1e-05,
+ momentum=0.9,
+ weight_attr=None,
+ bias_attr=None,
+ data_format='NCL',
+ use_global_stats=None,
+ ):
+ super(BatchNorm1d, self).__init__()
+
+ self.norm = nn.BatchNorm1D(
+ input_size,
+ epsilon=eps,
+ momentum=momentum,
+ weight_attr=weight_attr,
+ bias_attr=bias_attr,
+ data_format=data_format,
+ use_global_stats=use_global_stats,
+ )
+
+ def forward(self, x):
+ x_n = self.norm(x)
+ return x_n
+
+
+class TDNNBlock(nn.Layer):
+ def __init__(
+ self,
+ in_channels,
+ out_channels,
+ kernel_size,
+ dilation,
+ activation=nn.ReLU,
+ ):
+ super(TDNNBlock, self).__init__()
+ self.conv = Conv1d(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ dilation=dilation,
+ )
+ self.activation = activation()
+ self.norm = BatchNorm1d(input_size=out_channels)
+
+ def forward(self, x):
+ return self.norm(self.activation(self.conv(x)))
+
+
+class Res2NetBlock(nn.Layer):
+ def __init__(self, in_channels, out_channels, scale=8, dilation=1):
+ super(Res2NetBlock, self).__init__()
+ assert in_channels % scale == 0
+ assert out_channels % scale == 0
+
+ in_channel = in_channels // scale
+ hidden_channel = out_channels // scale
+
+ self.blocks = nn.LayerList(
+ [TDNNBlock(in_channel, hidden_channel, kernel_size=3, dilation=dilation) for i in range(scale - 1)])
+ self.scale = scale
+
+ def forward(self, x):
+ y = []
+ for i, x_i in enumerate(paddle.chunk(x, self.scale, axis=1)):
+ if i == 0:
+ y_i = x_i
+ elif i == 1:
+ y_i = self.blocks[i - 1](x_i)
+ else:
+ y_i = self.blocks[i - 1](x_i + y_i)
+ y.append(y_i)
+ y = paddle.concat(y, axis=1)
+ return y
+
+
+class SEBlock(nn.Layer):
+ def __init__(self, in_channels, se_channels, out_channels):
+ super(SEBlock, self).__init__()
+
+ self.conv1 = Conv1d(in_channels=in_channels, out_channels=se_channels, kernel_size=1)
+ self.relu = paddle.nn.ReLU()
+ self.conv2 = Conv1d(in_channels=se_channels, out_channels=out_channels, kernel_size=1)
+ self.sigmoid = paddle.nn.Sigmoid()
+
+ def forward(self, x, lengths=None):
+ L = x.shape[-1]
+ if lengths is not None:
+ mask = length_to_mask(lengths * L, max_len=L)
+ mask = mask.unsqueeze(1)
+ total = mask.sum(axis=2, keepdim=True)
+ s = (x * mask).sum(axis=2, keepdim=True) / total
+ else:
+ s = x.mean(axis=2, keepdim=True)
+
+ s = self.relu(self.conv1(s))
+ s = self.sigmoid(self.conv2(s))
+
+ return s * x
+
+
+class AttentiveStatisticsPooling(nn.Layer):
+ def __init__(self, channels, attention_channels=128, global_context=True):
+ super().__init__()
+
+ self.eps = 1e-12
+ self.global_context = global_context
+ if global_context:
+ self.tdnn = TDNNBlock(channels * 3, attention_channels, 1, 1)
+ else:
+ self.tdnn = TDNNBlock(channels, attention_channels, 1, 1)
+ self.tanh = nn.Tanh()
+ self.conv = Conv1d(in_channels=attention_channels, out_channels=channels, kernel_size=1)
+
+ def forward(self, x, lengths=None):
+ C, L = x.shape[1], x.shape[2] # KP: (N, C, L)
+
+ def _compute_statistics(x, m, axis=2, eps=self.eps):
+ mean = (m * x).sum(axis)
+ std = paddle.sqrt((m * (x - mean.unsqueeze(axis)).pow(2)).sum(axis).clip(eps))
+ return mean, std
+
+ if lengths is None:
+ lengths = paddle.ones([x.shape[0]])
+
+ # Make binary mask of shape [N, 1, L]
+ mask = length_to_mask(lengths * L, max_len=L)
+ mask = mask.unsqueeze(1)
+
+ # Expand the temporal context of the pooling layer by allowing the
+ # self-attention to look at global properties of the utterance.
+ if self.global_context:
+ total = mask.sum(axis=2, keepdim=True).astype('float32')
+ mean, std = _compute_statistics(x, mask / total)
+ mean = mean.unsqueeze(2).tile((1, 1, L))
+ std = std.unsqueeze(2).tile((1, 1, L))
+ attn = paddle.concat([x, mean, std], axis=1)
+ else:
+ attn = x
+
+ # Apply layers
+ attn = self.conv(self.tanh(self.tdnn(attn)))
+
+ # Filter out zero-paddings
+ attn = paddle.where(mask.tile((1, C, 1)) == 0, paddle.ones_like(attn) * float("-inf"), attn)
+
+ attn = F.softmax(attn, axis=2)
+ mean, std = _compute_statistics(x, attn)
+
+ # Append mean and std of the batch
+ pooled_stats = paddle.concat((mean, std), axis=1)
+ pooled_stats = pooled_stats.unsqueeze(2)
+
+ return pooled_stats
+
+
+class SERes2NetBlock(nn.Layer):
+ def __init__(
+ self,
+ in_channels,
+ out_channels,
+ res2net_scale=8,
+ se_channels=128,
+ kernel_size=1,
+ dilation=1,
+ activation=nn.ReLU,
+ ):
+ super(SERes2NetBlock, self).__init__()
+ self.out_channels = out_channels
+ self.tdnn1 = TDNNBlock(
+ in_channels,
+ out_channels,
+ kernel_size=1,
+ dilation=1,
+ activation=activation,
+ )
+ self.res2net_block = Res2NetBlock(out_channels, out_channels, res2net_scale, dilation)
+ self.tdnn2 = TDNNBlock(
+ out_channels,
+ out_channels,
+ kernel_size=1,
+ dilation=1,
+ activation=activation,
+ )
+ self.se_block = SEBlock(out_channels, se_channels, out_channels)
+
+ self.shortcut = None
+ if in_channels != out_channels:
+ self.shortcut = Conv1d(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ )
+
+ def forward(self, x, lengths=None):
+ residual = x
+ if self.shortcut:
+ residual = self.shortcut(x)
+
+ x = self.tdnn1(x)
+ x = self.res2net_block(x)
+ x = self.tdnn2(x)
+ x = self.se_block(x, lengths)
+
+ return x + residual
+
+
+class ECAPA_TDNN(nn.Layer):
+ def __init__(
+ self,
+ input_size,
+ lin_neurons=192,
+ activation=nn.ReLU,
+ channels=[512, 512, 512, 512, 1536],
+ kernel_sizes=[5, 3, 3, 3, 1],
+ dilations=[1, 2, 3, 4, 1],
+ attention_channels=128,
+ res2net_scale=8,
+ se_channels=128,
+ global_context=True,
+ ):
+
+ super(ECAPA_TDNN, self).__init__()
+ assert len(channels) == len(kernel_sizes)
+ assert len(channels) == len(dilations)
+ self.channels = channels
+ self.blocks = nn.LayerList()
+ self.emb_size = lin_neurons
+
+ # The initial TDNN layer
+ self.blocks.append(TDNNBlock(
+ input_size,
+ channels[0],
+ kernel_sizes[0],
+ dilations[0],
+ activation,
+ ))
+
+ # SE-Res2Net layers
+ for i in range(1, len(channels) - 1):
+ self.blocks.append(
+ SERes2NetBlock(
+ channels[i - 1],
+ channels[i],
+ res2net_scale=res2net_scale,
+ se_channels=se_channels,
+ kernel_size=kernel_sizes[i],
+ dilation=dilations[i],
+ activation=activation,
+ ))
+
+ # Multi-layer feature aggregation
+ self.mfa = TDNNBlock(
+ channels[-1],
+ channels[-1],
+ kernel_sizes[-1],
+ dilations[-1],
+ activation,
+ )
+
+ # Attentive Statistical Pooling
+ self.asp = AttentiveStatisticsPooling(
+ channels[-1],
+ attention_channels=attention_channels,
+ global_context=global_context,
+ )
+ self.asp_bn = BatchNorm1d(input_size=channels[-1] * 2)
+
+ # Final linear transformation
+ self.fc = Conv1d(
+ in_channels=channels[-1] * 2,
+ out_channels=self.emb_size,
+ kernel_size=1,
+ )
+
+ def forward(self, x, lengths=None):
+ xl = []
+ for layer in self.blocks:
+ try:
+ x = layer(x, lengths=lengths)
+ except TypeError:
+ x = layer(x)
+ xl.append(x)
+
+ # Multi-layer feature aggregation
+ x = paddle.concat(xl[1:], axis=1)
+ x = self.mfa(x)
+
+ # Attentive Statistical Pooling
+ x = self.asp(x, lengths=lengths)
+ x = self.asp_bn(x)
+
+ # Final linear transformation
+ x = self.fc(x)
+
+ return x
+
+
+class Classifier(nn.Layer):
+ def __init__(self, backbone, num_class, dtype=paddle.float32):
+ super(Classifier, self).__init__()
+ self.backbone = backbone
+ self.params = nn.ParameterList(
+ [paddle.create_parameter(shape=[num_class, self.backbone.emb_size], dtype=dtype)])
+
+ def forward(self, x):
+ emb = self.backbone(x.transpose([0, 2, 1])).transpose([0, 2, 1])
+ logits = F.linear(F.normalize(emb.squeeze(1)), F.normalize(self.params[0]).transpose([1, 0]))
+
+ return logits
diff --git a/modules/audio/language_identification/ecapa_tdnn_common_language/feature.py b/modules/audio/language_identification/ecapa_tdnn_common_language/feature.py
new file mode 100644
index 0000000000000000000000000000000000000000..09b930ebfd4cd56c9be1bc107f4ca6fc5f948027
--- /dev/null
+++ b/modules/audio/language_identification/ecapa_tdnn_common_language/feature.py
@@ -0,0 +1,112 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddleaudio
+from paddleaudio.features.spectrum import hz_to_mel
+from paddleaudio.features.spectrum import mel_to_hz
+from paddleaudio.features.spectrum import power_to_db
+from paddleaudio.features.spectrum import Spectrogram
+from paddleaudio.features.window import get_window
+
+
+def compute_fbank_matrix(sample_rate: int = 16000,
+ n_fft: int = 400,
+ n_mels: int = 80,
+ f_min: int = 0.0,
+ f_max: int = 8000.0):
+ mel = paddle.linspace(hz_to_mel(f_min, htk=True), hz_to_mel(f_max, htk=True), n_mels + 2, dtype=paddle.float32)
+ hz = mel_to_hz(mel, htk=True)
+
+ band = hz[1:] - hz[:-1]
+ band = band[:-1]
+ f_central = hz[1:-1]
+
+ n_stft = n_fft // 2 + 1
+ all_freqs = paddle.linspace(0, sample_rate // 2, n_stft)
+ all_freqs_mat = all_freqs.tile([f_central.shape[0], 1])
+
+ f_central_mat = f_central.tile([all_freqs_mat.shape[1], 1]).transpose([1, 0])
+ band_mat = band.tile([all_freqs_mat.shape[1], 1]).transpose([1, 0])
+
+ slope = (all_freqs_mat - f_central_mat) / band_mat
+ left_side = slope + 1.0
+ right_side = -slope + 1.0
+
+ fbank_matrix = paddle.maximum(paddle.zeros_like(left_side), paddle.minimum(left_side, right_side))
+
+ return fbank_matrix
+
+
+def compute_log_fbank(
+ x: paddle.Tensor,
+ sample_rate: int = 16000,
+ n_fft: int = 400,
+ hop_length: int = 160,
+ win_length: int = 400,
+ n_mels: int = 80,
+ window: str = 'hamming',
+ center: bool = True,
+ pad_mode: str = 'constant',
+ f_min: float = 0.0,
+ f_max: float = None,
+ top_db: float = 80.0,
+):
+
+ if f_max is None:
+ f_max = sample_rate / 2
+
+ spect = Spectrogram(
+ n_fft=n_fft, hop_length=hop_length, win_length=win_length, window=window, center=center, pad_mode=pad_mode)(x)
+
+ fbank_matrix = compute_fbank_matrix(
+ sample_rate=sample_rate,
+ n_fft=n_fft,
+ n_mels=n_mels,
+ f_min=f_min,
+ f_max=f_max,
+ )
+ fbank = paddle.matmul(fbank_matrix, spect)
+ log_fbank = power_to_db(fbank, top_db=top_db).transpose([0, 2, 1])
+ return log_fbank
+
+
+def compute_stats(x: paddle.Tensor, mean_norm: bool = True, std_norm: bool = False, eps: float = 1e-10):
+ if mean_norm:
+ current_mean = paddle.mean(x, axis=0)
+ else:
+ current_mean = paddle.to_tensor([0.0])
+
+ if std_norm:
+ current_std = paddle.std(x, axis=0)
+ else:
+ current_std = paddle.to_tensor([1.0])
+
+ current_std = paddle.maximum(current_std, eps * paddle.ones_like(current_std))
+
+ return current_mean, current_std
+
+
+def normalize(
+ x: paddle.Tensor,
+ global_mean: paddle.Tensor = None,
+ global_std: paddle.Tensor = None,
+):
+
+ for i in range(x.shape[0]): # (B, ...)
+ if global_mean is None and global_std is None:
+ mean, std = compute_stats(x[i])
+ x[i] = (x[i] - mean) / std
+ else:
+ x[i] = (x[i] - global_mean) / global_std
+ return x
diff --git a/modules/audio/language_identification/ecapa_tdnn_common_language/module.py b/modules/audio/language_identification/ecapa_tdnn_common_language/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..1950deaf1b5843c5f69269bb6982691739b0332e
--- /dev/null
+++ b/modules/audio/language_identification/ecapa_tdnn_common_language/module.py
@@ -0,0 +1,85 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import re
+from typing import List
+from typing import Union
+
+import numpy as np
+import paddle
+import paddleaudio
+
+from .ecapa_tdnn import Classifier
+from .ecapa_tdnn import ECAPA_TDNN
+from .feature import compute_log_fbank
+from .feature import normalize
+from paddlehub.module.module import moduleinfo
+from paddlehub.utils.log import logger
+
+
+@moduleinfo(
+ name="ecapa_tdnn_common_language",
+ version="1.0.0",
+ summary="",
+ author="paddlepaddle",
+ author_email="",
+ type="audio/language_identification")
+class LanguageIdentification(paddle.nn.Layer):
+ def __init__(self):
+ super(LanguageIdentification, self).__init__()
+ ckpt_path = os.path.join(self.directory, 'assets', 'model.pdparams')
+ label_path = os.path.join(self.directory, 'assets', 'label.txt')
+
+ self.label_list = []
+ with open(label_path, 'r') as f:
+ for l in f:
+ self.label_list.append(l.strip())
+
+ self.sr = 16000
+ model_conf = {
+ 'input_size': 80,
+ 'channels': [1024, 1024, 1024, 1024, 3072],
+ 'kernel_sizes': [5, 3, 3, 3, 1],
+ 'dilations': [1, 2, 3, 4, 1],
+ 'attention_channels': 128,
+ 'lin_neurons': 192
+ }
+ self.model = Classifier(
+ backbone=ECAPA_TDNN(**model_conf),
+ num_class=45,
+ )
+ self.model.set_state_dict(paddle.load(ckpt_path))
+ self.model.eval()
+
+ def load_audio(self, wav):
+ wav = os.path.abspath(os.path.expanduser(wav))
+ assert os.path.isfile(wav), 'Please check wav file: {}'.format(wav)
+ waveform, _ = paddleaudio.load(wav, sr=self.sr, mono=True, normal=False)
+ return waveform
+
+ def language_identify(self, wav):
+ waveform = self.load_audio(wav)
+ logits = self(paddle.to_tensor(waveform)).reshape([-1])
+ idx = paddle.argmax(logits)
+ return logits[idx].numpy(), self.label_list[idx]
+
+ def forward(self, x):
+ if len(x.shape) == 1:
+ x = x.unsqueeze(0)
+
+ fbank = compute_log_fbank(x) # x: waveform tensors with (B, T) shape
+ norm_fbank = normalize(fbank)
+ logits = self.model(norm_fbank).squeeze(1)
+
+ return logits
diff --git a/modules/audio/language_identification/ecapa_tdnn_common_language/requirements.txt b/modules/audio/language_identification/ecapa_tdnn_common_language/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..defe617fa36bc5ab7b72438034c785ee2b3ac3c9
--- /dev/null
+++ b/modules/audio/language_identification/ecapa_tdnn_common_language/requirements.txt
@@ -0,0 +1 @@
+paddleaudio==0.1.0
diff --git a/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/README.md b/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..70da7371cc411e535a4b53fd74a46c9a2521a016
--- /dev/null
+++ b/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/README.md
@@ -0,0 +1,128 @@
+# ecapa_tdnn_voxceleb
+
+|模型名称|ecapa_tdnn_voxceleb|
+| :--- | :---: |
+|类别|语音-声纹识别|
+|网络|ECAPA-TDNN|
+|数据集|VoxCeleb|
+|是否支持Fine-tuning|否|
+|模型大小|79MB|
+|最新更新日期|2021-12-30|
+|数据指标|EER 0.69%|
+
+## 一、模型基本信息
+
+### 模型介绍
+
+ecapa_tdnn_voxceleb采用了[ECAPA-TDNN](https://arxiv.org/abs/2005.07143)的模型结构,并在[VoxCeleb](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/)数据集上进行了预训练,在VoxCeleb1的声纹识别测试集([veri_test.txt](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test.txt))上的测试结果为 EER 0.69%,达到了该数据集的SOTA。
+
+
+
+
+
+
+
+更多详情请参考
+- [VoxCeleb: a large-scale speaker identification dataset](https://www.robots.ox.ac.uk/~vgg/publications/2017/Nagrani17/nagrani17.pdf)
+- [ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification](https://arxiv.org/pdf/2005.07143.pdf)
+- [The SpeechBrain Toolkit](https://github.com/speechbrain/speechbrain)
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.2.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ecapa_tdnn_voxceleb
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、预测代码示例
+
+ ```python
+ import paddlehub as hub
+
+ model = hub.Module(
+ name='ecapa_tdnn_voxceleb',
+ threshold=0.25,
+ version='1.0.0')
+
+ # 通过下列链接可下载示例音频
+ # https://paddlehub.bj.bcebos.com/paddlehub_dev/sv1.wav
+ # https://paddlehub.bj.bcebos.com/paddlehub_dev/sv2.wav
+
+ # Speaker Embedding
+ embedding = model.speaker_embedding('sv1.wav')
+ print(embedding.shape)
+ # (192,)
+
+ # Speaker Verification
+ score, pred = model.speaker_verify('sv1.wav', 'sv2.wav')
+ print(score, pred)
+ # [0.16354457], [False]
+ ```
+
+- ### 2、API
+ - ```python
+ def __init__(
+ threshold: float,
+ )
+ ```
+ - 初始化声纹模型,确定判别阈值。
+
+ - **参数**
+
+ - `threshold`:设定模型判别声纹相似度的得分阈值,默认为 0.25。
+
+ - ```python
+ def speaker_embedding(
+ wav: os.PathLike,
+ )
+ ```
+ - 获取输入音频的声纹特征
+
+ - **参数**
+
+ - `wav`:输入的说话人的音频文件,格式为`*.wav`。
+
+ - **返回**
+
+ - 输出纬度为 (192,) 的声纹特征向量。
+
+ - ```python
+ def speaker_verify(
+ wav1: os.PathLike,
+ wav2: os.PathLike,
+ )
+ ```
+ - 对比两段音频,分别计算其声纹特征的相似度得分,并判断是否为同一说话人。
+
+ - **参数**
+
+ - `wav1`:输入的说话人1的音频文件,格式为`*.wav`。
+ - `wav2`:输入的说话人2的音频文件,格式为`*.wav`。
+
+ - **返回**
+
+ - 返回声纹相似度得分[-1, 1]和预测结果。
+
+
+## 四、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ ```shell
+ $ hub install ecapa_tdnn_voxceleb
+ ```
diff --git a/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/__init__.py b/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..185a92b8d94d3426d616c0624f0f2ee04339349e
--- /dev/null
+++ b/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/ecapa_tdnn.py b/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/ecapa_tdnn.py
new file mode 100644
index 0000000000000000000000000000000000000000..59950860985414aaca3a46657cd11cd9645c223c
--- /dev/null
+++ b/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/ecapa_tdnn.py
@@ -0,0 +1,392 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import math
+import os
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+
+def length_to_mask(length, max_len=None, dtype=None):
+ assert len(length.shape) == 1
+
+ if max_len is None:
+ max_len = length.max().astype('int').item() # using arange to generate mask
+ mask = paddle.arange(max_len, dtype=length.dtype).expand((len(length), max_len)) < length.unsqueeze(1)
+
+ if dtype is None:
+ dtype = length.dtype
+
+ mask = paddle.to_tensor(mask, dtype=dtype)
+ return mask
+
+
+class Conv1d(nn.Layer):
+ def __init__(
+ self,
+ in_channels,
+ out_channels,
+ kernel_size,
+ stride=1,
+ padding="same",
+ dilation=1,
+ groups=1,
+ bias=True,
+ padding_mode="reflect",
+ ):
+ super(Conv1d, self).__init__()
+
+ self.kernel_size = kernel_size
+ self.stride = stride
+ self.dilation = dilation
+ self.padding = padding
+ self.padding_mode = padding_mode
+
+ self.conv = nn.Conv1D(
+ in_channels,
+ out_channels,
+ self.kernel_size,
+ stride=self.stride,
+ padding=0,
+ dilation=self.dilation,
+ groups=groups,
+ bias_attr=bias,
+ )
+
+ def forward(self, x):
+ if self.padding == "same":
+ x = self._manage_padding(x, self.kernel_size, self.dilation, self.stride)
+ else:
+ raise ValueError("Padding must be 'same'. Got {self.padding}")
+
+ return self.conv(x)
+
+ def _manage_padding(self, x, kernel_size: int, dilation: int, stride: int):
+ L_in = x.shape[-1] # Detecting input shape
+ padding = self._get_padding_elem(L_in, stride, kernel_size, dilation) # Time padding
+ x = F.pad(x, padding, mode=self.padding_mode, data_format="NCL") # Applying padding
+ return x
+
+ def _get_padding_elem(self, L_in: int, stride: int, kernel_size: int, dilation: int):
+ if stride > 1:
+ n_steps = math.ceil(((L_in - kernel_size * dilation) / stride) + 1)
+ L_out = stride * (n_steps - 1) + kernel_size * dilation
+ padding = [kernel_size // 2, kernel_size // 2]
+ else:
+ L_out = (L_in - dilation * (kernel_size - 1) - 1) // stride + 1
+
+ padding = [(L_in - L_out) // 2, (L_in - L_out) // 2]
+
+ return padding
+
+
+class BatchNorm1d(nn.Layer):
+ def __init__(
+ self,
+ input_size,
+ eps=1e-05,
+ momentum=0.9,
+ weight_attr=None,
+ bias_attr=None,
+ data_format='NCL',
+ use_global_stats=None,
+ ):
+ super(BatchNorm1d, self).__init__()
+
+ self.norm = nn.BatchNorm1D(
+ input_size,
+ epsilon=eps,
+ momentum=momentum,
+ weight_attr=weight_attr,
+ bias_attr=bias_attr,
+ data_format=data_format,
+ use_global_stats=use_global_stats,
+ )
+
+ def forward(self, x):
+ x_n = self.norm(x)
+ return x_n
+
+
+class TDNNBlock(nn.Layer):
+ def __init__(
+ self,
+ in_channels,
+ out_channels,
+ kernel_size,
+ dilation,
+ activation=nn.ReLU,
+ ):
+ super(TDNNBlock, self).__init__()
+ self.conv = Conv1d(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ dilation=dilation,
+ )
+ self.activation = activation()
+ self.norm = BatchNorm1d(input_size=out_channels)
+
+ def forward(self, x):
+ return self.norm(self.activation(self.conv(x)))
+
+
+class Res2NetBlock(nn.Layer):
+ def __init__(self, in_channels, out_channels, scale=8, dilation=1):
+ super(Res2NetBlock, self).__init__()
+ assert in_channels % scale == 0
+ assert out_channels % scale == 0
+
+ in_channel = in_channels // scale
+ hidden_channel = out_channels // scale
+
+ self.blocks = nn.LayerList(
+ [TDNNBlock(in_channel, hidden_channel, kernel_size=3, dilation=dilation) for i in range(scale - 1)])
+ self.scale = scale
+
+ def forward(self, x):
+ y = []
+ for i, x_i in enumerate(paddle.chunk(x, self.scale, axis=1)):
+ if i == 0:
+ y_i = x_i
+ elif i == 1:
+ y_i = self.blocks[i - 1](x_i)
+ else:
+ y_i = self.blocks[i - 1](x_i + y_i)
+ y.append(y_i)
+ y = paddle.concat(y, axis=1)
+ return y
+
+
+class SEBlock(nn.Layer):
+ def __init__(self, in_channels, se_channels, out_channels):
+ super(SEBlock, self).__init__()
+
+ self.conv1 = Conv1d(in_channels=in_channels, out_channels=se_channels, kernel_size=1)
+ self.relu = paddle.nn.ReLU()
+ self.conv2 = Conv1d(in_channels=se_channels, out_channels=out_channels, kernel_size=1)
+ self.sigmoid = paddle.nn.Sigmoid()
+
+ def forward(self, x, lengths=None):
+ L = x.shape[-1]
+ if lengths is not None:
+ mask = length_to_mask(lengths * L, max_len=L)
+ mask = mask.unsqueeze(1)
+ total = mask.sum(axis=2, keepdim=True)
+ s = (x * mask).sum(axis=2, keepdim=True) / total
+ else:
+ s = x.mean(axis=2, keepdim=True)
+
+ s = self.relu(self.conv1(s))
+ s = self.sigmoid(self.conv2(s))
+
+ return s * x
+
+
+class AttentiveStatisticsPooling(nn.Layer):
+ def __init__(self, channels, attention_channels=128, global_context=True):
+ super().__init__()
+
+ self.eps = 1e-12
+ self.global_context = global_context
+ if global_context:
+ self.tdnn = TDNNBlock(channels * 3, attention_channels, 1, 1)
+ else:
+ self.tdnn = TDNNBlock(channels, attention_channels, 1, 1)
+ self.tanh = nn.Tanh()
+ self.conv = Conv1d(in_channels=attention_channels, out_channels=channels, kernel_size=1)
+
+ def forward(self, x, lengths=None):
+ C, L = x.shape[1], x.shape[2] # KP: (N, C, L)
+
+ def _compute_statistics(x, m, axis=2, eps=self.eps):
+ mean = (m * x).sum(axis)
+ std = paddle.sqrt((m * (x - mean.unsqueeze(axis)).pow(2)).sum(axis).clip(eps))
+ return mean, std
+
+ if lengths is None:
+ lengths = paddle.ones([x.shape[0]])
+
+ # Make binary mask of shape [N, 1, L]
+ mask = length_to_mask(lengths * L, max_len=L)
+ mask = mask.unsqueeze(1)
+
+ # Expand the temporal context of the pooling layer by allowing the
+ # self-attention to look at global properties of the utterance.
+ if self.global_context:
+ total = mask.sum(axis=2, keepdim=True).astype('float32')
+ mean, std = _compute_statistics(x, mask / total)
+ mean = mean.unsqueeze(2).tile((1, 1, L))
+ std = std.unsqueeze(2).tile((1, 1, L))
+ attn = paddle.concat([x, mean, std], axis=1)
+ else:
+ attn = x
+
+ # Apply layers
+ attn = self.conv(self.tanh(self.tdnn(attn)))
+
+ # Filter out zero-paddings
+ attn = paddle.where(mask.tile((1, C, 1)) == 0, paddle.ones_like(attn) * float("-inf"), attn)
+
+ attn = F.softmax(attn, axis=2)
+ mean, std = _compute_statistics(x, attn)
+
+ # Append mean and std of the batch
+ pooled_stats = paddle.concat((mean, std), axis=1)
+ pooled_stats = pooled_stats.unsqueeze(2)
+
+ return pooled_stats
+
+
+class SERes2NetBlock(nn.Layer):
+ def __init__(
+ self,
+ in_channels,
+ out_channels,
+ res2net_scale=8,
+ se_channels=128,
+ kernel_size=1,
+ dilation=1,
+ activation=nn.ReLU,
+ ):
+ super(SERes2NetBlock, self).__init__()
+ self.out_channels = out_channels
+ self.tdnn1 = TDNNBlock(
+ in_channels,
+ out_channels,
+ kernel_size=1,
+ dilation=1,
+ activation=activation,
+ )
+ self.res2net_block = Res2NetBlock(out_channels, out_channels, res2net_scale, dilation)
+ self.tdnn2 = TDNNBlock(
+ out_channels,
+ out_channels,
+ kernel_size=1,
+ dilation=1,
+ activation=activation,
+ )
+ self.se_block = SEBlock(out_channels, se_channels, out_channels)
+
+ self.shortcut = None
+ if in_channels != out_channels:
+ self.shortcut = Conv1d(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ )
+
+ def forward(self, x, lengths=None):
+ residual = x
+ if self.shortcut:
+ residual = self.shortcut(x)
+
+ x = self.tdnn1(x)
+ x = self.res2net_block(x)
+ x = self.tdnn2(x)
+ x = self.se_block(x, lengths)
+
+ return x + residual
+
+
+class ECAPA_TDNN(nn.Layer):
+ def __init__(
+ self,
+ input_size,
+ lin_neurons=192,
+ activation=nn.ReLU,
+ channels=[512, 512, 512, 512, 1536],
+ kernel_sizes=[5, 3, 3, 3, 1],
+ dilations=[1, 2, 3, 4, 1],
+ attention_channels=128,
+ res2net_scale=8,
+ se_channels=128,
+ global_context=True,
+ ):
+
+ super(ECAPA_TDNN, self).__init__()
+ assert len(channels) == len(kernel_sizes)
+ assert len(channels) == len(dilations)
+ self.channels = channels
+ self.blocks = nn.LayerList()
+ self.emb_size = lin_neurons
+
+ # The initial TDNN layer
+ self.blocks.append(TDNNBlock(
+ input_size,
+ channels[0],
+ kernel_sizes[0],
+ dilations[0],
+ activation,
+ ))
+
+ # SE-Res2Net layers
+ for i in range(1, len(channels) - 1):
+ self.blocks.append(
+ SERes2NetBlock(
+ channels[i - 1],
+ channels[i],
+ res2net_scale=res2net_scale,
+ se_channels=se_channels,
+ kernel_size=kernel_sizes[i],
+ dilation=dilations[i],
+ activation=activation,
+ ))
+
+ # Multi-layer feature aggregation
+ self.mfa = TDNNBlock(
+ channels[-1],
+ channels[-1],
+ kernel_sizes[-1],
+ dilations[-1],
+ activation,
+ )
+
+ # Attentive Statistical Pooling
+ self.asp = AttentiveStatisticsPooling(
+ channels[-1],
+ attention_channels=attention_channels,
+ global_context=global_context,
+ )
+ self.asp_bn = BatchNorm1d(input_size=channels[-1] * 2)
+
+ # Final linear transformation
+ self.fc = Conv1d(
+ in_channels=channels[-1] * 2,
+ out_channels=self.emb_size,
+ kernel_size=1,
+ )
+
+ def forward(self, x, lengths=None):
+ xl = []
+ for layer in self.blocks:
+ try:
+ x = layer(x, lengths=lengths)
+ except TypeError:
+ x = layer(x)
+ xl.append(x)
+
+ # Multi-layer feature aggregation
+ x = paddle.concat(xl[1:], axis=1)
+ x = self.mfa(x)
+
+ # Attentive Statistical Pooling
+ x = self.asp(x, lengths=lengths)
+ x = self.asp_bn(x)
+
+ # Final linear transformation
+ x = self.fc(x)
+
+ return x
diff --git a/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/feature.py b/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/feature.py
new file mode 100644
index 0000000000000000000000000000000000000000..09b930ebfd4cd56c9be1bc107f4ca6fc5f948027
--- /dev/null
+++ b/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/feature.py
@@ -0,0 +1,112 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddleaudio
+from paddleaudio.features.spectrum import hz_to_mel
+from paddleaudio.features.spectrum import mel_to_hz
+from paddleaudio.features.spectrum import power_to_db
+from paddleaudio.features.spectrum import Spectrogram
+from paddleaudio.features.window import get_window
+
+
+def compute_fbank_matrix(sample_rate: int = 16000,
+ n_fft: int = 400,
+ n_mels: int = 80,
+ f_min: int = 0.0,
+ f_max: int = 8000.0):
+ mel = paddle.linspace(hz_to_mel(f_min, htk=True), hz_to_mel(f_max, htk=True), n_mels + 2, dtype=paddle.float32)
+ hz = mel_to_hz(mel, htk=True)
+
+ band = hz[1:] - hz[:-1]
+ band = band[:-1]
+ f_central = hz[1:-1]
+
+ n_stft = n_fft // 2 + 1
+ all_freqs = paddle.linspace(0, sample_rate // 2, n_stft)
+ all_freqs_mat = all_freqs.tile([f_central.shape[0], 1])
+
+ f_central_mat = f_central.tile([all_freqs_mat.shape[1], 1]).transpose([1, 0])
+ band_mat = band.tile([all_freqs_mat.shape[1], 1]).transpose([1, 0])
+
+ slope = (all_freqs_mat - f_central_mat) / band_mat
+ left_side = slope + 1.0
+ right_side = -slope + 1.0
+
+ fbank_matrix = paddle.maximum(paddle.zeros_like(left_side), paddle.minimum(left_side, right_side))
+
+ return fbank_matrix
+
+
+def compute_log_fbank(
+ x: paddle.Tensor,
+ sample_rate: int = 16000,
+ n_fft: int = 400,
+ hop_length: int = 160,
+ win_length: int = 400,
+ n_mels: int = 80,
+ window: str = 'hamming',
+ center: bool = True,
+ pad_mode: str = 'constant',
+ f_min: float = 0.0,
+ f_max: float = None,
+ top_db: float = 80.0,
+):
+
+ if f_max is None:
+ f_max = sample_rate / 2
+
+ spect = Spectrogram(
+ n_fft=n_fft, hop_length=hop_length, win_length=win_length, window=window, center=center, pad_mode=pad_mode)(x)
+
+ fbank_matrix = compute_fbank_matrix(
+ sample_rate=sample_rate,
+ n_fft=n_fft,
+ n_mels=n_mels,
+ f_min=f_min,
+ f_max=f_max,
+ )
+ fbank = paddle.matmul(fbank_matrix, spect)
+ log_fbank = power_to_db(fbank, top_db=top_db).transpose([0, 2, 1])
+ return log_fbank
+
+
+def compute_stats(x: paddle.Tensor, mean_norm: bool = True, std_norm: bool = False, eps: float = 1e-10):
+ if mean_norm:
+ current_mean = paddle.mean(x, axis=0)
+ else:
+ current_mean = paddle.to_tensor([0.0])
+
+ if std_norm:
+ current_std = paddle.std(x, axis=0)
+ else:
+ current_std = paddle.to_tensor([1.0])
+
+ current_std = paddle.maximum(current_std, eps * paddle.ones_like(current_std))
+
+ return current_mean, current_std
+
+
+def normalize(
+ x: paddle.Tensor,
+ global_mean: paddle.Tensor = None,
+ global_std: paddle.Tensor = None,
+):
+
+ for i in range(x.shape[0]): # (B, ...)
+ if global_mean is None and global_std is None:
+ mean, std = compute_stats(x[i])
+ x[i] = (x[i] - mean) / std
+ else:
+ x[i] = (x[i] - global_mean) / global_std
+ return x
diff --git a/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/module.py b/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..11f7121a5f0a7eb2b330ffeedec821171bb30bef
--- /dev/null
+++ b/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/module.py
@@ -0,0 +1,93 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import re
+from typing import List
+from typing import Union
+
+import numpy as np
+import paddle
+import paddleaudio
+
+from .ecapa_tdnn import ECAPA_TDNN
+from .feature import compute_log_fbank
+from .feature import normalize
+from paddlehub.module.module import moduleinfo
+from paddlehub.utils.log import logger
+
+
+@moduleinfo(
+ name="ecapa_tdnn_voxceleb",
+ version="1.0.0",
+ summary="",
+ author="paddlepaddle",
+ author_email="",
+ type="audio/speaker_recognition")
+class SpeakerRecognition(paddle.nn.Layer):
+ def __init__(self, threshold=0.25):
+ super(SpeakerRecognition, self).__init__()
+ global_stats_path = os.path.join(self.directory, 'assets', 'global_embedding_stats.npy')
+ ckpt_path = os.path.join(self.directory, 'assets', 'model.pdparams')
+
+ self.sr = 16000
+ self.threshold = threshold
+ model_conf = {
+ 'input_size': 80,
+ 'channels': [1024, 1024, 1024, 1024, 3072],
+ 'kernel_sizes': [5, 3, 3, 3, 1],
+ 'dilations': [1, 2, 3, 4, 1],
+ 'attention_channels': 128,
+ 'lin_neurons': 192
+ }
+ self.model = ECAPA_TDNN(**model_conf)
+ self.model.set_state_dict(paddle.load(ckpt_path))
+ self.model.eval()
+
+ global_embedding_stats = np.load(global_stats_path, allow_pickle=True)
+ self.global_emb_mean = paddle.to_tensor(global_embedding_stats.item().get('global_emb_mean'))
+ self.global_emb_std = paddle.to_tensor(global_embedding_stats.item().get('global_emb_std'))
+
+ self.similarity = paddle.nn.CosineSimilarity(axis=-1, eps=1e-6)
+
+ def load_audio(self, wav):
+ wav = os.path.abspath(os.path.expanduser(wav))
+ assert os.path.isfile(wav), 'Please check wav file: {}'.format(wav)
+ waveform, _ = paddleaudio.load(wav, sr=self.sr, mono=True, normal=False)
+ return waveform
+
+ def speaker_embedding(self, wav):
+ waveform = self.load_audio(wav)
+ embedding = self(paddle.to_tensor(waveform)).reshape([-1])
+ return embedding.numpy()
+
+ def speaker_verify(self, wav1, wav2):
+ waveform1 = self.load_audio(wav1)
+ embedding1 = self(paddle.to_tensor(waveform1)).reshape([-1])
+
+ waveform2 = self.load_audio(wav2)
+ embedding2 = self(paddle.to_tensor(waveform2)).reshape([-1])
+
+ score = self.similarity(embedding1, embedding2).numpy()
+ return score, score > self.threshold
+
+ def forward(self, x):
+ if len(x.shape) == 1:
+ x = x.unsqueeze(0)
+
+ fbank = compute_log_fbank(x) # x: waveform tensors with (B, T) shape
+ norm_fbank = normalize(fbank)
+ embedding = self.model(norm_fbank.transpose([0, 2, 1])).transpose([0, 2, 1])
+ norm_embedding = normalize(x=embedding, global_mean=self.global_emb_mean, global_std=self.global_emb_std)
+
+ return norm_embedding
diff --git a/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/requirements.txt b/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..defe617fa36bc5ab7b72438034c785ee2b3ac3c9
--- /dev/null
+++ b/modules/audio/speaker_recognition/ecapa_tdnn_voxceleb/requirements.txt
@@ -0,0 +1 @@
+paddleaudio==0.1.0
diff --git a/modules/audio/tts/deepvoice3_ljspeech/README.md b/modules/audio/tts/deepvoice3_ljspeech/README.md
index a1a659d250f9d920e3d104092d033ea3921ab854..ea5d2636f092d36694015727833c0442a2cb247a 100644
--- a/modules/audio/tts/deepvoice3_ljspeech/README.md
+++ b/modules/audio/tts/deepvoice3_ljspeech/README.md
@@ -1,120 +1,147 @@
-## 概述
+# deepvoice3_ljspeech
+
+|模型名称|deepvoice3_ljspeech|
+| :--- | :---: |
+|类别|语音-语音合成|
+|网络|DeepVoice3|
+|数据集|LJSpeech-1.1|
+|是否支持Fine-tuning|否|
+|模型大小|58MB|
+|最新更新日期|2020-10-27|
+|数据指标|-|
+
+## 一、模型基本信息
+
+### 模型介绍
Deep Voice 3是百度研究院2017年发布的端到端的TTS模型(论文录用于ICLR 2018)。它是一个基于卷积神经网络和注意力机制的seq2seq模型,由于不包含循环神经网络,它可以并行训练,远快于基于循环神经网络的模型。Deep Voice 3可以学习到多个说话人的特征,也支持搭配多种声码器使用。deepvoice3_ljspeech是基于ljspeech英文语音数据集预训练得到的英文TTS模型,仅支持预测。
-
+
更多详情参考论文[Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning](https://arxiv.org/abs/1710.07654)
-## 命令行预测
-```shell
-$ hub run deepvoice3_ljspeech --input_text='Simple as this proposition is, it is necessary to be stated' --use_gpu True --vocoder griffin-lim
-```
+## 二、安装
-## API
+- ### 1、系统依赖
-```python
-def synthesize(texts, use_gpu=False, vocoder="griffin-lim"):
-```
+ 对于Ubuntu用户,请执行:
+ ```
+ sudo apt-get install libsndfile1
+ ```
+ 对于Centos用户,请执行:
+ ```
+ sudo yum install libsndfile
+ ```
-预测API,由输入文本合成对应音频波形。
+- ### 2、环境依赖
-**参数**
+ - 2.0.0 > paddlepaddle >= 1.8.2
-* texts (list\[str\]): 待预测文本;
-* use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA\_VISIBLE\_DEVICES环境变量**;
-* vocoder: 指定声码器,可选 "griffin-lim"或"waveflow"
+ - 2.0.0 > paddlehub >= 1.7.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
-**返回**
+- ### 3、安装
-* wavs (list): 语音合成结果列表,列表中每一个元素为对应输入文本的音频波形,可使用`soundfile.write`进一步处理或保存。
-* sample\_rate (int): 合成音频的采样率。
+ - ```shell
+ $ hub install deepvoice3_ljspeech
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
-**代码示例**
-```python
-import paddlehub as hub
-import soundfile as sf
+## 三、模型API预测
-# Load deepvoice3_ljspeech module.
-module = hub.Module(name="deepvoice3_ljspeech")
+- ### 1、命令行预测
-# Predict sentiment label
-test_texts = ['Simple as this proposition is, it is necessary to be stated',
- 'Parakeet stands for Paddle PARAllel text-to-speech toolkit']
-wavs, sample_rate = module.synthesize(texts=test_texts)
-for index, wav in enumerate(wavs):
- sf.write(f"{index}.wav", wav, sample_rate)
-```
+ - ```shell
+ $ hub run deepvoice3_ljspeech --input_text='Simple as this proposition is, it is necessary to be stated' --use_gpu True --vocoder griffin-lim
+ ```
+ - 通过命令行方式实现语音合成模型的调用,更多请见[PaddleHub命令行指令](https://github.com/shinichiye/PaddleHub/blob/release/v2.1/docs/docs_ch/tutorial/cmd_usage.rst)
-## 服务部署
+- ### 2、预测代码示例
-PaddleHub Serving 可以部署在线服务。
+ - ```python
+ import paddlehub as hub
+ import soundfile as sf
-### 第一步:启动PaddleHub Serving
+ # Load deepvoice3_ljspeech module.
+ module = hub.Module(name="deepvoice3_ljspeech")
-运行启动命令:
-```shell
-$ hub serving start -m deepvoice3_ljspeech
-```
+ # Predict sentiment label
+ test_texts = ['Simple as this proposition is, it is necessary to be stated',
+ 'Parakeet stands for Paddle PARAllel text-to-speech toolkit']
+ wavs, sample_rate = module.synthesize(texts=test_texts)
+ for index, wav in enumerate(wavs):
+ sf.write(f"{index}.wav", wav, sample_rate)
+ ```
-这样就完成了一个服务化API的部署,默认端口号为8866。
+- ### 3、API
-**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+ - ```python
+ def synthesize(texts, use_gpu=False, vocoder="griffin-lim"):
+ ```
-### 第二步:发送预测请求
+ - 预测API,由输入文本合成对应音频波形。
-配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+ - **参数**
+ - texts (list\[str\]): 待预测文本;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA\_VISIBLE\_DEVICES环境变量**;
+ - vocoder: 指定声码器,可选 "griffin-lim"或"waveflow"
-```python
-import requests
-import json
+ - **返回**
+ - wavs (list): 语音合成结果列表,列表中每一个元素为对应输入文本的音频波形,可使用`soundfile.write`进一步处理或保存。
+ - sample\_rate (int): 合成音频的采样率。
-import soundfile as sf
-# 发送HTTP请求
+## 四、服务部署
-data = {'texts':['Simple as this proposition is, it is necessary to be stated',
- 'Parakeet stands for Paddle PARAllel text-to-speech toolkit'],
- 'use_gpu':False}
-headers = {"Content-type": "application/json"}
-url = "http://127.0.0.1:8866/predict/deepvoice3_ljspeech"
-r = requests.post(url=url, headers=headers, data=json.dumps(data))
+- PaddleHub Serving可以部署一个在线语音合成服务,可以将此接口用于在线web应用。
-# 保存结果
-result = r.json()["results"]
-wavs = result["wavs"]
-sample_rate = result["sample_rate"]
-for index, wav in enumerate(wavs):
- sf.write(f"{index}.wav", wav, sample_rate)
-```
+- ### 第一步:启动PaddleHub Serving
-## 查看代码
+ - 运行启动命令
+ - ```shell
+ $ hub serving start -m deepvoice3_ljspeech
+ ```
+ - 这样就完成了服务化API的部署,默认端口号为8866。
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
-https://github.com/PaddlePaddle/Parakeet
+- ### 第二步:发送预测请求
-### 依赖
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
-paddlepaddle >= 1.8.2
+ - ```python
+ import requests
+ import json
-paddlehub >= 1.7.0
+ import soundfile as sf
-**NOTE:** 除了python依赖外还必须安装libsndfile库
+ # 发送HTTP请求
-对于Ubuntu用户,请执行:
-```
-sudo apt-get install libsndfile1
-```
-对于Centos用户,请执行:
-```
-sudo yum install libsndfile
-```
+ data = {'texts':['Simple as this proposition is, it is necessary to be stated',
+ 'Parakeet stands for Paddle PARAllel text-to-speech toolkit'],
+ 'use_gpu':False}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/deepvoice3_ljspeech"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
-## 更新历史
+ # 保存结果
+ result = r.json()["results"]
+ wavs = result["wavs"]
+ sample_rate = result["sample_rate"]
+ for index, wav in enumerate(wavs):
+ sf.write(f"{index}.wav", wav, sample_rate)
+ ```
+
+
+## 五、更新历史
* 1.0.0
初始发布
+
+ ```shell
+ $ hub install deepvoice3_ljspeech
+ ```
diff --git a/modules/audio/tts/fastspeech_ljspeech/README.md b/modules/audio/tts/fastspeech_ljspeech/README.md
index a2be971d3c301bb8c591d381ce43ab27e5beb65a..93dbe77c2b81059b0e52bb2935307c08c0372b2f 100644
--- a/modules/audio/tts/fastspeech_ljspeech/README.md
+++ b/modules/audio/tts/fastspeech_ljspeech/README.md
@@ -1,121 +1,148 @@
-## 概述
+# fastspeech_ljspeech
+
+|模型名称|fastspeech_ljspeech|
+| :--- | :---: |
+|类别|语音-语音合成|
+|网络|FastSpeech|
+|数据集|LJSpeech-1.1|
+|是否支持Fine-tuning|否|
+|模型大小|320MB|
+|最新更新日期|2020-10-27|
+|数据指标|-|
+
+## 一、模型基本信息
+
+### 模型介绍
FastSpeech是基于Transformer的前馈神经网络,作者从encoder-decoder结构的teacher model中提取attention对角线来做发音持续时间预测,即使用长度调节器对文本序列进行扩展来匹配目标梅尔频谱的长度,以便并行生成梅尔频谱。该模型基本上消除了复杂情况下的跳词和重复的问题,并且可以平滑地调整语音速度,更重要的是,该模型大幅度提升了梅尔频谱的生成速度。fastspeech_ljspeech是基于ljspeech英文语音数据集预训练得到的英文TTS模型,仅支持预测。
-
+
更多详情参考论文[FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263)
-## 命令行预测
-```shell
-$ hub run fastspeech_ljspeech --input_text='Simple as this proposition is, it is necessary to be stated' --use_gpu True --vocoder griffin-lim
-```
+## 二、安装
-## API
+- ### 1、系统依赖
-```python
-def synthesize(texts, use_gpu=False, speed=1.0, vocoder="griffin-lim"):
-```
+ 对于Ubuntu用户,请执行:
+ ```
+ sudo apt-get install libsndfile1
+ ```
+ 对于Centos用户,请执行:
+ ```
+ sudo yum install libsndfile
+ ```
-预测API,由输入文本合成对应音频波形。
+- ### 2、环境依赖
-**参数**
+ - 2.0.0 > paddlepaddle >= 1.8.2
-* texts (list\[str\]): 待预测文本;
-* use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA\_VISIBLE\_DEVICES环境变量**;
-* speed(float): 音频速度,1.0表示以原速输出。
-* vocoder: 指定声码器,可选 "griffin-lim"或"waveflow"
+ - 2.0.0 > paddlehub >= 1.7.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
-**返回**
+- ### 3、安装
-* wavs (list): 语音合成结果列表,列表中每一个元素为对应输入文本的音频波形,可使用`soundfile.write`进一步处理或保存。
-* sample\_rate (int): 合成音频的采样率。
+ - ```shell
+ $ hub install fastspeech_ljspeech
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
-**代码示例**
-```python
-import paddlehub as hub
-import soundfile as sf
+## 三、模型API预测
-# Load fastspeech_ljspeech module.
-module = hub.Module(name="fastspeech_ljspeech")
+- ### 1、命令行预测
-# Predict sentiment label
-test_texts = ['Simple as this proposition is, it is necessary to be stated',
- 'Parakeet stands for Paddle PARAllel text-to-speech toolkit']
-wavs, sample_rate = module.synthesize(texts=test_texts)
-for index, wav in enumerate(wavs):
- sf.write(f"{index}.wav", wav, sample_rate)
-```
+ - ```shell
+ $ hub run fastspeech_ljspeech --input_text='Simple as this proposition is, it is necessary to be stated' --use_gpu True --vocoder griffin-lim
+ ```
+ - 通过命令行方式实现语音合成模型的调用,更多请见[PaddleHub命令行指令](https://github.com/shinichiye/PaddleHub/blob/release/v2.1/docs/docs_ch/tutorial/cmd_usage.rst)
-## 服务部署
+- ### 2、预测代码示例
-PaddleHub Serving 可以部署在线服务。
+ - ```python
+ import paddlehub as hub
+ import soundfile as sf
-### 第一步:启动PaddleHub Serving
+ # Load fastspeech_ljspeech module.
+ module = hub.Module(name="fastspeech_ljspeech")
-运行启动命令:
-```shell
-$ hub serving start -m fastspeech_ljspeech
-```
+ # Predict sentiment label
+ test_texts = ['Simple as this proposition is, it is necessary to be stated',
+ 'Parakeet stands for Paddle PARAllel text-to-speech toolkit']
+ wavs, sample_rate = module.synthesize(texts=test_texts)
+ for index, wav in enumerate(wavs):
+ sf.write(f"{index}.wav", wav, sample_rate)
+ ```
-这样就完成了一个服务化API的部署,默认端口号为8866。
+- ### 3、API
-**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+ - ```python
+ def synthesize(texts, use_gpu=False, speed=1.0, vocoder="griffin-lim"):
+ ```
-### 第二步:发送预测请求
+ - 预测API,由输入文本合成对应音频波形。
-配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+ - **参数**
+ - texts (list\[str\]): 待预测文本;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA\_VISIBLE\_DEVICES环境变量**;
+ - speed(float): 音频速度,1.0表示以原速输出。
+ - vocoder: 指定声码器,可选 "griffin-lim"或"waveflow"
-```python
-import requests
-import json
+ - **返回**
+ - wavs (list): 语音合成结果列表,列表中每一个元素为对应输入文本的音频波形,可使用`soundfile.write`进一步处理或保存。
+ - sample\_rate (int): 合成音频的采样率。
-import soundfile as sf
-# 发送HTTP请求
+## 四、服务部署
-data = {'texts':['Simple as this proposition is, it is necessary to be stated',
- 'Parakeet stands for Paddle PARAllel text-to-speech toolkit'],
- 'use_gpu':False}
-headers = {"Content-type": "application/json"}
-url = "http://127.0.0.1:8866/predict/fastspeech_ljspeech"
-r = requests.post(url=url, headers=headers, data=json.dumps(data))
+- PaddleHub Serving可以部署一个在线语音合成服务,可以将此接口用于在线web应用。
-# 保存结果
-result = r.json()["results"]
-wavs = result["wavs"]
-sample_rate = result["sample_rate"]
-for index, wav in enumerate(wavs):
- sf.write(f"{index}.wav", wav, sample_rate)
-```
+- ### 第一步:启动PaddleHub Serving
-## 查看代码
+ - 运行启动命令
+ - ```shell
+ $ hub serving start -m fastspeech_ljspeech
+ ```
+ - 这样就完成了服务化API的部署,默认端口号为8866。
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
-https://github.com/PaddlePaddle/Parakeet
+- ### 第二步:发送预测请求
-### 依赖
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
-paddlepaddle >= 1.8.2
+ - ```python
+ import requests
+ import json
-paddlehub >= 1.7.0
+ import soundfile as sf
-**NOTE:** 除了python依赖外还必须安装libsndfile库
+ # 发送HTTP请求
-对于Ubuntu用户,请执行:
-```
-sudo apt-get install libsndfile1
-```
-对于Centos用户,请执行:
-```
-sudo yum install libsndfile
-```
+ data = {'texts':['Simple as this proposition is, it is necessary to be stated',
+ 'Parakeet stands for Paddle PARAllel text-to-speech toolkit'],
+ 'use_gpu':False}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/fastspeech_ljspeech"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
-## 更新历史
+ # 保存结果
+ result = r.json()["results"]
+ wavs = result["wavs"]
+ sample_rate = result["sample_rate"]
+ for index, wav in enumerate(wavs):
+ sf.write(f"{index}.wav", wav, sample_rate)
+ ```
+
+
+## 五、更新历史
* 1.0.0
初始发布
+
+ ```shell
+ $ hub install fastspeech_ljspeech
+ ```
diff --git a/modules/audio/tts/transformer_tts_ljspeech/README.md b/modules/audio/tts/transformer_tts_ljspeech/README.md
index 2be5603ed13006f6ab2d6f5ab2d21c6381b943a7..58d1bf569fe7e637a50bfe766bb95059f0486c3e 100644
--- a/modules/audio/tts/transformer_tts_ljspeech/README.md
+++ b/modules/audio/tts/transformer_tts_ljspeech/README.md
@@ -1,119 +1,147 @@
-## 概述
+# transformer_tts_ljspeech
+
+|模型名称|transformer_tts_ljspeech|
+| :--- | :---: |
+|类别|语音-语音合成|
+|网络|Transformer|
+|数据集|LJSpeech-1.1|
+|是否支持Fine-tuning|否|
+|模型大小|54MB|
+|最新更新日期|2020-10-27|
+|数据指标|-|
+
+## 一、模型基本信息
+
+### 模型介绍
TansformerTTS 是使用了 Transformer 结构的端到端语音合成模型,对 Transformer 和 Tacotron2 进行了融合,取得了令人满意的效果。因为删除了 RNN 的循环连接,可并行的提供 decoder 的输入,进行并行训练,大大提升了模型的训练速度。transformer_tts_ljspeech是基于ljspeech英文语音数据集预训练得到的英文TTS模型,仅支持预测。
-
+
更多详情参考论文[Neural Speech Synthesis with Transformer Network](https://arxiv.org/abs/1809.08895)
-## 命令行预测
-```shell
-$ hub run transformer_tts_ljspeech --input_text="Life was like a box of chocolates, you never know what you're gonna get." --use_gpu True --vocoder griffin-lim
-```
+## 二、安装
+
+- ### 1、系统依赖
-## API
+ 对于Ubuntu用户,请执行:
+ ```
+ sudo apt-get install libsndfile1
+ ```
+ 对于Centos用户,请执行:
+ ```
+ sudo yum install libsndfile
+ ```
-```python
-def synthesize(texts, use_gpu=False, vocoder="griffin-lim"):
-```
+- ### 2、环境依赖
-预测API,由输入文本合成对应音频波形。
+ - 2.0.0 > paddlepaddle >= 1.8.2
-**参数**
+ - 2.0.0 > paddlehub >= 1.7.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
-* texts (list\[str\]): 待预测文本;
-* use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA\_VISIBLE\_DEVICES环境变量**;
-* vocoder: 指定声码器,可选 "griffin-lim"或"waveflow"
+- ### 3、安装
-**返回**
+ - ```shell
+ $ hub install transformer_tts_ljspeech
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
-* wavs (list): 语音合成结果列表,列表中每一个元素为对应输入文本的音频波形,可使用`soundfile.write`进一步处理或保存。
-* sample\_rate (int): 合成音频的采样率。
-**代码示例**
+## 三、模型API预测
-```python
-import paddlehub as hub
-import soundfile as sf
+- ### 1、命令行预测
-# Load transformer_tts_ljspeech module.
-module = hub.Module(name="transformer_tts_ljspeech")
+ - ```shell
+ $ hub run transformer_tts_ljspeech --input_text="Life was like a box of chocolates, you never know what you're gonna get." --use_gpu True --vocoder griffin-lim
+ ```
+ - 通过命令行方式实现语音合成模型的调用,更多请见[PaddleHub命令行指令](https://github.com/shinichiye/PaddleHub/blob/release/v2.1/docs/docs_ch/tutorial/cmd_usage.rst)
-# Predict sentiment label
-test_texts = ["Life was like a box of chocolates, you never know what you're gonna get."]
-wavs, sample_rate = module.synthesize(texts=test_texts, use_gpu=True, vocoder="waveflow")
-for index, wav in enumerate(wavs):
- sf.write(f"{index}.wav", wav, sample_rate)
-```
+- ### 2、预测代码示例
-## 服务部署
+ - ```python
+ import paddlehub as hub
+ import soundfile as sf
-PaddleHub Serving 可以部署在线服务。
+ # Load transformer_tts_ljspeech module.
+ module = hub.Module(name="transformer_tts_ljspeech")
-### 第一步:启动PaddleHub Serving
+ # Predict sentiment label
+ test_texts = ["Life was like a box of chocolates, you never know what you're gonna get."]
+ wavs, sample_rate = module.synthesize(texts=test_texts, use_gpu=True, vocoder="waveflow")
+ for index, wav in enumerate(wavs):
+ sf.write(f"{index}.wav", wav, sample_rate)
+ ```
-运行启动命令:
-```shell
-$ hub serving start -m transformer_tts_ljspeech
-```
+- ### 3、API
-这样就完成了一个服务化API的部署,默认端口号为8866。
+ - ```python
+ def synthesize(texts, use_gpu=False, vocoder="griffin-lim"):
+ ```
-**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+ - 预测API,由输入文本合成对应音频波形。
-### 第二步:发送预测请求
+ - **参数**
+ - texts (list\[str\]): 待预测文本;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA\_VISIBLE\_DEVICES环境变量**;
+ - vocoder: 指定声码器,可选 "griffin-lim"或"waveflow"
-配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+ - **返回**
+ - wavs (list): 语音合成结果列表,列表中每一个元素为对应输入文本的音频波形,可使用`soundfile.write`进一步处理或保存。
+ - sample\_rate (int): 合成音频的采样率。
-```python
-import requests
-import json
-import soundfile as sf
+## 四、服务部署
-# 发送HTTP请求
+- PaddleHub Serving可以部署一个在线语音合成服务,可以将此接口用于在线web应用。
-data = {'texts':['Simple as this proposition is, it is necessary to be stated',
- 'Parakeet stands for Paddle PARAllel text-to-speech toolkit'],
- 'use_gpu':False}
-headers = {"Content-type": "application/json"}
-url = "http://127.0.0.1:8866/predict/transformer_tts_ljspeech"
-r = requests.post(url=url, headers=headers, data=json.dumps(data))
+- ### 第一步:启动PaddleHub Serving
-# 保存结果
-result = r.json()["results"]
-wavs = result["wavs"]
-sample_rate = result["sample_rate"]
-for index, wav in enumerate(wavs):
- sf.write(f"{index}.wav", wav, sample_rate)
-```
+ - 运行启动命令
-## 查看代码
+ - ```shell
+ $ hub serving start -m transformer_tts_ljspeech
+ ```
+ - 这样就完成了服务化API的部署,默认端口号为8866。
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
-https://github.com/PaddlePaddle/Parakeet
+- ### 第二步:发送预测请求
-### 依赖
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
-paddlepaddle >= 1.8.2
+ - ```python
+ import requests
+ import json
-paddlehub >= 1.7.0
+ import soundfile as sf
-**NOTE:** 除了python依赖外还必须安装libsndfile库
+ # 发送HTTP请求
-对于Ubuntu用户,请执行:
-```
-sudo apt-get install libsndfile1
-```
-对于Centos用户,请执行:
-```
-sudo yum install libsndfile
-```
+ data = {'texts':['Simple as this proposition is, it is necessary to be stated',
+ 'Parakeet stands for Paddle PARAllel text-to-speech toolkit'],
+ 'use_gpu':False}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/transformer_tts_ljspeech"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
-## 更新历史
+ # 保存结果
+ result = r.json()["results"]
+ wavs = result["wavs"]
+ sample_rate = result["sample_rate"]
+ for index, wav in enumerate(wavs):
+ sf.write(f"{index}.wav", wav, sample_rate)
+ ```
+
+
+## 五、更新历史
* 1.0.0
初始发布
+
+ ```shell
+ $ hub install transformer_tts_ljspeech
+ ```
diff --git a/modules/audio/voice_cloning/ge2e_fastspeech2_pwgan/README.md b/modules/audio/voice_cloning/ge2e_fastspeech2_pwgan/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..a4f9ac8a29269f31ea653db70d5ff92f36718672
--- /dev/null
+++ b/modules/audio/voice_cloning/ge2e_fastspeech2_pwgan/README.md
@@ -0,0 +1,111 @@
+# ge2e_fastspeech2_pwgan
+
+|模型名称|ge2e_fastspeech2_pwgan|
+| :--- | :---: |
+|类别|语音-声音克隆|
+|网络|FastSpeech2|
+|数据集|AISHELL-3|
+|是否支持Fine-tuning|否|
+|模型大小|462MB|
+|最新更新日期|2021-12-17|
+|数据指标|-|
+
+## 一、模型基本信息
+
+### 模型介绍
+
+声音克隆是指使用特定的音色,结合文字的读音合成音频,使得合成后的音频具有目标说话人的特征,从而达到克隆的目的。
+
+在训练语音克隆模型时,目标音色作为Speaker Encoder的输入,模型会提取这段语音的说话人特征(音色)作为Speaker Embedding。接着,在训练模型重新合成此类音色的语音时,除了输入的目标文本外,说话人的特征也将成为额外条件加入模型的训练。
+
+在预测时,选取一段新的目标音色作为Speaker Encoder的输入,并提取其说话人特征,最终实现输入为一段文本和一段目标音色,模型生成目标音色说出此段文本的语音片段。
+
+![](https://ai-studio-static-online.cdn.bcebos.com/982ab955b87244d3bae3b003aff8e28d9ec159ff0d6246a79757339076dfe7d4)
+
+`ge2e_fastspeech2_pwgan`是一个支持中文的语音克隆模型,分别使用了LSTMSpeakerEncoder、FastSpeech2和PWGan模型分别用于语音特征提取、目标音频特征合成和语音波形转换。
+
+关于模型的详请可参考[PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech)。
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ge2e_fastspeech2_pwgan
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ge2e_fastspeech2_pwgan', output_dir='./', speaker_audio='/data/man.wav') # 指定目标音色音频文件
+ texts = [
+ '语音的表现形式在未来将变得越来越重要$',
+ '今天的天气怎么样$', ]
+ wavs = model.generate(texts, use_gpu=True)
+
+ for text, wav in zip(texts, wavs):
+ print('='*30)
+ print(f'Text: {text}')
+ print(f'Wav: {wav}')
+ ```
+
+- ### 2、API
+ - ```python
+ def __init__(speaker_audio: str = None,
+ output_dir: str = './')
+ ```
+ - 初始化module,可配置模型的目标音色的音频文件和输出的路径。
+
+ - **参数**
+ - `speaker_audio`(str): 目标说话人语音音频文件(*.wav)的路径,默认为None(使用默认的女声作为目标音色)。
+ - `output_dir`(str): 合成音频的输出文件,默认为当前目录。
+
+
+ - ```python
+ def get_speaker_embedding()
+ ```
+ - 获取模型的目标说话人特征。
+
+ - **返回**
+ - `results`(numpy.ndarray): 长度为256的numpy数组,代表目标说话人的特征。
+
+ - ```python
+ def set_speaker_embedding(speaker_audio: str)
+ ```
+ - 设置模型的目标说话人特征。
+
+ - **参数**
+ - `speaker_audio`(str): 必填,目标说话人语音音频文件(*.wav)的路径。
+
+ - ```python
+ def generate(data: Union[str, List[str]], use_gpu: bool = False):
+ ```
+ - 根据输入文字,合成目标说话人的语音音频文件。
+
+ - **参数**
+ - `data`(Union[str, List[str]]): 必填,目标音频的内容文本列表,目前只支持中文,不支持添加标点符号。
+ - `use_gpu`(bool): 是否使用gpu执行计算,默认为False。
+
+
+## 四、更新历史
+
+* 1.0.0
+
+ 初始发布。
+
+ ```shell
+ $ hub install ge2e_fastspeech2_pwgan
+ ```
diff --git a/modules/audio/voice_cloning/ge2e_fastspeech2_pwgan/__init__.py b/modules/audio/voice_cloning/ge2e_fastspeech2_pwgan/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/audio/voice_cloning/ge2e_fastspeech2_pwgan/module.py b/modules/audio/voice_cloning/ge2e_fastspeech2_pwgan/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..9bea0832b9d67319a9ecf318ca1f3df9128df305
--- /dev/null
+++ b/modules/audio/voice_cloning/ge2e_fastspeech2_pwgan/module.py
@@ -0,0 +1,160 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import List, Union
+
+import numpy as np
+import paddle
+import soundfile as sf
+import yaml
+from yacs.config import CfgNode
+
+from paddlehub.env import MODULE_HOME
+from paddlehub.module.module import moduleinfo, serving
+from paddlehub.utils.log import logger
+from paddlespeech.t2s.frontend.zh_frontend import Frontend
+from paddlespeech.t2s.models.fastspeech2 import FastSpeech2
+from paddlespeech.t2s.models.fastspeech2 import FastSpeech2Inference
+from paddlespeech.t2s.models.parallel_wavegan import PWGGenerator
+from paddlespeech.t2s.models.parallel_wavegan import PWGInference
+from paddlespeech.t2s.modules.normalizer import ZScore
+from paddlespeech.vector.exps.ge2e.audio_processor import SpeakerVerificationPreprocessor
+from paddlespeech.vector.models.lstm_speaker_encoder import LSTMSpeakerEncoder
+
+
+@moduleinfo(
+ name="ge2e_fastspeech2_pwgan",
+ version="1.0.0",
+ summary="",
+ author="paddlepaddle",
+ author_email="",
+ type="audio/voice_cloning",
+)
+class VoiceCloner(paddle.nn.Layer):
+ def __init__(self, speaker_audio: str = None, output_dir: str = './'):
+ super(VoiceCloner, self).__init__()
+
+ speaker_encoder_ckpt = os.path.join(MODULE_HOME, 'ge2e_fastspeech2_pwgan', 'assets',
+ 'ge2e_ckpt_0.3/step-3000000.pdparams')
+ synthesizer_res_dir = os.path.join(MODULE_HOME, 'ge2e_fastspeech2_pwgan', 'assets',
+ 'fastspeech2_nosil_aishell3_vc1_ckpt_0.5')
+ vocoder_res_dir = os.path.join(MODULE_HOME, 'ge2e_fastspeech2_pwgan', 'assets', 'pwg_aishell3_ckpt_0.5')
+
+ # Speaker encoder
+ self.speaker_processor = SpeakerVerificationPreprocessor(
+ sampling_rate=16000,
+ audio_norm_target_dBFS=-30,
+ vad_window_length=30,
+ vad_moving_average_width=8,
+ vad_max_silence_length=6,
+ mel_window_length=25,
+ mel_window_step=10,
+ n_mels=40,
+ partial_n_frames=160,
+ min_pad_coverage=0.75,
+ partial_overlap_ratio=0.5)
+ self.speaker_encoder = LSTMSpeakerEncoder(n_mels=40, num_layers=3, hidden_size=256, output_size=256)
+ self.speaker_encoder.set_state_dict(paddle.load(speaker_encoder_ckpt))
+ self.speaker_encoder.eval()
+
+ # Voice synthesizer
+ with open(os.path.join(synthesizer_res_dir, 'default.yaml'), 'r') as f:
+ fastspeech2_config = CfgNode(yaml.safe_load(f))
+ with open(os.path.join(synthesizer_res_dir, 'phone_id_map.txt'), 'r') as f:
+ phn_id = [line.strip().split() for line in f.readlines()]
+
+ model = FastSpeech2(idim=len(phn_id), odim=fastspeech2_config.n_mels, **fastspeech2_config["model"])
+ model.set_state_dict(paddle.load(os.path.join(synthesizer_res_dir, 'snapshot_iter_96400.pdz'))["main_params"])
+ model.eval()
+
+ stat = np.load(os.path.join(synthesizer_res_dir, 'speech_stats.npy'))
+ mu, std = stat
+ mu = paddle.to_tensor(mu)
+ std = paddle.to_tensor(std)
+ fastspeech2_normalizer = ZScore(mu, std)
+ self.sample_rate = fastspeech2_config.fs
+
+ self.fastspeech2_inference = FastSpeech2Inference(fastspeech2_normalizer, model)
+ self.fastspeech2_inference.eval()
+
+ # Vocoder
+ with open(os.path.join(vocoder_res_dir, 'default.yaml')) as f:
+ pwg_config = CfgNode(yaml.safe_load(f))
+
+ vocoder = PWGGenerator(**pwg_config["generator_params"])
+ vocoder.set_state_dict(
+ paddle.load(os.path.join(vocoder_res_dir, 'snapshot_iter_1000000.pdz'))["generator_params"])
+ vocoder.remove_weight_norm()
+ vocoder.eval()
+
+ stat = np.load(os.path.join(vocoder_res_dir, 'feats_stats.npy'))
+ mu, std = stat
+ mu = paddle.to_tensor(mu)
+ std = paddle.to_tensor(std)
+ pwg_normalizer = ZScore(mu, std)
+
+ self.pwg_inference = PWGInference(pwg_normalizer, vocoder)
+ self.pwg_inference.eval()
+
+ # Text frontend
+ self.frontend = Frontend(phone_vocab_path=os.path.join(synthesizer_res_dir, 'phone_id_map.txt'))
+
+ # Speaking embedding
+ self._speaker_embedding = None
+ if speaker_audio is None or not os.path.isfile(speaker_audio):
+ speaker_audio = os.path.join(MODULE_HOME, 'ge2e_fastspeech2_pwgan', 'assets', 'voice_cloning.wav')
+ logger.warning(f'Due to no speaker audio is specified, speaker encoder will use defult '
+ f'waveform({speaker_audio}) to extract speaker embedding. You can use '
+ '"set_speaker_embedding()" method to reset a speaker audio for voice cloning.')
+ self.set_speaker_embedding(speaker_audio)
+
+ self.output_dir = os.path.abspath(output_dir)
+ if not os.path.exists(self.output_dir):
+ os.makedirs(self.output_dir)
+
+ def get_speaker_embedding(self):
+ return self._speaker_embedding.numpy()
+
+ @paddle.no_grad()
+ def set_speaker_embedding(self, speaker_audio: str):
+ assert os.path.exists(speaker_audio), f'Speaker audio file: {speaker_audio} does not exists.'
+ mel_sequences = self.speaker_processor.extract_mel_partials(
+ self.speaker_processor.preprocess_wav(speaker_audio))
+ self._speaker_embedding = self.speaker_encoder.embed_utterance(paddle.to_tensor(mel_sequences))
+
+ logger.info(f'Speaker embedding has been set from file: {speaker_audio}')
+
+ @paddle.no_grad()
+ def generate(self, data: Union[str, List[str]], use_gpu: bool = False):
+ assert self._speaker_embedding is not None, f'Set speaker embedding before voice cloning.'
+
+ if isinstance(data, str):
+ data = [data]
+ elif isinstance(data, list):
+ assert len(data) > 0 and isinstance(data[0],
+ str) and len(data[0]) > 0, f'Input data should be str of List[str].'
+ else:
+ raise Exception(f'Input data should be str of List[str].')
+
+ paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
+ files = []
+ for idx, text in enumerate(data):
+ phone_ids = self.frontend.get_input_ids(text, merge_sentences=True)["phone_ids"][0]
+ wav = self.pwg_inference(self.fastspeech2_inference(phone_ids, spk_emb=self._speaker_embedding))
+ output_wav = os.path.join(self.output_dir, f'{idx+1}.wav')
+ sf.write(output_wav, wav.numpy(), samplerate=self.sample_rate)
+ files.append(output_wav)
+
+ return files
diff --git a/modules/audio/voice_cloning/ge2e_fastspeech2_pwgan/requirements.txt b/modules/audio/voice_cloning/ge2e_fastspeech2_pwgan/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..120598fd26d619a674601ca3de0a9f7c1609ca99
--- /dev/null
+++ b/modules/audio/voice_cloning/ge2e_fastspeech2_pwgan/requirements.txt
@@ -0,0 +1 @@
+paddlespeech==0.1.0a13
diff --git a/modules/audio/voice_cloning/lstm_tacotron2/README.md b/modules/audio/voice_cloning/lstm_tacotron2/README.md
index 58d6e846a25ddded31a10d6632aaaf6d7563f723..dedd5017324c10bc7a1f466d4a7367d80237ae53 100644
--- a/modules/audio/voice_cloning/lstm_tacotron2/README.md
+++ b/modules/audio/voice_cloning/lstm_tacotron2/README.md
@@ -1,8 +1,18 @@
-```shell
-$ hub install lstm_tacotron2==1.0.0
-```
+# lstm_tacotron2
+
+|模型名称|lstm_tacotron2|
+| :--- | :---: |
+|类别|语音-语音合成|
+|网络|LSTM、Tacotron2、WaveFlow|
+|数据集|AISHELL-3|
+|是否支持Fine-tuning|否|
+|模型大小|327MB|
+|最新更新日期|2021-06-15|
+|数据指标|-|
+
+## 一、模型基本信息
-## 概述
+### 模型介绍
声音克隆是指使用特定的音色,结合文字的读音合成音频,使得合成后的音频具有目标说话人的特征,从而达到克隆的目的。
@@ -10,93 +20,107 @@ $ hub install lstm_tacotron2==1.0.0
在预测时,选取一段新的目标音色作为Speaker Encoder的输入,并提取其说话人特征,最终实现输入为一段文本和一段目标音色,模型生成目标音色说出此段文本的语音片段。
-![](https://ai-studio-static-online.cdn.bcebos.com/982ab955b87244d3bae3b003aff8e28d9ec159ff0d6246a79757339076dfe7d4)
+
+
+
`lstm_tacotron2`是一个支持中文的语音克隆模型,分别使用了LSTMSpeakerEncoder、Tacotron2和WaveFlow模型分别用于语音特征提取、目标音频特征合成和语音波形转换。
-关于模型的详请可参考[Parakeet](https://github.com/PaddlePaddle/Parakeet/tree/release/v0.3/parakeet/models)。
+更多详情请参考:
+- [Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis](https://arxiv.org/pdf/1806.04558.pdf)
+- [Parakeet](https://github.com/PaddlePaddle/Parakeet/tree/release/v0.3/parakeet/models)
-## API
+## 二、安装
-```python
-def __init__(speaker_audio: str = None,
- output_dir: str = './')
-```
-初始化module,可配置模型的目标音色的音频文件和输出的路径。
+- ### 1、环境依赖
-**参数**
-- `speaker_audio`(str): 目标说话人语音音频文件(*.wav)的路径,默认为None(使用默认的女声作为目标音色)。
-- `output_dir`(str): 合成音频的输出文件,默认为当前目录。
+ - paddlepaddle >= 2.0.0
+ - paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
-```python
-def get_speaker_embedding()
-```
-获取模型的目标说话人特征。
+- ### 2、安装
-**返回**
-* `results`(numpy.ndarray): 长度为256的numpy数组,代表目标说话人的特征。
+ - ```shell
+ $ hub install lstm_tacotron2
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
-```python
-def set_speaker_embedding(speaker_audio: str)
-```
-设置模型的目标说话人特征。
-**参数**
-- `speaker_audio`(str): 必填,目标说话人语音音频文件(*.wav)的路径。
+## 三、模型API预测
-```python
-def generate(data: List[str], batch_size: int = 1, use_gpu: bool = False):
-```
-根据输入文字,合成目标说话人的语音音频文件。
+- ### 1、预测代码示例
-**参数**
-- `data`(List[str]): 必填,目标音频的内容文本列表,目前只支持中文,不支持添加标点符号。
-- `batch_size`(int): 可选,模型合成语音时的batch_size,默认为1。
-- `use_gpu`(bool): 是否使用gpu执行计算,默认为False。
+ - ```python
+ import paddlehub as hub
+ model = hub.Module(name='lstm_tacotron2', output_dir='/data', speaker_audio='/data/man.wav') # 指定目标音色音频文件
+ texts = [
+ '语音的表现形式在未来将变得越来越重要$',
+ '今天的天气怎么样$', ]
+ wavs = model.generate(texts, use_gpu=True)
-**代码示例**
+ for text, wav in zip(texts, wavs):
+ print('='*30)
+ print(f'Text: {text}')
+ print(f'Wav: {wav}')
+ ```
+ ```
+ ==============================
+ Text: 语音的表现形式在未来将变得越来越重要$
+ Wav: /data/1.wav
+ ==============================
+ Text: 今天的天气怎么样$
+ Wav: /data/2.wav
+ ```
-```python
-import paddlehub as hub
+- ### 2、API
-model = hub.Module(name='lstm_tacotron2', output_dir='./', speaker_audio='/data/man.wav') # 指定目标音色音频文件
-texts = [
- '语音的表现形式在未来将变得越来越重要$',
- '今天的天气怎么样$', ]
-wavs = model.generate(texts, use_gpu=True)
+ - ```python
+ def __init__(speaker_audio: str = None,
+ output_dir: str = './')
+ ```
+ - 初始化module,可配置模型的目标音色的音频文件和输出的路径。
-for text, wav in zip(texts, wavs):
- print('='*30)
- print(f'Text: {text}')
- print(f'Wav: {wav}')
-```
+ - **参数**
+ - `speaker_audio`(str): 目标说话人语音音频文件(*.wav)的路径,默认为None(使用默认的女声作为目标音色)。
+ - `output_dir`(str): 合成音频的输出文件,默认为当前目录。
-输出
-```
-==============================
-Text: 语音的表现形式在未来将变得越来越重要$
-Wav: /data/1.wav
-==============================
-Text: 今天的天气怎么样$
-Wav: /data/2.wav
-```
+ - ```python
+ def get_speaker_embedding()
+ ```
+ - 获取模型的目标说话人特征。
+
+ - **返回**
+ - `results`(numpy.ndarray): 长度为256的numpy数组,代表目标说话人的特征。
-## 查看代码
+ - ```python
+ def set_speaker_embedding(speaker_audio: str)
+ ```
+ - 设置模型的目标说话人特征。
-https://github.com/PaddlePaddle/Parakeet
+ - **参数**
+ - `speaker_audio`(str): 必填,目标说话人语音音频文件(*.wav)的路径。
-## 依赖
+ - ```python
+ def generate(data: List[str], batch_size: int = 1, use_gpu: bool = False):
+ ```
+ - 根据输入文字,合成目标说话人的语音音频文件。
-paddlepaddle >= 2.0.0
+ - **参数**
+ - `data`(List[str]): 必填,目标音频的内容文本列表,目前只支持中文,不支持添加标点符号。
+ - `batch_size`(int): 可选,模型合成语音时的batch_size,默认为1。
+ - `use_gpu`(bool): 是否使用gpu执行计算,默认为False。
-paddlehub >= 2.1.0
-## 更新历史
+## 四、更新历史
* 1.0.0
初始发布
+
+```shell
+$ hub install lstm_tacotron2==1.0.0
+```
diff --git a/modules/image/Image_editing/colorization/deoldify/README.md b/modules/image/Image_editing/colorization/deoldify/README.md
index a181b89bdcc802fa5c6129d5d466472e80bfb258..c4303720a52c02215250d23158798053659014f1 100644
--- a/modules/image/Image_editing/colorization/deoldify/README.md
+++ b/modules/image/Image_editing/colorization/deoldify/README.md
@@ -53,14 +53,14 @@
## 三、模型API预测
- - ### 1、代码示例
+ - ### 1、预测代码示例
- ```python
- import paddlehub as hub
+ - ```python
+ import paddlehub as hub
- model = hub.Module(name='deoldify')
- model.predict('/PATH/TO/IMAGE/OR/VIDEO')
- ```
+ model = hub.Module(name='deoldify')
+ model.predict('/PATH/TO/IMAGE/OR/VIDEO')
+ ```
- ### 2、API
diff --git a/modules/image/Image_editing/colorization/deoldify/README_en.md b/modules/image/Image_editing/colorization/deoldify/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..cbfcd6078a00e7dbf81b07c5a527e494dcad6093
--- /dev/null
+++ b/modules/image/Image_editing/colorization/deoldify/README_en.md
@@ -0,0 +1,171 @@
+# deoldify
+
+| Module Name |deoldify|
+| :--- | :---: |
+|Category|Image editing|
+|Network |NoGAN|
+|Dataset|ILSVRC 2012|
+|Fine-tuning supported or not |No|
+|Module Size |834MB|
+|Data indicators|-|
+|Latest update date |2021-04-13|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - Deoldify is a color rendering model for images and videos, which can restore color for black and white photos and videos.
+
+ - For more information, please refer to: [deoldify](https://github.com/jantic/DeOldify)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+ - NOTE: This Module relies on ffmpeg, Please install ffmpeg before using this Module.
+
+ ```shell
+ $ conda install x264=='1!152.20180717' ffmpeg=4.0.2 -c conda-forge
+ ```
+
+
+- ### 2、Installation
+ - ```shell
+ $ hub install deoldify
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+ - ### 1、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='deoldify')
+ model.predict('/PATH/TO/IMAGE/OR/VIDEO')
+ ```
+
+ - ### 2、API
+
+ - ```python
+ def predict(self, input):
+ ```
+
+ - Prediction API.
+
+ - **Parameter**
+
+ - input (str): Image path.
+
+ - **Return**
+
+ - If input is image path, the output is:
+ - pred_img(np.ndarray): image data, ndarray.shape is in the format [H, W, C], BGR.
+ - out_path(str): save path of images.
+
+ - If input is video path, the output is :
+ - frame_pattern_combined(str): save path of frames from output video.
+ - vid_out_path(str): save path of output video.
+
+ - ```python
+ def run_image(self, img):
+ ```
+ - Prediction API for image.
+
+ - **Parameter**
+
+ - img (str|np.ndarray): Image data, str or ndarray. ndarray.shape is in the format [H, W, C], BGR.
+
+ - **Return**
+
+ - pred_img(np.ndarray): Ndarray.shape is in the format [H, W, C], BGR.
+
+ - ```python
+ def run_video(self, video):
+ ```
+ - Prediction API for video.
+
+ - **Parameter**
+
+ - video(str): Video path.
+
+ - **Return**
+
+ - frame_pattern_combined(str): Save path of frames from output video.
+ - vid_out_path(str): Save path of output video.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of coloring old photos or videos.
+
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m deoldify
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result.
+
+ - ```python
+ import requests
+ import json
+ import base64
+
+ import cv2
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # Send an HTTP request
+ org_im = cv2.imread('/PATH/TO/ORIGIN/IMAGE')
+ data = {'images':cv2_to_base64(org_im)}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/deoldify"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ img = base64_to_cv2(r.json()["results"])
+ cv2.imwrite('/PATH/TO/SAVE/IMAGE', img)
+ ```
+
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
+
+- 1.0.1
+
+ Adapt to paddlehub2.0
diff --git a/modules/image/Image_editing/colorization/photo_restoration/README.md b/modules/image/Image_editing/colorization/photo_restoration/README.md
index e3a2d5fd3459e07a4045ccfb3f20b5774826e773..fbb6332c95babf6d3ce3c43343af7711217fc59c 100644
--- a/modules/image/Image_editing/colorization/photo_restoration/README.md
+++ b/modules/image/Image_editing/colorization/photo_restoration/README.md
@@ -51,17 +51,17 @@
## 三、模型API预测
- - ### 1、代码示例
+ - ### 1、预测代码示例
- ```python
- import cv2
- import paddlehub as hub
+ - ```python
+ import cv2
+ import paddlehub as hub
- model = hub.Module(name='photo_restoration', visualization=True)
- im = cv2.imread('/PATH/TO/IMAGE')
- res = model.run_image(im)
+ model = hub.Module(name='photo_restoration', visualization=True)
+ im = cv2.imread('/PATH/TO/IMAGE')
+ res = model.run_image(im)
- ```
+ ```
- ### 2、API
diff --git a/modules/image/Image_editing/colorization/photo_restoration/README_en.md b/modules/image/Image_editing/colorization/photo_restoration/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..1ff585bddd0dc54768fb168999cfdddac266a6f8
--- /dev/null
+++ b/modules/image/Image_editing/colorization/photo_restoration/README_en.md
@@ -0,0 +1,151 @@
+# photo_restoration
+
+|Module Name|photo_restoration|
+| :--- | :---: |
+|Category|Image editing|
+|Network|deoldify and realsr|
+|Fine-tuning supported or not|No|
+|Module Size |64MB+834MB|
+|Data indicators|-|
+|Latest update date|2021-08-19|
+
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+
+- ### Module Introduction
+
+ - Photo_restoration can restore old photos. It mainly consists of two parts: coloring and super-resolution. The coloring model is deoldify
+ , and super resolution model is realsr. Therefore, when using this model, please install deoldify and realsr in advance.
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+ - NOTE: This Module relies on ffmpeg, Please install ffmpeg before using this Module.
+
+ ```shell
+ $ conda install x264=='1!152.20180717' ffmpeg=4.0.2 -c conda-forge
+ ```
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install photo_restoration
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+
+ - ```python
+ import cv2
+ import paddlehub as hub
+
+ model = hub.Module(name='photo_restoration', visualization=True)
+ im = cv2.imread('/PATH/TO/IMAGE')
+ res = model.run_image(im)
+
+ ```
+- ### 2、API
+
+
+ - ```python
+ def run_image(self,
+ input,
+ model_select= ['Colorization', 'SuperResolution'],
+ save_path = 'photo_restoration'):
+ ```
+
+ - Predicition API, produce repaired photos.
+
+ - **Parameter**
+
+ - input (numpy.ndarray|str): Image data,numpy.ndarray or str. ndarray.shape is in the format [H, W, C], BGR.
+
+ - model_select (list\[str\]): Mode selection,\['Colorization'\] only colorize the input image, \['SuperResolution'\] only increase the image resolution;
+ default is \['Colorization', 'SuperResolution'\]。
+
+ - save_path (str): Save path, default is 'photo_restoration'.
+
+ - **Return**
+
+ - output (numpy.ndarray): Restoration result,ndarray.shape is in the format [H, W, C], BGR.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of photo restoration.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m photo_restoration
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ - ```python
+ import requests
+ import json
+ import base64
+
+ import cv2
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # Send an HTTP request
+ org_im = cv2.imread('PATH/TO/IMAGE')
+ data = {'images':cv2_to_base64(org_im), 'model_select': ['Colorization', 'SuperResolution']}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/photo_restoration"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ img = base64_to_cv2(r.json()["results"])
+ cv2.imwrite('PATH/TO/SAVE/IMAGE', img)
+ ```
+
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
+
+- 1.0.1
+
+ Adapt to paddlehub2.0
+
diff --git a/modules/image/Image_editing/colorization/user_guided_colorization/README.md b/modules/image/Image_editing/colorization/user_guided_colorization/README.md
index 390f04e1500e1d3d0ae1215f798bb9f7902f1fdc..d5d13144eed22656bf7e5fd12343b2fec6cf7b34 100644
--- a/modules/image/Image_editing/colorization/user_guided_colorization/README.md
+++ b/modules/image/Image_editing/colorization/user_guided_colorization/README.md
@@ -22,7 +22,7 @@
![](https://user-images.githubusercontent.com/35907364/136648959-40493c9c-08ec-46cd-a2a2-5e2038dcbfa7.png)
- - user_guided_colorization 是基于''Real-Time User-Guided Image Colorization with Learned Deep Priors"的着色模型,该模型利用预先提供的着色块对图像进行着色。
+ - user_guided_colorization 是基于"Real-Time User-Guided Image Colorization with Learned Deep Priors"的着色模型,该模型利用预先提供的着色块对图像进行着色。
## 二、安装
diff --git a/modules/image/Image_editing/colorization/user_guided_colorization/README_en.md b/modules/image/Image_editing/colorization/user_guided_colorization/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..8e17592c87ca4ee428e98afc8478411803471cd8
--- /dev/null
+++ b/modules/image/Image_editing/colorization/user_guided_colorization/README_en.md
@@ -0,0 +1,205 @@
+# user_guided_colorization
+
+|Module Name|user_guided_colorization|
+| :--- | :---: |
+|Category |Image editing|
+|Network| Local and Global Hints Network |
+|Dataset|ILSVRC 2012|
+|Fine-tuning supported or notFine-tuning|Yes|
+|Module Size|131MB|
+|Data indicators|-|
+|Latest update date |2021-02-26|
+
+
+
+## I. Basic Information
+
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - User_guided_colorization is a colorization model based on "Real-Time User-Guided Image Colorization with Learned Deep Priors",this model uses pre-supplied coloring blocks to color the gray image.
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install user_guided_colorization
+ ```
+
+ - In case of any problems during installation, please refer to: [Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ ```shell
+ $ hub run user_guided_colorization --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+- ### 2、Prediction Code Example
+
+ ```python
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+
+ model = hub.Module(name='user_guided_colorization')
+ model.set_config(prob=0.1)
+ result = model.predict(images=['/PATH/TO/IMAGE'])
+ ```
+- ### 3.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the user_guided_colorization model to fine-tune datasets such as [Canvas](../../docs/reference/datasets.md#class-hubdatasetsCanvas) by executing `python train.py`.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ import paddlehub.vision.transforms as T
+
+ transform = T.Compose([T.Resize((256, 256), interpolation='NEAREST'),
+ T.RandomPaddingCrop(crop_size=176),
+ T.RGB2LAB()], to_rgb=True)
+ ```
+
+ - `transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+ - ```python
+ from paddlehub.datasets import Canvas
+
+ color_set = Canvas(transform=transform, mode='train')
+ ```
+
+ * `transforms`: Data preprocessing methods.
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+ * `hub.datasets.Canvas()`: The dataset will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ model = hub.Module(name='user_guided_colorization', load_checkpoint=None)
+ model.set_config(classification=True, prob=1)
+ ```
+ * `name`: Model name.
+ * `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+ * `classification`: The model is trained by two mode. At the beginning, `classification` is set to True, which is used for shallow network training. In the later stage of training, set `classification` to False, which is used to train the output layer of the network.
+ * `prob`: The probability that a priori color block is not added to each input image, the default is 1, that is, no prior color block is added. For example, when `prob` is set to 0.9, the probability that there are two a priori color blocks on a picture is(1-0.9)*(1-0.9)*0.9=0.009.
+
+ - Step4: Optimization strategy
+
+ ```python
+ optimizer = paddle.optimizer.Adam(learning_rate=0.0001, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='img_colorization_ckpt_cls_1')
+ trainer.train(color_set, epochs=201, batch_size=25, eval_dataset=color_set, log_interval=10, save_interval=10)
+ ```
+
+
+ - Run configuration
+
+ - `Trainer` mainly control the training of Fine-tune, including the following controllable parameters:
+
+ * `model`: Optimized model.
+ * `optimizer`: Optimizer selection.
+ * `use_vdl`: Whether to use vdl to visualize the training process.
+ * `checkpoint_dir`: The storage address of the model parameters.
+ * `compare_metrics`: The measurement index of the optimal model.
+
+ - `trainer.train` mainly control the specific training process, including the following controllable parameters:
+
+ * `train_dataset`: Training dataset.
+ * `epochs`: Epochs of training process.
+ * `batch_size`: Batch size.
+ * `num_workers`: Number of workers.
+ * `eval_dataset`: Validation dataset.
+ * `log_interval`:The interval for printing logs.
+ * `save_interval`: The interval for saving model parameters.
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ - ```python
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='user_guided_colorization', load_checkpoint='/PATH/TO/CHECKPOINT')
+ model.set_config(prob=0.1)
+ result = model.predict(images=['/PATH/TO/IMAGE'])
+ ```
+
+
+ - **NOTE:** If you want to get the oil painting style, please download the parameter file [Canvas colorization](https://paddlehub.bj.bcebos.com/dygraph/models/canvas_rc.pdparams)
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of colorization.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m user_guided_colorization
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # Send an HTTP request
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/user_guided_colorization"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ data = base64_to_cv2(r.json()["results"]['data'][0]['fake_reg'])
+ cv2.imwrite('color.png', data)
+ ```
+
+
+## V. Release Note
+
+* 1.0.0
+
+ First release
diff --git a/modules/image/Image_editing/super_resolution/dcscn/README_en.md b/modules/image/Image_editing/super_resolution/dcscn/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..098d03657369d534d4975e27fc19b3d120ff3d97
--- /dev/null
+++ b/modules/image/Image_editing/super_resolution/dcscn/README_en.md
@@ -0,0 +1,172 @@
+# dcscn
+
+|Module Name|dcscn|
+| :--- | :---: |
+|Category |Image editing|
+|Network|dcscn|
+|Dataset|DIV2k|
+|Fine-tuning supported or not|No|
+|Module Size|260KB|
+|Data indicators|PSNR37.63|
+|Data indicators |2021-02-26|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - DCSCN is a super resolution model based on 'Fast and Accurate Image Super Resolution by Deep CNN with Skip Connection and Network in Network'. The model uses residual structure and skip connections to extract local and global features. It uses a parallel 1*1 convolutional network to learn detailed features to improve model performance. This model provides super resolution result with scale factor x2.
+
+ - For more information, please refer to: [dcscn](https://github.com/jiny2001/dcscn-super-resolution)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install dcscn
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```
+ $ hub run dcscn --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+- ### 2、Prediction Code Example
+
+ - ```python
+ import cv2
+ import paddlehub as hub
+
+ sr_model = hub.Module(name='dcscn')
+ im = cv2.imread('/PATH/TO/IMAGE').astype('float32')
+ res = sr_model.reconstruct(images=[im], visualization=True)
+ print(res[0]['data'])
+ sr_model.save_inference_model()
+ ```
+
+- ### 3、API
+
+ - ```python
+ def reconstruct(self,
+ images=None,
+ paths=None,
+ use_gpu=False,
+ visualization=False,
+ output_dir="dcscn_output")
+ ```
+
+ - Prediction API.
+
+ - **Parameter**
+
+ * images (list\[numpy.ndarray\]): Image data,ndarray.shape is in the format \[H, W, C\],BGR.
+ * paths (list\[str\]): image path.
+ * use\_gpu (bool): Use GPU or not. **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**.
+ * visualization (bool): Whether to save the recognition results as picture files.
+ * output\_dir (str): Save path of images, "dcscn_output" by default.
+
+ - **Return**
+ * res (list\[dict\]): The list of model results, where each element is dict and each field is:
+ * save\_path (str, optional): Save path of the result, save_path is '' if no image is saved.
+ * data (numpy.ndarray): Result of super resolution.
+
+ - ```python
+ def save_inference_model(self,
+ dirname='dcscn_save_model',
+ model_filename=None,
+ params_filename=None,
+ combined=False)
+ ```
+
+ - Save the model to the specified path.
+
+ - **Parameters**
+
+ * dirname: Save path.
+ * model\_filename: Model file name,defalt is \_\_model\_\_
+ * params\_filename: Parameter file name,defalt is \_\_params\_\_(Only takes effect when `combined` is True)
+ * combined: Whether to save the parameters to a unified file.
+
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of super resolution.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m dcscn
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ - ```python
+ import requests
+ import json
+ import base64
+
+ import cv2
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/dcscn"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ sr = np.expand_dims(cv2.cvtColor(base64_to_cv2(r.json()["results"][0]['data']), cv2.COLOR_BGR2GRAY), axis=2)
+ shape =sr.shape
+ org_im = cv2.cvtColor(org_im, cv2.COLOR_BGR2YUV)
+ uv = cv2.resize(org_im[...,1:], (shape[1], shape[0]), interpolation=cv2.INTER_CUBIC)
+ combine_im = cv2.cvtColor(np.concatenate((sr, uv), axis=2), cv2.COLOR_YUV2BGR)
+ cv2.imwrite('dcscn_X2.png', combine_im)
+ print("save image as dcscn_X2.png")
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/Image_editing/super_resolution/falsr_a/README_en.md b/modules/image/Image_editing/super_resolution/falsr_a/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..aa677c6d5d5bbe480cad8049b7cad08e1ede441f
--- /dev/null
+++ b/modules/image/Image_editing/super_resolution/falsr_a/README_en.md
@@ -0,0 +1,173 @@
+# falsr_a
+
+|Module Name|falsr_a|
+| :--- | :---: |
+|Category |Image editing|
+|Network |falsr_a|
+|Dataset|DIV2k|
+|Fine-tuning supported or not|No|
+|Module Size |8.9MB|
+|Data indicators|PSNR37.82|
+|Latest update date|2021-02-26|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - Falsr_a is a lightweight super-resolution model based on "Accurate and Lightweight Super-Resolution with Neural Architecture Search". The model uses a multi-objective approach to deal with the over-segmentation problem, and uses an elastic search strategy based on a hybrid controller to improve the performance of the model. This model provides super resolution result with scale factor x2.
+
+ - For more information, please refer to: [falsr_a](https://github.com/xiaomi-automl/FALSR)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install falsr_a
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```
+ $ hub run falsr_a --input_path "/PATH/TO/IMAGE"
+ ```
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import cv2
+ import paddlehub as hub
+
+ sr_model = hub.Module(name='falsr_a')
+ im = cv2.imread('/PATH/TO/IMAGE').astype('float32')
+ res = sr_model.reconstruct(images=[im], visualization=True)
+ print(res[0]['data'])
+ sr_model.save_inference_model()
+ ```
+
+- ### 3、API
+
+ - ```python
+ def reconstruct(self,
+ images=None,
+ paths=None,
+ use_gpu=False,
+ visualization=False,
+ output_dir="falsr_a_output")
+ ```
+
+ - Prediction API.
+
+ - **Parameter**
+
+ * images (list\[numpy.ndarray\]): image data,ndarray.shape is in the format \[H, W, C\],BGR.
+ * paths (list\[str\]): image path.
+ * use\_gpu (bool): use GPU or not. **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**.
+ * visualization (bool): Whether to save the recognition results as picture files.
+ * output\_dir (str): save path of images, "dcscn_output" by default.
+
+ - **Return**
+ * res (list\[dict\]): The list of model results, where each element is dict and each field is:
+ * save\_path (str, optional): Save path of the result, save_path is '' if no image is saved.
+ * data (numpy.ndarray): result of super resolution.
+
+ - ```python
+ def save_inference_model(self,
+ dirname='falsr_a_save_model',
+ model_filename=None,
+ params_filename=None,
+ combined=False)
+ ```
+
+ - Save the model to the specified path.
+
+ - **Parameters**
+
+ * dirname: Save path.
+ * model\_filename: model file name,defalt is \_\_model\_\_
+ * params\_filename: parameter file name,defalt is \_\_params\_\_(Only takes effect when `combined` is True)
+ * combined: Whether to save the parameters to a unified file.
+
+
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of super resolution.
+
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m falsr_a
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ - ```python
+ import requests
+ import json
+ import base64
+
+ import cv2
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/falsr_a"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ sr = base64_to_cv2(r.json()["results"][0]['data'])
+ cv2.imwrite('falsr_a_X2.png', sr)
+ print("save image as falsr_a_X2.png")
+ ```
+
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
+
+
+
diff --git a/modules/image/Image_editing/super_resolution/falsr_b/README_en.md b/modules/image/Image_editing/super_resolution/falsr_b/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..5507b2ac6de0a89ce0c061b8651dcc59752b7079
--- /dev/null
+++ b/modules/image/Image_editing/super_resolution/falsr_b/README_en.md
@@ -0,0 +1,173 @@
+# falsr_b
+
+|Module Name|falsr_b|
+| :--- | :---: |
+|Category |Image editing|
+|Network |falsr_b|
+|Dataset|DIV2k|
+|Fine-tuning supported or not|No|
+|Module Size |4MB|
+|Data indicators|PSNR37.61|
+|Latest update date|2021-02-26|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - Falsr_b is a lightweight super-resolution model based on "Accurate and Lightweight Super-Resolution with Neural Architecture Search". The model uses a multi-objective approach to deal with the over-segmentation problem, and uses an elastic search strategy based on a hybrid controller to improve the performance of the model. This model provides super resolution result with scale factor x2.
+
+ - For more information, please refer to:[falsr_b](https://github.com/xiaomi-automl/FALSR)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install falsr_b
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```
+ $ hub run falsr_b --input_path "/PATH/TO/IMAGE"
+ ```
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+- ### 2、Prediction Code Example
+
+ ```python
+ import cv2
+ import paddlehub as hub
+
+ sr_model = hub.Module(name='falsr_b')
+ im = cv2.imread('/PATH/TO/IMAGE').astype('float32')
+ res = sr_model.reconstruct(images=[im], visualization=True)
+ print(res[0]['data'])
+ sr_model.save_inference_model()
+ ```
+
+- ### 3、API
+
+ - ```python
+ def reconstruct(self,
+ images=None,
+ paths=None,
+ use_gpu=False,
+ visualization=False,
+ output_dir="falsr_b_output")
+ ```
+
+ - Prediction API.
+
+ - **Parameter**
+
+ * images (list\[numpy.ndarray\]): Image data,ndarray.shape is in the format \[H, W, C\],BGR.
+ * paths (list\[str\]): Image path.
+ * use\_gpu (bool): Use GPU or not. **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**.
+ * visualization (bool): Whether to save the recognition results as picture files.
+ * output\_dir (str): Save path of images, "dcscn_output" by default.
+
+ - **Return**
+ * res (list\[dict\]): The list of model results, where each element is dict and each field is:
+ * save\_path (str, optional): Save path of the result, save_path is '' if no image is saved.
+ * data (numpy.ndarray): Result of super resolution.
+
+ - ```python
+ def save_inference_model(self,
+ dirname='falsr_b_save_model',
+ model_filename=None,
+ params_filename=None,
+ combined=False)
+ ```
+
+ - Save the model to the specified path.
+
+ - **Parameters**
+
+ * dirname: Save path.
+ * model\_filename: Model file name,defalt is \_\_model\_\_
+ * params\_filename: Parameter file name,defalt is \_\_params\_\_(Only takes effect when `combined` is True)
+ * combined: Whether to save the parameters to a unified file.
+
+
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of super resolution.
+
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m falsr_b
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ - ```python
+ import requests
+ import json
+ import base64
+
+ import cv2
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/falsr_b"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ sr = base64_to_cv2(r.json()["results"][0]['data'])
+ cv2.imwrite('falsr_b_X2.png', sr)
+ print("save image as falsr_b_X2.png")
+ ```
+
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
+
+
+
diff --git a/modules/image/Image_editing/super_resolution/falsr_c/README.md b/modules/image/Image_editing/super_resolution/falsr_c/README.md
index 3227847494d5b34867aa7ee36e91ff789ad80574..2e7d35bbea7cc2eff7ab40af558942a826412a3f 100644
--- a/modules/image/Image_editing/super_resolution/falsr_c/README.md
+++ b/modules/image/Image_editing/super_resolution/falsr_c/README.md
@@ -51,7 +51,7 @@
- ```
$ hub run falsr_c --input_path "/PATH/TO/IMAGE"
```
-- ### 代码示例
+- ### 2、预测代码示例
```python
import cv2
@@ -65,7 +65,7 @@
sr_model.save_inference_model()
```
-- ### 2、API
+- ### 3、API
- ```python
def reconstruct(self,
diff --git a/modules/image/Image_editing/super_resolution/falsr_c/README_en.md b/modules/image/Image_editing/super_resolution/falsr_c/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..5e651a7ea9393c68af8e24a9bb34a741287ffd46
--- /dev/null
+++ b/modules/image/Image_editing/super_resolution/falsr_c/README_en.md
@@ -0,0 +1,173 @@
+# falsr_c
+
+|Module Name|falsr_c|
+| :--- | :---: |
+|Category |Image editing|
+|Network |falsr_c|
+|Dataset|DIV2k|
+|Fine-tuning supported or not|No|
+|Module Size |4.4MB|
+|Data indicators|PSNR37.66|
+|Latest update date|2021-02-26|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - Falsr_c is a lightweight super-resolution model based on "Accurate and Lightweight Super-Resolution with Neural Architecture Search". The model uses a multi-objective approach to deal with the over-segmentation problem, and uses an elastic search strategy based on a hybrid controller to improve the performance of the model. This model provides super resolution result with scale factor x2.
+
+ - For more information, please refer to:[falsr_c](https://github.com/xiaomi-automl/FALSR)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install falsr_c
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```
+ $ hub run falsr_c --input_path "/PATH/TO/IMAGE"
+ ```
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+- ### 2、Prediction Code Example
+
+ ```python
+ import cv2
+ import paddlehub as hub
+
+ sr_model = hub.Module(name='falsr_c')
+ im = cv2.imread('/PATH/TO/IMAGE').astype('float32')
+ res = sr_model.reconstruct(images=[im], visualization=True)
+ print(res[0]['data'])
+ sr_model.save_inference_model()
+ ```
+
+- ### 3、API
+
+ - ```python
+ def reconstruct(self,
+ images=None,
+ paths=None,
+ use_gpu=False,
+ visualization=False,
+ output_dir="falsr_c_output")
+ ```
+
+ - Prediction API.
+
+ - **Parameter**
+
+ * images (list\[numpy.ndarray\]): Image data,ndarray.shape is in the format \[H, W, C\],BGR.
+ * paths (list\[str\]): Image path.
+ * use\_gpu (bool): Use GPU or not. **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**.
+ * visualization (bool): Whether to save the recognition results as picture files.
+ * output\_dir (str): Save path of images, "dcscn_output" by default.
+
+ - **Return**
+ * res (list\[dict\]): The list of model results, where each element is dict and each field is:
+ * save\_path (str, optional): Save path of the result, save_path is '' if no image is saved.
+ * data (numpy.ndarray): Result of super resolution.
+
+ - ```python
+ def save_inference_model(self,
+ dirname='falsr_c_save_model',
+ model_filename=None,
+ params_filename=None,
+ combined=False)
+ ```
+
+ - Save the model to the specified path.
+
+ - **Parameters**
+
+ * dirname: Save path.
+ * model\_filename: Model file name,defalt is \_\_model\_\_
+ * params\_filename: Parameter file name,defalt is \_\_params\_\_(Only takes effect when `combined` is True)
+ * combined: Whether to save the parameters to a unified file.
+
+
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of super resolution.
+
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m falsr_c
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ - ```python
+ import requests
+ import json
+ import base64
+
+ import cv2
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/falsr_c"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ sr = base64_to_cv2(r.json()["results"][0]['data'])
+ cv2.imwrite('falsr_c_X2.png', sr)
+ print("save image as falsr_c_X2.png")
+ ```
+
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
+
+
+
diff --git a/modules/image/Image_editing/super_resolution/realsr/README.md b/modules/image/Image_editing/super_resolution/realsr/README.md
index 02e66678c5926f6f9e54344d6f74a1bf91304b39..e5eebce61099444691edd8c084572398ccc785cd 100644
--- a/modules/image/Image_editing/super_resolution/realsr/README.md
+++ b/modules/image/Image_editing/super_resolution/realsr/README.md
@@ -57,7 +57,7 @@
## 三、模型API预测
- - ### 1、代码示例
+ - ### 1、预测代码示例
```python
import paddlehub as hub
diff --git a/modules/image/Image_editing/super_resolution/realsr/README_en.md b/modules/image/Image_editing/super_resolution/realsr/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..4e3eafba85acd21d185da2af622dffd82d7d09ee
--- /dev/null
+++ b/modules/image/Image_editing/super_resolution/realsr/README_en.md
@@ -0,0 +1,174 @@
+# realsr
+
+|Module Name |reasr|
+| :--- | :---: |
+|Category |Image editing|
+|Network|LP-KPN|
+|Dataset |RealSR dataset|
+|Fine-tuning supported or not|No|
+|Module Size |64MB|
+|Latest update date|2021-02-26|
+|Data indicators |PSNR29.05|
+
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - Realsr is a super resolution model for image and video based on "Toward Real-World Single Image Super-Resolution: A New Benchmark and A New Mode". This model provides super resolution result with scale factor x4.
+
+ - For more information, please refer to: [realsr](https://github.com/csjcai/RealSR)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+ - **NOTE**: This Module relies on ffmpeg, Please install ffmpeg before using this Module.
+ ```shell
+ $ conda install x264=='1!152.20180717' ffmpeg=4.0.2 -c conda-forge
+ ```
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install realsr
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+
+## III. Module API Prediction
+
+ - ### 1、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='realsr')
+ model.predict('/PATH/TO/IMAGE/OR/VIDEO')
+ ```
+ - ### 2、API
+
+ - ```python
+ def predict(self, input):
+ ```
+
+ - Prediction API.
+
+ - **Parameter**
+
+ - input (str): image path.
+
+ - **Return**
+
+ - If input is image path, the output is:
+ - pred_img(np.ndarray): image data, ndarray.shape is in the format [H, W, C], BGR.
+ - out_path(str): save path of images.
+
+ - If input is video path, the output is :
+ - frame_pattern_combined(str): save path of frames from output video.
+ - vid_out_path(str): save path of output video.
+
+ - ```python
+ def run_image(self, img):
+ ```
+ - Prediction API for images.
+
+ - **Parameter**
+
+ - img (str|np.ndarray): Image data, str or ndarray. ndarray.shape is in the format [H, W, C], BGR.
+
+ - **Return**
+
+ - pred_img(np.ndarray): Prediction result, ndarray.shape is in the format [H, W, C], BGR.
+
+ - ```python
+ def run_video(self, video):
+ ```
+ - Prediction API for video.
+
+ - **Parameter**
+
+ - video(str): Video path.
+
+ - **Return**
+
+ - frame_pattern_combined(str): Save path of frames from output video.
+ - vid_out_path(str): Save path of output video.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image super resolution.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m realsr
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ - ```python
+ import requests
+ import json
+ import base64
+
+ import cv2
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':cv2_to_base64(org_im)}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/realsr"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ img = base64_to_cv2(r.json()["results"])
+ cv2.imwrite('/PATH/TO/SAVE/IMAGE', img)
+
+ ```
+
+
+## V. Release Note
+
+
+- 1.0.0
+
+ First release
+
+* 1.0.1
+
+ Support paddlehub2.0
+
diff --git a/modules/image/Image_gan/attgan_celeba/README_en.md b/modules/image/Image_gan/attgan_celeba/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..48808475316d209f437e12021881c61c48c32d7e
--- /dev/null
+++ b/modules/image/Image_gan/attgan_celeba/README_en.md
@@ -0,0 +1,110 @@
+# attgan_celeba
+
+|Module Name|attgan_celeba|
+| :--- | :---: |
+|Category |image generation|
+|Network |AttGAN|
+|Dataset|Celeba|
+|Fine-tuning supported or not |No|
+|Module Size |167MB|
+|Latest update date|2021-02-26|
+|Data indicators |-|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+ ![](https://user-images.githubusercontent.com/35907364/137855667-43c5c40c-28f5-45d8-accc-028e185b988f.JPG)
+ The image attributes are: original image, Bald, Bangs, Black_Hair, Blond_Hair, Brown_Hair, Bushy_Eyebrows, Eyeglasses, Gender, Mouth_Slightly_Open, Mustache, No_Beard, Pale_Skin, Aged
+
+
+
+- ### Module Introduction
+
+ - AttGAN is a Generative Adversarial Network, which uses classification loss and reconstruction loss to train the network. The PaddleHub Module is trained one Celeba dataset and currently supports attributes of "Bald", "Bangs", "Black_Hair", "Blond_Hair", "Brown_Hair", "Bushy_Eyebrows", "Eyeglasses", "Gender", "Mouth_Slightly_Open", "Mustache", "No_Beard", "Pale_Skin", "Aged".
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 1.5.2
+
+ - paddlehub >= 1.0.0 | [How to install PaddleHub](../../../../docs/docs_en/get_start/installation.rst)
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install attgan_celeba==1.0.0
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md).
+
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run attgan_celeba --image "/PATH/TO/IMAGE" --style "target_attribute"
+ ```
+
+ - **Parameters**
+
+ - image: Input image path.
+
+ - style: Specify the attributes to be converted. The options are "Bald", "Bangs", "Black_Hair", "Blond_Hair", "Brown_Hair", "Bushy_Eyebrows", "Eyeglasses", "Gender", "Mouth_Slightly_Open", "Mustache", "No_Beard", "Pale_Skin", "Aged". You can choose one of the options.
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+
+ attgan = hub.Module(name="attgan_celeba")
+
+ test_img_path = ["/PATH/TO/IMAGE"]
+ trans_attr = ["Bangs"]
+
+ # set input dict
+ input_dict = {"image": test_img_path, "style": trans_attr}
+
+ # execute predict and print the result
+ results = attgan.generate(data=input_dict)
+ print(results)
+ ```
+
+- ### 3、API
+
+ - ```python
+ def generate(data)
+ ```
+
+ - Style transfer API.
+
+ - **Parameter**
+
+ - data(list[dict]): Each element in the list is dict and each field is:
+ - image (list\[str\]): Each element in the list is the path of the image to be converted.
+ - style (list\[str\]): Each element in the list is a string, fill in the face attributes to be converted.
+
+ - **Return**
+ - res (list\[str\]): Save path of the result.
+
+
+
+## IV. Release Note
+
+- 1.0.0
+
+ First release
+
+
diff --git a/modules/image/Image_gan/cyclegan_cityscapes/README_en.md b/modules/image/Image_gan/cyclegan_cityscapes/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..dc310e8f1592773400e1d413df5425f82742ff00
--- /dev/null
+++ b/modules/image/Image_gan/cyclegan_cityscapes/README_en.md
@@ -0,0 +1,109 @@
+# cyclegan_cityscapes
+
+|Module Name|cyclegan_cityscapes|
+| :--- | :---: |
+|Category |Image generation|
+|Network |CycleGAN|
+|Dataset|Cityscapes|
+|Fine-tuning supported or not |No|
+|Module Size |33MB|
+|Latest update date |2021-02-26|
+|Data indicators|-|
+
+
+## I. Basic Information
+
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+ Input image
+
+
+
+ Output image
+
+
+
+
+- ### Module Introduction
+
+ - CycleGAN belongs to Generative Adversarial Networks(GANs). Unlike traditional GANs that can only generate pictures in one direction, CycleGAN can simultaneously complete the style transfer of two domains. The PaddleHub Module is trained by Cityscapes dataset, and supports the conversion from real images to semantic segmentation results, and also supports conversion from semantic segmentation results to real images.
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 1.4.0
+
+ - paddlehub >= 1.1.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install cyclegan_cityscapes==1.0.0
+ ```
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+ - ```shell
+ $ hub run cyclegan_cityscapes --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - **Parameters**
+
+ - input_path: image path
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+
+ cyclegan = hub.Module(name="cyclegan_cityscapes")
+
+ test_img_path = "/PATH/TO/IMAGE"
+
+ # set input dict
+ input_dict = {"image": [test_img_path]}
+
+ # execute predict and print the result
+ results = cyclegan.generate(data=input_dict)
+ print(results)
+ ```
+
+- ### 3、API
+
+ - ```python
+ def generate(data)
+ ```
+
+ - Style transfer API.
+
+ - **Parameters**
+
+ - data(list[dict]): Each element in the list is dict and each field is:
+ - image (list\[str\]): Image path.
+
+ - **Return**
+ - res (list\[str\]): The list of style transfer results, where each element is dict and each field is:
+ - origin: Original input path.
+ - generated: Save path of images.
+
+
+
+## IV. Release Note
+
+* 1.0.0
+
+ First release
+
diff --git a/modules/image/Image_gan/gan/first_order_motion/README.md b/modules/image/Image_gan/gan/first_order_motion/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..ffca34eb9f96a9037a0b95e23b2ae20ded537b16
--- /dev/null
+++ b/modules/image/Image_gan/gan/first_order_motion/README.md
@@ -0,0 +1,95 @@
+# first_order_motion
+
+|模型名称|first_order_motion|
+| :--- | :---: |
+|类别|图像 - 图像生成|
+|网络|S3FD|
+|数据集|-|
+|是否支持Fine-tuning|否|
+|模型大小|343MB|
+|最新更新日期|2021-12-24|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+ - 样例结果示例:
+
+
+
+ 输入图像
+
+
+
+ 输入视频
+
+
+
+ 输出视频
+
+
+
+- ### 模型介绍
+
+ - First Order Motion的任务是图像动画/Image Animation,即输入为一张源图片和一个驱动视频,源图片中的人物则会做出驱动视频中的动作。
+
+
+## 二、安装
+
+- ### 1、环境依赖
+ - paddlepaddle >= 2.1.0
+ - paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install first_order_motion
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run first_order_motion --source_image "/PATH/TO/IMAGE" --driving_video "/PATH/TO/VIDEO" --use_gpu
+ ```
+ - 通过命令行方式实现视频驱动生成模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+
+ module = hub.Module(name="first_order_motion")
+ module.generate(source_image="/PATH/TO/IMAGE", driving_video="/PATH/TO/VIDEO", ratio=0.4, image_size=256, output_dir='./motion_driving_result/', filename='result.mp4', use_gpu=False)
+ ```
+
+- ### 3、API
+
+ - ```python
+ generate(self, source_image=None, driving_video=None, ratio=0.4, image_size=256, output_dir='./motion_driving_result/', filename='result.mp4', use_gpu=False)
+ ```
+ - 视频驱动生成API。
+
+ - **参数**
+ - source_image (str): 原始图片,支持单人图片和多人图片,视频中人物的表情动作将迁移到该原始图片中的人物上。
+ - driving_video (str): 驱动视频,视频中人物的表情动作作为待迁移的对象。
+ - ratio (float): 贴回驱动生成的人脸区域占原图的比例, 用户需要根据生成的效果调整该参数,尤其对于多人脸距离比较近的情况下需要调整改参数, 默认为0.4,调整范围是[0.4, 0.5]。
+ - image_size (int): 图片人脸大小,默认为256,可设置为512。
+ - output\_dir (str): 结果保存的文件夹名;
+ - filename (str): 结果保存的文件名。
+ - use\_gpu (bool): 是否使用 GPU;
+
+
+## 四、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install first_order_motion==1.0.0
+ ```
diff --git a/modules/image/Image_gan/gan/first_order_motion/model.py b/modules/image/Image_gan/gan/first_order_motion/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..35b180d4283f86644ab16d1170e99f6d8bb5d5cf
--- /dev/null
+++ b/modules/image/Image_gan/gan/first_order_motion/model.py
@@ -0,0 +1,352 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import os
+import sys
+import math
+import pickle
+
+import yaml
+import imageio
+import numpy as np
+from tqdm import tqdm
+from scipy.spatial import ConvexHull
+import cv2
+import paddle
+from ppgan.utils.download import get_path_from_url
+from ppgan.utils.animate import normalize_kp
+from ppgan.modules.keypoint_detector import KPDetector
+from ppgan.models.generators.occlusion_aware import OcclusionAwareGenerator
+from ppgan.faceutils import face_detection
+
+
+class FirstOrderPredictor:
+ def __init__(self,
+ weight_path=None,
+ config=None,
+ image_size=256,
+ relative=True,
+ adapt_scale=False,
+ find_best_frame=False,
+ best_frame=None,
+ face_detector='sfd',
+ multi_person=False,
+ face_enhancement=True,
+ batch_size=1,
+ mobile_net=False):
+ if config is not None and isinstance(config, str):
+ with open(config) as f:
+ self.cfg = yaml.load(f, Loader=yaml.SafeLoader)
+ elif isinstance(config, dict):
+ self.cfg = config
+ elif config is None:
+ self.cfg = {
+ 'model': {
+ 'common_params': {
+ 'num_kp': 10,
+ 'num_channels': 3,
+ 'estimate_jacobian': True
+ },
+ 'generator': {
+ 'kp_detector_cfg': {
+ 'temperature': 0.1,
+ 'block_expansion': 32,
+ 'max_features': 1024,
+ 'scale_factor': 0.25,
+ 'num_blocks': 5
+ },
+ 'generator_cfg': {
+ 'block_expansion': 64,
+ 'max_features': 512,
+ 'num_down_blocks': 2,
+ 'num_bottleneck_blocks': 6,
+ 'estimate_occlusion_map': True,
+ 'dense_motion_params': {
+ 'block_expansion': 64,
+ 'max_features': 1024,
+ 'num_blocks': 5,
+ 'scale_factor': 0.25
+ }
+ }
+ }
+ }
+ }
+ self.image_size = image_size
+ if weight_path is None:
+ if mobile_net:
+ vox_cpk_weight_url = 'https://paddlegan.bj.bcebos.com/applications/first_order_model/vox-mobile.pdparams'
+
+ else:
+ if self.image_size == 512:
+ vox_cpk_weight_url = 'https://paddlegan.bj.bcebos.com/applications/first_order_model/vox-cpk-512.pdparams'
+ else:
+ vox_cpk_weight_url = 'https://paddlegan.bj.bcebos.com/applications/first_order_model/vox-cpk.pdparams'
+ weight_path = get_path_from_url(vox_cpk_weight_url)
+
+ self.weight_path = weight_path
+ self.relative = relative
+ self.adapt_scale = adapt_scale
+ self.find_best_frame = find_best_frame
+ self.best_frame = best_frame
+ self.face_detector = face_detector
+ self.generator, self.kp_detector = self.load_checkpoints(self.cfg, self.weight_path)
+ self.multi_person = multi_person
+ self.face_enhancement = face_enhancement
+ self.batch_size = batch_size
+ if face_enhancement:
+ from ppgan.faceutils.face_enhancement import FaceEnhancement
+ self.faceenhancer = FaceEnhancement(batch_size=batch_size)
+
+ def read_img(self, path):
+ img = imageio.imread(path)
+ if img.ndim == 2:
+ img = np.expand_dims(img, axis=2)
+ # som images have 4 channels
+ if img.shape[2] > 3:
+ img = img[:, :, :3]
+ return img
+
+ def run(self, source_image, driving_video, ratio, image_size, output_dir, filename):
+ self.ratio = ratio
+ self.image_size = image_size
+ self.output = output_dir
+ self.filename = filename
+ if not os.path.exists(output_dir):
+ os.makedirs(output_dir)
+
+ def get_prediction(face_image):
+ if self.find_best_frame or self.best_frame is not None:
+ i = self.best_frame if self.best_frame is not None else self.find_best_frame_func(
+ source_image, driving_video)
+
+ print("Best frame: " + str(i))
+ driving_forward = driving_video[i:]
+ driving_backward = driving_video[:(i + 1)][::-1]
+ predictions_forward = self.make_animation(
+ face_image,
+ driving_forward,
+ self.generator,
+ self.kp_detector,
+ relative=self.relative,
+ adapt_movement_scale=self.adapt_scale)
+ predictions_backward = self.make_animation(
+ face_image,
+ driving_backward,
+ self.generator,
+ self.kp_detector,
+ relative=self.relative,
+ adapt_movement_scale=self.adapt_scale)
+ predictions = predictions_backward[::-1] + predictions_forward[1:]
+ else:
+ predictions = self.make_animation(
+ face_image,
+ driving_video,
+ self.generator,
+ self.kp_detector,
+ relative=self.relative,
+ adapt_movement_scale=self.adapt_scale)
+ return predictions
+
+ source_image = self.read_img(source_image)
+ reader = imageio.get_reader(driving_video)
+ fps = reader.get_meta_data()['fps']
+ driving_video = []
+ try:
+ for im in reader:
+ driving_video.append(im)
+ except RuntimeError:
+ print("Read driving video error!")
+ pass
+ reader.close()
+
+ driving_video = [cv2.resize(frame, (self.image_size, self.image_size)) / 255.0 for frame in driving_video]
+ results = []
+
+ bboxes = self.extract_bbox(source_image.copy())
+ print(str(len(bboxes)) + " persons have been detected")
+
+ # for multi person
+ for rec in bboxes:
+ face_image = source_image.copy()[rec[1]:rec[3], rec[0]:rec[2]]
+ face_image = cv2.resize(face_image, (self.image_size, self.image_size)) / 255.0
+ predictions = get_prediction(face_image)
+ results.append({'rec': rec, 'predict': [predictions[i] for i in range(predictions.shape[0])]})
+ if len(bboxes) == 1 or not self.multi_person:
+ break
+ out_frame = []
+
+ for i in range(len(driving_video)):
+ frame = source_image.copy()
+ for result in results:
+ x1, y1, x2, y2, _ = result['rec']
+ h = y2 - y1
+ w = x2 - x1
+ out = result['predict'][i]
+ out = cv2.resize(out.astype(np.uint8), (x2 - x1, y2 - y1))
+ if len(results) == 1:
+ frame[y1:y2, x1:x2] = out
+ break
+ else:
+ patch = np.zeros(frame.shape).astype('uint8')
+ patch[y1:y2, x1:x2] = out
+ mask = np.zeros(frame.shape[:2]).astype('uint8')
+ cx = int((x1 + x2) / 2)
+ cy = int((y1 + y2) / 2)
+ cv2.circle(mask, (cx, cy), math.ceil(h * self.ratio), (255, 255, 255), -1, 8, 0)
+ frame = cv2.copyTo(patch, mask, frame)
+
+ out_frame.append(frame)
+ imageio.mimsave(os.path.join(self.output, self.filename), [frame for frame in out_frame], fps=fps)
+
+ def load_checkpoints(self, config, checkpoint_path):
+
+ generator = OcclusionAwareGenerator(
+ **config['model']['generator']['generator_cfg'], **config['model']['common_params'], inference=True)
+
+ kp_detector = KPDetector(**config['model']['generator']['kp_detector_cfg'], **config['model']['common_params'])
+
+ checkpoint = paddle.load(self.weight_path)
+ generator.set_state_dict(checkpoint['generator'])
+
+ kp_detector.set_state_dict(checkpoint['kp_detector'])
+
+ generator.eval()
+ kp_detector.eval()
+
+ return generator, kp_detector
+
+ def make_animation(self,
+ source_image,
+ driving_video,
+ generator,
+ kp_detector,
+ relative=True,
+ adapt_movement_scale=True):
+ with paddle.no_grad():
+ predictions = []
+ source = paddle.to_tensor(source_image[np.newaxis].astype(np.float32)).transpose([0, 3, 1, 2])
+
+ driving = paddle.to_tensor(np.array(driving_video).astype(np.float32)).transpose([0, 3, 1, 2])
+ kp_source = kp_detector(source)
+ kp_driving_initial = kp_detector(driving[0:1])
+ kp_source_batch = {}
+ kp_source_batch["value"] = paddle.tile(kp_source["value"], repeat_times=[self.batch_size, 1, 1])
+ kp_source_batch["jacobian"] = paddle.tile(kp_source["jacobian"], repeat_times=[self.batch_size, 1, 1, 1])
+ source = paddle.tile(source, repeat_times=[self.batch_size, 1, 1, 1])
+ begin_idx = 0
+ for frame_idx in tqdm(range(int(np.ceil(float(driving.shape[0]) / self.batch_size)))):
+ frame_num = min(self.batch_size, driving.shape[0] - begin_idx)
+ driving_frame = driving[begin_idx:begin_idx + frame_num]
+ kp_driving = kp_detector(driving_frame)
+ kp_source_img = {}
+ kp_source_img["value"] = kp_source_batch["value"][0:frame_num]
+ kp_source_img["jacobian"] = kp_source_batch["jacobian"][0:frame_num]
+
+ kp_norm = normalize_kp(
+ kp_source=kp_source,
+ kp_driving=kp_driving,
+ kp_driving_initial=kp_driving_initial,
+ use_relative_movement=relative,
+ use_relative_jacobian=relative,
+ adapt_movement_scale=adapt_movement_scale)
+
+ out = generator(source[0:frame_num], kp_source=kp_source_img, kp_driving=kp_norm)
+ img = np.transpose(out['prediction'].numpy(), [0, 2, 3, 1]) * 255.0
+
+ if self.face_enhancement:
+ img = self.faceenhancer.enhance_from_batch(img)
+
+ predictions.append(img)
+ begin_idx += frame_num
+ return np.concatenate(predictions)
+
+ def find_best_frame_func(self, source, driving):
+ import face_alignment
+
+ def normalize_kp(kp):
+ kp = kp - kp.mean(axis=0, keepdims=True)
+ area = ConvexHull(kp[:, :2]).volume
+ area = np.sqrt(area)
+ kp[:, :2] = kp[:, :2] / area
+ return kp
+
+ fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, flip_input=True)
+
+ kp_source = fa.get_landmarks(255 * source)[0]
+ kp_source = normalize_kp(kp_source)
+ norm = float('inf')
+ frame_num = 0
+ for i, image in tqdm(enumerate(driving)):
+ kp_driving = fa.get_landmarks(255 * image)[0]
+ kp_driving = normalize_kp(kp_driving)
+ new_norm = (np.abs(kp_source - kp_driving)**2).sum()
+ if new_norm < norm:
+ norm = new_norm
+ frame_num = i
+ return frame_num
+
+ def extract_bbox(self, image):
+ detector = face_detection.FaceAlignment(
+ face_detection.LandmarksType._2D, flip_input=False, face_detector=self.face_detector)
+
+ frame = [image]
+ predictions = detector.get_detections_for_image(np.array(frame))
+ person_num = len(predictions)
+ if person_num == 0:
+ return np.array([])
+ results = []
+ face_boxs = []
+ h, w, _ = image.shape
+ for rect in predictions:
+ bh = rect[3] - rect[1]
+ bw = rect[2] - rect[0]
+ cy = rect[1] + int(bh / 2)
+ cx = rect[0] + int(bw / 2)
+ margin = max(bh, bw)
+ y1 = max(0, cy - margin)
+ x1 = max(0, cx - int(0.8 * margin))
+ y2 = min(h, cy + margin)
+ x2 = min(w, cx + int(0.8 * margin))
+ area = (y2 - y1) * (x2 - x1)
+ results.append([x1, y1, x2, y2, area])
+ # if a person has more than one bbox, keep the largest one
+ # maybe greedy will be better?
+ sorted(results, key=lambda area: area[4], reverse=True)
+ results_box = [results[0]]
+ for i in range(1, person_num):
+ num = len(results_box)
+ add_person = True
+ for j in range(num):
+ pre_person = results_box[j]
+ iou = self.IOU(pre_person[0], pre_person[1], pre_person[2], pre_person[3], pre_person[4], results[i][0],
+ results[i][1], results[i][2], results[i][3], results[i][4])
+ if iou > 0.5:
+ add_person = False
+ break
+ if add_person:
+ results_box.append(results[i])
+ boxes = np.array(results_box)
+ return boxes
+
+ def IOU(self, ax1, ay1, ax2, ay2, sa, bx1, by1, bx2, by2, sb):
+ #sa = abs((ax2 - ax1) * (ay2 - ay1))
+ #sb = abs((bx2 - bx1) * (by2 - by1))
+ x1, y1 = max(ax1, bx1), max(ay1, by1)
+ x2, y2 = min(ax2, bx2), min(ay2, by2)
+ w = x2 - x1
+ h = y2 - y1
+ if w < 0 or h < 0:
+ return 0.0
+ else:
+ return 1.0 * w * h / (sa + sb - w * h)
diff --git a/modules/image/Image_gan/gan/first_order_motion/module.py b/modules/image/Image_gan/gan/first_order_motion/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..b3d5ecb07b5756865d0e41678f2234520cbd46f6
--- /dev/null
+++ b/modules/image/Image_gan/gan/first_order_motion/module.py
@@ -0,0 +1,106 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import argparse
+import copy
+
+import paddle
+import paddlehub as hub
+from paddlehub.module.module import moduleinfo, runnable, serving
+import numpy as np
+import cv2
+from skimage.io import imread
+from skimage.transform import rescale, resize
+
+from .model import FirstOrderPredictor
+
+
+@moduleinfo(
+ name="first_order_motion", type="CV/gan", author="paddlepaddle", author_email="", summary="", version="1.0.0")
+class FirstOrderMotion:
+ def __init__(self):
+ self.pretrained_model = os.path.join(self.directory, "vox-cpk.pdparams")
+ self.network = FirstOrderPredictor(weight_path=self.pretrained_model, face_enhancement=True)
+
+ def generate(self,
+ source_image=None,
+ driving_video=None,
+ ratio=0.4,
+ image_size=256,
+ output_dir='./motion_driving_result/',
+ filename='result.mp4',
+ use_gpu=False):
+ '''
+ source_image (str): path to image
+ driving_video (str) : path to driving_video
+ ratio: margin ratio
+ image_size: size of image
+ output_dir: the dir to save the results
+ filename: filename to save the results
+ use_gpu: if True, use gpu to perform the computation, otherwise cpu.
+ '''
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if source_image == None or driving_video == None:
+ print('No image or driving video provided. Please input an image and a driving video.')
+ return
+ self.network.run(source_image, driving_video, ratio, image_size, output_dir, filename)
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ self.generate(
+ source_image=self.args.source_image,
+ driving_video=self.args.driving_video,
+ ratio=self.args.ratio,
+ image_size=self.args.image_size,
+ output_dir=self.args.output_dir,
+ use_gpu=self.args.use_gpu)
+ return
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default='motion_driving_result', help='output directory for saving result.')
+ self.arg_config_group.add_argument("--filename", default='result.mp4', help="filename to output")
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument("--source_image", type=str, help="path to source image")
+ self.arg_input_group.add_argument("--driving_video", type=str, help="path to driving video")
+ self.arg_input_group.add_argument("--ratio", dest="ratio", type=float, default=0.4, help="margin ratio")
+ self.arg_input_group.add_argument(
+ "--image_size", dest="image_size", type=int, default=256, help="size of image")
diff --git a/modules/image/Image_gan/gan/first_order_motion/requirements.txt b/modules/image/Image_gan/gan/first_order_motion/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..67e9bb6fa840355e9ed0d44b7134850f1fe22fe1
--- /dev/null
+++ b/modules/image/Image_gan/gan/first_order_motion/requirements.txt
@@ -0,0 +1 @@
+ppgan
diff --git a/modules/image/Image_gan/gan/photopen/README.md b/modules/image/Image_gan/gan/photopen/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..73c80f9ad381b2adaeb7ab28d95c702b6cc55102
--- /dev/null
+++ b/modules/image/Image_gan/gan/photopen/README.md
@@ -0,0 +1,126 @@
+# photopen
+
+|模型名称|photopen|
+| :--- | :---: |
+|类别|图像 - 图像生成|
+|网络|SPADEGenerator|
+|数据集|coco_stuff|
+|是否支持Fine-tuning|否|
+|模型大小|74MB|
+|最新更新日期|2021-12-14|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本模块采用一个像素风格迁移网络 Pix2PixHD,能够根据输入的语义分割标签生成照片风格的图片。为了解决模型归一化层导致标签语义信息丢失的问题,向 Pix2PixHD 的生成器网络中添加了 SPADE(Spatially-Adaptive
+ Normalization)空间自适应归一化模块,通过两个卷积层保留了归一化时训练的缩放与偏置参数的空间维度,以增强生成图片的质量。语义风格标签图像可以参考[coco_stuff数据集](https://github.com/nightrome/cocostuff)获取, 也可以通过[PaddleGAN repo中的该项目](https://github.com/PaddlePaddle/PaddleGAN/blob/87537ad9d4eeda17eaa5916c6a585534ab989ea8/docs/zh_CN/tutorials/photopen.md)来自定义生成图像进行体验。
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+ - ppgan
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install photopen
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ # Read from a file
+ $ hub run photopen --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现图像生成模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+
+ module = hub.Module(name="photopen")
+ input_path = ["/PATH/TO/IMAGE"]
+ # Read from a file
+ module.photo_transfer(paths=input_path, output_dir='./transfer_result/', use_gpu=True)
+ ```
+
+- ### 3、API
+
+ - ```python
+ photo_transfer(images=None, paths=None, output_dir='./transfer_result/', use_gpu=False, visualization=True):
+ ```
+ - 图像转换生成API。
+
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\];
+ - paths (list\[str\]): 图片的路径;
+ - output\_dir (str): 结果保存的路径;
+ - use\_gpu (bool): 是否使用 GPU;
+ - visualization(bool): 是否保存结果到本地文件夹
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像转换生成服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m photopen
+ ```
+
+ - 这样就完成了一个图像转换生成的在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/photopen"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install photopen==1.0.0
+ ```
diff --git a/modules/image/Image_gan/gan/photopen/model.py b/modules/image/Image_gan/gan/photopen/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..4a0b0a4836b010ca4d72995c8857a8bb0ddd7aa2
--- /dev/null
+++ b/modules/image/Image_gan/gan/photopen/model.py
@@ -0,0 +1,62 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+
+import cv2
+import numpy as np
+import paddle
+from PIL import Image
+from PIL import ImageOps
+from ppgan.models.generators import SPADEGenerator
+from ppgan.utils.filesystem import load
+from ppgan.utils.photopen import data_onehot_pro
+
+
+class PhotoPenPredictor:
+ def __init__(self, weight_path, gen_cfg):
+
+ # 初始化模型
+ gen = SPADEGenerator(
+ gen_cfg.ngf,
+ gen_cfg.num_upsampling_layers,
+ gen_cfg.crop_size,
+ gen_cfg.aspect_ratio,
+ gen_cfg.norm_G,
+ gen_cfg.semantic_nc,
+ gen_cfg.use_vae,
+ gen_cfg.nef,
+ )
+ gen.eval()
+ para = load(weight_path)
+ if 'net_gen' in para:
+ gen.set_state_dict(para['net_gen'])
+ else:
+ gen.set_state_dict(para)
+
+ self.gen = gen
+ self.gen_cfg = gen_cfg
+
+ def run(self, image):
+ sem = Image.fromarray(image).convert('L')
+ sem = sem.resize((self.gen_cfg.crop_size, self.gen_cfg.crop_size), Image.NEAREST)
+ sem = np.array(sem).astype('float32')
+ sem = paddle.to_tensor(sem)
+ sem = sem.reshape([1, 1, self.gen_cfg.crop_size, self.gen_cfg.crop_size])
+
+ one_hot = data_onehot_pro(sem, self.gen_cfg)
+ predicted = self.gen(one_hot)
+ pic = predicted.numpy()[0].reshape((3, 256, 256)).transpose((1, 2, 0))
+ pic = ((pic + 1.) / 2. * 255).astype('uint8')
+
+ return pic
diff --git a/modules/image/Image_gan/gan/photopen/module.py b/modules/image/Image_gan/gan/photopen/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..f8a23e574c9823c52daf2e07a318e344b8220b70
--- /dev/null
+++ b/modules/image/Image_gan/gan/photopen/module.py
@@ -0,0 +1,133 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from ppgan.utils.config import get_config
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import PhotoPenPredictor
+from .util import base64_to_cv2
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(
+ name="photopen", type="CV/style_transfer", author="paddlepaddle", author_email="", summary="", version="1.0.0")
+class Photopen:
+ def __init__(self):
+ self.pretrained_model = os.path.join(self.directory, "photopen.pdparams")
+ cfg = get_config(os.path.join(self.directory, "photopen.yaml"))
+ self.network = PhotoPenPredictor(weight_path=self.pretrained_model, gen_cfg=cfg.predict)
+
+ def photo_transfer(self,
+ images: list = None,
+ paths: list = None,
+ output_dir: str = './transfer_result/',
+ use_gpu: bool = False,
+ visualization: bool = True):
+ '''
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR(read by cv2).
+ paths (list[str]): paths to images
+
+ output_dir (str): the dir to save the results
+ use_gpu (bool): if True, use gpu to perform the computation, otherwise cpu.
+ visualization (bool): if True, save results in output_dir.
+ '''
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ out = self.network.run(image)
+ results.append(out)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ out = self.network.run(image)
+ results.append(out)
+
+ if visualization == True:
+ if not os.path.exists(output_dir):
+ os.makedirs(output_dir, exist_ok=True)
+ for i, out in enumerate(results):
+ if out is not None:
+ cv2.imwrite(os.path.join(output_dir, 'output_{}.png'.format(i)), out[:, :, ::-1])
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.photo_transfer(
+ paths=[self.args.input_path],
+ output_dir=self.args.output_dir,
+ use_gpu=self.args.use_gpu,
+ visualization=self.args.visualization)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.photo_transfer(images=images_decode, **kwargs)
+ tolist = [result.tolist() for result in results]
+ return tolist
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default='transfer_result', help='output directory for saving result.')
+ self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/Image_gan/gan/photopen/photopen.yaml b/modules/image/Image_gan/gan/photopen/photopen.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..178f361736c06f1f816997dc4a52a9a6bd62bcc9
--- /dev/null
+++ b/modules/image/Image_gan/gan/photopen/photopen.yaml
@@ -0,0 +1,95 @@
+total_iters: 1
+output_dir: output_dir
+checkpoints_dir: checkpoints
+
+model:
+ name: PhotoPenModel
+ generator:
+ name: SPADEGenerator
+ ngf: 24
+ num_upsampling_layers: normal
+ crop_size: 256
+ aspect_ratio: 1.0
+ norm_G: spectralspadebatch3x3
+ semantic_nc: 14
+ use_vae: False
+ nef: 16
+ discriminator:
+ name: MultiscaleDiscriminator
+ ndf: 128
+ num_D: 4
+ crop_size: 256
+ label_nc: 12
+ output_nc: 3
+ contain_dontcare_label: True
+ no_instance: False
+ n_layers_D: 6
+ criterion:
+ name: PhotoPenPerceptualLoss
+ crop_size: 224
+ lambda_vgg: 1.6
+ label_nc: 12
+ contain_dontcare_label: True
+ batchSize: 1
+ crop_size: 256
+ lambda_feat: 10.0
+
+dataset:
+ train:
+ name: PhotoPenDataset
+ content_root: test/coco_stuff
+ load_size: 286
+ crop_size: 256
+ num_workers: 0
+ batch_size: 1
+ test:
+ name: PhotoPenDataset_test
+ content_root: test/coco_stuff
+ load_size: 286
+ crop_size: 256
+ num_workers: 0
+ batch_size: 1
+
+lr_scheduler: # abundoned
+ name: LinearDecay
+ learning_rate: 0.0001
+ start_epoch: 99999
+ decay_epochs: 99999
+ # will get from real dataset
+ iters_per_epoch: 1
+
+optimizer:
+ lr: 0.0001
+ optimG:
+ name: Adam
+ net_names:
+ - net_gen
+ beta1: 0.9
+ beta2: 0.999
+ optimD:
+ name: Adam
+ net_names:
+ - net_des
+ beta1: 0.9
+ beta2: 0.999
+
+log_config:
+ interval: 1
+ visiual_interval: 1
+
+snapshot_config:
+ interval: 1
+
+predict:
+ name: SPADEGenerator
+ ngf: 24
+ num_upsampling_layers: normal
+ crop_size: 256
+ aspect_ratio: 1.0
+ norm_G: spectralspadebatch3x3
+ semantic_nc: 14
+ use_vae: False
+ nef: 16
+ contain_dontcare_label: True
+ label_nc: 12
+ batchSize: 1
diff --git a/modules/image/Image_gan/gan/photopen/requirements.txt b/modules/image/Image_gan/gan/photopen/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..67e9bb6fa840355e9ed0d44b7134850f1fe22fe1
--- /dev/null
+++ b/modules/image/Image_gan/gan/photopen/requirements.txt
@@ -0,0 +1 @@
+ppgan
diff --git a/modules/image/Image_gan/gan/photopen/util.py b/modules/image/Image_gan/gan/photopen/util.py
new file mode 100644
index 0000000000000000000000000000000000000000..531a0ae0d487822a870ba7f09817e658967aff10
--- /dev/null
+++ b/modules/image/Image_gan/gan/photopen/util.py
@@ -0,0 +1,11 @@
+import base64
+
+import cv2
+import numpy as np
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
diff --git a/modules/image/Image_gan/gan/pixel2style2pixel/README.md b/modules/image/Image_gan/gan/pixel2style2pixel/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..fa0c3925e23e62f30d6c4b3635c62a0ba1dfb6dd
--- /dev/null
+++ b/modules/image/Image_gan/gan/pixel2style2pixel/README.md
@@ -0,0 +1,133 @@
+# pixel2style2pixel
+
+|模型名称|pixel2style2pixel|
+| :--- | :---: |
+|类别|图像 - 图像生成|
+|网络|Pixel2Style2Pixel|
+|数据集|-|
+|是否支持Fine-tuning|否|
+|模型大小|1.7GB|
+|最新更新日期|2021-12-14|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+ - 样例结果示例:
+
+
+
+ 输入图像
+
+
+
+ 输出图像
+
+
+
+- ### 模型介绍
+
+ - Pixel2Style2Pixel使用相当大的模型对图像进行编码,将图像编码到StyleGAN V2的风格向量空间中,使编码前的图像和解码后的图像具有强关联性。该模块应用于人脸转正任务。
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.1.0
+ - paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+- ### 2、安装
+
+ - ```shell
+ $ hub install pixel2style2pixel
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ # Read from a file
+ $ hub run pixel2style2pixel --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现人脸转正模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+
+ module = hub.Module(name="pixel2style2pixel")
+ input_path = ["/PATH/TO/IMAGE"]
+ # Read from a file
+ module.style_transfer(paths=input_path, output_dir='./transfer_result/', use_gpu=True)
+ ```
+
+- ### 3、API
+
+ - ```python
+ style_transfer(images=None, paths=None, output_dir='./transfer_result/', use_gpu=False, visualization=True):
+ ```
+ - 人脸转正生成API。
+
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\];
+ - paths (list\[str\]): 图片的路径;
+ - output\_dir (str): 结果保存的路径;
+ - use\_gpu (bool): 是否使用 GPU;
+ - visualization(bool): 是否保存结果到本地文件夹
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线人脸转正服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m pixel2style2pixel
+ ```
+
+ - 这样就完成了一个人脸转正的在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/pixel2style2pixel"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install pixel2style2pixel==1.0.0
+ ```
diff --git a/modules/image/Image_gan/gan/pixel2style2pixel/model.py b/modules/image/Image_gan/gan/pixel2style2pixel/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..e82fbc8ead5e2545628e59fff817b3a378d63560
--- /dev/null
+++ b/modules/image/Image_gan/gan/pixel2style2pixel/model.py
@@ -0,0 +1,205 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import cv2
+import scipy
+import random
+import numpy as np
+import paddle
+import paddle.vision.transforms as T
+import ppgan.faceutils as futils
+from ppgan.models.generators import Pixel2Style2Pixel
+from ppgan.utils.download import get_path_from_url
+from PIL import Image
+
+model_cfgs = {
+ 'ffhq-inversion': {
+ 'model_urls':
+ 'https://paddlegan.bj.bcebos.com/models/pSp-ffhq-inversion.pdparams',
+ 'transform':
+ T.Compose([T.Resize((256, 256)),
+ T.Transpose(),
+ T.Normalize([127.5, 127.5, 127.5], [127.5, 127.5, 127.5])]),
+ 'size':
+ 1024,
+ 'style_dim':
+ 512,
+ 'n_mlp':
+ 8,
+ 'channel_multiplier':
+ 2
+ },
+ 'ffhq-toonify': {
+ 'model_urls':
+ 'https://paddlegan.bj.bcebos.com/models/pSp-ffhq-toonify.pdparams',
+ 'transform':
+ T.Compose([T.Resize((256, 256)),
+ T.Transpose(),
+ T.Normalize([127.5, 127.5, 127.5], [127.5, 127.5, 127.5])]),
+ 'size':
+ 1024,
+ 'style_dim':
+ 512,
+ 'n_mlp':
+ 8,
+ 'channel_multiplier':
+ 2
+ },
+ 'default': {
+ 'transform':
+ T.Compose([T.Resize((256, 256)),
+ T.Transpose(),
+ T.Normalize([127.5, 127.5, 127.5], [127.5, 127.5, 127.5])])
+ }
+}
+
+
+def run_alignment(image):
+ img = Image.fromarray(image).convert("RGB")
+ face = futils.dlib.detect(img)
+ if not face:
+ raise Exception('Could not find a face in the given image.')
+ face_on_image = face[0]
+ lm = futils.dlib.landmarks(img, face_on_image)
+ lm = np.array(lm)[:, ::-1]
+ lm_eye_left = lm[36:42]
+ lm_eye_right = lm[42:48]
+ lm_mouth_outer = lm[48:60]
+
+ output_size = 1024
+ transform_size = 4096
+ enable_padding = True
+
+ # Calculate auxiliary vectors.
+ eye_left = np.mean(lm_eye_left, axis=0)
+ eye_right = np.mean(lm_eye_right, axis=0)
+ eye_avg = (eye_left + eye_right) * 0.5
+ eye_to_eye = eye_right - eye_left
+ mouth_left = lm_mouth_outer[0]
+ mouth_right = lm_mouth_outer[6]
+ mouth_avg = (mouth_left + mouth_right) * 0.5
+ eye_to_mouth = mouth_avg - eye_avg
+
+ # Choose oriented crop rectangle.
+ x = eye_to_eye - np.flipud(eye_to_mouth) * [-1, 1]
+ x /= np.hypot(*x)
+ x *= max(np.hypot(*eye_to_eye) * 2.0, np.hypot(*eye_to_mouth) * 1.8)
+ y = np.flipud(x) * [-1, 1]
+ c = eye_avg + eye_to_mouth * 0.1
+ quad = np.stack([c - x - y, c - x + y, c + x + y, c + x - y])
+ qsize = np.hypot(*x) * 2
+
+ # Shrink.
+ shrink = int(np.floor(qsize / output_size * 0.5))
+ if shrink > 1:
+ rsize = (int(np.rint(float(img.size[0]) / shrink)), int(np.rint(float(img.size[1]) / shrink)))
+ img = img.resize(rsize, Image.ANTIALIAS)
+ quad /= shrink
+ qsize /= shrink
+
+ # Crop.
+ border = max(int(np.rint(qsize * 0.1)), 3)
+ crop = (int(np.floor(min(quad[:, 0]))), int(np.floor(min(quad[:, 1]))), int(np.ceil(max(quad[:, 0]))),
+ int(np.ceil(max(quad[:, 1]))))
+ crop = (max(crop[0] - border, 0), max(crop[1] - border, 0), min(crop[2] + border, img.size[0]),
+ min(crop[3] + border, img.size[1]))
+ if crop[2] - crop[0] < img.size[0] or crop[3] - crop[1] < img.size[1]:
+ img = img.crop(crop)
+ quad -= crop[0:2]
+
+ # Pad.
+ pad = (int(np.floor(min(quad[:, 0]))), int(np.floor(min(quad[:, 1]))), int(np.ceil(max(quad[:, 0]))),
+ int(np.ceil(max(quad[:, 1]))))
+ pad = (max(-pad[0] + border, 0), max(-pad[1] + border, 0), max(pad[2] - img.size[0] + border, 0),
+ max(pad[3] - img.size[1] + border, 0))
+ if enable_padding and max(pad) > border - 4:
+ pad = np.maximum(pad, int(np.rint(qsize * 0.3)))
+ img = np.pad(np.float32(img), ((pad[1], pad[3]), (pad[0], pad[2]), (0, 0)), 'reflect')
+ h, w, _ = img.shape
+ y, x, _ = np.ogrid[:h, :w, :1]
+ mask = np.maximum(1.0 - np.minimum(np.float32(x) / pad[0],
+ np.float32(w - 1 - x) / pad[2]),
+ 1.0 - np.minimum(np.float32(y) / pad[1],
+ np.float32(h - 1 - y) / pad[3]))
+ blur = qsize * 0.02
+ img += (scipy.ndimage.gaussian_filter(img, [blur, blur, 0]) - img) * np.clip(mask * 3.0 + 1.0, 0.0, 1.0)
+ img += (np.median(img, axis=(0, 1)) - img) * np.clip(mask, 0.0, 1.0)
+ img = Image.fromarray(np.uint8(np.clip(np.rint(img), 0, 255)), 'RGB')
+ quad += pad[:2]
+
+ # Transform.
+ img = img.transform((transform_size, transform_size), Image.QUAD, (quad + 0.5).flatten(), Image.BILINEAR)
+
+ return img
+
+
+class AttrDict(dict):
+ def __init__(self, *args, **kwargs):
+ super(AttrDict, self).__init__(*args, **kwargs)
+ self.__dict__ = self
+
+
+class Pixel2Style2PixelPredictor:
+ def __init__(self,
+ weight_path=None,
+ model_type=None,
+ seed=None,
+ size=1024,
+ style_dim=512,
+ n_mlp=8,
+ channel_multiplier=2):
+
+ if weight_path is None and model_type != 'default':
+ if model_type in model_cfgs.keys():
+ weight_path = get_path_from_url(model_cfgs[model_type]['model_urls'])
+ size = model_cfgs[model_type].get('size', size)
+ style_dim = model_cfgs[model_type].get('style_dim', style_dim)
+ n_mlp = model_cfgs[model_type].get('n_mlp', n_mlp)
+ channel_multiplier = model_cfgs[model_type].get('channel_multiplier', channel_multiplier)
+ checkpoint = paddle.load(weight_path)
+ else:
+ raise ValueError('Predictor need a weight path or a pretrained model type')
+ else:
+ checkpoint = paddle.load(weight_path)
+
+ opts = checkpoint.pop('opts')
+ opts = AttrDict(opts)
+ opts['size'] = size
+ opts['style_dim'] = style_dim
+ opts['n_mlp'] = n_mlp
+ opts['channel_multiplier'] = channel_multiplier
+
+ self.generator = Pixel2Style2Pixel(opts)
+ self.generator.set_state_dict(checkpoint)
+ self.generator.eval()
+
+ if seed is not None:
+ paddle.seed(seed)
+ random.seed(seed)
+ np.random.seed(seed)
+
+ self.model_type = 'default' if model_type is None else model_type
+
+ def run(self, image):
+ src_img = run_alignment(image)
+ src_img = np.asarray(src_img)
+ transformed_image = model_cfgs[self.model_type]['transform'](src_img)
+ dst_img, latents = self.generator(
+ paddle.to_tensor(transformed_image[None, ...]), resize=False, return_latents=True)
+ dst_img = (dst_img * 0.5 + 0.5)[0].numpy() * 255
+ dst_img = dst_img.transpose((1, 2, 0))
+ dst_npy = latents[0].numpy()
+
+ return dst_img, dst_npy
diff --git a/modules/image/Image_gan/gan/pixel2style2pixel/module.py b/modules/image/Image_gan/gan/pixel2style2pixel/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..fb054a6f09becd52790df9437abb6de28f42118d
--- /dev/null
+++ b/modules/image/Image_gan/gan/pixel2style2pixel/module.py
@@ -0,0 +1,137 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import argparse
+import copy
+
+import paddle
+import paddlehub as hub
+from paddlehub.module.module import moduleinfo, runnable, serving
+import numpy as np
+import cv2
+from skimage.io import imread
+from skimage.transform import rescale, resize
+
+from .model import Pixel2Style2PixelPredictor
+from .util import base64_to_cv2
+
+
+@moduleinfo(
+ name="pixel2style2pixel",
+ type="CV/style_transfer",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class pixel2style2pixel:
+ def __init__(self):
+ self.pretrained_model = os.path.join(self.directory, "pSp-ffhq-inversion.pdparams")
+
+ self.network = Pixel2Style2PixelPredictor(weight_path=self.pretrained_model, model_type='ffhq-inversion')
+
+ def style_transfer(self,
+ images=None,
+ paths=None,
+ output_dir='./transfer_result/',
+ use_gpu=False,
+ visualization=True):
+ '''
+
+
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR(read by cv2).
+ paths (list[str]): paths to images
+ output_dir: the dir to save the results
+ use_gpu: if True, use gpu to perform the computation, otherwise cpu.
+ visualization: if True, save results in output_dir.
+ '''
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ out = self.network.run(image)
+ results.append(out)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ out = self.network.run(image)
+ results.append(out)
+
+ if visualization == True:
+ if not os.path.exists(output_dir):
+ os.makedirs(output_dir, exist_ok=True)
+ for i, out in enumerate(results):
+ if out is not None:
+ cv2.imwrite(os.path.join(output_dir, 'output_{}.png'.format(i)), out[0][:, :, ::-1])
+ np.save(os.path.join(output_dir, 'output_{}.npy'.format(i)), out[1])
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.style_transfer(
+ paths=[self.args.input_path],
+ output_dir=self.args.output_dir,
+ use_gpu=self.args.use_gpu,
+ visualization=self.args.visualization)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.style_transfer(images=images_decode, **kwargs)
+ tolist = [result.tolist() for result in results]
+ return tolist
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default='transfer_result', help='output directory for saving result.')
+ self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/Image_gan/gan/pixel2style2pixel/requirements.txt b/modules/image/Image_gan/gan/pixel2style2pixel/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d9bfc85782a3ee323241fe7beb87a9f281c120fe
--- /dev/null
+++ b/modules/image/Image_gan/gan/pixel2style2pixel/requirements.txt
@@ -0,0 +1,2 @@
+ppgan
+dlib
diff --git a/modules/image/Image_gan/gan/pixel2style2pixel/util.py b/modules/image/Image_gan/gan/pixel2style2pixel/util.py
new file mode 100644
index 0000000000000000000000000000000000000000..b88ac3562b74cadc1d4d6459a56097ca4a938a0b
--- /dev/null
+++ b/modules/image/Image_gan/gan/pixel2style2pixel/util.py
@@ -0,0 +1,10 @@
+import base64
+import cv2
+import numpy as np
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
diff --git a/modules/image/Image_gan/gan/stgan_bald/requirements.txt b/modules/image/Image_gan/gan/stgan_bald/requirements.txt
index 2d8443d02d090d830649fbfacbc11c8cebea8d34..00a00fcc8e48e65538cf8b73b2fd4e1157362f20 100644
--- a/modules/image/Image_gan/gan/stgan_bald/requirements.txt
+++ b/modules/image/Image_gan/gan/stgan_bald/requirements.txt
@@ -1,2 +1 @@
-paddlepaddle>=1.8.4
paddlehub>=1.8.0
diff --git a/modules/image/Image_gan/gan/styleganv2_editing/README.md b/modules/image/Image_gan/gan/styleganv2_editing/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..4728207bff29bfc281a799f2bc6581634ebaecfc
--- /dev/null
+++ b/modules/image/Image_gan/gan/styleganv2_editing/README.md
@@ -0,0 +1,134 @@
+# styleganv2_editing
+
+|模型名称|styleganv2_editing|
+| :--- | :---: |
+|类别|图像 - 图像生成|
+|网络|StyleGAN V2|
+|数据集|-|
+|是否支持Fine-tuning|否|
+|模型大小|190MB|
+|最新更新日期|2021-12-15|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+ - 样例结果示例:
+
+
+
+ 输入图像
+
+
+
+ 输出图像(修改age)
+
+
+
+- ### 模型介绍
+
+ - StyleGAN V2 的任务是使用风格向量进行image generation,而Editing模块则是利用预先对多图的风格向量进行分类回归得到的属性操纵向量来操纵生成图像的属性。
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+ - ppgan
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install styleganv2_editing
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ # Read from a file
+ $ hub run styleganv2_editing --input_path "/PATH/TO/IMAGE" --direction_name age --direction_offset 5
+ ```
+ - 通过命令行方式实现人脸编辑模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+
+ module = hub.Module(name="styleganv2_editing")
+ input_path = ["/PATH/TO/IMAGE"]
+ # Read from a file
+ module.generate(paths=input_path, direction_name = 'age', direction_offset = 5, output_dir='./editing_result/', use_gpu=True)
+ ```
+
+- ### 3、API
+
+ - ```python
+ generate(self, images=None, paths=None, direction_name = 'age', direction_offset = 0.0, output_dir='./editing_result/', use_gpu=False, visualization=True)
+ ```
+ - 人脸编辑生成API。
+
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据
+ - paths (list\[str\]): 图片路径;
+ - direction_name (str): 要编辑的属性名称,对于ffhq-conf-f有预先准备的这些属性: age、eyes_open、eye_distance、eye_eyebrow_distance、eye_ratio、gender、lip_ratio、mouth_open、mouth_ratio、nose_mouth_distance、nose_ratio、nose_tip、pitch、roll、smile、yaw
+ - direction_offset (float): 属性的偏移强度
+ - output\_dir (str): 结果保存的路径;
+ - use\_gpu (bool): 是否使用 GPU;
+ - visualization(bool): 是否保存结果到本地文件夹
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线人脸编辑服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m styleganv2_editing
+ ```
+
+ - 这样就完成了一个人脸编辑的在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/styleganv2_editing"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install styleganv2_editing==1.0.0
+ ```
diff --git a/modules/image/Image_gan/gan/styleganv2_editing/basemodel.py b/modules/image/Image_gan/gan/styleganv2_editing/basemodel.py
new file mode 100644
index 0000000000000000000000000000000000000000..37eca73d4e14965a1f69e818744aa435a7e3600f
--- /dev/null
+++ b/modules/image/Image_gan/gan/styleganv2_editing/basemodel.py
@@ -0,0 +1,140 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import random
+import numpy as np
+import paddle
+from ppgan.models.generators import StyleGANv2Generator
+from ppgan.utils.download import get_path_from_url
+from ppgan.utils.visual import make_grid, tensor2img, save_image
+
+model_cfgs = {
+ 'ffhq-config-f': {
+ 'model_urls': 'https://paddlegan.bj.bcebos.com/models/stylegan2-ffhq-config-f.pdparams',
+ 'size': 1024,
+ 'style_dim': 512,
+ 'n_mlp': 8,
+ 'channel_multiplier': 2
+ },
+ 'animeface-512': {
+ 'model_urls': 'https://paddlegan.bj.bcebos.com/models/stylegan2-animeface-512.pdparams',
+ 'size': 512,
+ 'style_dim': 512,
+ 'n_mlp': 8,
+ 'channel_multiplier': 2
+ }
+}
+
+
+@paddle.no_grad()
+def get_mean_style(generator):
+ mean_style = None
+
+ for i in range(10):
+ style = generator.mean_latent(1024)
+
+ if mean_style is None:
+ mean_style = style
+
+ else:
+ mean_style += style
+
+ mean_style /= 10
+ return mean_style
+
+
+@paddle.no_grad()
+def sample(generator, mean_style, n_sample):
+ image = generator(
+ [paddle.randn([n_sample, generator.style_dim])],
+ truncation=0.7,
+ truncation_latent=mean_style,
+ )[0]
+
+ return image
+
+
+@paddle.no_grad()
+def style_mixing(generator, mean_style, n_source, n_target):
+ source_code = paddle.randn([n_source, generator.style_dim])
+ target_code = paddle.randn([n_target, generator.style_dim])
+
+ resolution = 2**((generator.n_latent + 2) // 2)
+
+ images = [paddle.ones([1, 3, resolution, resolution]) * -1]
+
+ source_image = generator([source_code], truncation_latent=mean_style, truncation=0.7)[0]
+ target_image = generator([target_code], truncation_latent=mean_style, truncation=0.7)[0]
+
+ images.append(source_image)
+
+ for i in range(n_target):
+ image = generator(
+ [target_code[i].unsqueeze(0).tile([n_source, 1]), source_code],
+ truncation_latent=mean_style,
+ truncation=0.7,
+ )[0]
+ images.append(target_image[i].unsqueeze(0))
+ images.append(image)
+
+ images = paddle.concat(images, 0)
+
+ return images
+
+
+class StyleGANv2Predictor:
+ def __init__(self,
+ output_path='output_dir',
+ weight_path=None,
+ model_type=None,
+ seed=None,
+ size=1024,
+ style_dim=512,
+ n_mlp=8,
+ channel_multiplier=2):
+ self.output_path = output_path
+
+ if weight_path is None:
+ if model_type in model_cfgs.keys():
+ weight_path = get_path_from_url(model_cfgs[model_type]['model_urls'])
+ size = model_cfgs[model_type].get('size', size)
+ style_dim = model_cfgs[model_type].get('style_dim', style_dim)
+ n_mlp = model_cfgs[model_type].get('n_mlp', n_mlp)
+ channel_multiplier = model_cfgs[model_type].get('channel_multiplier', channel_multiplier)
+ checkpoint = paddle.load(weight_path)
+ else:
+ raise ValueError('Predictor need a weight path or a pretrained model type')
+ else:
+ checkpoint = paddle.load(weight_path)
+
+ self.generator = StyleGANv2Generator(size, style_dim, n_mlp, channel_multiplier)
+ self.generator.set_state_dict(checkpoint)
+ self.generator.eval()
+
+ if seed is not None:
+ paddle.seed(seed)
+ random.seed(seed)
+ np.random.seed(seed)
+
+ def run(self, n_row=3, n_col=5):
+ os.makedirs(self.output_path, exist_ok=True)
+ mean_style = get_mean_style(self.generator)
+
+ img = sample(self.generator, mean_style, n_row * n_col)
+ save_image(tensor2img(make_grid(img, nrow=n_col)), f'{self.output_path}/sample.png')
+
+ for j in range(2):
+ img = style_mixing(self.generator, mean_style, n_col, n_row)
+ save_image(tensor2img(make_grid(img, nrow=n_col + 1)), f'{self.output_path}/sample_mixing_{j}.png')
diff --git a/modules/image/Image_gan/gan/styleganv2_editing/model.py b/modules/image/Image_gan/gan/styleganv2_editing/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..ccdadeaa8b125bfd98a86ae5a895d543914d5d9d
--- /dev/null
+++ b/modules/image/Image_gan/gan/styleganv2_editing/model.py
@@ -0,0 +1,58 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import cv2
+import numpy as np
+import paddle
+
+from ppgan.utils.download import get_path_from_url
+from .basemodel import StyleGANv2Predictor
+
+model_cfgs = {
+ 'ffhq-config-f': {
+ 'direction_urls': 'https://paddlegan.bj.bcebos.com/models/stylegan2-ffhq-config-f-directions.pdparams'
+ }
+}
+
+
+def make_image(tensor):
+ return (((tensor.detach() + 1) / 2 * 255).clip(min=0, max=255).transpose((0, 2, 3, 1)).numpy().astype('uint8'))
+
+
+class StyleGANv2EditingPredictor(StyleGANv2Predictor):
+ def __init__(self, model_type=None, direction_path=None, **kwargs):
+ super().__init__(model_type=model_type, **kwargs)
+
+ if direction_path is None and model_type is not None:
+ assert model_type in model_cfgs, f'There is not any pretrained direction file for {model_type} model.'
+ direction_path = get_path_from_url(model_cfgs[model_type]['direction_urls'])
+ self.directions = paddle.load(direction_path)
+
+ @paddle.no_grad()
+ def run(self, latent, direction, offset):
+
+ latent = paddle.to_tensor(latent).unsqueeze(0).astype('float32')
+ direction = self.directions[direction].unsqueeze(0).astype('float32')
+
+ latent_n = paddle.concat([latent, latent + offset * direction], 0)
+ generator = self.generator
+ img_gen, _ = generator([latent_n], input_is_latent=True, randomize_noise=False)
+ imgs = make_image(img_gen)
+ src_img = imgs[0]
+ dst_img = imgs[1]
+
+ dst_latent = (latent + offset * direction)[0].numpy().astype('float32')
+
+ return src_img, dst_img, dst_latent
diff --git a/modules/image/Image_gan/gan/styleganv2_editing/module.py b/modules/image/Image_gan/gan/styleganv2_editing/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..1e90060bd1005f3c91708e2ccd44a34e6132aef3
--- /dev/null
+++ b/modules/image/Image_gan/gan/styleganv2_editing/module.py
@@ -0,0 +1,155 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import argparse
+import copy
+
+import paddle
+import paddlehub as hub
+from paddlehub.module.module import moduleinfo, runnable, serving
+import numpy as np
+import cv2
+from skimage.io import imread
+from skimage.transform import rescale, resize
+
+from .model import StyleGANv2EditingPredictor
+from .util import base64_to_cv2
+
+
+@moduleinfo(
+ name="styleganv2_editing",
+ type="CV/style_transfer",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class styleganv2_editing:
+ def __init__(self):
+ self.pretrained_model = os.path.join(self.directory, "stylegan2-ffhq-config-f-directions.pdparams")
+
+ self.network = StyleGANv2EditingPredictor(direction_path=self.pretrained_model, model_type='ffhq-config-f')
+ self.pixel2style2pixel_module = hub.Module(name='pixel2style2pixel')
+
+ def generate(self,
+ images=None,
+ paths=None,
+ direction_name='age',
+ direction_offset=0.0,
+ output_dir='./editing_result/',
+ use_gpu=False,
+ visualization=True):
+ '''
+
+
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR(read by cv2).
+ paths (list[str]): paths to image.
+ direction_name(str): Attribute to be manipulated,For ffhq-conf-f, we have: age, eyes_open, eye_distance, eye_eyebrow_distance, eye_ratio, gender, lip_ratio, mouth_open, mouth_ratio, nose_mouth_distance, nose_ratio, nose_tip, pitch, roll, smile, yaw.
+ direction_offset(float): Offset strength of the attribute.
+ output_dir: the dir to save the results
+ use_gpu: if True, use gpu to perform the computation, otherwise cpu.
+ visualization: if True, save results in output_dir.
+ '''
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ _, latent = self.pixel2style2pixel_module.network.run(image)
+ out = self.network.run(latent, direction_name, direction_offset)
+ results.append(out)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ _, latent = self.pixel2style2pixel_module.network.run(image)
+ out = self.network.run(latent, direction_name, direction_offset)
+ results.append(out)
+
+ if visualization == True:
+ if not os.path.exists(output_dir):
+ os.makedirs(output_dir, exist_ok=True)
+ for i, out in enumerate(results):
+ if out is not None:
+ cv2.imwrite(os.path.join(output_dir, 'src_{}.png'.format(i)), out[0][:, :, ::-1])
+ cv2.imwrite(os.path.join(output_dir, 'dst_{}.png'.format(i)), out[1][:, :, ::-1])
+ np.save(os.path.join(output_dir, 'dst_{}.npy'.format(i)), out[2])
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.generate(
+ paths=[self.args.input_path],
+ direction_name=self.args.direction_name,
+ direction_offset=self.args.direction_offset,
+ output_dir=self.args.output_dir,
+ use_gpu=self.args.use_gpu,
+ visualization=self.args.visualization)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.generate(images=images_decode, **kwargs)
+ tolist = [result.tolist() for result in results]
+ return tolist
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default='editing_result', help='output directory for saving result.')
+ self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
+ self.arg_input_group.add_argument(
+ '--direction_name',
+ type=str,
+ default='age',
+ help=
+ "Attribute to be manipulated,For ffhq-conf-f, we have: age, eyes_open, eye_distance, eye_eyebrow_distance, eye_ratio, gender, lip_ratio, mouth_open, mouth_ratio, nose_mouth_distance, nose_ratio, nose_tip, pitch, roll, smile, yaw."
+ )
+ self.arg_input_group.add_argument('--direction_offset', type=float, help="Offset strength of the attribute.")
diff --git a/modules/image/Image_gan/gan/styleganv2_editing/requirements.txt b/modules/image/Image_gan/gan/styleganv2_editing/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..67e9bb6fa840355e9ed0d44b7134850f1fe22fe1
--- /dev/null
+++ b/modules/image/Image_gan/gan/styleganv2_editing/requirements.txt
@@ -0,0 +1 @@
+ppgan
diff --git a/modules/image/Image_gan/gan/styleganv2_editing/util.py b/modules/image/Image_gan/gan/styleganv2_editing/util.py
new file mode 100644
index 0000000000000000000000000000000000000000..b88ac3562b74cadc1d4d6459a56097ca4a938a0b
--- /dev/null
+++ b/modules/image/Image_gan/gan/styleganv2_editing/util.py
@@ -0,0 +1,10 @@
+import base64
+import cv2
+import numpy as np
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
diff --git a/modules/image/Image_gan/gan/styleganv2_mixing/README.md b/modules/image/Image_gan/gan/styleganv2_mixing/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..6623f0f6f4d40962b41ef409e736bb230617a913
--- /dev/null
+++ b/modules/image/Image_gan/gan/styleganv2_mixing/README.md
@@ -0,0 +1,143 @@
+# styleganv2_mixing
+
+|模型名称|styleganv2_mixing|
+| :--- | :---: |
+|类别|图像 - 图像生成|
+|网络|StyleGAN V2|
+|数据集|-|
+|是否支持Fine-tuning|否|
+|模型大小|190MB|
+|最新更新日期|2021-12-23|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+ - 样例结果示例:
+
+
+
+ 输入图像1
+
+
+
+ 输入图像2
+
+
+
+ 输出图像
+
+
+
+- ### 模型介绍
+
+ - StyleGAN V2 的任务是使用风格向量进行image generation,而Mixing模块则是利用其风格向量实现两张生成图像不同层次不同比例的混合。
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+ - paddlepaddle >= 2.1.0
+ - paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install styleganv2_mixing
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ # Read from a file
+ $ hub run styleganv2_mixing --image1 "/PATH/TO/IMAGE1" --image2 "/PATH/TO/IMAGE2"
+ ```
+ - 通过命令行方式实现人脸融合模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+
+ module = hub.Module(name="styleganv2_mixing")
+ input_path = ["/PATH/TO/IMAGE"]
+ # Read from a file
+ module.generate(paths=input_path, direction_name = 'age', direction_offset = 5, output_dir='./editing_result/', use_gpu=True)
+ ```
+
+- ### 3、API
+
+ - ```python
+ generate(self, images=None, paths=None, weights = [0.5] * 18, output_dir='./mixing_result/', use_gpu=False, visualization=True)
+ ```
+ - 人脸融合生成API。
+
+ - **参数**
+ - images (list[dict]): data of images, 每一个元素都为一个 dict,有关键字 image1, image2, 相应取值为:
+ - image1 (numpy.ndarray): 待融合的图片1,shape 为 \[H, W, C\],BGR格式;
+ - image2 (numpy.ndarray) : 待融合的图片2,shape为 \[H, W, C\],BGR格式;
+ - paths (list[str]): paths to images, 每一个元素都为一个dict, 有关键字 image1, image2, 相应取值为:
+ - image1 (str): 待融合的图片1的路径;
+ - image2 (str) : 待融合的图片2的路径;
+ - weights (list(float)): 融合的权重
+ - images (list\[numpy.ndarray\]): 图片数据
+ - paths (list\[str\]): 图片路径;
+ - output\_dir (str): 结果保存的路径;
+ - use\_gpu (bool): 是否使用 GPU;
+ - visualization(bool): 是否保存结果到本地文件夹
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线人脸融合服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m styleganv2_mixing
+ ```
+
+ - 这样就完成了一个人脸融合的在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[{'image1': cv2_to_base64(cv2.imread("/PATH/TO/IMAGE1")),'image2': cv2_to_base64(cv2.imread("/PATH/TO/IMAGE2"))}]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/styleganv2_mixing"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install styleganv2_mixing==1.0.0
+ ```
diff --git a/modules/image/Image_gan/gan/styleganv2_mixing/basemodel.py b/modules/image/Image_gan/gan/styleganv2_mixing/basemodel.py
new file mode 100644
index 0000000000000000000000000000000000000000..37eca73d4e14965a1f69e818744aa435a7e3600f
--- /dev/null
+++ b/modules/image/Image_gan/gan/styleganv2_mixing/basemodel.py
@@ -0,0 +1,140 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import random
+import numpy as np
+import paddle
+from ppgan.models.generators import StyleGANv2Generator
+from ppgan.utils.download import get_path_from_url
+from ppgan.utils.visual import make_grid, tensor2img, save_image
+
+model_cfgs = {
+ 'ffhq-config-f': {
+ 'model_urls': 'https://paddlegan.bj.bcebos.com/models/stylegan2-ffhq-config-f.pdparams',
+ 'size': 1024,
+ 'style_dim': 512,
+ 'n_mlp': 8,
+ 'channel_multiplier': 2
+ },
+ 'animeface-512': {
+ 'model_urls': 'https://paddlegan.bj.bcebos.com/models/stylegan2-animeface-512.pdparams',
+ 'size': 512,
+ 'style_dim': 512,
+ 'n_mlp': 8,
+ 'channel_multiplier': 2
+ }
+}
+
+
+@paddle.no_grad()
+def get_mean_style(generator):
+ mean_style = None
+
+ for i in range(10):
+ style = generator.mean_latent(1024)
+
+ if mean_style is None:
+ mean_style = style
+
+ else:
+ mean_style += style
+
+ mean_style /= 10
+ return mean_style
+
+
+@paddle.no_grad()
+def sample(generator, mean_style, n_sample):
+ image = generator(
+ [paddle.randn([n_sample, generator.style_dim])],
+ truncation=0.7,
+ truncation_latent=mean_style,
+ )[0]
+
+ return image
+
+
+@paddle.no_grad()
+def style_mixing(generator, mean_style, n_source, n_target):
+ source_code = paddle.randn([n_source, generator.style_dim])
+ target_code = paddle.randn([n_target, generator.style_dim])
+
+ resolution = 2**((generator.n_latent + 2) // 2)
+
+ images = [paddle.ones([1, 3, resolution, resolution]) * -1]
+
+ source_image = generator([source_code], truncation_latent=mean_style, truncation=0.7)[0]
+ target_image = generator([target_code], truncation_latent=mean_style, truncation=0.7)[0]
+
+ images.append(source_image)
+
+ for i in range(n_target):
+ image = generator(
+ [target_code[i].unsqueeze(0).tile([n_source, 1]), source_code],
+ truncation_latent=mean_style,
+ truncation=0.7,
+ )[0]
+ images.append(target_image[i].unsqueeze(0))
+ images.append(image)
+
+ images = paddle.concat(images, 0)
+
+ return images
+
+
+class StyleGANv2Predictor:
+ def __init__(self,
+ output_path='output_dir',
+ weight_path=None,
+ model_type=None,
+ seed=None,
+ size=1024,
+ style_dim=512,
+ n_mlp=8,
+ channel_multiplier=2):
+ self.output_path = output_path
+
+ if weight_path is None:
+ if model_type in model_cfgs.keys():
+ weight_path = get_path_from_url(model_cfgs[model_type]['model_urls'])
+ size = model_cfgs[model_type].get('size', size)
+ style_dim = model_cfgs[model_type].get('style_dim', style_dim)
+ n_mlp = model_cfgs[model_type].get('n_mlp', n_mlp)
+ channel_multiplier = model_cfgs[model_type].get('channel_multiplier', channel_multiplier)
+ checkpoint = paddle.load(weight_path)
+ else:
+ raise ValueError('Predictor need a weight path or a pretrained model type')
+ else:
+ checkpoint = paddle.load(weight_path)
+
+ self.generator = StyleGANv2Generator(size, style_dim, n_mlp, channel_multiplier)
+ self.generator.set_state_dict(checkpoint)
+ self.generator.eval()
+
+ if seed is not None:
+ paddle.seed(seed)
+ random.seed(seed)
+ np.random.seed(seed)
+
+ def run(self, n_row=3, n_col=5):
+ os.makedirs(self.output_path, exist_ok=True)
+ mean_style = get_mean_style(self.generator)
+
+ img = sample(self.generator, mean_style, n_row * n_col)
+ save_image(tensor2img(make_grid(img, nrow=n_col)), f'{self.output_path}/sample.png')
+
+ for j in range(2):
+ img = style_mixing(self.generator, mean_style, n_col, n_row)
+ save_image(tensor2img(make_grid(img, nrow=n_col + 1)), f'{self.output_path}/sample_mixing_{j}.png')
diff --git a/modules/image/Image_gan/gan/styleganv2_mixing/model.py b/modules/image/Image_gan/gan/styleganv2_mixing/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..5e2287df0c7bb22854e56a023f2278dd7981360c
--- /dev/null
+++ b/modules/image/Image_gan/gan/styleganv2_mixing/model.py
@@ -0,0 +1,47 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import cv2
+import numpy as np
+import paddle
+
+from .basemodel import StyleGANv2Predictor
+
+
+def make_image(tensor):
+ return (((tensor.detach() + 1) / 2 * 255).clip(min=0, max=255).transpose((0, 2, 3, 1)).numpy().astype('uint8'))
+
+
+class StyleGANv2MixingPredictor(StyleGANv2Predictor):
+ @paddle.no_grad()
+ def run(self, latent1, latent2, weights=[0.5] * 18):
+
+ latent1 = paddle.to_tensor(latent1).unsqueeze(0)
+ latent2 = paddle.to_tensor(latent2).unsqueeze(0)
+ assert latent1.shape[1] == latent2.shape[1] == len(
+ weights), 'latents and their weights should have the same level nums.'
+ mix_latent = []
+ for i, weight in enumerate(weights):
+ mix_latent.append(latent1[:, i:i + 1] * weight + latent2[:, i:i + 1] * (1 - weight))
+ mix_latent = paddle.concat(mix_latent, 1)
+ latent_n = paddle.concat([latent1, latent2, mix_latent], 0)
+ generator = self.generator
+ img_gen, _ = generator([latent_n], input_is_latent=True, randomize_noise=False)
+ imgs = make_image(img_gen)
+ src_img1 = imgs[0]
+ src_img2 = imgs[1]
+ dst_img = imgs[2]
+
+ return src_img1, src_img2, dst_img
diff --git a/modules/image/Image_gan/gan/styleganv2_mixing/module.py b/modules/image/Image_gan/gan/styleganv2_mixing/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..fbc10091c3ef86676f520c20b2d1704294c36fe1
--- /dev/null
+++ b/modules/image/Image_gan/gan/styleganv2_mixing/module.py
@@ -0,0 +1,161 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import argparse
+import copy
+
+import paddle
+import paddlehub as hub
+from paddlehub.module.module import moduleinfo, runnable, serving
+import numpy as np
+import cv2
+from skimage.io import imread
+from skimage.transform import rescale, resize
+
+from .model import StyleGANv2MixingPredictor
+from .util import base64_to_cv2
+
+
+@moduleinfo(
+ name="styleganv2_mixing",
+ type="CV/style_transfer",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class styleganv2_mixing:
+ def __init__(self):
+ self.pretrained_model = os.path.join(self.directory, "stylegan2-ffhq-config-f.pdparams")
+ self.network = StyleGANv2MixingPredictor(weight_path=self.pretrained_model, model_type='ffhq-config-f')
+ self.pixel2style2pixel_module = hub.Module(name='pixel2style2pixel')
+
+ def generate(self,
+ images=None,
+ paths=None,
+ weights=[0.5] * 18,
+ output_dir='./mixing_result/',
+ use_gpu=False,
+ visualization=True):
+ '''
+ images (list[dict]): data of images, each element is a dict,the keys are as below:
+ - image1 (numpy.ndarray): image1 to be mixed,shape is \[H, W, C\],BGR format;
+ - image2 (numpy.ndarray) : image2 to be mixed,shape is \[H, W, C\],BGR format;
+ paths (list[str]): paths to images, each element is a dict,the keys are as below:
+ - image1 (str): path to image1;
+ - image2 (str) : path to image2;
+ weights (list(float)): weight for mixing
+ output_dir: the dir to save the results
+ use_gpu: if True, use gpu to perform the computation, otherwise cpu.
+ visualization: if True, save results in output_dir.
+ '''
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+ if images != None:
+ for image_dict in images:
+ image1 = image_dict['image1'][:, :, ::-1]
+ image2 = image_dict['image2'][:, :, ::-1]
+ _, latent1 = self.pixel2style2pixel_module.network.run(image1)
+ _, latent2 = self.pixel2style2pixel_module.network.run(image2)
+ results.append(self.network.run(latent1, latent2, weights))
+
+ if paths != None:
+ for path_dict in paths:
+ path1 = path_dict['image1']
+ path2 = path_dict['image2']
+ image1 = cv2.imread(path1)[:, :, ::-1]
+ image2 = cv2.imread(path2)[:, :, ::-1]
+ _, latent1 = self.pixel2style2pixel_module.network.run(image1)
+ _, latent2 = self.pixel2style2pixel_module.network.run(image2)
+ results.append(self.network.run(latent1, latent2, weights))
+
+ if visualization == True:
+ if not os.path.exists(output_dir):
+ os.makedirs(output_dir, exist_ok=True)
+ for i, out in enumerate(results):
+ if out is not None:
+ cv2.imwrite(os.path.join(output_dir, 'src_{}_image1.png'.format(i)), out[0][:, :, ::-1])
+ cv2.imwrite(os.path.join(output_dir, 'src_{}_image2.png'.format(i)), out[1][:, :, ::-1])
+ cv2.imwrite(os.path.join(output_dir, 'dst_{}.png'.format(i)), out[2][:, :, ::-1])
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.generate(
+ paths=[{
+ 'image1': self.args.image1,
+ 'image2': self.args.image2
+ }],
+ weights=self.args.weights,
+ output_dir=self.args.output_dir,
+ use_gpu=self.args.use_gpu,
+ visualization=self.args.visualization)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = copy.deepcopy(images)
+ for image in images_decode:
+ image['image1'] = base64_to_cv2(image['image1'])
+ image['image2'] = base64_to_cv2(image['image2'])
+ results = self.generate(images_decode, **kwargs)
+ tolist = [result.tolist() for result in results]
+ return tolist
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default='mixing_result', help='output directory for saving result.')
+ self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--image1', type=str, help="path to input image1.")
+ self.arg_input_group.add_argument('--image2', type=str, help="path to input image2.")
+ self.arg_input_group.add_argument(
+ "--weights",
+ type=float,
+ nargs="+",
+ default=[0.5] * 18,
+ help="different weights at each level of two latent codes")
diff --git a/modules/image/Image_gan/gan/styleganv2_mixing/requirements.txt b/modules/image/Image_gan/gan/styleganv2_mixing/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..67e9bb6fa840355e9ed0d44b7134850f1fe22fe1
--- /dev/null
+++ b/modules/image/Image_gan/gan/styleganv2_mixing/requirements.txt
@@ -0,0 +1 @@
+ppgan
diff --git a/modules/image/Image_gan/gan/styleganv2_mixing/util.py b/modules/image/Image_gan/gan/styleganv2_mixing/util.py
new file mode 100644
index 0000000000000000000000000000000000000000..b88ac3562b74cadc1d4d6459a56097ca4a938a0b
--- /dev/null
+++ b/modules/image/Image_gan/gan/styleganv2_mixing/util.py
@@ -0,0 +1,10 @@
+import base64
+import cv2
+import numpy as np
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
diff --git a/modules/image/Image_gan/gan/wav2lip/README.md b/modules/image/Image_gan/gan/wav2lip/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..5305725a65bb12a8d4cf4c0f18c655b4c07c2841
--- /dev/null
+++ b/modules/image/Image_gan/gan/wav2lip/README.md
@@ -0,0 +1,94 @@
+# wav2lip
+
+|模型名称|wav2lip|
+| :--- | :---: |
+|类别|图像 - 视频生成|
+|网络|Wav2Lip|
+|数据集|LRS2|
+|是否支持Fine-tuning|否|
+|模型大小|139MB|
+|最新更新日期|2021-12-14|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+ - 样例结果示例:
+
+
+
+ 输入图像
+
+
+
+ 输出视频
+
+
+
+
+- ### 模型介绍
+
+ - Wav2Lip实现的是视频人物根据输入音频生成与语音同步的人物唇形,使得生成的视频人物口型与输入语音同步。Wav2Lip不仅可以基于静态图像来输出与目标语音匹配的唇形同步视频,还可以直接将动态的视频进行唇形转换,输出与目标语音匹配的视频。Wav2Lip实现唇形与语音精准同步突破的关键在于,它采用了唇形同步判别器,以强制生成器持续产生准确而逼真的唇部运动。此外,它通过在鉴别器中使用多个连续帧而不是单个帧,并使用视觉质量损失(而不仅仅是对比损失)来考虑时间相关性,从而改善了视觉质量。Wav2Lip适用于任何人脸、任何语言,对任意视频都能达到很高都准确率,可以无缝地与原始视频融合,还可以用于转换动画人脸。
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+ - ffmpeg
+ - libsndfile
+- ### 2、安装
+
+ - ```shell
+ $ hub install wav2lip
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ # Read from a file
+ $ hub run wav2lip --face "/PATH/TO/VIDEO or IMAGE" --audio "/PATH/TO/AUDIO"
+ ```
+ - 通过命令行方式人物唇形生成模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+
+ module = hub.Module(name="wav2lip")
+ face_input_path = "/PATH/TO/VIDEO or IMAGE"
+ audio_input_path = "/PATH/TO/AUDIO"
+ module.wav2lip_transfer(face=face_input_path, audio=audio_input_path, output_dir='./transfer_result/', use_gpu=True)
+ ```
+
+- ### 3、API
+
+ - ```python
+ def wav2lip_transfer(face, audio, output_dir ='./output_result/', use_gpu=False, visualization=True):
+ ```
+ - 人脸唇形生成API。
+
+ - **参数**
+
+ - face (str): 视频或图像文件的路径
+ - audio (str): 音频文件的路径
+ - output\_dir (str): 结果保存的路径;
+ - use\_gpu (bool): 是否使用 GPU;
+ - visualization(bool): 是否保存结果到本地文件夹
+
+
+## 四、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install wav2lip==1.0.0
+ ```
diff --git a/modules/image/Image_gan/gan/wav2lip/model.py b/modules/image/Image_gan/gan/wav2lip/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..3fa32ed9c384e74cf569ef0daa09215539355d8e
--- /dev/null
+++ b/modules/image/Image_gan/gan/wav2lip/model.py
@@ -0,0 +1,259 @@
+from os import listdir, path, makedirs
+import platform
+import numpy as np
+import scipy, cv2, os, sys, argparse
+import json, subprocess, random, string
+from tqdm import tqdm
+from glob import glob
+import paddle
+from paddle.utils.download import get_weights_path_from_url
+from ppgan.faceutils import face_detection
+from ppgan.utils import audio
+from ppgan.models.generators.wav2lip import Wav2Lip
+
+WAV2LIP_WEIGHT_URL = 'https://paddlegan.bj.bcebos.com/models/wav2lip_hq.pdparams'
+mel_step_size = 16
+
+
+class Wav2LipPredictor:
+ def __init__(self,
+ checkpoint_path=None,
+ static=False,
+ fps=25,
+ pads=[0, 10, 0, 0],
+ face_det_batch_size=16,
+ wav2lip_batch_size=128,
+ resize_factor=1,
+ crop=[0, -1, 0, -1],
+ box=[-1, -1, -1, -1],
+ rotate=False,
+ nosmooth=False,
+ face_detector='sfd',
+ face_enhancement=False):
+ self.img_size = 96
+ self.checkpoint_path = checkpoint_path
+ self.static = static
+ self.fps = fps
+ self.pads = pads
+ self.face_det_batch_size = face_det_batch_size
+ self.wav2lip_batch_size = wav2lip_batch_size
+ self.resize_factor = resize_factor
+ self.crop = crop
+ self.box = box
+ self.rotate = rotate
+ self.nosmooth = nosmooth
+ self.face_detector = face_detector
+ self.face_enhancement = face_enhancement
+ if face_enhancement:
+ from ppgan.faceutils.face_enhancement import FaceEnhancement
+ self.faceenhancer = FaceEnhancement()
+ makedirs('./temp', exist_ok=True)
+
+ def get_smoothened_boxes(self, boxes, T):
+ for i in range(len(boxes)):
+ if i + T > len(boxes):
+ window = boxes[len(boxes) - T:]
+ else:
+ window = boxes[i:i + T]
+ boxes[i] = np.mean(window, axis=0)
+ return boxes
+
+ def face_detect(self, images):
+ detector = face_detection.FaceAlignment(
+ face_detection.LandmarksType._2D, flip_input=False, face_detector=self.face_detector)
+
+ batch_size = self.face_det_batch_size
+
+ while 1:
+ predictions = []
+ try:
+ for i in tqdm(range(0, len(images), batch_size)):
+ predictions.extend(detector.get_detections_for_batch(np.array(images[i:i + batch_size])))
+ except RuntimeError:
+ if batch_size == 1:
+ raise RuntimeError(
+ 'Image too big to run face detection on GPU. Please use the --resize_factor argument')
+ batch_size //= 2
+ print('Recovering from OOM error; New batch size: {}'.format(batch_size))
+ continue
+ break
+
+ results = []
+ pady1, pady2, padx1, padx2 = self.pads
+ for rect, image in zip(predictions, images):
+ if rect is None:
+ cv2.imwrite('temp/faulty_frame.jpg', image) # check this frame where the face was not detected.
+ raise ValueError('Face not detected! Ensure the video contains a face in all the frames.')
+
+ y1 = max(0, rect[1] - pady1)
+ y2 = min(image.shape[0], rect[3] + pady2)
+ x1 = max(0, rect[0] - padx1)
+ x2 = min(image.shape[1], rect[2] + padx2)
+
+ results.append([x1, y1, x2, y2])
+
+ boxes = np.array(results)
+ if not self.nosmooth: boxes = self.get_smoothened_boxes(boxes, T=5)
+ results = [[image[y1:y2, x1:x2], (y1, y2, x1, x2)] for image, (x1, y1, x2, y2) in zip(images, boxes)]
+
+ del detector
+ return results
+
+ def datagen(self, frames, mels):
+ img_batch, mel_batch, frame_batch, coords_batch = [], [], [], []
+
+ if self.box[0] == -1:
+ if not self.static:
+ face_det_results = self.face_detect(frames) # BGR2RGB for CNN face detection
+ else:
+ face_det_results = self.face_detect([frames[0]])
+ else:
+ print('Using the specified bounding box instead of face detection...')
+ y1, y2, x1, x2 = self.box
+ face_det_results = [[f[y1:y2, x1:x2], (y1, y2, x1, x2)] for f in frames]
+
+ for i, m in enumerate(mels):
+ idx = 0 if self.static else i % len(frames)
+ frame_to_save = frames[idx].copy()
+ face, coords = face_det_results[idx].copy()
+
+ face = cv2.resize(face, (self.img_size, self.img_size))
+
+ img_batch.append(face)
+ mel_batch.append(m)
+ frame_batch.append(frame_to_save)
+ coords_batch.append(coords)
+
+ if len(img_batch) >= self.wav2lip_batch_size:
+ img_batch, mel_batch = np.asarray(img_batch), np.asarray(mel_batch)
+
+ img_masked = img_batch.copy()
+ img_masked[:, self.img_size // 2:] = 0
+
+ img_batch = np.concatenate((img_masked, img_batch), axis=3) / 255.
+ mel_batch = np.reshape(mel_batch, [len(mel_batch), mel_batch.shape[1], mel_batch.shape[2], 1])
+
+ yield img_batch, mel_batch, frame_batch, coords_batch
+ img_batch, mel_batch, frame_batch, coords_batch = [], [], [], []
+
+ if len(img_batch) > 0:
+ img_batch, mel_batch = np.asarray(img_batch), np.asarray(mel_batch)
+
+ img_masked = img_batch.copy()
+ img_masked[:, self.img_size // 2:] = 0
+
+ img_batch = np.concatenate((img_masked, img_batch), axis=3) / 255.
+ mel_batch = np.reshape(mel_batch, [len(mel_batch), mel_batch.shape[1], mel_batch.shape[2], 1])
+
+ yield img_batch, mel_batch, frame_batch, coords_batch
+
+ def run(self, face, audio_seq, output_dir, visualization=True):
+ if os.path.isfile(face) and path.basename(face).split('.')[1] in ['jpg', 'png', 'jpeg']:
+ self.static = True
+
+ if not os.path.isfile(face):
+ raise ValueError('--face argument must be a valid path to video/image file')
+
+ elif path.basename(face).split('.')[1] in ['jpg', 'png', 'jpeg']:
+ full_frames = [cv2.imread(face)]
+ fps = self.fps
+
+ else:
+ video_stream = cv2.VideoCapture(face)
+ fps = video_stream.get(cv2.CAP_PROP_FPS)
+
+ print('Reading video frames...')
+
+ full_frames = []
+ while 1:
+ still_reading, frame = video_stream.read()
+ if not still_reading:
+ video_stream.release()
+ break
+ if self.resize_factor > 1:
+ frame = cv2.resize(frame,
+ (frame.shape[1] // self.resize_factor, frame.shape[0] // self.resize_factor))
+
+ if self.rotate:
+ frame = cv2.rotate(frame, cv2.cv2.ROTATE_90_CLOCKWISE)
+
+ y1, y2, x1, x2 = self.crop
+ if x2 == -1: x2 = frame.shape[1]
+ if y2 == -1: y2 = frame.shape[0]
+
+ frame = frame[y1:y2, x1:x2]
+
+ full_frames.append(frame)
+
+ print("Number of frames available for inference: " + str(len(full_frames)))
+
+ if not audio_seq.endswith('.wav'):
+ print('Extracting raw audio...')
+ command = 'ffmpeg -y -i {} -strict -2 {}'.format(audio_seq, 'temp/temp.wav')
+
+ subprocess.call(command, shell=True)
+ audio_seq = 'temp/temp.wav'
+
+ wav = audio.load_wav(audio_seq, 16000)
+ mel = audio.melspectrogram(wav)
+ if np.isnan(mel.reshape(-1)).sum() > 0:
+ raise ValueError(
+ 'Mel contains nan! Using a TTS voice? Add a small epsilon noise to the wav file and try again')
+
+ mel_chunks = []
+ mel_idx_multiplier = 80. / fps
+ i = 0
+ while 1:
+ start_idx = int(i * mel_idx_multiplier)
+ if start_idx + mel_step_size > len(mel[0]):
+ mel_chunks.append(mel[:, len(mel[0]) - mel_step_size:])
+ break
+ mel_chunks.append(mel[:, start_idx:start_idx + mel_step_size])
+ i += 1
+
+ print("Length of mel chunks: {}".format(len(mel_chunks)))
+
+ full_frames = full_frames[:len(mel_chunks)]
+
+ batch_size = self.wav2lip_batch_size
+ gen = self.datagen(full_frames.copy(), mel_chunks)
+
+ model = Wav2Lip()
+ if self.checkpoint_path is None:
+ model_weights_path = get_weights_path_from_url(WAV2LIP_WEIGHT_URL)
+ weights = paddle.load(model_weights_path)
+ else:
+ weights = paddle.load(self.checkpoint_path)
+ model.load_dict(weights)
+ model.eval()
+ print("Model loaded")
+ for i, (img_batch, mel_batch, frames, coords) in enumerate(
+ tqdm(gen, total=int(np.ceil(float(len(mel_chunks)) / batch_size)))):
+ if i == 0:
+
+ frame_h, frame_w = full_frames[0].shape[:-1]
+ out = cv2.VideoWriter('temp/result.avi', cv2.VideoWriter_fourcc(*'DIVX'), fps, (frame_w, frame_h))
+
+ img_batch = paddle.to_tensor(np.transpose(img_batch, (0, 3, 1, 2))).astype('float32')
+ mel_batch = paddle.to_tensor(np.transpose(mel_batch, (0, 3, 1, 2))).astype('float32')
+
+ with paddle.no_grad():
+ pred = model(mel_batch, img_batch)
+
+ pred = pred.numpy().transpose(0, 2, 3, 1) * 255.
+
+ for p, f, c in zip(pred, frames, coords):
+ y1, y2, x1, x2 = c
+ if self.face_enhancement:
+ p = self.faceenhancer.enhance_from_image(p)
+ p = cv2.resize(p.astype(np.uint8), (x2 - x1, y2 - y1))
+
+ f[y1:y2, x1:x2] = p
+ out.write(f)
+
+ out.release()
+ os.makedirs(output_dir, exist_ok=True)
+ if visualization:
+ command = 'ffmpeg -y -i {} -i {} -strict -2 -q:v 1 {}'.format(audio_seq, 'temp/result.avi',
+ os.path.join(output_dir, 'result.avi'))
+ subprocess.call(command, shell=platform.system() != 'Windows')
diff --git a/modules/image/Image_gan/gan/wav2lip/module.py b/modules/image/Image_gan/gan/wav2lip/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..f16191d8984e33f38246e7985a8bb3f7f2aa74b0
--- /dev/null
+++ b/modules/image/Image_gan/gan/wav2lip/module.py
@@ -0,0 +1,101 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import argparse
+import copy
+
+import paddle
+import paddlehub as hub
+from paddlehub.module.module import moduleinfo, runnable, serving
+import numpy as np
+import cv2
+
+from .model import Wav2LipPredictor
+
+
+@moduleinfo(name="wav2lip", type="CV/generation", author="paddlepaddle", author_email="", summary="", version="1.0.0")
+class wav2lip:
+ def __init__(self):
+ self.pretrained_model = os.path.join(self.directory, "wav2lip_hq.pdparams")
+
+ self.network = Wav2LipPredictor(
+ checkpoint_path=self.pretrained_model,
+ static=False,
+ fps=25,
+ pads=[0, 10, 0, 0],
+ face_det_batch_size=16,
+ wav2lip_batch_size=128,
+ resize_factor=1,
+ crop=[0, -1, 0, -1],
+ box=[-1, -1, -1, -1],
+ rotate=False,
+ nosmooth=False,
+ face_detector='sfd',
+ face_enhancement=True)
+
+ def wav2lip_transfer(self, face, audio, output_dir='./output_result/', use_gpu=False, visualization=True):
+ '''
+ face (str): path to video/image that contains faces to use.
+ audio (str): path to input audio.
+ output_dir: the dir to save the results
+ use_gpu: if True, use gpu to perform the computation, otherwise cpu.
+ visualization: if True, save results in output_dir.
+ '''
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ self.network.run(face, audio, output_dir, visualization)
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ self.wav2lip_transfer(
+ face=self.args.face,
+ audio=self.args.audio,
+ output_dir=self.args.output_dir,
+ use_gpu=self.args.use_gpu,
+ visualization=self.args.visualization)
+ return
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default='output_result', help='output directory for saving result.')
+ self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--audio', type=str, help="path to input audio.")
+ self.arg_input_group.add_argument('--face', type=str, help="path to video/image that contains faces to use.")
diff --git a/modules/image/Image_gan/gan/wav2lip/requirements.txt b/modules/image/Image_gan/gan/wav2lip/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..67e9bb6fa840355e9ed0d44b7134850f1fe22fe1
--- /dev/null
+++ b/modules/image/Image_gan/gan/wav2lip/requirements.txt
@@ -0,0 +1 @@
+ppgan
diff --git a/modules/image/Image_gan/stargan_celeba/README_en.md b/modules/image/Image_gan/stargan_celeba/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..a79a091aa017e1caafae05d142bd48b29cf61aa1
--- /dev/null
+++ b/modules/image/Image_gan/stargan_celeba/README_en.md
@@ -0,0 +1,101 @@
+# stargan_celeba
+
+|Module Name|stargan_celeba|
+| :--- | :---: |
+|Category|image generation|
+|Network|STGAN|
+|Dataset|Celeba|
+|Fine-tuning supported or not|No|
+|Module Size |33MB|
+|Latest update date|2021-02-26|
+|Data indicators|-|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+ ![](https://user-images.githubusercontent.com/35907364/137855887-f0abca76-2735-4275-b7ad-242decf31bb3.PNG)
+ The image attributes are: origial image, Black_Hair, Blond_Hair, Brown_Hair, Male, Aged
+
+
+
+- ### Module Introduction
+
+ - STGAN takes the original attribute and the target attribute as input, and proposes STUs (Selective transfer units) to select and modify features of the encoder. The PaddleHub Module is trained one Celeba dataset and currently supports attributes of "Black_Hair", "Blond_Hair", "Brown_Hair", "Female", "Male", "Aged".
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 1.5.2
+
+ - paddlehub >= 1.0.0 | [How to install PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install stargan_celeba==1.0.0
+ ```
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run stargan_celeba --image "/PATH/TO/IMAGE" --style "target_attribute"
+ ```
+
+ - **Parameters**
+
+ - image: image path
+
+ - style: Specify the attributes to be converted. The options are "Black_Hair", "Blond_Hair", "Brown_Hair", "Female", "Male", "Aged". You can choose one of the options.
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+
+ stargan = hub.Module(name="stargan_celeba")
+ test_img_path = ["/PATH/TO/IMAGE"]
+ trans_attr = ["Blond_Hair"]
+
+ # set input dict
+ input_dict = {"image": test_img_path, "style": trans_attr}
+
+ # execute predict and print the result
+ results = stargan.generate(data=input_dict)
+ print(results)
+ ```
+
+- ### 3、API
+
+ - ```python
+ def generate(data)
+ ```
+
+ - Style transfer API.
+
+ - **Parameter**
+
+ - data(list[dict]): each element in the list is dict and each field is:
+ - image (list\[str\]): Each element in the list is the path of the image to be converted.
+ - style (list\[str\]): Each element in the list is a string, fill in the face attributes to be converted.
+
+ - **Return**
+ - res (list\[str\]): Save path of the result.
+
+## IV. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/Image_gan/stgan_celeba/README_en.md b/modules/image/Image_gan/stgan_celeba/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..c48718c792d5678a2a1980d9502e8f813e896ed7
--- /dev/null
+++ b/modules/image/Image_gan/stgan_celeba/README_en.md
@@ -0,0 +1,105 @@
+# stgan_celeba
+
+|Module Name|stgan_celeba|
+| :--- | :---: |
+|Category|image generation|
+|Network|STGAN|
+|Dataset|Celeba|
+|Fine-tuning supported or not|No|
+|Module Size |287MB|
+|Latest update date|2021-02-26|
+|Data indicators|-|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+ ![](https://user-images.githubusercontent.com/35907364/137856070-2a43facd-cda0-473f-8935-e61f5dd583d8.JPG)
+ The image attributes are: original image, Bald, Bangs, Black_Hair, Blond_Hair, Brown_Hair, Bushy_Eyebrows, Eyeglasses, Gender, Mouth_Slightly_Open, Mustache, No_Beard, Pale_Skin, Aged
+
+
+
+- ### Module Introduction
+
+ - STGAN takes the original attribute and the target attribute as input, and proposes STUs (Selective transfer units) to select and modify features of the encoder. The PaddleHub Module is trained one Celeba dataset and currently supports attributes of "Bald", "Bangs", "Black_Hair", "Blond_Hair", "Brown_Hair", "Bushy_Eyebrows", "Eyeglasses", "Gender", "Mouth_Slightly_Open", "Mustache", "No_Beard", "Pale_Skin", "Aged".
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 1.5.2
+
+ - paddlehub >= 1.0.0 | [How to install PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install stgan_celeba==1.0.0
+ ```
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run stgan_celeba --image "/PATH/TO/IMAGE" --info "original_attributes" --style "target_attribute"
+ ```
+ - **Parameters**
+
+ - image: Image path
+
+ - info: Attributes of original image, must fill in gender( "Male" or "Female").The options are "Bald", "Bangs", "Black_Hair", "Blond_Hair", "Brown_Hair", "Bushy_Eyebrows", "Eyeglasses", "Mouth_Slightly_Open", "Mustache", "No_Beard", "Pale_Skin", "Aged". For example, the input picture is a girl with black hair, then fill in as "Female,Black_Hair".
+
+ - style: Specify the attributes to be converted. The options are "Bald", "Bangs", "Black_Hair", "Blond_Hair", "Brown_Hair", "Bushy_Eyebrows", "Eyeglasses", "Gender", "Mouth_Slightly_Open", "Mustache", "No_Beard", "Pale_Skin", "Aged". You can choose one of the options.
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+
+ stgan = hub.Module(name="stgan_celeba")
+
+ test_img_path = ["/PATH/TO/IMAGE"]
+ org_info = ["Female,Black_Hair"]
+ trans_attr = ["Bangs"]
+
+ # set input dict
+ input_dict = {"image": test_img_path, "style": trans_attr, "info": org_info}
+
+ # execute predict and print the result
+ results = stgan.generate(data=input_dict)
+ print(results)
+ ```
+
+- ### 3、API
+
+ - ```python
+ def generate(data)
+ ```
+
+ - Style transfer API.
+
+ - **Parameter**
+
+ - data(list[dict]): Each element in the list is dict and each field is:
+ - image (list\[str\]): Each element in the list is the path of the image to be converted.
+ - style (list\[str\]): Each element in the list is a string, fill in the face attributes to be converted.
+ - info (list\[str\]): Represents the face attributes of the original image. Different attributes are separated by commas.
+
+
+ - **Return**
+ - res (list\[str\]): Save path of the result.
+
+## IV. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/Image_gan/style_transfer/ID_Photo_GEN/README_en.md b/modules/image/Image_gan/style_transfer/ID_Photo_GEN/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..ba06c5e7bcf9b7b2291c48958756b84f4cc3234d
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/ID_Photo_GEN/README_en.md
@@ -0,0 +1,98 @@
+# ID_Photo_GEN
+
+|Module Name |ID_Photo_GEN|
+| :--- | :---: |
+|Category|Image generation|
+|Network|HRNet_W18|
+|Dataset |-|
+|Fine-tuning supported or not |No|
+|Module Size|28KB|
+|Latest update date|2021-02-26|
+|Data indicators|-|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - This model is based on face_landmark_localization and FCN_HRNet_W18_Face_Seg. It can generate ID photos with white, red and blue background
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ID_Photo_GEN
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ - ```python
+ import cv2
+ import paddlehub as hub
+
+ model = hub.Module(name='ID_Photo_GEN')
+
+ result = model.Photo_GEN(
+ images=[cv2.imread('/PATH/TO/IMAGE')],
+ paths=None,
+ batch_size=1,
+ output_dir='output',
+ visualization=True,
+ use_gpu=False)
+ ```
+
+- ### 2、API
+
+ - ```python
+ def Photo_GEN(
+ images=None,
+ paths=None,
+ batch_size=1,
+ output_dir='output',
+ visualization=False,
+ use_gpu=False):
+ ```
+
+ - Prediction API, generating ID photos.
+
+ - **Parameter**
+ * images (list[np.ndarray]): Image data, ndarray.shape is in the format [H, W, C], BGR.
+ * paths (list[str]): Image path
+ * batch_size (int): Batch size
+ * output_dir (str): Save path of images, output by default.
+ * visualization (bool): Whether to save the recognition results as picture files.
+ * use_gpu (bool): Use GPU or not. **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**
+
+ **NOTE:** Choose one of `paths` and `images` to provide input data.
+
+ - **Return**
+
+ * results (list[dict{"write":np.ndarray,"blue":np.ndarray,"red":np.ndarray}]): The list of generation results.
+
+
+## IV. Release Note
+
+- 1.0.0
+
+ First release
\ No newline at end of file
diff --git a/modules/image/Image_gan/style_transfer/U2Net_Portrait/README.md b/modules/image/Image_gan/style_transfer/U2Net_Portrait/README.md
index e2724618d42095ebf05e410ece2ced9b06c831d6..4175ec598c6e02a65a744a9b26dd7c00aa2efd43 100644
--- a/modules/image/Image_gan/style_transfer/U2Net_Portrait/README.md
+++ b/modules/image/Image_gan/style_transfer/U2Net_Portrait/README.md
@@ -50,16 +50,16 @@
## 三、模型API预测
-- ### 1、代码示例
+- ### 1、预测代码示例
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="U2Net_Portrait")
- result = model.Cartoon_GEN(images=[cv2.imread('/PATH/TO/IMAGE')])
+ result = model.Portrait_GEN(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
- # result = model.Cartoon_GEN(paths=['/PATH/TO/IMAGE'])
+ # result = model.Portrait_GEN(paths=['/PATH/TO/IMAGE'])
```
- ### 2、API
diff --git a/modules/image/Image_gan/style_transfer/UGATIT_83w/README_en.md b/modules/image/Image_gan/style_transfer/UGATIT_83w/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..b4afce178b13910e0de350bf2fd20c1532aad355
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/UGATIT_83w/README_en.md
@@ -0,0 +1,134 @@
+# UGATIT_83w
+
+|Module Name|UGATIT_83w|
+| :--- | :---: |
+|Category|Image editing|
+|Network |U-GAT-IT|
+|Dataset|selfie2anime|
+|Fine-tuning supported or not|No|
+|Module Size|41MB|
+|Latest update date |2021-02-26|
+|Data indicators|-|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+
+
+- ### Module Introduction
+
+ - UGATIT can transfer the input face image into the anime style.
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 1.8.2
+
+ - paddlehub >= 1.8.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install UGATIT_83w
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ - ```python
+ import cv2
+ import paddlehub as hub
+
+ model = hub.Module(name='UGATIT_83w', use_gpu=False)
+ result = model.style_transfer(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = model.style_transfer(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 2、API
+
+ - ```python
+ def style_transfer(
+ self,
+ images=None,
+ paths=None,
+ batch_size=1,
+ output_dir='output',
+ visualization=False
+ )
+ ```
+
+ - Style transfer API, convert the input face image into anime style.
+
+ - **Parameters**
+ * images (list\[numpy.ndarray\]): Image data, ndarray.shape is in the format [H, W, C], BGR.
+ * paths (list\[str\]): image path,default is None;
+ * batch\_size (int): Batch size, default is 1;
+ * visualization (bool): Whether to save the recognition results as picture files, default is False.
+ * output\_dir (str): Save path of images, `output` by default.
+
+ **NOTE:** Choose one of `paths` and `images` to provide data.
+
+ - **Return**
+
+ - res (list\[numpy.ndarray\]): Result, ndarray.shape is in the format [H, W, C].
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of Style transfer task.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m UGATIT_83w
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ # Send an HTTP request
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/UGATIT_83w"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # print prediction results
+ print(r.json()["results"])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
\ No newline at end of file
diff --git a/modules/image/Image_gan/style_transfer/UGATIT_92w/README_en.md b/modules/image/Image_gan/style_transfer/UGATIT_92w/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..ef7a22a493e58d3745b383b60e0dccc44ae1fdf9
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/UGATIT_92w/README_en.md
@@ -0,0 +1,134 @@
+# UGATIT_92w
+
+|Module Name|UGATIT_92w|
+| :--- | :---: |
+|Category|Image editing|
+|Network |U-GAT-IT|
+|Dataset|selfie2anime|
+|Fine-tuning supported or not|No|
+|Module Size|41MB|
+|Latest update date |2021-02-26|
+|Data indicators|-|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+
+
+- ### Module Introduction
+
+ - UGATIT can transfer the input face image into the anime style.
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 1.8.2
+
+ - paddlehub >= 1.8.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install UGATIT_92w
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ - ```python
+ import cv2
+ import paddlehub as hub
+
+ model = hub.Module(name='UGATIT_92w', use_gpu=False)
+ result = model.style_transfer(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = model.style_transfer(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 2、API
+
+ - ```python
+ def style_transfer(
+ self,
+ images=None,
+ paths=None,
+ batch_size=1,
+ output_dir='output',
+ visualization=False
+ )
+ ```
+
+ - Style transfer API, convert the input face image into anime style.
+
+ - **Parameters**
+ * images (list\[numpy.ndarray\]): Image data, ndarray.shape is in the format [H, W, C], BGR.
+ * paths (list\[str\]): Image path,default is None;
+ * batch\_size (int): Batch size, default is 1;
+ * visualization (bool): Whether to save the recognition results as picture files, default is False.
+ * output\_dir (str): save path of images, `output` by default.
+
+ **NOTE:** Choose one of `paths` and `images` to provide input data.
+
+ - **Return**
+
+ - res (list\[numpy.ndarray\]): Style tranfer result, ndarray.shape is in the format [H, W, C].
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of Style transfer task.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m UGATIT_92w
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ # Send an HTTP request
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/UGATIT_92w"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # print prediction results
+ print(r.json()["results"])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
\ No newline at end of file
diff --git a/modules/image/Image_gan/style_transfer/animegan_v2_paprika_54/README_en.md b/modules/image/Image_gan/style_transfer/animegan_v2_paprika_54/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..77d724986f3442b9ba35ef789381a338c6208c28
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/animegan_v2_paprika_54/README_en.md
@@ -0,0 +1,151 @@
+# animegan_v2_paprika_54
+
+|Module Name |animegan_v2_paprika_54|
+| :--- | :---: |
+|Category |Image generation|
+|Network|AnimeGAN|
+|Dataset|Paprika|
+|Fine-tuning supported or not|No|
+|Module Size|9.4MB|
+|Latest update date|2021-02-26|
+|Data indicators|-|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+ Input image
+
+
+
+ Output image
+
+
+
+
+
+- ### Module Introduction
+
+ - AnimeGAN V2 image style stransfer model, the model can convert the input image into red pepper anime style, the model weight is converted from[AnimeGAN V2 official repo](https://github.com/TachibanaYoshino/AnimeGAN)。
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 1.8.0
+
+ - paddlehub >= 1.8.0 | [How to install PaddleHub](../../../../docs/docs_en/get_start/installation.rst)
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install animegan_v2_paprika_54
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="animegan_v2_paprika_54")
+ result = model.style_transfer(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = model.style_transfer(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 2、API
+
+ - ```python
+ def style_transfer(images=None,
+ paths=None,
+ output_dir='output',
+ visualization=False,
+ min_size=32,
+ max_size=1024)
+ ```
+
+ - Style transfer API.
+
+ - **Parameters**
+
+ - images (list\[numpy.ndarray\]): Image data, ndarray.shape is in the format [H, W, C], BGR.
+ - paths (list\[str\]): Image path.
+ - output\_dir (str): Save path of images, `output` by default.
+ - visualization (bool): Whether to save the results as picture files.
+ - min\_size (int): Minimum size, default is 32.
+ - max\_size (int): Maximum size, default is 1024.
+
+ **NOTE:** Choose one of `paths` and `images` to provide input data.
+
+ - **Return**
+ - res (list\[numpy.ndarray\]): The list of style transfer results,ndarray.shape is in the format [H, W, C].
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of style transfer.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m animegan_v2_paprika_54
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # Send an HTTP request
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/animegan_v2_paprika_54"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # print prediction results
+ print(r.json()["results"])
+ ```
+
+
+## V. Release Note
+
+- 1.0.0
+
+ First release.
+
+* 1.0.1
+
+ Support paddlehub2.0.
+
+* 1.0.2
+
+ Delete batch_size.
diff --git a/modules/image/Image_gan/style_transfer/animegan_v2_paprika_97/README_en.md b/modules/image/Image_gan/style_transfer/animegan_v2_paprika_97/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..fa2a8953a76dab2cac7fef702977291cc5504303
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/animegan_v2_paprika_97/README_en.md
@@ -0,0 +1,151 @@
+# animegan_v2_paprika_97
+
+|Module Name |animegan_v2_paprika_97|
+| :--- | :---: |
+|Category |Image generation|
+|Network|AnimeGAN|
+|Dataset|Paprika|
+|Fine-tuning supported or not|No|
+|Module Size|9.7MB|
+|Latest update date|2021-07-30|
+|Data indicators|-|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+ Input image
+
+
+
+ Output image
+
+
+
+
+
+- ### Module Introduction
+
+ - AnimeGAN V2 image style stransfer model, the model can convert the input image into red pepper anime style, the model weight is converted from[AnimeGAN V2 official repo](https://github.com/TachibanaYoshino/AnimeGAN)。
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 1.8.0
+
+ - paddlehub >= 1.8.0 | [How to install PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install animegan_v2_paprika_97
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="animegan_v2_paprika_97")
+ result = model.style_transfer(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = model.style_transfer(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 2、API
+
+ - ```python
+ def style_transfer(images=None,
+ paths=None,
+ output_dir='output',
+ visualization=False,
+ min_size=32,
+ max_size=1024)
+ ```
+
+ - Style transfer API.
+
+ - **Parameters**
+
+ - images (list\[numpy.ndarray\]): Image data, ndarray.shape is in the format [H, W, C], BGR.
+ - paths (list\[str\]): Image path.
+ - output\_dir (str): Save path of images, `output` by default.
+ - visualization (bool): Whether to save the results as picture files.
+ - min\_size (int): Minimum size, default is 32.
+ - max\_size (int): Maximum size, default is 1024.
+
+ **NOTE:** Choose one of `paths` and `images` to provide input data.
+
+ - **Return**
+ - res (list\[numpy.ndarray\]): The list of style transfer results,ndarray.shape is in the format [H, W, C].
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of style transfer.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m animegan_v2_paprika_97
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # Send an HTTP request
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/animegan_v2_paprika_97"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # print prediction results
+ print(r.json()["results"])
+ ```
+
+
+## V. Release Note
+
+- 1.0.0
+
+ First release.
+
+* 1.0.1
+
+ Support paddlehub2.0.
+
+* 1.0.2
+
+ Delete batch_size.
diff --git a/modules/image/Image_gan/style_transfer/face_parse/README.md b/modules/image/Image_gan/style_transfer/face_parse/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..8d9716150c156912c42eebe67bf0cd38db9f2bcd
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/face_parse/README.md
@@ -0,0 +1,133 @@
+# face_parse
+
+|模型名称|face_parse|
+| :--- | :---: |
+|类别|图像 - 人脸解析|
+|网络|BiSeNet|
+|数据集|COCO-Stuff|
+|是否支持Fine-tuning|否|
+|模型大小|77MB|
+|最新更新日期|2021-12-07|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+ - 样例结果示例:
+
+
+
+ 输入图像
+
+
+
+ 输出图像
+
+
+
+- ### 模型介绍
+
+ - 人脸解析是语义图像分割的一种特殊情况,人脸解析是计算人脸图像中不同语义成分(如头发、嘴唇、鼻子、眼睛等)的像素级标签映射。给定一个输入的人脸图像,人脸解析将为每个语义成分分配一个像素级标签。
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+ - ppgan
+ - dlib
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install face_parse
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ # Read from a file
+ $ hub run face_parse --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现人脸解析模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+
+ module = hub.Module(name="face_parse")
+ input_path = ["/PATH/TO/IMAGE"]
+ # Read from a file
+ module.style_transfer(paths=input_path, output_dir='./transfer_result/', use_gpu=True)
+ ```
+
+- ### 3、API
+
+ - ```python
+ style_transfer(images=None, paths=None, output_dir='./transfer_result/', use_gpu=False, visualization=True):
+ ```
+ - 人脸解析转换API。
+
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\];
+ - paths (list\[str\]): 图片的路径;
+ - output\_dir (str): 结果保存的路径;
+ - use\_gpu (bool): 是否使用 GPU;
+ - visualization(bool): 是否保存结果到本地文件夹
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线人脸解析转换服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m face_parse
+ ```
+
+ - 这样就完成了一个人脸解析转换的在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/face_parse"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install face_parse==1.0.0
+ ```
diff --git a/modules/image/Image_gan/style_transfer/face_parse/model.py b/modules/image/Image_gan/style_transfer/face_parse/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..c5df633416cd0ddc199bbb4bc7908e9dec008c58
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/face_parse/model.py
@@ -0,0 +1,51 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import sys
+import argparse
+
+from PIL import Image
+import numpy as np
+import cv2
+
+import ppgan.faceutils as futils
+from ppgan.utils.preprocess import *
+from ppgan.utils.visual import mask2image
+
+
+class FaceParsePredictor:
+ def __init__(self):
+ self.input_size = (512, 512)
+ self.up_ratio = 0.6 / 0.85
+ self.down_ratio = 0.2 / 0.85
+ self.width_ratio = 0.2 / 0.85
+ self.face_parser = futils.mask.FaceParser()
+
+ def run(self, image):
+ image = Image.fromarray(image)
+ face = futils.dlib.detect(image)
+
+ if not face:
+ return
+ face_on_image = face[0]
+ image, face, crop_face = futils.dlib.crop(image, face_on_image, self.up_ratio, self.down_ratio,
+ self.width_ratio)
+ np_image = np.array(image)
+ mask = self.face_parser.parse(np.float32(cv2.resize(np_image, self.input_size)))
+ mask = cv2.resize(mask.numpy(), (256, 256))
+ mask = mask.astype(np.uint8)
+ mask = mask2image(mask)
+
+ return mask
diff --git a/modules/image/Image_gan/style_transfer/face_parse/module.py b/modules/image/Image_gan/style_transfer/face_parse/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..f1985f9ba23faf68a74e07315d2dc766ffb4f0fc
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/face_parse/module.py
@@ -0,0 +1,133 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import FaceParsePredictor
+from .util import base64_to_cv2
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(
+ name="face_parse", type="CV/style_transfer", author="paddlepaddle", author_email="", summary="", version="1.0.0")
+class Face_parse:
+ def __init__(self):
+ self.pretrained_model = os.path.join(self.directory, "bisenet.pdparams")
+
+ self.network = FaceParsePredictor()
+
+ def style_transfer(self,
+ images: list = None,
+ paths: list = None,
+ output_dir: str = './transfer_result/',
+ use_gpu: bool = False,
+ visualization: bool = True):
+ '''
+
+
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR(read by cv2).
+ paths (list[str]): paths to images
+ output_dir (str): the dir to save the results
+ use_gpu (bool): if True, use gpu to perform the computation, otherwise cpu.
+ visualization (bool): if True, save results in output_dir.
+ '''
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ out = self.network.run(image)
+ results.append(out)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ out = self.network.run(image)
+ results.append(out)
+
+ if visualization == True:
+ if not os.path.exists(output_dir):
+ os.makedirs(output_dir, exist_ok=True)
+ for i, out in enumerate(results):
+ if out is not None:
+ cv2.imwrite(os.path.join(output_dir, 'output_{}.png'.format(i)), out[:, :, ::-1])
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.style_transfer(
+ paths=[self.args.input_path],
+ output_dir=self.args.output_dir,
+ use_gpu=self.args.use_gpu,
+ visualization=self.args.visualization)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.style_transfer(images=images_decode, **kwargs)
+ tolist = [result.tolist() for result in results]
+ return tolist
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default='transfer_result', help='output directory for saving result.')
+ self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/Image_gan/style_transfer/face_parse/requirements.txt b/modules/image/Image_gan/style_transfer/face_parse/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d9bfc85782a3ee323241fe7beb87a9f281c120fe
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/face_parse/requirements.txt
@@ -0,0 +1,2 @@
+ppgan
+dlib
diff --git a/modules/image/Image_gan/style_transfer/face_parse/util.py b/modules/image/Image_gan/style_transfer/face_parse/util.py
new file mode 100644
index 0000000000000000000000000000000000000000..b88ac3562b74cadc1d4d6459a56097ca4a938a0b
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/face_parse/util.py
@@ -0,0 +1,10 @@
+import base64
+import cv2
+import numpy as np
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
diff --git a/modules/image/Image_gan/style_transfer/msgnet/README.md b/modules/image/Image_gan/style_transfer/msgnet/README.md
index b2ead3a2a4c3e185ef2edf31c8b0e8ceac817451..8314a252f61cb92a8d121d129c6ee47ea9f8ad65 100644
--- a/modules/image/Image_gan/style_transfer/msgnet/README.md
+++ b/modules/image/Image_gan/style_transfer/msgnet/README.md
@@ -50,13 +50,14 @@ $ hub run msgnet --input_path "/PATH/TO/ORIGIN/IMAGE" --style_path "/PATH/TO/STY
- ### 2.预测代码示例
+
```python
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='msgnet')
- result = model.predict(origin=["venice-boat.jpg"], style="candy.jpg", visualization=True, save_path ='style_tranfer')
+ result = model.predict(origin=["/PATH/TO/ORIGIN/IMAGE"], style="/PATH/TO/STYLE/IMAGE", visualization=True, save_path ="/PATH/TO/SAVE/IMAGE")
```
@@ -86,7 +87,7 @@ if __name__ == '__main__':
- `transforms`: 数据预处理方式。
- `mode`: 选择数据模式,可选项有 `train`, `test`, 默认为`train`。
- - 数据集的准备代码可以参考 [minicoco.py](../../paddlehub/datasets/flowers.py)。`hub.datasets.MiniCOCO()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+ - 数据集的准备代码可以参考 [minicoco.py](../../paddlehub/datasets/minicoco.py)。`hub.datasets.MiniCOCO()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
- Step3: 加载预训练模型
@@ -117,7 +118,7 @@ if __name__ == '__main__':
if __name__ == '__main__':
model = hub.Module(name='msgnet', load_checkpoint="/PATH/TO/CHECKPOINT")
- result = model.predict(origin=["venice-boat.jpg"], style="candy.jpg", visualization=True, save_path ='style_tranfer')
+ result = model.predict(origin=["/PATH/TO/ORIGIN/IMAGE"], style="/PATH/TO/STYLE/IMAGE", visualization=True, save_path ="/PATH/TO/SAVE/IMAGE")
```
- 参数配置正确后,请执行脚本`python predict.py`, 加载模型具体可参见[加载](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0-rc/api/paddle/framework/io/load_cn.html#load)。
diff --git a/modules/image/Image_gan/style_transfer/msgnet/README_en.md b/modules/image/Image_gan/style_transfer/msgnet/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..30d978b85d329b6fe64d2f86d3b868486e56af95
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/msgnet/README_en.md
@@ -0,0 +1,185 @@
+# msgnet
+
+|Module Name|msgnet|
+| :--- | :---: |
+|Category|Image editing|
+|Network|msgnet|
+|Dataset|COCO2014|
+|Fine-tuning supported or not|Yes|
+|Module Size|68MB|
+|Data indicators|-|
+|Latest update date|2021-07-29|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - Msgnet is a style transfer model. We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to [msgnet](https://github.com/zhanghang1989/PyTorch-Multi-Style-Transfer)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install msgnet
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```
+ $ hub run msgnet --input_path "/PATH/TO/ORIGIN/IMAGE" --style_path "/PATH/TO/STYLE/IMAGE"
+ ```
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='msgnet')
+ result = model.predict(origin=["/PATH/TO/ORIGIN/IMAGE"], style="/PATH/TO/STYLE/IMAGE", visualization=True, save_path ="/PATH/TO/SAVE/IMAGE")
+ ```
+
+- ### 3.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the msgnet model to fine-tune datasets such as [MiniCOCO](../../docs/reference/datasets.md#class-hubdatasetsMiniCOCO) by executing `python train.py`.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ import paddlehub.vision.transforms as T
+
+ transform = T.Compose([T.Resize((256, 256), interpolation='LINEAR')])
+ ```
+
+ - `transforms` The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+ - ```python
+ from paddlehub.datasets.minicoco import MiniCOCO
+
+ styledata = MiniCOCO(transform=transform, mode='train')
+
+ ```
+ * `transforms`: data preprocessing methods.
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ - Dataset preparation can be referred to [minicoco.py](../../paddlehub/datasets/minicoco.py). `hub.datasets.MiniCOCO()` will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ model = hub.Module(name='msgnet', load_checkpoint=None)
+ ```
+ * `name`: model name.
+ * `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ optimizer = paddle.optimizer.Adam(learning_rate=0.0001, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_style_ckpt')
+ trainer.train(styledata, epochs=101, batch_size=4, eval_dataset=styledata, log_interval=10, save_interval=10)
+ ```
+
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+ - ```python
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='msgnet', load_checkpoint="/PATH/TO/CHECKPOINT")
+ result = model.predict(origin=["/PATH/TO/ORIGIN/IMAGE"], style="/PATH/TO/STYLE/IMAGE", visualization=True, save_path ="/PATH/TO/SAVE/IMAGE")
+ ```
+
+ - **Parameters**
+ * `origin`: Image path or ndarray data with format [H, W, C], BGR.
+ * `style`: Style image path.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'style_tranfer'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of style transfer.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m msgnet
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # Send an HTTP request
+ org_im = cv2.imread('/PATH/TO/ORIGIN/IMAGE')
+ style_im = cv2.imread('/PATH/TO/STYLE/IMAGE')
+ data = {'images':[[cv2_to_base64(org_im)], cv2_to_base64(style_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/msgnet"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ data = base64_to_cv2(r.json()["results"]['data'][0])
+ cv2.imwrite('style.png', data)
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/Image_gan/style_transfer/psgan/README.md b/modules/image/Image_gan/style_transfer/psgan/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..3d0b63dc1558f861d13b801c58a8a8206eac10ea
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/psgan/README.md
@@ -0,0 +1,143 @@
+# psgan
+
+|模型名称|psgan|
+| :--- | :---: |
+|类别|图像 - 妆容迁移|
+|网络|PSGAN|
+|数据集|-|
+|是否支持Fine-tuning|否|
+|模型大小|121MB|
+|最新更新日期|2021-12-07|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+ - 样例结果示例:
+
+
+
+ 输入内容图形
+
+
+
+ 输入妆容图形
+
+
+
+ 输出图像
+
+
+
+- ### 模型介绍
+
+ - PSGAN模型的任务是妆容迁移, 即将任意参照图像上的妆容迁移到不带妆容的源图像上。很多人像美化应用都需要这种技术。
+
+ - 更多详情参考:[PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer](https://arxiv.org/pdf/1909.06956.pdf)
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+ - ppgan
+ - dlib
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install psgan
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ # Read from a file
+ $ hub run psgan --content "/PATH/TO/IMAGE" --style "/PATH/TO/IMAGE1"
+ ```
+ - 通过命令行方式实现妆容转换模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+
+ module = hub.Module(name="psgan")
+ content = cv2.imread("/PATH/TO/IMAGE")
+ style = cv2.imread("/PATH/TO/IMAGE1")
+ results = module.makeup_transfer(images=[{'content':content, 'style':style}], output_dir='./transfer_result', use_gpu=True)
+ ```
+
+- ### 3、API
+
+ - ```python
+ makeup_transfer(images=None, paths=None, output_dir='./transfer_result/', use_gpu=False, visualization=True)
+ ```
+ - 妆容风格转换API。
+
+ - **参数**
+
+ - images (list[dict]): data of images, 每一个元素都为一个 dict,有关键字 content, style, 相应取值为:
+ - content (numpy.ndarray): 待转换的图片,shape 为 \[H, W, C\],BGR格式;
+ - style (numpy.ndarray) : 风格图像,shape为 \[H, W, C\],BGR格式;
+ - paths (list[str]): paths to images, 每一个元素都为一个dict, 有关键字 content, style, 相应取值为:
+ - content (str): 待转换的图片的路径;
+ - style (str) : 风格图像的路径;
+ - output\_dir (str): 结果保存的路径;
+ - use\_gpu (bool): 是否使用 GPU;
+ - visualization(bool): 是否保存结果到本地文件夹
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线妆容风格转换服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m psgan
+ ```
+
+ - 这样就完成了一个妆容风格转换的在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[{'content': cv2_to_base64(cv2.imread("/PATH/TO/IMAGE")), 'style': cv2_to_base64(cv2.imread("/PATH/TO/IMAGE1"))}]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/psgan"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install psgan==1.0.0
+ ```
diff --git a/modules/image/Image_gan/style_transfer/psgan/makeup.yaml b/modules/image/Image_gan/style_transfer/psgan/makeup.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..05723e02b4c96c460e18affbb8774b36c5c6b532
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/psgan/makeup.yaml
@@ -0,0 +1,76 @@
+epochs: 100
+output_dir: tmp
+checkpoints_dir: checkpoints
+find_unused_parameters: True
+
+model:
+ name: MakeupModel
+ generator:
+ name: GeneratorPSGANAttention
+ conv_dim: 64
+ repeat_num: 6
+ discriminator:
+ name: NLayerDiscriminator
+ ndf: 64
+ n_layers: 3
+ input_nc: 3
+ norm_type: spectral
+ cycle_criterion:
+ name: L1Loss
+ idt_criterion:
+ name: L1Loss
+ loss_weight: 0.5
+ l1_criterion:
+ name: L1Loss
+ l2_criterion:
+ name: MSELoss
+ gan_criterion:
+ name: GANLoss
+ gan_mode: lsgan
+
+dataset:
+ train:
+ name: MakeupDataset
+ trans_size: 256
+ dataroot: data/MT-Dataset
+ cls_list: [non-makeup, makeup]
+ phase: train
+ test:
+ name: MakeupDataset
+ trans_size: 256
+ dataroot: data/MT-Dataset
+ cls_list: [non-makeup, makeup]
+ phase: test
+
+
+lr_scheduler:
+ name: LinearDecay
+ learning_rate: 0.0002
+ start_epoch: 100
+ decay_epochs: 100
+ # will get from real dataset
+ iters_per_epoch: 1
+
+optimizer:
+ optimizer_G:
+ name: Adam
+ net_names:
+ - netG
+ beta1: 0.5
+ optimizer_DA:
+ name: Adam
+ net_names:
+ - netD_A
+ beta1: 0.5
+ optimizer_DB:
+ name: Adam
+ net_names:
+ - netD_B
+ beta1: 0.5
+
+log_config:
+ interval: 10
+ visiual_interval: 500
+
+snapshot_config:
+ interval: 5
diff --git a/modules/image/Image_gan/style_transfer/psgan/model.py b/modules/image/Image_gan/style_transfer/psgan/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..c4dcf64157b1a3a3d5a55da56cd5c82d49c13ce6
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/psgan/model.py
@@ -0,0 +1,170 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import os
+import sys
+from pathlib import Path
+
+import numpy as np
+import paddle
+import paddle.vision.transforms as T
+import ppgan.faceutils as futils
+from paddle.utils.download import get_weights_path_from_url
+from PIL import Image
+from ppgan.models.builder import build_model
+from ppgan.utils.config import get_config
+from ppgan.utils.filesystem import load
+from ppgan.utils.options import parse_args
+from ppgan.utils.preprocess import *
+
+
+def toImage(net_output):
+ img = net_output.squeeze(0).transpose((1, 2, 0)).numpy() # [1,c,h,w]->[h,w,c]
+ img = (img * 255.0).clip(0, 255)
+ img = np.uint8(img)
+ img = Image.fromarray(img, mode='RGB')
+ return img
+
+
+PS_WEIGHT_URL = "https://paddlegan.bj.bcebos.com/models/psgan_weight.pdparams"
+
+
+class PreProcess:
+ def __init__(self, config, need_parser=True):
+ self.img_size = 256
+ self.transform = transform = T.Compose([
+ T.Resize(size=256),
+ T.ToTensor(),
+ ])
+ self.norm = T.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
+ if need_parser:
+ self.face_parser = futils.mask.FaceParser()
+ self.up_ratio = 0.6 / 0.85
+ self.down_ratio = 0.2 / 0.85
+ self.width_ratio = 0.2 / 0.85
+
+ def __call__(self, image):
+ face = futils.dlib.detect(image)
+
+ if not face:
+ return
+ face_on_image = face[0]
+ image, face, crop_face = futils.dlib.crop(image, face_on_image, self.up_ratio, self.down_ratio,
+ self.width_ratio)
+ np_image = np.array(image)
+ image_trans = self.transform(np_image)
+ mask = self.face_parser.parse(np.float32(cv2.resize(np_image, (512, 512))))
+ mask = cv2.resize(mask.numpy(), (self.img_size, self.img_size), interpolation=cv2.INTER_NEAREST)
+ mask = mask.astype(np.uint8)
+ mask_tensor = paddle.to_tensor(mask)
+
+ lms = futils.dlib.landmarks(image, face) / image_trans.shape[:2] * self.img_size
+ lms = lms.round()
+
+ P_np = generate_P_from_lmks(lms, self.img_size, self.img_size, self.img_size)
+
+ mask_aug = generate_mask_aug(mask, lms)
+
+ return [self.norm(image_trans).unsqueeze(0),
+ np.float32(mask_aug),
+ np.float32(P_np),
+ np.float32(mask)], face_on_image, crop_face
+
+
+class PostProcess:
+ def __init__(self, config):
+ self.denoise = True
+ self.img_size = 256
+
+ def __call__(self, source: Image, result: Image):
+ # TODO: Refract -> name, resize
+ source = np.array(source)
+ result = np.array(result)
+
+ height, width = source.shape[:2]
+ small_source = cv2.resize(source, (self.img_size, self.img_size))
+ laplacian_diff = source.astype(np.float) - cv2.resize(small_source, (width, height)).astype(np.float)
+ result = (cv2.resize(result, (width, height)) + laplacian_diff).round().clip(0, 255).astype(np.uint8)
+ if self.denoise:
+ result = cv2.fastNlMeansDenoisingColored(result)
+ result = Image.fromarray(result).convert('RGB')
+ return result
+
+
+class Inference:
+ def __init__(self, config, model_path=''):
+ self.model = build_model(config.model)
+ self.preprocess = PreProcess(config)
+ self.model_path = model_path
+
+ def transfer(self, source, reference, with_face=False):
+ source_input, face, crop_face = self.preprocess(source)
+ reference_input, face, crop_face = self.preprocess(reference)
+
+ consis_mask = np.float32(calculate_consis_mask(source_input[1], reference_input[1]))
+ consis_mask = paddle.to_tensor(np.expand_dims(consis_mask, 0))
+
+ if not (source_input and reference_input):
+ if with_face:
+ return None, None
+ return
+
+ for i in range(1, len(source_input) - 1):
+ source_input[i] = paddle.to_tensor(np.expand_dims(source_input[i], 0))
+
+ for i in range(1, len(reference_input) - 1):
+ reference_input[i] = paddle.to_tensor(np.expand_dims(reference_input[i], 0))
+
+ input_data = {
+ 'image_A': source_input[0],
+ 'image_B': reference_input[0],
+ 'mask_A_aug': source_input[1],
+ 'mask_B_aug': reference_input[1],
+ 'P_A': source_input[2],
+ 'P_B': reference_input[2],
+ 'consis_mask': consis_mask
+ }
+
+ state_dicts = load(self.model_path)
+ for net_name, net in self.model.nets.items():
+ net.set_state_dict(state_dicts[net_name])
+ result, _ = self.model.test(input_data)
+ min_, max_ = result.min(), result.max()
+ result += -min_
+ result = paddle.divide(result, max_ - min_ + 1e-5)
+ img = toImage(result)
+
+ if with_face:
+ return img, crop_face
+
+ return img
+
+
+class PSGANPredictor:
+ def __init__(self, cfg, weight_path):
+ self.cfg = cfg
+ self.weight_path = weight_path
+
+ def run(self, source, reference):
+ source = Image.fromarray(source)
+ reference = Image.fromarray(reference)
+ inference = Inference(self.cfg, self.weight_path)
+ postprocess = PostProcess(self.cfg)
+
+ # Transfer the psgan from reference to source.
+ image, face = inference.transfer(source, reference, with_face=True)
+ source_crop = source.crop((face.left(), face.top(), face.right(), face.bottom()))
+ image = postprocess(source_crop, image)
+ image = np.array(image)
+ return image
diff --git a/modules/image/Image_gan/style_transfer/psgan/module.py b/modules/image/Image_gan/style_transfer/psgan/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..754af703458578fbda1e06e623b5ae91d3c807c0
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/psgan/module.py
@@ -0,0 +1,144 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from ppgan.utils.config import get_config
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import PSGANPredictor
+from .util import base64_to_cv2
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(name="psgan", type="CV/gan", author="paddlepaddle", author_email="", summary="", version="1.0.0")
+class psgan:
+ def __init__(self):
+ self.pretrained_model = os.path.join(self.directory, "psgan_weight.pdparams")
+ cfg = get_config(os.path.join(self.directory, 'makeup.yaml'))
+ self.network = PSGANPredictor(cfg, self.pretrained_model)
+
+ def makeup_transfer(self,
+ images=None,
+ paths=None,
+ output_dir='./transfer_result/',
+ use_gpu=False,
+ visualization=True):
+ '''
+ Transfer a image to stars style.
+
+ images (list[dict]): data of images, 每一个元素都为一个 dict,有关键字 content, style, 相应取值为:
+ - content (numpy.ndarray): 待转换的图片,shape 为 \[H, W, C\],BGR格式;
+ - style (numpy.ndarray) : 妆容图像,shape为 \[H, W, C\],BGR格式;
+ paths (list[str]): paths to images, 每一个元素都为一个dict, 有关键字 content, style, 相应取值为:
+ - content (str): 待转换的图片的路径;
+ - style (str) : 妆容图像的路径;
+
+ output_dir: the dir to save the results
+ use_gpu: if True, use gpu to perform the computation, otherwise cpu.
+ visualization: if True, save results in output_dir.
+ '''
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image_dict in images:
+ content_img = image_dict['content'][:, :, ::-1]
+ style_img = image_dict['style'][:, :, ::-1]
+ results.append(self.network.run(content_img, style_img))
+
+ if paths != None:
+ for path_dict in paths:
+ content_img = cv2.imread(path_dict['content'])[:, :, ::-1]
+ style_img = cv2.imread(path_dict['style'])[:, :, ::-1]
+ results.append(self.network.run(content_img, style_img))
+
+ if visualization == True:
+ if not os.path.exists(output_dir):
+ os.makedirs(output_dir, exist_ok=True)
+ for i, out in enumerate(results):
+ cv2.imwrite(os.path.join(output_dir, 'output_{}.png'.format(i)), out[:, :, ::-1])
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+
+ self.makeup_transfer(
+ paths=[{
+ 'content': self.args.content,
+ 'style': self.args.style
+ }],
+ output_dir=self.args.output_dir,
+ use_gpu=self.args.use_gpu,
+ visualization=self.args.visualization)
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = copy.deepcopy(images)
+ for image in images_decode:
+ image['content'] = base64_to_cv2(image['content'])
+ image['style'] = base64_to_cv2(image['style'])
+ results = self.makeup_transfer(images_decode, **kwargs)
+ tolist = [result.tolist() for result in results]
+ return tolist
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default='transfer_result', help='output directory for saving result.')
+ self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--content', type=str, help="path to content image.")
+ self.arg_input_group.add_argument('--style', type=str, help="path to style image.")
diff --git a/modules/image/Image_gan/style_transfer/psgan/requirements.txt b/modules/image/Image_gan/style_transfer/psgan/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d9bfc85782a3ee323241fe7beb87a9f281c120fe
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/psgan/requirements.txt
@@ -0,0 +1,2 @@
+ppgan
+dlib
diff --git a/modules/image/Image_gan/style_transfer/psgan/util.py b/modules/image/Image_gan/style_transfer/psgan/util.py
new file mode 100644
index 0000000000000000000000000000000000000000..531a0ae0d487822a870ba7f09817e658967aff10
--- /dev/null
+++ b/modules/image/Image_gan/style_transfer/psgan/util.py
@@ -0,0 +1,11 @@
+import base64
+
+import cv2
+import numpy as np
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
diff --git a/modules/image/classification/food_classification/requirements.txt b/modules/image/classification/food_classification/requirements.txt
index f3c5b8fb12473794251e0a4669dac313cb93eff4..0e661622dcc3a99395344610b83850e5535961b6 100644
--- a/modules/image/classification/food_classification/requirements.txt
+++ b/modules/image/classification/food_classification/requirements.txt
@@ -1,3 +1,2 @@
-paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
paddlex == 1.3.7
diff --git a/modules/image/classification/resnet50_vd_animals/README.md b/modules/image/classification/resnet50_vd_animals/README.md
index a42168e27330a2e66d93a463ca8ce87553c2a2c8..0b7deba6c890ab1fd3a9d57d9c28afddb01b8940 100644
--- a/modules/image/classification/resnet50_vd_animals/README.md
+++ b/modules/image/classification/resnet50_vd_animals/README.md
@@ -44,7 +44,7 @@
hub run resnet50_vd_animals --input_path "/PATH/TO/IMAGE"
```
-- ### 2、代码示例
+- ### 2、预测代码示例
- ```python
import paddlehub as hub
diff --git a/modules/image/classification/resnet50_vd_animals/README_en.md b/modules/image/classification/resnet50_vd_animals/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..031f469fc7025996491dcd5613a1d5cc7bd8f817
--- /dev/null
+++ b/modules/image/classification/resnet50_vd_animals/README_en.md
@@ -0,0 +1,173 @@
+# resnet50_vd_animals
+
+|Module Name|resnet50_vd_animals|
+| :--- | :---: |
+|Category |Image classification|
+|Network|ResNet50_vd|
+|Dataset|Baidu self-built dataset|
+|Fine-tuning supported or not|No|
+|Module Size|154MB|
+|Latest update date|2021-02-26|
+|Data indicators|-|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - ResNet-vd is a variant of ResNet, which can be used for image classification and feature extraction. This module is trained by Baidu self-built animal data set and supports the classification and recognition of 7,978 animal species.
+ - For more information, please refer to [ResNet-vd](https://arxiv.org/pdf/1812.01187.pdf)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install resnet50_vd_animals
+ ```
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run resnet50_vd_animals --input_path "/PATH/TO/IMAGE"
+ ```
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="resnet50_vd_animals")
+
+ result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = classifier.classification(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+ - ```python
+ def get_expected_image_width()
+ ```
+
+ - Returns the preprocessed image width, which is 224.
+
+ - ```python
+ def get_expected_image_height()
+ ```
+
+ - Returns the preprocessed image height, which is 224.
+
+ - ```python
+ def get_pretrained_images_mean()
+ ```
+
+ - Returns the mean value of the preprocessed image, which is \[0.485, 0.456, 0.406\].
+
+ - ```python
+ def get_pretrained_images_std()
+ ```
+
+ - Return the standard deviation of the preprocessed image, which is \[0.229, 0.224, 0.225\].
+
+
+ - ```python
+ def classification(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ top_k=1):
+ ```
+
+ - **Parameter**
+
+ * images (list\[numpy.ndarray\]): image data, ndarray.shape is in the format [H, W, C], BGR;
+ * paths (list\[str\]): image path;
+ * batch\_size (int): batch size;
+ * use\_gpu (bool): use GPU or not; **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**
+ * top\_k (int): return the top k prediction results.
+
+ - **Return**
+
+ - res (list\[dict\]): the list of classification results,key is the prediction label, value is the corresponding confidence.
+
+ - ```python
+ def save_inference_model(dirname,
+ model_filename=None,
+ params_filename=None,
+ combined=True)
+ ```
+
+ - Save the model to the specified path.
+
+ - **Parameters**
+ * dirname: Save path.
+ * model\_filename: model file name,defalt is \_\_model\_\_
+ * params\_filename: parameter file name,defalt is \_\_params\_\_(Only takes effect when `combined` is True)
+ * combined: Whether to save the parameters to a unified file.
+
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of animal classification.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m resnet50_vd_animals
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ # Send an HTTP request
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/resnet50_vd_animals"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # print prediction results
+ print(r.json()["results"])
+ ```
+
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
+
diff --git a/modules/image/classification/resnet50_vd_imagenet_ssld/README.md b/modules/image/classification/resnet50_vd_imagenet_ssld/README.md
index 229e5d0c8400152d73f354f13dab546a3f8b749c..7563ae023257a077ef302d8992ee51307246e3c4 100644
--- a/modules/image/classification/resnet50_vd_imagenet_ssld/README.md
+++ b/modules/image/classification/resnet50_vd_imagenet_ssld/README.md
@@ -50,7 +50,7 @@
if __name__ == '__main__':
model = hub.Module(name='resnet50_vd_imagenet_ssld')
- result = model.predict(['flower.jpg'])
+ result = model.predict(['/PATH/TO/IMAGE'])
```
- ### 3.如何开始Fine-tune
@@ -134,7 +134,7 @@
if __name__ == '__main__':
model = hub.Module(name='resnet50_vd_imagenet_ssld', label_list=["roses", "tulips", "daisy", "sunflowers", "dandelion"], load_checkpoint='/PATH/TO/CHECKPOINT')
- result = model.predict(['flower.jpg'])
+ result = model.predict(['/PATH/TO/IMAGE'])
```
diff --git a/modules/image/classification/resnet50_vd_imagenet_ssld/README_en.md b/modules/image/classification/resnet50_vd_imagenet_ssld/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..9cf41b043fd11c4e30181dfccafe04d39533d083
--- /dev/null
+++ b/modules/image/classification/resnet50_vd_imagenet_ssld/README_en.md
@@ -0,0 +1,198 @@
+# resnet50_vd_imagenet_ssld
+
+|Module Name|resnet50_vd_imagenet_ssld|
+| :--- | :---: |
+|Category |Image classification|
+|Network|ResNet_vd|
+|Dataset|ImageNet-2012|
+|Fine-tuning supported or notFine-tuning|Yes|
+|Module Size|148MB|
+|Data indicators|-|
+|Latest update date|2021-02-26|
+
+
+## I. Basic Information
+
+- ### Module Introduction
+
+ - ResNet-vd is a variant of ResNet, which can be used for image classification and feature extraction.
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install resnet50_vd_imagenet_ssld
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ ```shell
+ $ hub run resnet50_vd_imagenet_ssld --input_path "/PATH/TO/IMAGE" --top_k 5
+ ```
+- ### 2、Prediction Code Example
+
+ ```python
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+
+ model = hub.Module(name='resnet50_vd_imagenet_ssld')
+ result = model.predict(['/PATH/TO/IMAGE'])
+ ```
+- ### 3.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the user_guided_colorization model to fine-tune datasets such as [Flowers](../../docs/reference/datasets.md#class-hubdatasetsflowers) by excuting `python train.py`.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+ - ```python
+ import paddlehub.vision.transforms as T
+
+ transforms = T.Compose([T.Resize((256, 256)),
+ T.CenterCrop(224),
+ T.Normalize(mean=[0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225])],
+ to_rgb=True)
+ ```
+
+ - `transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import Flowers
+
+ flowers = Flowers(transforms)
+
+ flowers_validate = Flowers(transforms, mode='val')
+ ```
+
+ * `transforms`: data preprocessing methods.
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+ * `hub.datasets.Flowers()` will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ model = hub.Module(name="resnet50_vd_imagenet_ssld", label_list=["roses", "tulips", "daisy", "sunflowers", "dandelion"])
+ ```
+ * `name`: model name.
+ * `label_list`: set the output classification category. Default is Imagenet2012 category.
+
+ - Step4: Optimization strategy
+
+ ```python
+ optimizer = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='img_classification_ckpt')
+
+ trainer.train(flowers, epochs=100, batch_size=32, eval_dataset=flowers_validate, save_interval=1)
+ ```
+
+
+ - Run configuration
+
+ - `Trainer` mainly control the training of Fine-tune, including the following controllable parameters:
+
+ * `model`: Optimized model.
+ * `optimizer`: Optimizer selection.
+ * `use_vdl`: Whether to use vdl to visualize the training process.
+ * `checkpoint_dir`: The storage address of the model parameters.
+ * `compare_metrics`: The measurement index of the optimal model.
+
+ - `trainer.train` mainly control the specific training process, including the following controllable parameters:
+
+ * `train_dataset`: Training dataset.
+ * `epochs`: Epochs of training process.
+ * `batch_size`: Batch size.
+ * `num_workers`: Number of workers.
+ * `eval_dataset`: Validation dataset.
+ * `log_interval`:The interval for printing logs.
+ * `save_interval`: The interval for saving model parameters.
+
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ - ```python
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+
+ model = hub.Module(name='resnet50_vd_imagenet_ssld', label_list=["roses", "tulips", "daisy", "sunflowers", "dandelion"], load_checkpoint='/PATH/TO/CHECKPOINT')
+ result = model.predict(['/PATH/TO/IMAGE'])
+ ```
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of classification.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m resnet50_vd_imagenet_ssld
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # Send an HTTP request
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+
+ data = {'images':[cv2_to_base64(org_im)], 'top_k':2}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/resnet50_vd_imagenet_ssld"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ data =r.json()["results"]['data']
+ ```
+## V. Release Note
+
+* 1.0.0
+
+ First release
+
+* 1.1.0
+
+ Upgrade to dynamic version
diff --git a/modules/image/classification/resnet_v2_50_imagenet/README_en.md b/modules/image/classification/resnet_v2_50_imagenet/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..76e7dfd87ba1379a803d1def080b674ac60bc685
--- /dev/null
+++ b/modules/image/classification/resnet_v2_50_imagenet/README_en.md
@@ -0,0 +1,87 @@
+# resnet_v2_50_imagenet
+
+|Module Name|resnet_v2_50_imagenet|
+| :--- | :---: |
+|Category |Image classification|
+|Network|ResNet V2|
+|Dataset|ImageNet-2012|
+|Fine-tuning supported or not|No|
+|Module Size|99MB|
+|Latest update date|2021-02-26|
+|Data indicators|-|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - This module utilizes ResNet50 structure and it is trained on ImageNet-2012.
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 1.4.0
+
+ - paddlehub >= 1.0.0 | [How to install PaddleHub](../../../../docs/docs_en/get_start/installation.rst)
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install resnet_v2_50_imagenet
+ ```
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run resnet_v2_50_imagenet --input_path "/PATH/TO/IMAGE"
+ ```
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="resnet_v2_50_imagenet")
+ test_img_path = "/PATH/TO/IMAGE"
+ input_dict = {"image": [test_img_path]}
+ result = classifier.classification(data=input_dict)
+ ```
+
+- ### 3、API
+
+ - ```python
+ def classification(data)
+ ```
+ - Prediction API for classification.
+
+ - **Parameter**
+ - data (dict): Key is 'image',value is the list of image path.
+
+ - **Return**
+ - result (list[dict]): The list of classification results,key is the prediction label, value is the corresponding confidence.
+
+
+
+
+## IV. Release Note
+
+- 1.0.0
+
+ First release
+
+- 1.0.1
+
+ Fix encoding problem in python2
+
+ - ```shell
+ $ hub install resnet_v2_50_imagenet==1.0.1
+ ```
diff --git a/modules/image/industrial_application/meter_readings/barometer_reader/requirements.txt b/modules/image/industrial_application/meter_readings/barometer_reader/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..2b801c41fabee3640316419344f036d9c963e36a
--- /dev/null
+++ b/modules/image/industrial_application/meter_readings/barometer_reader/requirements.txt
@@ -0,0 +1 @@
+paddlex == 1.3.0
\ No newline at end of file
diff --git a/modules/image/matting/dim_vgg16_matting/README.md b/modules/image/matting/dim_vgg16_matting/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..07f8e1ac0d4673c164e692d3854efc077494be44
--- /dev/null
+++ b/modules/image/matting/dim_vgg16_matting/README.md
@@ -0,0 +1,154 @@
+# dim_vgg16_matting
+
+|模型名称|dim_vgg16_matting|
+| :--- | :---: |
+|类别|图像-抠图|
+|网络|dim_vgg16|
+|数据集|百度自建数据集|
+|是否支持Fine-tuning|否|
+|模型大小|164MB|
+|指标|SAD112.73|
+|最新更新日期|2021-12-03|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+
+ - 样例结果示例(左为原图,右为效果图):
+
+
+
+
+
+- ### 模型介绍
+
+ - Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。dim_vgg16_matting是一种需要trimap作为输入的matting模型。
+
+
+
+ - 更多详情请参考:[dim_vgg16_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install dim_vgg16_matting
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run dim_vgg16_matting --input_path "/PATH/TO/IMAGE" --trimap_path "/PATH/TO/TRIMAP"
+ ```
+
+ - 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="dim_vgg16_matting")
+
+ result = model.predict(image_list=["/PATH/TO/IMAGE"], trimap_list=["PATH/TO/TRIMAP"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ trimap_list,
+ visualization,
+ save_path):
+ ```
+
+ - 人像matting预测API,用于将输入图片中的人像分割出来。
+
+ - 参数
+
+ - image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
+ - trimap_list(list(str | numpy.ndarray)):trimap输入路径或者单通道灰度图片。
+ - visualization (bool): 是否进行可视化,默认为False。
+ - save_path (str): 当visualization为True时,保存图片的路径,默认为"dim_vgg16_matting_output" 。
+
+ - 返回
+
+ - result (list(numpy.ndarray)):模型分割结果:
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署人像matting在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m dim_vgg16_matting
+ ```
+
+ - 这样就完成了一个人像matting在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))], 'trimaps':[cv2_to_base64(cv2.imread("/PATH/TO/TRIMAP"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/dim_vgg16_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/matting/dim_vgg16_matting/README_en.md b/modules/image/matting/dim_vgg16_matting/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..aaffb278a85f8076fd0ed5d536e2d5870bb478ca
--- /dev/null
+++ b/modules/image/matting/dim_vgg16_matting/README_en.md
@@ -0,0 +1,156 @@
+# dim_vgg16_matting
+
+|Module Name|dim_vgg16_matting|
+| :--- | :---: |
+|Category|Matting|
+|Network|dim_vgg16|
+|Dataset|Baidu self-built dataset|
+|Support Fine-tuning|No|
+|Module Size|164MB|
+|Data Indicators|-|
+|Latest update date|2021-12-03|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
+
+
+
+ - For more information, please refer to: [dim_vgg16_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install dim_vgg16_matting
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run dim_vgg16_matting --input_path "/PATH/TO/IMAGE" --trimap_path "/PATH/TO/TRIMAP"
+ ```
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="dim_vgg16_matting")
+
+ result = model.predict(image_list=["/PATH/TO/IMAGE"], trimap_list=["PATH/TO/TRIMAP"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ trimap_list,
+ visualization,
+ save_path):
+ ```
+
+ - Prediction API for matting.
+
+ - **Parameter**
+
+ - image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
+ - trimap_list(list(str | numpy.ndarray)): Trimap path or trimap data, ndarray.shape is in the format \[H, W],Gray style.
+ - visualization (bool): Whether to save the recognition results as picture files, default is False.
+ - save_path (str): Save path of images, "dim_vgg16_matting_output" by default.
+
+ - **Return**
+
+ - result (list(numpy.ndarray)):The list of model results.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of matting.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m dim_vgg16_matting
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))], 'trimaps':[cv2_to_base64(cv2.imread("/PATH/TO/TRIMAP"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/dim_vgg16_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/matting/dim_vgg16_matting/module.py b/modules/image/matting/dim_vgg16_matting/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..2ae3c0d36fbdf6a827bb1093a80c1def67de17cd
--- /dev/null
+++ b/modules/image/matting/dim_vgg16_matting/module.py
@@ -0,0 +1,288 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import time
+import argparse
+from typing import Callable, Union, List, Tuple
+
+import numpy as np
+import cv2
+import scipy
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.module import moduleinfo, runnable, serving
+from paddleseg.models import layers
+
+from dim_vgg16_matting.vgg import VGG16
+import dim_vgg16_matting.processor as P
+
+
+@moduleinfo(
+ name="dim_vgg16_matting",
+ type="CV/matting",
+ author="paddlepaddle",
+ summary="dim_vgg16_matting is a matting model",
+ version="1.0.0"
+)
+class DIMVGG16(nn.Layer):
+ """
+ The DIM implementation based on PaddlePaddle.
+
+ The original article refers to
+ Ning Xu, et, al. "Deep Image Matting"
+ (https://arxiv.org/pdf/1908.07919.pdf).
+
+ Args:
+ stage (int, optional): The stage of model. Defautl: 3.
+ decoder_input_channels(int, optional): The channel of decoder input. Default: 512.
+ pretrained(str, optional): The path of pretrianed model. Defautl: None.
+
+ """
+ def __init__(self,
+ stage: int = 3,
+ decoder_input_channels: int = 512,
+ pretrained: str = None):
+ super(DIMVGG16, self).__init__()
+
+ self.backbone = VGG16()
+ self.pretrained = pretrained
+ self.stage = stage
+
+ decoder_output_channels = [64, 128, 256, 512]
+ self.decoder = Decoder(
+ input_channels=decoder_input_channels,
+ output_channels=decoder_output_channels)
+ if self.stage == 2:
+ for param in self.backbone.parameters():
+ param.stop_gradient = True
+ for param in self.decoder.parameters():
+ param.stop_gradient = True
+ if self.stage >= 2:
+ self.refine = Refine()
+
+ self.transforms = P.Compose([P.LoadImages(), P.LimitLong(max_long=3840),P.Normalize()])
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'dim-vgg16.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def preprocess(self, img: Union[str, np.ndarray] , transforms: Callable, trimap: Union[str, np.ndarray] = None) -> dict:
+ data = {}
+ data['img'] = img
+ if trimap is not None:
+ data['trimap'] = trimap
+ data['gt_fields'] = ['trimap']
+ data['trans_info'] = []
+ data = self.transforms(data)
+ data['img'] = paddle.to_tensor(data['img'])
+ data['img'] = data['img'].unsqueeze(0)
+ if trimap is not None:
+ data['trimap'] = paddle.to_tensor(data['trimap'])
+ data['trimap'] = data['trimap'].unsqueeze((0, 1))
+
+ return data
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ input_shape = paddle.shape(inputs['img'])[-2:]
+ x = paddle.concat([inputs['img'], inputs['trimap'] / 255], axis=1)
+ fea_list = self.backbone(x)
+
+ # decoder stage
+ up_shape = []
+ for i in range(5):
+ up_shape.append(paddle.shape(fea_list[i])[-2:])
+ alpha_raw = self.decoder(fea_list, up_shape)
+ alpha_raw = F.interpolate(
+ alpha_raw, input_shape, mode='bilinear', align_corners=False)
+ logit_dict = {'alpha_raw': alpha_raw}
+ if self.stage < 2:
+ return logit_dict
+
+ if self.stage >= 2:
+ # refine stage
+ refine_input = paddle.concat([inputs['img'], alpha_raw], axis=1)
+ alpha_refine = self.refine(refine_input)
+
+ # finally alpha
+ alpha_pred = alpha_refine + alpha_raw
+ alpha_pred = F.interpolate(
+ alpha_pred, input_shape, mode='bilinear', align_corners=False)
+ if not self.training:
+ alpha_pred = paddle.clip(alpha_pred, min=0, max=1)
+ logit_dict['alpha_pred'] = alpha_pred
+
+ return alpha_pred
+
+ def predict(self, image_list: list, trimap_list: list, visualization: bool =False, save_path: str = "dim_vgg16_matting_output") -> list:
+ self.eval()
+ result= []
+ with paddle.no_grad():
+ for i, im_path in enumerate(image_list):
+ trimap = trimap_list[i] if trimap_list is not None else None
+ data = self.preprocess(img=im_path, transforms=self.transforms, trimap=trimap)
+ alpha_pred = self.forward(data)
+ alpha_pred = P.reverse_transform(alpha_pred, data['trans_info'])
+ alpha_pred = (alpha_pred.numpy()).squeeze()
+ alpha_pred = (alpha_pred * 255).astype('uint8')
+ alpha_pred = P.save_alpha_pred(alpha_pred, trimap)
+ result.append(alpha_pred)
+ if visualization:
+ if not os.path.exists(save_path):
+ os.makedirs(save_path)
+ img_name = str(time.time()) + '.png'
+ image_save_path = os.path.join(save_path, img_name)
+ cv2.imwrite(image_save_path, alpha_pred)
+
+ return result
+
+ @serving
+ def serving_method(self, images: list, trimaps:list, **kwargs) -> dict:
+ """
+ Run as a service.
+ """
+ images_decode = [P.base64_to_cv2(image) for image in images]
+
+ if trimaps is not None:
+ trimap_decoder = [cv2.cvtColor(P.base64_to_cv2(trimap), cv2.COLOR_BGR2GRAY) for trimap in trimaps]
+ else:
+ trimap_decoder = None
+
+ outputs = self.predict(image_list=images_decode, trimap_list= trimap_decoder, **kwargs)
+
+ serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
+ results = {'data': serving_data}
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list) -> list:
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ args = self.parser.parse_args(argvs)
+ if args.trimap_path is not None:
+ trimap_list = [args.trimap_path]
+ else:
+ trimap_list = None
+
+ results = self.predict(image_list=[args.input_path], trimap_list=trimap_list, save_path=args.output_dir, visualization=args.visualization)
+
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default="dim_vgg16_matting_output", help="The directory to save output images.")
+ self.arg_config_group.add_argument(
+ '--visualization', type=bool, default=True, help="whether to save output as images.")
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
+ self.arg_input_group.add_argument('--trimap_path', type=str, help="path to trimap.")
+
+
+class Up(nn.Layer):
+ def __init__(self, input_channels: int, output_channels: int):
+ super().__init__()
+ self.conv = layers.ConvBNReLU(
+ input_channels,
+ output_channels,
+ kernel_size=5,
+ padding=2,
+ bias_attr=False)
+
+ def forward(self, x: paddle.Tensor, skip: paddle.Tensor, output_shape: list) -> paddle.Tensor:
+ x = F.interpolate(
+ x, size=output_shape, mode='bilinear', align_corners=False)
+ x = x + skip
+ x = self.conv(x)
+ x = F.relu(x)
+
+ return x
+
+
+class Decoder(nn.Layer):
+ def __init__(self, input_channels: int, output_channels: list = [64, 128, 256, 512]):
+ super().__init__()
+ self.deconv6 = nn.Conv2D(
+ input_channels, input_channels, kernel_size=1, bias_attr=False)
+ self.deconv5 = Up(input_channels, output_channels[-1])
+ self.deconv4 = Up(output_channels[-1], output_channels[-2])
+ self.deconv3 = Up(output_channels[-2], output_channels[-3])
+ self.deconv2 = Up(output_channels[-3], output_channels[-4])
+ self.deconv1 = Up(output_channels[-4], 64)
+
+ self.alpha_conv = nn.Conv2D(
+ 64, 1, kernel_size=5, padding=2, bias_attr=False)
+
+ def forward(self, fea_list: list, shape_list: list) -> paddle.Tensor:
+ x = fea_list[-1]
+ x = self.deconv6(x)
+ x = self.deconv5(x, fea_list[4], shape_list[4])
+ x = self.deconv4(x, fea_list[3], shape_list[3])
+ x = self.deconv3(x, fea_list[2], shape_list[2])
+ x = self.deconv2(x, fea_list[1], shape_list[1])
+ x = self.deconv1(x, fea_list[0], shape_list[0])
+ alpha = self.alpha_conv(x)
+ alpha = F.sigmoid(alpha)
+
+ return alpha
+
+
+class Refine(nn.Layer):
+ def __init__(self):
+ super().__init__()
+ self.conv1 = layers.ConvBNReLU(
+ 4, 64, kernel_size=3, padding=1, bias_attr=False)
+ self.conv2 = layers.ConvBNReLU(
+ 64, 64, kernel_size=3, padding=1, bias_attr=False)
+ self.conv3 = layers.ConvBNReLU(
+ 64, 64, kernel_size=3, padding=1, bias_attr=False)
+ self.alpha_pred = layers.ConvBNReLU(
+ 64, 1, kernel_size=3, padding=1, bias_attr=False)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.conv1(x)
+ x = self.conv2(x)
+ x = self.conv3(x)
+ alpha = self.alpha_pred(x)
+
+ return alpha
diff --git a/modules/image/matting/dim_vgg16_matting/processor.py b/modules/image/matting/dim_vgg16_matting/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..87e499c2960bb0e76ba6e498a2f00ca508ee19a6
--- /dev/null
+++ b/modules/image/matting/dim_vgg16_matting/processor.py
@@ -0,0 +1,220 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import random
+import base64
+from typing import Callable, Union, List, Tuple
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+from paddleseg.transforms import functional
+from PIL import Image
+
+
+class Compose:
+ """
+ Do transformation on input data with corresponding pre-processing and augmentation operations.
+ The shape of input data to all operations is [height, width, channels].
+ """
+
+ def __init__(self, transforms: Callable, to_rgb: bool = True):
+ if not isinstance(transforms, list):
+ raise TypeError('The transforms must be a list!')
+ self.transforms = transforms
+ self.to_rgb = to_rgb
+
+ def __call__(self, data: dict) -> dict:
+
+ if 'trans_info' not in data:
+ data['trans_info'] = []
+ for op in self.transforms:
+ data = op(data)
+ if data is None:
+ return None
+
+ data['img'] = np.transpose(data['img'], (2, 0, 1))
+ for key in data.get('gt_fields', []):
+ if len(data[key].shape) == 2:
+ continue
+ data[key] = np.transpose(data[key], (2, 0, 1))
+
+ return data
+
+
+class LoadImages:
+ """
+ Read images from image path.
+
+ Args:
+ to_rgb (bool, optional): If converting image to RGB color space. Default: True.
+ """
+ def __init__(self, to_rgb: bool = True):
+ self.to_rgb = to_rgb
+
+ def __call__(self, data: dict) -> dict:
+
+ if isinstance(data['img'], str):
+ data['img'] = cv2.imread(data['img'])
+
+ for key in data.get('gt_fields', []):
+ if isinstance(data[key], str):
+ data[key] = cv2.imread(data[key], cv2.IMREAD_UNCHANGED)
+ # if alpha and trimap has 3 channels, extract one.
+ if key in ['alpha', 'trimap']:
+ if len(data[key].shape) > 2:
+ data[key] = data[key][:, :, 0]
+
+ if self.to_rgb:
+ data['img'] = cv2.cvtColor(data['img'], cv2.COLOR_BGR2RGB)
+ for key in data.get('gt_fields', []):
+ if len(data[key].shape) == 2:
+ continue
+ data[key] = cv2.cvtColor(data[key], cv2.COLOR_BGR2RGB)
+
+ return data
+
+
+class LimitLong:
+ """
+ Limit the long edge of image.
+
+ If the long edge is larger than max_long, resize the long edge
+ to max_long, while scale the short edge proportionally.
+
+ If the long edge is smaller than min_long, resize the long edge
+ to min_long, while scale the short edge proportionally.
+
+ Args:
+ max_long (int, optional): If the long edge of image is larger than max_long,
+ it will be resize to max_long. Default: None.
+ min_long (int, optional): If the long edge of image is smaller than min_long,
+ it will be resize to min_long. Default: None.
+ """
+
+ def __init__(self, max_long=None, min_long=None):
+ if max_long is not None:
+ if not isinstance(max_long, int):
+ raise TypeError(
+ "Type of `max_long` is invalid. It should be int, but it is {}"
+ .format(type(max_long)))
+ if min_long is not None:
+ if not isinstance(min_long, int):
+ raise TypeError(
+ "Type of `min_long` is invalid. It should be int, but it is {}"
+ .format(type(min_long)))
+ if (max_long is not None) and (min_long is not None):
+ if min_long > max_long:
+ raise ValueError(
+ '`max_long should not smaller than min_long, but they are {} and {}'
+ .format(max_long, min_long))
+ self.max_long = max_long
+ self.min_long = min_long
+
+ def __call__(self, data):
+ h, w = data['img'].shape[:2]
+ long_edge = max(h, w)
+ target = long_edge
+ if (self.max_long is not None) and (long_edge > self.max_long):
+ target = self.max_long
+ elif (self.min_long is not None) and (long_edge < self.min_long):
+ target = self.min_long
+
+ if target != long_edge:
+ data['trans_info'].append(('resize', data['img'].shape[0:2]))
+ data['img'] = functional.resize_long(data['img'], target)
+ for key in data.get('gt_fields', []):
+ data[key] = functional.resize_long(data[key], target)
+
+ return data
+
+
+class Normalize:
+ """
+ Normalize an image.
+
+ Args:
+ mean (list, optional): The mean value of a data set. Default: [0.5, 0.5, 0.5].
+ std (list, optional): The standard deviation of a data set. Default: [0.5, 0.5, 0.5].
+
+ Raises:
+ ValueError: When mean/std is not list or any value in std is 0.
+ """
+
+ def __init__(self, mean: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5), std: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5)):
+ self.mean = mean
+ self.std = std
+ if not (isinstance(self.mean, (list, tuple))
+ and isinstance(self.std, (list, tuple))):
+ raise ValueError(
+ "{}: input type is invalid. It should be list or tuple".format(
+ self))
+ from functools import reduce
+ if reduce(lambda x, y: x * y, self.std) == 0:
+ raise ValueError('{}: std is invalid!'.format(self))
+
+ def __call__(self, data: dict) -> dict:
+ mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
+ std = np.array(self.std)[np.newaxis, np.newaxis, :]
+ data['img'] = functional.normalize(data['img'], mean, std)
+ if 'fg' in data.get('gt_fields', []):
+ data['fg'] = functional.normalize(data['fg'], mean, std)
+ if 'bg' in data.get('gt_fields', []):
+ data['bg'] = functional.normalize(data['bg'], mean, std)
+
+ return data
+
+
+def reverse_transform(alpha: paddle.Tensor, trans_info: List[str]):
+ """recover pred to origin shape"""
+ for item in trans_info[::-1]:
+ if item[0] == 'resize':
+ h, w = item[1][0], item[1][1]
+ alpha = F.interpolate(alpha, [h, w], mode='bilinear')
+ elif item[0] == 'padding':
+ h, w = item[1][0], item[1][1]
+ alpha = alpha[:, :, 0:h, 0:w]
+ else:
+ raise Exception("Unexpected info '{}' in im_info".format(item[0]))
+ return alpha
+
+def save_alpha_pred(alpha: np.ndarray, trimap: np.ndarray = None):
+ """
+ The value of alpha is range [0, 1], shape should be [h,w]
+ """
+ if isinstance(trimap, str):
+ trimap = cv2.imread(trimap, 0)
+ alpha[trimap == 0] = 0
+ alpha[trimap == 255] = 255
+ alpha = (alpha).astype('uint8')
+ return alpha
+
+
+def cv2_to_base64(image: np.ndarray):
+ """
+ Convert data from BGR to base64 format.
+ """
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+def base64_to_cv2(b64str: str):
+ """
+ Convert data from base64 to BGR format.
+ """
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
\ No newline at end of file
diff --git a/modules/image/matting/dim_vgg16_matting/requirements.py b/modules/image/matting/dim_vgg16_matting/requirements.py
new file mode 100644
index 0000000000000000000000000000000000000000..7df0ef23928361724c3fadb8d87d6a3be869e58b
--- /dev/null
+++ b/modules/image/matting/dim_vgg16_matting/requirements.py
@@ -0,0 +1 @@
+paddleseg >= 2.3.0
diff --git a/modules/image/matting/dim_vgg16_matting/vgg.py b/modules/image/matting/dim_vgg16_matting/vgg.py
new file mode 100644
index 0000000000000000000000000000000000000000..11cc9ccc51867996d2726522f0e2f1b156895cd7
--- /dev/null
+++ b/modules/image/matting/dim_vgg16_matting/vgg.py
@@ -0,0 +1,142 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import List, Tuple
+
+import paddle
+from paddle import ParamAttr
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn import Conv2D, BatchNorm, Linear, Dropout
+from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D
+
+from paddleseg.utils import utils
+
+
+class ConvBlock(nn.Layer):
+ def __init__(self, input_channels: int, output_channels: int, groups: int, name: str = None):
+ super(ConvBlock, self).__init__()
+
+ self.groups = groups
+ self._conv_1 = Conv2D(
+ in_channels=input_channels,
+ out_channels=output_channels,
+ kernel_size=3,
+ stride=1,
+ padding=1,
+ weight_attr=ParamAttr(name=name + "1_weights"),
+ bias_attr=False)
+ if groups == 2 or groups == 3 or groups == 4:
+ self._conv_2 = Conv2D(
+ in_channels=output_channels,
+ out_channels=output_channels,
+ kernel_size=3,
+ stride=1,
+ padding=1,
+ weight_attr=ParamAttr(name=name + "2_weights"),
+ bias_attr=False)
+ if groups == 3 or groups == 4:
+ self._conv_3 = Conv2D(
+ in_channels=output_channels,
+ out_channels=output_channels,
+ kernel_size=3,
+ stride=1,
+ padding=1,
+ weight_attr=ParamAttr(name=name + "3_weights"),
+ bias_attr=False)
+ if groups == 4:
+ self._conv_4 = Conv2D(
+ in_channels=output_channels,
+ out_channels=output_channels,
+ kernel_size=3,
+ stride=1,
+ padding=1,
+ weight_attr=ParamAttr(name=name + "4_weights"),
+ bias_attr=False)
+
+ self._pool = MaxPool2D(
+ kernel_size=2, stride=2, padding=0, return_mask=True)
+
+ def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+ x = self._conv_1(inputs)
+ x = F.relu(x)
+ if self.groups == 2 or self.groups == 3 or self.groups == 4:
+ x = self._conv_2(x)
+ x = F.relu(x)
+ if self.groups == 3 or self.groups == 4:
+ x = self._conv_3(x)
+ x = F.relu(x)
+ if self.groups == 4:
+ x = self._conv_4(x)
+ x = F.relu(x)
+ skip = x
+ x, max_indices = self._pool(x)
+ return x, max_indices, skip
+
+
+class VGGNet(nn.Layer):
+ def __init__(self, input_channels: int = 4, layers: int = 11, pretrained: str = None):
+ super(VGGNet, self).__init__()
+ self.pretrained = pretrained
+
+ self.layers = layers
+ self.vgg_configure = {
+ 11: [1, 1, 2, 2, 2],
+ 13: [2, 2, 2, 2, 2],
+ 16: [2, 2, 3, 3, 3],
+ 19: [2, 2, 4, 4, 4]
+ }
+ assert self.layers in self.vgg_configure.keys(), \
+ "supported layers are {} but input layer is {}".format(
+ self.vgg_configure.keys(), layers)
+ self.groups = self.vgg_configure[self.layers]
+
+ # matting的第一层卷积输入为4通道,初始化是直接初始化为0
+ self._conv_block_1 = ConvBlock(
+ input_channels, 64, self.groups[0], name="conv1_")
+ self._conv_block_2 = ConvBlock(64, 128, self.groups[1], name="conv2_")
+ self._conv_block_3 = ConvBlock(128, 256, self.groups[2], name="conv3_")
+ self._conv_block_4 = ConvBlock(256, 512, self.groups[3], name="conv4_")
+ self._conv_block_5 = ConvBlock(512, 512, self.groups[4], name="conv5_")
+
+ # 这一层的初始化需要利用vgg fc6的参数转换后进行初始化,可以暂时不考虑初始化
+ self._conv_6 = Conv2D(
+ 512, 512, kernel_size=3, padding=1, bias_attr=False)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ fea_list = []
+ ids_list = []
+ x, ids, skip = self._conv_block_1(inputs)
+ fea_list.append(skip)
+ ids_list.append(ids)
+ x, ids, skip = self._conv_block_2(x)
+ fea_list.append(skip)
+ ids_list.append(ids)
+ x, ids, skip = self._conv_block_3(x)
+ fea_list.append(skip)
+ ids_list.append(ids)
+ x, ids, skip = self._conv_block_4(x)
+ fea_list.append(skip)
+ ids_list.append(ids)
+ x, ids, skip = self._conv_block_5(x)
+ fea_list.append(skip)
+ ids_list.append(ids)
+ x = F.relu(self._conv_6(x))
+ fea_list.append(x)
+ return fea_list
+
+
+def VGG16(**args):
+ model = VGGNet(layers=16, **args)
+ return model
\ No newline at end of file
diff --git a/modules/image/matting/gfm_resnet34_matting/README.md b/modules/image/matting/gfm_resnet34_matting/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..7787fddc230c59995b48f4f1bc8065517d70069b
--- /dev/null
+++ b/modules/image/matting/gfm_resnet34_matting/README.md
@@ -0,0 +1,153 @@
+# gfm_resnet34_matting
+
+|模型名称|gfm_resnet34_matting|
+| :--- | :---: |
+|类别|图像-抠图|
+|网络|gfm_resnet34|
+|数据集|AM-2k|
+|是否支持Fine-tuning|否|
+|模型大小|562MB|
+|指标|SAD10.89|
+|最新更新日期|2021-12-03|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+
+ - 样例结果示例(左为原图,右为效果图):
+
+
+
+
+
+- ### 模型介绍
+
+ - Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。gfm_resnet34_matting可生成抠图结果。
+
+
+
+ - 更多详情请参考:[gfm_resnet34_matting](https://github.com/JizhiziLi/GFM)
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install gfm_resnet34_matting
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run gfm_resnet34_matting --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="gfm_resnet34_matting")
+ result = model.predict(["/PATH/TO/IMAGE"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ visualization,
+ save_path):
+ ```
+
+ - 动物matting预测API,用于将输入图片中的动物分割出来。
+
+ - 参数
+
+ - image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
+ - visualization (bool): 是否进行可视化,默认为False。
+ - save_path (str): 当visualization为True时,保存图片的路径,默认为"gfm_resnet34_matting_output"。
+
+ - 返回
+
+ - result (list(numpy.ndarray)):模型分割结果:
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署动物matting在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m gfm_resnet34_matting
+ ```
+
+ - 这样就完成了一个动物matting在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/gfm_resnet34_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
diff --git a/modules/image/matting/gfm_resnet34_matting/README_en.md b/modules/image/matting/gfm_resnet34_matting/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..c16a3657b47489845ac44fcadaf99baec55b676e
--- /dev/null
+++ b/modules/image/matting/gfm_resnet34_matting/README_en.md
@@ -0,0 +1,154 @@
+# gfm_resnet34_matting
+
+|Module Name|gfm_resnet34_matting|
+| :--- | :---: |
+|Category|Image Matting|
+|Network|gfm_resnet34|
+|Dataset|AM-2k|
+|Support Fine-tuning|No|
+|Module Size|562MB|
+|Data Indicators|SAD10.89|
+|Latest update date|2021-12-03|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
+
+
+
+ - For more information, please refer to: [gfm_resnet34_matting](https://github.com/JizhiziLi/GFM)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install gfm_resnet34_matting
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run gfm_resnet34_matting --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="gfm_resnet34_matting")
+ result = model.predict(["/PATH/TO/IMAGE"])
+ print(result)
+
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ visualization,
+ save_path):
+ ```
+
+ - Prediction API for matting.
+
+ - **Parameter**
+
+ - image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
+ - visualization (bool): Whether to save the recognition results as picture files, default is False.
+ - save_path (str): Save path of images, "modnet_mobilenetv2_matting_output" by default.
+
+ - **Return**
+
+ - result (list(numpy.ndarray)):The list of model results.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of matting.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m gfm_resnet34_matting
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/gfm_resnet34_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/matting/gfm_resnet34_matting/gfm.py b/modules/image/matting/gfm_resnet34_matting/gfm.py
new file mode 100644
index 0000000000000000000000000000000000000000..4b7306c2282467ec80bbf8f1c7540afb25a1b72f
--- /dev/null
+++ b/modules/image/matting/gfm_resnet34_matting/gfm.py
@@ -0,0 +1,447 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import Callable, Union, List, Tuple
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+from gfm_resnet34_matting.resnet import resnet34
+
+
+def conv3x3(in_planes: int, out_planes: int, stride: int = 1) -> Callable:
+ """3x3 convolution with padding"""
+ return nn.Conv2D(in_planes, out_planes, kernel_size=3, stride=stride,
+ padding=1, bias_attr=False)
+
+
+def conv_up_psp(in_channels: int, out_channels: int, up_sample: float) -> Callable:
+ return nn.Sequential(nn.Conv2D(in_channels, out_channels, 3, padding=1),
+ nn.BatchNorm2D(out_channels),
+ nn.ReLU(),
+ nn.Upsample(scale_factor=up_sample, mode='bilinear',align_corners = False))
+
+
+def build_bb(in_channels: int, mid_channels: int, out_channels: int) -> Callable:
+ return nn.Sequential(nn.Conv2D(in_channels, mid_channels, 3, dilation=2,
+ padding=2), nn.BatchNorm2D(mid_channels), nn.
+ ReLU(), nn.Conv2D(mid_channels, out_channels, 3,
+ dilation=2, padding=2), nn.BatchNorm2D(out_channels), nn.ReLU(), nn.Conv2D(out_channels,
+ out_channels, 3, dilation=2, padding=2), nn.BatchNorm2D(
+ out_channels), nn.ReLU())
+
+
+def build_decoder(in_channels: int, mid_channels_1: int, mid_channels_2: int, out_channels: int,
+ last_bnrelu: bool, upsample_flag: bool) -> Callable:
+ layers = []
+ layers += [nn.Conv2D(in_channels, mid_channels_1, 3, padding=1), nn.
+ BatchNorm2D(mid_channels_1), nn.ReLU(), nn.Conv2D(mid_channels_1, mid_channels_2, 3, padding=1), nn.
+ BatchNorm2D(mid_channels_2), nn.ReLU(), nn.Conv2D(mid_channels_2, out_channels, 3, padding=1)]
+ if last_bnrelu:
+ layers += [nn.BatchNorm2D(out_channels), nn.ReLU()]
+
+ if upsample_flag:
+ layers += [nn.Upsample(scale_factor=2, mode='bilinear')]
+
+ sequential = nn.Sequential(*layers)
+ return sequential
+
+
+class BasicBlock(nn.Layer):
+ expansion = 1
+ def __init__(self, inplanes: int, planes: int, stride: int = 1, downsample=None):
+ super(BasicBlock, self).__init__()
+ self.conv1 = conv3x3(inplanes, planes, stride)
+ self.bn1 = nn.BatchNorm2D(planes)
+ self.relu = nn.ReLU()
+ self.conv2 = conv3x3(planes, planes)
+ self.bn2 = nn.BatchNorm2D(planes)
+ self.downsample = downsample
+ self.stride = stride
+
+ def forward(self, x: paddle.Tensor) -> Callable:
+ residual = x
+ out = self.conv1(x)
+ out = self.bn1(out)
+ out = self.relu(out)
+ out = self.conv2(out)
+ out = self.bn2(out)
+ if self.downsample is not None:
+ residual = self.downsample(x)
+ out += residual
+ out = self.relu(out)
+ return out
+
+
+class PSPModule(nn.Layer):
+
+ def __init__(self, features: paddle.Tensor, out_features: int = 1024, sizes: List[int] = (1, 2, 3, 6)):
+ super().__init__()
+ #self.stages = []
+ self.stages = nn.LayerList([self._make_stage(features, size) for
+ size in sizes])
+ self.bottleneck = nn.Conv2D(features * (len(sizes) + 1),
+ out_features, kernel_size=1)
+ self.relu = nn.ReLU()
+
+ def _make_stage(self, features: paddle.Tensor, size: int) -> Callable:
+ prior = nn.AdaptiveAvgPool2D(output_size=(size, size))
+ conv = nn.Conv2D(features, features, kernel_size=1, bias_attr=False)
+ return nn.Sequential(prior, conv)
+
+ def forward(self, feats: paddle.Tensor) -> paddle.Tensor:
+ h, w = feats.shape[2], feats.shape[3]
+ priors = [F.upsample(stage(feats), size=(h, w), mode='bilinear',align_corners = True) for stage in self.stages] + [feats]
+ bottle = self.bottleneck(paddle.concat(priors, 1))
+ return self.relu(bottle)
+
+
+class SELayer(nn.Layer):
+
+ def __init__(self, channel: int, reduction: int = 4):
+ super(SELayer, self).__init__()
+ self.avg_pool = nn.AdaptiveAvgPool2D(1)
+ self.fc = nn.Sequential(nn.Linear(channel, channel // reduction,
+ bias_attr=False), nn.ReLU(), nn.
+ Linear(channel // reduction, channel, bias_attr=False), nn.
+ Sigmoid())
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ b, c, _, _ = x.size()
+ y = self.avg_pool(x).view(b, c)
+ y = self.fc(y).view(b, c, 1, 1)
+ return x * y.expand_as(x)
+
+
+class GFM(nn.Layer):
+ """
+ The GFM implementation based on PaddlePaddle.
+
+ The original article refers to:
+ Bridging Composite and Real: Towards End-to-end Deep Image Matting [IJCV-2021]
+ Main network file (GFM).
+
+ Copyright (c) 2021, Jizhizi Li (jili8515@uni.sydney.edu.au)
+ Licensed under the MIT License (see LICENSE for details)
+ Github repo: https://github.com/JizhiziLi/GFM
+ Paper link (Arxiv): https://arxiv.org/abs/2010.16188
+
+ """
+
+ def __init__(self):
+ super().__init__()
+ self.backbone = 'r34_2b'
+ self.rosta = 'TT'
+ if self.rosta == 'TT':
+ self.gd_channel = 3
+ else:
+ self.gd_channel = 2
+ if self.backbone == 'r34_2b':
+ self.resnet = resnet34()
+ self.encoder0 = nn.Sequential(nn.Conv2D(3, 64, 3, padding=1),
+ nn.BatchNorm2D(64), nn.ReLU())
+ self.encoder1 = self.resnet.layer1
+ self.encoder2 = self.resnet.layer2
+ self.encoder3 = self.resnet.layer3
+ self.encoder4 = self.resnet.layer4
+ self.encoder5 = nn.Sequential(nn.MaxPool2D(2, 2, ceil_mode=True
+ ), BasicBlock(512, 512), BasicBlock(512, 512), BasicBlock(
+ 512, 512))
+ self.encoder6 = nn.Sequential(nn.MaxPool2D(2, 2, ceil_mode=True
+ ), BasicBlock(512, 512), BasicBlock(512, 512), BasicBlock(
+ 512, 512))
+ self.psp_module = PSPModule(512, 512, (1, 3, 5))
+ self.psp6 = conv_up_psp(512, 512, 2)
+ self.psp5 = conv_up_psp(512, 512, 4)
+ self.psp4 = conv_up_psp(512, 256, 8)
+ self.psp3 = conv_up_psp(512, 128, 16)
+ self.psp2 = conv_up_psp(512, 64, 32)
+ self.psp1 = conv_up_psp(512, 64, 32)
+ self.decoder6_g = build_decoder(1024, 512, 512, 512, True, True)
+ self.decoder5_g = build_decoder(1024, 512, 512, 512, True, True)
+ self.decoder4_g = build_decoder(1024, 512, 512, 256, True, True)
+ self.decoder3_g = build_decoder(512, 256, 256, 128, True, True)
+ self.decoder2_g = build_decoder(256, 128, 128, 64, True, True)
+ self.decoder1_g = build_decoder(128, 64, 64, 64, True, False)
+ self.bridge_block = build_bb(512, 512, 512)
+ self.decoder6_f = build_decoder(1024, 512, 512, 512, True, True)
+ self.decoder5_f = build_decoder(1024, 512, 512, 512, True, True)
+ self.decoder4_f = build_decoder(1024, 512, 512, 256, True, True)
+ self.decoder3_f = build_decoder(512, 256, 256, 128, True, True)
+ self.decoder2_f = build_decoder(256, 128, 128, 64, True, True)
+ self.decoder1_f = build_decoder(128, 64, 64, 64, True, False)
+ if self.rosta == 'RIM':
+ self.decoder0_g_tt = nn.Sequential(nn.Conv2D(64, 3, 3,
+ padding=1))
+ self.decoder0_g_ft = nn.Sequential(nn.Conv2D(64, 2, 3,
+ padding=1))
+ self.decoder0_g_bt = nn.Sequential(nn.Conv2D(64, 2, 3,
+ padding=1))
+ self.decoder0_f_tt = nn.Sequential(nn.Conv2D(64, 1, 3,
+ padding=1))
+ self.decoder0_f_ft = nn.Sequential(nn.Conv2D(64, 1, 3,
+ padding=1))
+ self.decoder0_f_bt = nn.Sequential(nn.Conv2D(64, 1, 3,
+ padding=1))
+ else:
+ self.decoder0_g = nn.Sequential(nn.Conv2D(64, self.
+ gd_channel, 3, padding=1))
+ self.decoder0_f = nn.Sequential(nn.Conv2D(64, 1, 3, padding=1))
+ if self.backbone == 'r34':
+ self.encoder0 = nn.Sequential(self.resnet.conv1, self.resnet.
+ bn1, self.resnet.relu)
+
+ self.encoder1 = nn.Sequential(self.resnet.maxpool, self.resnet.
+ layer1)
+ self.encoder2 = self.resnet.layer2
+ self.encoder3 = self.resnet.layer3
+ self.encoder4 = self.resnet.layer4
+ self.psp_module = PSPModule(512, 512, (1, 3, 5))
+ self.psp4 = conv_up_psp(512, 256, 2)
+ self.psp3 = conv_up_psp(512, 128, 4)
+ self.psp2 = conv_up_psp(512, 64, 8)
+ self.psp1 = conv_up_psp(512, 64, 16)
+ self.decoder4_g = build_decoder(1024, 512, 512, 256, True, True)
+ self.decoder3_g = build_decoder(512, 256, 256, 128, True, True)
+ self.decoder2_g = build_decoder(256, 128, 128, 64, True, True)
+ self.decoder1_g = build_decoder(128, 64, 64, 64, True, True)
+ self.bridge_block = build_bb(512, 512, 512)
+ self.decoder4_f = build_decoder(1024, 512, 512, 256, True, True)
+ self.decoder3_f = build_decoder(512, 256, 256, 128, True, True)
+ self.decoder2_f = build_decoder(256, 128, 128, 64, True, True)
+ self.decoder1_f = build_decoder(128, 64, 64, 64, True, True)
+ if self.rosta == 'RIM':
+ self.decoder0_g_tt = build_decoder(128, 64, 64, 3, False, True)
+ self.decoder0_g_ft = build_decoder(128, 64, 64, 2, False, True)
+ self.decoder0_g_bt = build_decoder(128, 64, 64, 2, False, True)
+ self.decoder0_f_tt = build_decoder(128, 64, 64, 1, False, True)
+ self.decoder0_f_ft = build_decoder(128, 64, 64, 1, False, True)
+ self.decoder0_f_bt = build_decoder(128, 64, 64, 1, False, True)
+ else:
+ self.decoder0_g = build_decoder(128, 64, 64, self.
+ gd_channel, False, True)
+ self.decoder0_f = build_decoder(128, 64, 64, 1, False, True)
+ elif self.backbone == 'r101':
+ self.encoder0 = nn.Sequential(self.resnet.conv1, self.resnet.
+ bn1, self.resnet.relu)
+ self.encoder1 = nn.Sequential(self.resnet.maxpool, self.resnet.
+ layer1)
+ self.encoder2 = self.resnet.layer2
+ self.encoder3 = self.resnet.layer3
+ self.encoder4 = self.resnet.layer4
+ self.psp_module = PSPModule(2048, 2048, (1, 3, 5))
+ self.bridge_block = build_bb(2048, 2048, 2048)
+ self.psp4 = conv_up_psp(2048, 1024, 2)
+ self.psp3 = conv_up_psp(2048, 512, 4)
+ self.psp2 = conv_up_psp(2048, 256, 8)
+ self.psp1 = conv_up_psp(2048, 64, 16)
+ self.decoder4_g = build_decoder(4096, 2048, 1024, 1024, True, True)
+ self.decoder3_g = build_decoder(2048, 1024, 512, 512, True, True)
+ self.decoder2_g = build_decoder(1024, 512, 256, 256, True, True)
+ self.decoder1_g = build_decoder(512, 256, 128, 64, True, True)
+ self.decoder4_f = build_decoder(4096, 2048, 1024, 1024, True, True)
+ self.decoder3_f = build_decoder(2048, 1024, 512, 512, True, True)
+ self.decoder2_f = build_decoder(1024, 512, 256, 256, True, True)
+ self.decoder1_f = build_decoder(512, 256, 128, 64, True, True)
+ if self.rosta == 'RIM':
+ self.decoder0_g_tt = build_decoder(128, 64, 64, 3, False, True)
+ self.decoder0_g_ft = build_decoder(128, 64, 64, 2, False, True)
+ self.decoder0_g_bt = build_decoder(128, 64, 64, 2, False, True)
+ self.decoder0_f_tt = build_decoder(128, 64, 64, 1, False, True)
+ self.decoder0_f_ft = build_decoder(128, 64, 64, 1, False, True)
+ self.decoder0_f_bt = build_decoder(128, 64, 64, 1, False, True)
+ else:
+ self.decoder0_g = build_decoder(128, 64, 64, self.
+ gd_channel, False, True)
+ self.decoder0_f = build_decoder(128, 64, 64, 1, False, True)
+ elif self.backbone == 'd121':
+ self.encoder0 = nn.Sequential(self.densenet.features.conv0,
+ self.densenet.features.norm0, self.densenet.features.relu0)
+ self.encoder1 = nn.Sequential(self.densenet.features.
+ denseblock1, self.densenet.features.transition1)
+ self.encoder2 = nn.Sequential(self.densenet.features.
+ denseblock2, self.densenet.features.transition2)
+ self.encoder3 = nn.Sequential(self.densenet.features.
+ denseblock3, self.densenet.features.transition3)
+ self.encoder4 = nn.Sequential(self.densenet.features.
+ denseblock4, nn.Conv2D(1024, 512, 3, padding=1), nn.
+ BatchNorm2D(512), nn.ReLU(),
+ nn.MaxPool2D(2, 2, ceil_mode=True))
+ self.psp_module = PSPModule(512, 512, (1, 3, 5))
+ self.psp4 = conv_up_psp(512, 256, 2)
+ self.psp3 = conv_up_psp(512, 128, 4)
+ self.psp2 = conv_up_psp(512, 64, 8)
+ self.psp1 = conv_up_psp(512, 64, 16)
+ self.decoder4_g = build_decoder(1024, 512, 512, 256, True, True)
+ self.decoder3_g = build_decoder(512, 256, 256, 128, True, True)
+ self.decoder2_g = build_decoder(256, 128, 128, 64, True, True)
+ self.decoder1_g = build_decoder(128, 64, 64, 64, True, True)
+ self.bridge_block = build_bb(512, 512, 512)
+ self.decoder4_f = build_decoder(1024, 512, 512, 256, True, True)
+ self.decoder3_f = build_decoder(768, 256, 256, 128, True, True)
+ self.decoder2_f = build_decoder(384, 128, 128, 64, True, True)
+ self.decoder1_f = build_decoder(192, 64, 64, 64, True, True)
+ if self.rosta == 'RIM':
+ self.decoder0_g_tt = build_decoder(128, 64, 64, 3, False, True)
+ self.decoder0_g_ft = build_decoder(128, 64, 64, 2, False, True)
+ self.decoder0_g_bt = build_decoder(128, 64, 64, 2, False, True)
+ self.decoder0_f_tt = build_decoder(128, 64, 64, 1, False, True)
+ self.decoder0_f_ft = build_decoder(128, 64, 64, 1, False, True)
+ self.decoder0_f_bt = build_decoder(128, 64, 64, 1, False, True)
+ else:
+ self.decoder0_g = build_decoder(128, 64, 64, self.
+ gd_channel, False, True)
+ self.decoder0_f = build_decoder(128, 64, 64, 1, False, True)
+ if self.rosta == 'RIM':
+ self.rim = nn.Sequential(nn.Conv2D(3, 16, 1), SELayer(16), nn.
+ Conv2D(16, 1, 1))
+
+ def forward(self, input: paddle.Tensor) -> List[paddle.Tensor]:
+ glance_sigmoid = paddle.zeros(input.shape)
+ glance_sigmoid.stop_gradient = True
+ focus_sigmoid = paddle.zeros(input.shape)
+ focus_sigmoid.stop_gradient = True
+ fusion_sigmoid = paddle.zeros(input.shape)
+ fusion_sigmoid.stop_gradient = True
+ e0 = self.encoder0(input)
+ e1 = self.encoder1(e0)
+ e2 = self.encoder2(e1)
+ e3 = self.encoder3(e2)
+ e4 = self.encoder4(e3)
+ if self.backbone == 'r34_2b':
+ e5 = self.encoder5(e4)
+ e6 = self.encoder6(e5)
+ psp = self.psp_module(e6)
+ d6_g = self.decoder6_g(paddle.concat((psp, e6), 1))
+ d5_g = self.decoder5_g(paddle.concat((self.psp6(psp),
+ d6_g), 1))
+ d4_g = self.decoder4_g(paddle.concat((self.psp5(psp),
+ d5_g), 1))
+ else:
+ psp = self.psp_module(e4)
+ d4_g = self.decoder4_g(paddle.concat((psp, e4), 1))
+ d3_g = self.decoder3_g(paddle.concat((self.psp4(psp), d4_g), 1))
+ d2_g = self.decoder2_g(paddle.concat((self.psp3(psp), d3_g), 1))
+ d1_g = self.decoder1_g(paddle.concat((self.psp2(psp), d2_g), 1))
+ if self.backbone == 'r34_2b':
+ if self.rosta == 'RIM':
+ d0_g_tt = self.decoder0_g_tt(d1_g)
+ d0_g_ft = self.decoder0_g_ft(d1_g)
+ d0_g_bt = self.decoder0_g_bt(d1_g)
+ else:
+ d0_g = self.decoder0_g(d1_g)
+ elif self.rosta == 'RIM':
+ d0_g_tt = self.decoder0_g_tt(paddle.concat((self.psp1(psp
+ ), d1_g), 1))
+ d0_g_ft = self.decoder0_g_ft(paddle.concat((self.psp1(psp
+ ), d1_g), 1))
+ d0_g_bt = self.decoder0_g_bt(paddle.concat((self.psp1(psp
+ ), d1_g), 1))
+ else:
+ d0_g = self.decoder0_g(paddle.concat((self.psp1(psp),
+ d1_g), 1))
+ if self.rosta == 'RIM':
+ glance_sigmoid_tt = F.sigmoid(d0_g_tt)
+ glance_sigmoid_ft = F.sigmoid(d0_g_ft)
+ glance_sigmoid_bt = F.sigmoid(d0_g_bt)
+ else:
+ glance_sigmoid = F.sigmoid(d0_g)
+ if self.backbone == 'r34_2b':
+ bb = self.bridge_block(e6)
+ d6_f = self.decoder6_f(paddle.concat((bb, e6), 1))
+ d5_f = self.decoder5_f(paddle.concat((d6_f, e5), 1))
+ d4_f = self.decoder4_f(paddle.concat((d5_f, e4), 1))
+ else:
+ bb = self.bridge_block(e4)
+ d4_f = self.decoder4_f(paddle.concat((bb, e4), 1))
+ d3_f = self.decoder3_f(paddle.concat((d4_f, e3), 1))
+ d2_f = self.decoder2_f(paddle.concat((d3_f, e2), 1))
+ d1_f = self.decoder1_f(paddle.concat((d2_f, e1), 1))
+ if self.backbone == 'r34_2b':
+ if self.rosta == 'RIM':
+ d0_f_tt = self.decoder0_f_tt(d1_f)
+ d0_f_ft = self.decoder0_f_ft(d1_f)
+ d0_f_bt = self.decoder0_f_bt(d1_f)
+ else:
+ d0_f = self.decoder0_f(d1_f)
+ elif self.rosta == 'RIM':
+ d0_f_tt = self.decoder0_f_tt(paddle.concat((d1_f, e0), 1))
+ d0_f_ft = self.decoder0_f_ft(paddle.concat((d1_f, e0), 1))
+ d0_f_bt = self.decoder0_f_bt(paddle.concat((d1_f, e0), 1))
+ else:
+ d0_f = self.decoder0_f(paddle.concat((d1_f, e0), 1))
+ if self.rosta == 'RIM':
+ focus_sigmoid_tt = F.sigmoid(d0_f_tt)
+ focus_sigmoid_ft = F.sigmoid(d0_f_ft)
+ focus_sigmoid_bt = F.sigmoid(d0_f_bt)
+ else:
+ focus_sigmoid = F.sigmoid(d0_f)
+ if self.rosta == 'RIM':
+ fusion_sigmoid_tt = collaborative_matting('TT',
+ glance_sigmoid_tt, focus_sigmoid_tt)
+ fusion_sigmoid_ft = collaborative_matting('FT',
+ glance_sigmoid_ft, focus_sigmoid_ft)
+ fusion_sigmoid_bt = collaborative_matting('BT',
+ glance_sigmoid_bt, focus_sigmoid_bt)
+ fusion_sigmoid = paddle.concat((fusion_sigmoid_tt,
+ fusion_sigmoid_ft, fusion_sigmoid_bt), 1)
+ fusion_sigmoid = self.rim(fusion_sigmoid)
+ return [[glance_sigmoid_tt, focus_sigmoid_tt, fusion_sigmoid_tt
+ ], [glance_sigmoid_ft, focus_sigmoid_ft, fusion_sigmoid_ft],
+ [glance_sigmoid_bt, focus_sigmoid_bt, fusion_sigmoid_bt],
+ fusion_sigmoid]
+ else:
+ fusion_sigmoid = collaborative_matting(self.rosta,
+ glance_sigmoid, focus_sigmoid)
+ return glance_sigmoid, focus_sigmoid, fusion_sigmoid
+
+
+def collaborative_matting(rosta, glance_sigmoid, focus_sigmoid):
+ if rosta == 'TT':
+ values = paddle.max(glance_sigmoid, axis=1)
+ index = paddle.argmax(glance_sigmoid, axis=1)
+ index = index[:, None, :, :].float()
+ bg_mask = index.clone()
+ bg_mask[bg_mask == 2] = 1
+ bg_mask = 1 - bg_mask
+ trimap_mask = index.clone()
+ trimap_mask[trimap_mask == 2] = 0
+ fg_mask = index.clone()
+ fg_mask[fg_mask == 1] = 0
+ fg_mask[fg_mask == 2] = 1
+ focus_sigmoid = focus_sigmoid.cpu()
+ trimap_mask = trimap_mask.cpu()
+ fg_mask = fg_mask.cpu()
+ fusion_sigmoid = focus_sigmoid * trimap_mask + fg_mask
+ elif rosta == 'BT':
+ values = paddle.max(glance_sigmoid, axis=1)
+ index = paddle.argmax(glance_sigmoid, axis=1)
+ index = index[:, None, :, :].float()
+ fusion_sigmoid = index - focus_sigmoid
+ fusion_sigmoid[fusion_sigmoid < 0] = 0
+ else:
+ values = paddle.max(glance_sigmoid, axis=1)
+ index = paddle.argmax(glance_sigmoid, axis=1)
+ index = index[:, None, :, :].float()
+ fusion_sigmoid = index + focus_sigmoid
+ fusion_sigmoid[fusion_sigmoid > 1] = 1
+ return fusion_sigmoid
+
+
+if __name__ == "__main__":
+ model = GFM()
+ x = paddle.ones([1,3, 256,256])
+ result = model(x)
+ print(x)
\ No newline at end of file
diff --git a/modules/image/matting/gfm_resnet34_matting/module.py b/modules/image/matting/gfm_resnet34_matting/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..f78082fc46da8dadc569ab1db0b78011e4b80bc7
--- /dev/null
+++ b/modules/image/matting/gfm_resnet34_matting/module.py
@@ -0,0 +1,176 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import os
+import time
+import argparse
+from typing import Callable, Union, List, Tuple
+
+from PIL import Image
+import numpy as np
+import cv2
+import scipy
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.transforms as T
+from paddlehub.module.module import moduleinfo, runnable, serving
+from skimage.transform import resize
+
+from gfm_resnet34_matting.gfm import GFM
+import gfm_resnet34_matting.processor as P
+
+
+@moduleinfo(
+ name="gfm_resnet34_matting",
+ type="CV/matting",
+ author="paddlepaddle",
+ author_email="",
+ summary="gfm_resnet34_matting is an animal matting model.",
+ version="1.0.0")
+class GFMResNet34(nn.Layer):
+ """
+ The GFM implementation based on PaddlePaddle.
+
+ The original article refers to:
+ Bridging Composite and Real: Towards End-to-end Deep Image Matting [IJCV-2021]
+ Main network file (GFM).
+
+ Github repo: https://github.com/JizhiziLi/GFM
+ Paper link (Arxiv): https://arxiv.org/abs/2010.16188
+ """
+
+ def __init__(self, pretrained: str=None):
+ super(GFMResNet34, self).__init__()
+
+ self.model = GFM()
+ self.resize_by_short = P.ResizeByShort(1080)
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.model.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.model.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def preprocess(self, img: Union[str, np.ndarray], h: int, w: int) -> paddle.Tensor:
+ if min(h, w) > 1080:
+ img = self.resize_by_short(img)
+ tensor_img = self.scale_image(img, h, w)
+ return tensor_img
+
+ def scale_image(self, img: np.ndarray, h: int, w: int, ratio: float = 1/3):
+ new_h = min(1600, h - (h % 32))
+ new_w = min(1600, w - (w % 32))
+ resize_h = int(h*ratio)
+ resize_w = int(w*ratio)
+ new_h = min(1600, resize_h - (resize_h % 32))
+ new_w = min(1600, resize_w - (resize_w % 32))
+
+ scale_img = resize(img,(new_h,new_w)) * 255
+ tensor_img = paddle.to_tensor(scale_img.astype(np.float32)[np.newaxis, :, :, :])
+ tensor_img = tensor_img.transpose([0,3,1,2])
+ return tensor_img
+
+
+ def inference_img_scale(self, input: paddle.Tensor) -> List[paddle.Tensor]:
+ pred_global, pred_local, pred_fusion = self.model(input)
+ pred_global = P.gen_trimap_from_segmap_e2e(pred_global)
+ pred_local = pred_local.numpy()[0,0,:,:]
+ pred_fusion = pred_fusion.numpy()[0,0,:,:]
+ return pred_global, pred_local, pred_fusion
+
+
+ def predict(self, image_list: list, visualization: bool =True, save_path: str = "gfm_resnet34_matting_output"):
+ self.model.eval()
+ result = []
+ with paddle.no_grad():
+ for i, img in enumerate(image_list):
+ if isinstance(img, str):
+ img = np.array(Image.open(img))[:,:,:3]
+ else:
+ img = img[:,:,::-1]
+ h, w, _ = img.shape
+ tensor_img = self.preprocess(img, h, w)
+ pred_glance_1, pred_focus_1, pred_fusion_1 = self.inference_img_scale(tensor_img)
+ pred_glance_1 = resize(pred_glance_1,(h,w)) * 255.0
+ tensor_img = self.scale_image(img, h, w, 1/2)
+ pred_glance_2, pred_focus_2, pred_fusion_2 = self.inference_img_scale(tensor_img)
+ pred_focus_2 = resize(pred_focus_2,(h,w))
+ pred_fusion = P.get_masked_local_from_global_test(pred_glance_1, pred_focus_2)
+ pred_fusion = (pred_fusion * 255).astype(np.uint8)
+ if visualization:
+ if not os.path.exists(save_path):
+ os.makedirs(save_path)
+ img_name = str(time.time()) + '.png'
+ image_save_path = os.path.join(save_path, img_name)
+ cv2.imwrite(image_save_path, pred_fusion)
+ result.append(pred_fusion)
+ return result
+
+ @serving
+ def serving_method(self, images: str, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [P.base64_to_cv2(image) for image in images]
+ outputs = self.predict(image_list=images_decode, **kwargs)
+ serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
+ results = {'data': serving_data}
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ args = self.parser.parse_args(argvs)
+
+ results = self.predict(image_list=[args.input_path], save_path=args.output_dir, visualization=args.visualization)
+
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default="gfm_resnet34_matting_output", help="The directory to save output images.")
+ self.arg_config_group.add_argument(
+ '--visualization', type=bool, default=True, help="whether to save output as images.")
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
+
diff --git a/modules/image/matting/gfm_resnet34_matting/processor.py b/modules/image/matting/gfm_resnet34_matting/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..52969d0229111d4cc60ccc02d0d6e39a09231e95
--- /dev/null
+++ b/modules/image/matting/gfm_resnet34_matting/processor.py
@@ -0,0 +1,84 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import base64
+
+import cv2
+import numpy as np
+from paddleseg.transforms import functional
+
+
+class ResizeByLong:
+ """
+ Resize the long side of an image to given size, and then scale the other side proportionally.
+
+ Args:
+ long_size (int): The target size of long side.
+ """
+
+ def __init__(self, long_size):
+ self.long_size = long_size
+
+ def __call__(self, data):
+ data = functional.resize_long(data, self.long_size)
+ return data
+
+
+class ResizeByShort:
+ """
+ Resize the short side of an image to given size, and then scale the other side proportionally.
+
+ Args:
+ short_size (int): The target size of short side.
+ """
+
+ def __init__(self, short_size):
+ self.short_size = short_size
+
+ def __call__(self, data):
+
+ data = functional.resize_short(data, self.short_size)
+
+ return data
+
+def gen_trimap_from_segmap_e2e(segmap):
+ trimap = np.argmax(segmap, axis=1)[0]
+ trimap = trimap.astype(np.int64)
+ trimap[trimap==1]=128
+ trimap[trimap==2]=255
+ return trimap.astype(np.uint8)
+
+def get_masked_local_from_global_test(global_result, local_result):
+ weighted_global = np.ones(global_result.shape)
+ weighted_global[global_result==255] = 0
+ weighted_global[global_result==0] = 0
+ fusion_result = global_result*(1.-weighted_global)/255+local_result*weighted_global
+ return fusion_result
+
+def cv2_to_base64(image: np.ndarray):
+ """
+ Convert data from BGR to base64 format.
+ """
+ data = cv2.imencode('.png', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+def base64_to_cv2(b64str: str):
+ """
+ Convert data from base64 to BGR format.
+ """
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
\ No newline at end of file
diff --git a/modules/image/matting/gfm_resnet34_matting/resnet.py b/modules/image/matting/gfm_resnet34_matting/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..5d2ec70cb6ccd419cdc7725cf35eb267df25dca9
--- /dev/null
+++ b/modules/image/matting/gfm_resnet34_matting/resnet.py
@@ -0,0 +1,201 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+from typing import Type, Any, Callable, Union, List, Optional
+
+
+def conv3x3(in_planes: int, out_planes: int, stride: int=1, groups: int=1,
+ dilation: int=1) ->paddle.nn.Conv2D:
+ """3x3 convolution with padding"""
+ return nn.Conv2D(in_planes, out_planes, kernel_size=3, stride=stride,
+ padding=dilation, groups=groups, dilation=dilation, bias_attr=False)
+
+
+def conv1x1(in_planes: int, out_planes: int, stride: int=1) ->paddle.nn.Conv2D:
+ """1x1 convolution"""
+ return nn.Conv2D(in_planes, out_planes, kernel_size=1, stride=stride,
+ bias_attr=False)
+
+
+class BasicBlock(nn.Layer):
+ expansion: int = 1
+
+ def __init__(self, inplanes: int, planes: int, stride: int=1,
+ downsample: Optional[nn.Layer]=None, groups: int=1, base_width:
+ int=64, dilation: int=1, norm_layer: Optional[Callable[..., paddle.
+ nn.Layer]]=None) ->None:
+ super(BasicBlock, self).__init__()
+ if norm_layer is None:
+ norm_layer = nn.BatchNorm2D
+ if groups != 1 or base_width != 64:
+ raise ValueError(
+ 'BasicBlock only supports groups=1 and base_width=64')
+ if dilation > 1:
+ raise NotImplementedError(
+ 'Dilation > 1 not supported in BasicBlock')
+ self.conv1 = conv3x3(inplanes, planes, stride)
+ self.bn1 = norm_layer(planes)
+ self.relu = paddle.nn.ReLU()
+ self.conv2 = conv3x3(planes, planes)
+ self.bn2 = norm_layer(planes)
+ self.downsample = downsample
+ self.stride = stride
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ identity = x
+ out = self.conv1(x)
+ out = self.bn1(out)
+ out = self.relu(out)
+ out = self.conv2(out)
+ out = self.bn2(out)
+ if self.downsample is not None:
+ identity = self.downsample(x)
+ out += identity
+ out = self.relu(out)
+ return out
+
+
+class Bottleneck(nn.Layer):
+ expansion: int = 4
+
+ def __init__(self, inplanes: int, planes: int, stride: int=1,
+ downsample: Optional[nn.Layer]=None, groups: int=1, base_width:
+ int=64, dilation: int=1, norm_layer: Optional[Callable[..., paddle.
+ nn.Layer]]=None) ->None:
+ super(Bottleneck, self).__init__()
+ if norm_layer is None:
+ norm_layer = nn.BatchNorm2D
+ width = int(planes * (base_width / 64.0)) * groups
+ self.conv1 = conv1x1(inplanes, width)
+ self.bn1 = norm_layer(width)
+ self.conv2 = conv3x3(width, width, stride, groups, dilation)
+ self.bn2 = norm_layer(width)
+ self.conv3 = conv1x1(width, planes * self.expansion)
+ self.bn3 = norm_layer(planes * self.expansion)
+ self.relu = paddle.nn.ReLU()
+ self.downsample = downsample
+ self.stride = stride
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ identity = x
+ out = self.conv1(x)
+ out = self.bn1(out)
+ out = self.relu(out)
+ out = self.conv2(out)
+ out = self.bn2(out)
+ out = self.relu(out)
+ out = self.conv3(out)
+ out = self.bn3(out)
+ if self.downsample is not None:
+ identity = self.downsample(x)
+ out += identity
+ out = self.relu(out)
+ return out
+
+
+class ResNet(nn.Layer):
+
+ def __init__(self, block: Type[Union[BasicBlock, Bottleneck]], layers:
+ List[int], num_classes: int=1000, zero_init_residual: bool=False,
+ groups: int=1, width_per_group: int=64,
+ replace_stride_with_dilation: Optional[List[bool]]=None, norm_layer:
+ Optional[Callable[..., paddle.nn.Layer]]=None) ->None:
+ super(ResNet, self).__init__()
+ if norm_layer is None:
+ norm_layer = nn.BatchNorm2D
+ self._norm_layer = norm_layer
+ self.inplanes = 64
+ self.dilation = 1
+ if replace_stride_with_dilation is None:
+ replace_stride_with_dilation = [False, False, False]
+ if len(replace_stride_with_dilation) != 3:
+ raise ValueError(
+ 'replace_stride_with_dilation should be None or a 3-element tuple, got {}'
+ .format(replace_stride_with_dilation))
+ self.groups = groups
+ self.base_width = width_per_group
+ self.conv1 = nn.Conv2D(3, self.inplanes, kernel_size=7, stride=2,
+ padding=3, bias_attr=False)
+ self.bn1 = norm_layer(self.inplanes)
+ self.relu = paddle.nn.ReLU()
+ self.maxpool = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+ self.layer1 = self._make_layer(block, 64, layers[0])
+ self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
+ dilate=replace_stride_with_dilation[0])
+ self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
+ dilate=replace_stride_with_dilation[1])
+ self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
+ dilate=replace_stride_with_dilation[2])
+ self.avgpool = nn.AdaptiveAvgPool2D((1, 1))
+ self.fc = nn.Linear(512 * block.expansion, num_classes)
+
+ def _make_layer(self, block: Type[Union[BasicBlock, Bottleneck]],
+ planes: int, blocks: int, stride: int=1, dilate: bool=False
+ ) ->paddle.nn.Sequential:
+ norm_layer = self._norm_layer
+ downsample = None
+ previous_dilation = self.dilation
+ if dilate:
+ self.dilation *= stride
+ stride = 1
+ if stride != 1 or self.inplanes != planes * block.expansion:
+ downsample = nn.Sequential(conv1x1(self.inplanes, planes *
+ block.expansion, stride), norm_layer(planes * block.expansion))
+ layers = []
+ layers.append(block(self.inplanes, planes, stride, downsample, self
+ .groups, self.base_width, previous_dilation, norm_layer))
+ self.inplanes = planes * block.expansion
+ for _ in range(1, blocks):
+ layers.append(block(self.inplanes, planes, groups=self.groups,
+ base_width=self.base_width, dilation=self.dilation,
+ norm_layer=norm_layer))
+ return nn.Sequential(*layers)
+
+ def _forward_impl(self, x: paddle.Tensor) ->paddle.Tensor:
+ x = self.conv1(x)
+ x = self.bn1(x)
+ x = self.relu(x)
+ x = self.maxpool(x)
+ x = self.layer1(x)
+ x = self.layer2(x)
+ x = self.layer3(x)
+ x = self.layer4(x)
+ x = self.avgpool(x)
+ x= paddle.flatten(x,1)
+ x = self.fc(x)
+ return x
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ return self._forward_impl(x)
+
+
+def _resnet(arch: str, block: Type[Union[BasicBlock, Bottleneck]], layers:
+ List[int], pretrained: bool, progress: bool, **kwargs: Any) ->ResNet:
+ model = ResNet(block, layers, **kwargs)
+ return model
+
+
+def resnet34(pretrained: bool=False, progress: bool=True, **kwargs: Any
+ ) ->ResNet:
+ """ResNet-34 model from
+ `"Deep Residual Learning for Image Recognition"
`_.
+
+ Args:
+ pretrained (bool): If True, returns a model pre-trained on ImageNet
+ progress (bool): If True, displays a progress bar of the download to stderr
+ """
+ return _resnet('resnet34', BasicBlock, [3, 4, 6, 3], pretrained,
+ progress, **kwargs)
diff --git a/modules/image/matting/modnet_hrnet18_matting/README.md b/modules/image/matting/modnet_hrnet18_matting/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..704635055d6b00a81806987bbd9cd487f09e50b0
--- /dev/null
+++ b/modules/image/matting/modnet_hrnet18_matting/README.md
@@ -0,0 +1,155 @@
+# modnet_hrnet18_matting
+
+|模型名称|modnet_hrnet18_matting|
+| :--- | :---: |
+|类别|图像-抠图|
+|网络|modnet_hrnet18|
+|数据集|百度自建数据集|
+|是否支持Fine-tuning|否|
+|模型大小|60MB|
+|指标|SAD77.96|
+|最新更新日期|2021-12-03|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+
+ - 样例结果示例(左为原图,右为效果图):
+
+
+
+
+
+- ### 模型介绍
+
+ - Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。modnet_hrnet18_matting可生成抠图结果。
+
+
+
+ - 更多详情请参考:[modnet_hrnet18_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install modnet_hrnet18_matting
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run modnet_hrnet18_matting --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="modnet_hrnet18_matting")
+
+ result = model.predict(["/PATH/TO/IMAGE"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ trimap_list,
+ visualization,
+ save_path):
+ ```
+
+ - 人像matting预测API,用于将输入图片中的人像分割出来。
+
+ - 参数
+
+ - image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
+ - trimap_list(list(str | numpy.ndarray)):trimap输入路径或者单通道灰度图格式图片。
+ - visualization (bool): 是否进行可视化,默认为False。
+ - save_path (str): 当visualization为True时,保存图片的路径,默认为"modnet_hrnet18_matting_output"。
+
+ - 返回
+
+ - result (list(numpy.ndarray)):模型分割结果:
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署人像matting在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m modnet_hrnet18_matting
+ ```
+
+ - 这样就完成了一个人像matting在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/modnet_hrnet18_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/matting/modnet_hrnet18_matting/README_en.md b/modules/image/matting/modnet_hrnet18_matting/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..17524b51b31174b66a01fd13fdb0165d97f46223
--- /dev/null
+++ b/modules/image/matting/modnet_hrnet18_matting/README_en.md
@@ -0,0 +1,156 @@
+# modnet_hrnet18_matting
+
+|Module Name|modnet_hrnet18_matting|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|modnet_mobilenetv2|
+|Dataset|Baidu self-built dataset|
+|Support Fine-tuning|No|
+|Module Size|60MB|
+|Data Indicators|SAD77.96|
+|Latest update date|2021-12-03|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
+
+
+
+ - For more information, please refer to: [modnet_hrnet18_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install modnet_hrnet18_matting
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run modnet_hrnet18_matting --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="modnet_hrnet18_matting")
+
+ result = model.predict(["/PATH/TO/IMAGE"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ trimap_list,
+ visualization,
+ save_path):
+ ```
+
+ - Prediction API for matting.
+
+ - **Parameter**
+
+ - image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
+ - trimap_list(list(str | numpy.ndarray)): Trimap path or trimap data, ndarray.shape is in the format \[H, W],gray. Default is None
+ - visualization (bool): Whether to save the recognition results as picture files, default is False.
+ - save_path (str): Save path of images, "modnet_hrnet18_matting_output" by default.
+
+ - **Return**
+
+ - result (list(numpy.ndarray)):The list of model results.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of matting.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m modnet_hrnet18_matting
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/modnet_hrnet18_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/matting/modnet_hrnet18_matting/hrnet.py b/modules/image/matting/modnet_hrnet18_matting/hrnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..22cbd377bfd2c5c789f42c273de603d89fd8a24a
--- /dev/null
+++ b/modules/image/matting/modnet_hrnet18_matting/hrnet.py
@@ -0,0 +1,652 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+from paddleseg.cvlibs import manager, param_init
+from paddleseg.models import layers
+from paddleseg.utils import utils
+
+__all__ = ["HRNet_W18"]
+
+
+class HRNet(nn.Layer):
+ """
+ The HRNet implementation based on PaddlePaddle.
+
+ The original article refers to
+ Jingdong Wang, et, al. "HRNet:Deep High-Resolution Representation Learning for Visual Recognition"
+ (https://arxiv.org/pdf/1908.07919.pdf).
+
+ Args:
+ pretrained (str, optional): The path of pretrained model.
+ stage1_num_modules (int, optional): Number of modules for stage1. Default 1.
+ stage1_num_blocks (list, optional): Number of blocks per module for stage1. Default (4).
+ stage1_num_channels (list, optional): Number of channels per branch for stage1. Default (64).
+ stage2_num_modules (int, optional): Number of modules for stage2. Default 1.
+ stage2_num_blocks (list, optional): Number of blocks per module for stage2. Default (4, 4).
+ stage2_num_channels (list, optional): Number of channels per branch for stage2. Default (18, 36).
+ stage3_num_modules (int, optional): Number of modules for stage3. Default 4.
+ stage3_num_blocks (list, optional): Number of blocks per module for stage3. Default (4, 4, 4).
+ stage3_num_channels (list, optional): Number of channels per branch for stage3. Default [18, 36, 72).
+ stage4_num_modules (int, optional): Number of modules for stage4. Default 3.
+ stage4_num_blocks (list, optional): Number of blocks per module for stage4. Default (4, 4, 4, 4).
+ stage4_num_channels (list, optional): Number of channels per branch for stage4. Default (18, 36, 72. 144).
+ has_se (bool, optional): Whether to use Squeeze-and-Excitation module. Default False.
+ align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+ e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+ """
+
+ def __init__(self,
+ input_channels: int=3,
+ pretrained: int = None,
+ stage1_num_modules: int = 1,
+ stage1_num_blocks: list = (4, ),
+ stage1_num_channels: list = (64, ),
+ stage2_num_modules: int = 1,
+ stage2_num_blocks: list = (4, 4),
+ stage2_num_channels: list = (18, 36),
+ stage3_num_modules: int = 4,
+ stage3_num_blocks: list = (4, 4, 4),
+ stage3_num_channels: list = (18, 36, 72),
+ stage4_num_modules: int = 3,
+ stage4_num_blocks: list = (4, 4, 4, 4),
+ stage4_num_channels: list = (18, 36, 72, 144),
+ has_se: bool = False,
+ align_corners: bool = False,
+ padding_same: bool = True):
+ super(HRNet, self).__init__()
+ self.pretrained = pretrained
+ self.stage1_num_modules = stage1_num_modules
+ self.stage1_num_blocks = stage1_num_blocks
+ self.stage1_num_channels = stage1_num_channels
+ self.stage2_num_modules = stage2_num_modules
+ self.stage2_num_blocks = stage2_num_blocks
+ self.stage2_num_channels = stage2_num_channels
+ self.stage3_num_modules = stage3_num_modules
+ self.stage3_num_blocks = stage3_num_blocks
+ self.stage3_num_channels = stage3_num_channels
+ self.stage4_num_modules = stage4_num_modules
+ self.stage4_num_blocks = stage4_num_blocks
+ self.stage4_num_channels = stage4_num_channels
+ self.has_se = has_se
+ self.align_corners = align_corners
+
+ self.feat_channels = [i for i in stage4_num_channels]
+ self.feat_channels = [64] + self.feat_channels
+
+ self.conv_layer1_1 = layers.ConvBNReLU(
+ in_channels=input_channels,
+ out_channels=64,
+ kernel_size=3,
+ stride=2,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False)
+
+ self.conv_layer1_2 = layers.ConvBNReLU(
+ in_channels=64,
+ out_channels=64,
+ kernel_size=3,
+ stride=2,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False)
+
+ self.la1 = Layer1(
+ num_channels=64,
+ num_blocks=self.stage1_num_blocks[0],
+ num_filters=self.stage1_num_channels[0],
+ has_se=has_se,
+ name="layer2",
+ padding_same=padding_same)
+
+ self.tr1 = TransitionLayer(
+ in_channels=[self.stage1_num_channels[0] * 4],
+ out_channels=self.stage2_num_channels,
+ name="tr1",
+ padding_same=padding_same)
+
+ self.st2 = Stage(
+ num_channels=self.stage2_num_channels,
+ num_modules=self.stage2_num_modules,
+ num_blocks=self.stage2_num_blocks,
+ num_filters=self.stage2_num_channels,
+ has_se=self.has_se,
+ name="st2",
+ align_corners=align_corners,
+ padding_same=padding_same)
+
+ self.tr2 = TransitionLayer(
+ in_channels=self.stage2_num_channels,
+ out_channels=self.stage3_num_channels,
+ name="tr2",
+ padding_same=padding_same)
+ self.st3 = Stage(
+ num_channels=self.stage3_num_channels,
+ num_modules=self.stage3_num_modules,
+ num_blocks=self.stage3_num_blocks,
+ num_filters=self.stage3_num_channels,
+ has_se=self.has_se,
+ name="st3",
+ align_corners=align_corners,
+ padding_same=padding_same)
+
+ self.tr3 = TransitionLayer(
+ in_channels=self.stage3_num_channels,
+ out_channels=self.stage4_num_channels,
+ name="tr3",
+ padding_same=padding_same)
+ self.st4 = Stage(
+ num_channels=self.stage4_num_channels,
+ num_modules=self.stage4_num_modules,
+ num_blocks=self.stage4_num_blocks,
+ num_filters=self.stage4_num_channels,
+ has_se=self.has_se,
+ name="st4",
+ align_corners=align_corners,
+ padding_same=padding_same)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ feat_list = []
+ conv1 = self.conv_layer1_1(x)
+ feat_list.append(conv1)
+ conv2 = self.conv_layer1_2(conv1)
+
+ la1 = self.la1(conv2)
+
+ tr1 = self.tr1([la1])
+ st2 = self.st2(tr1)
+
+ tr2 = self.tr2(st2)
+ st3 = self.st3(tr2)
+
+ tr3 = self.tr3(st3)
+ st4 = self.st4(tr3)
+
+ feat_list = feat_list + st4
+
+ return feat_list
+
+
+class Layer1(nn.Layer):
+ def __init__(self,
+ num_channels: int,
+ num_filters: int,
+ num_blocks: int,
+ has_se: bool = False,
+ name: str = None,
+ padding_same: bool = True):
+ super(Layer1, self).__init__()
+
+ self.bottleneck_block_list = []
+
+ for i in range(num_blocks):
+ bottleneck_block = self.add_sublayer(
+ "bb_{}_{}".format(name, i + 1),
+ BottleneckBlock(
+ num_channels=num_channels if i == 0 else num_filters * 4,
+ num_filters=num_filters,
+ has_se=has_se,
+ stride=1,
+ downsample=True if i == 0 else False,
+ name=name + '_' + str(i + 1),
+ padding_same=padding_same))
+ self.bottleneck_block_list.append(bottleneck_block)
+
+ def forward(self, x: paddle.Tensor):
+ conv = x
+ for block_func in self.bottleneck_block_list:
+ conv = block_func(conv)
+ return conv
+
+
+class TransitionLayer(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ name: str = None,
+ padding_same: bool = True):
+ super(TransitionLayer, self).__init__()
+
+ num_in = len(in_channels)
+ num_out = len(out_channels)
+ self.conv_bn_func_list = []
+ for i in range(num_out):
+ residual = None
+ if i < num_in:
+ if in_channels[i] != out_channels[i]:
+ residual = self.add_sublayer(
+ "transition_{}_layer_{}".format(name, i + 1),
+ layers.ConvBNReLU(
+ in_channels=in_channels[i],
+ out_channels=out_channels[i],
+ kernel_size=3,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False))
+ else:
+ residual = self.add_sublayer(
+ "transition_{}_layer_{}".format(name, i + 1),
+ layers.ConvBNReLU(
+ in_channels=in_channels[-1],
+ out_channels=out_channels[i],
+ kernel_size=3,
+ stride=2,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False))
+ self.conv_bn_func_list.append(residual)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outs = []
+ for idx, conv_bn_func in enumerate(self.conv_bn_func_list):
+ if conv_bn_func is None:
+ outs.append(x[idx])
+ else:
+ if idx < len(x):
+ outs.append(conv_bn_func(x[idx]))
+ else:
+ outs.append(conv_bn_func(x[-1]))
+ return outs
+
+
+class Branches(nn.Layer):
+ def __init__(self,
+ num_blocks: int,
+ in_channels: int,
+ out_channels: int,
+ has_se: bool = False,
+ name: str = None,
+ padding_same: bool = True):
+ super(Branches, self).__init__()
+
+ self.basic_block_list = []
+
+ for i in range(len(out_channels)):
+ self.basic_block_list.append([])
+ for j in range(num_blocks[i]):
+ in_ch = in_channels[i] if j == 0 else out_channels[i]
+ basic_block_func = self.add_sublayer(
+ "bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1),
+ BasicBlock(
+ num_channels=in_ch,
+ num_filters=out_channels[i],
+ has_se=has_se,
+ name=name + '_branch_layer_' + str(i + 1) + '_' +
+ str(j + 1),
+ padding_same=padding_same))
+ self.basic_block_list[i].append(basic_block_func)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outs = []
+ for idx, input in enumerate(x):
+ conv = input
+ for basic_block_func in self.basic_block_list[idx]:
+ conv = basic_block_func(conv)
+ outs.append(conv)
+ return outs
+
+
+class BottleneckBlock(nn.Layer):
+ def __init__(self,
+ num_channels: int,
+ num_filters: int,
+ has_se: bool,
+ stride: int = 1,
+ downsample: bool = False,
+ name:str = None,
+ padding_same: bool = True):
+ super(BottleneckBlock, self).__init__()
+
+ self.has_se = has_se
+ self.downsample = downsample
+
+ self.conv1 = layers.ConvBNReLU(
+ in_channels=num_channels,
+ out_channels=num_filters,
+ kernel_size=1,
+ bias_attr=False)
+
+ self.conv2 = layers.ConvBNReLU(
+ in_channels=num_filters,
+ out_channels=num_filters,
+ kernel_size=3,
+ stride=stride,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False)
+
+ self.conv3 = layers.ConvBN(
+ in_channels=num_filters,
+ out_channels=num_filters * 4,
+ kernel_size=1,
+ bias_attr=False)
+
+ if self.downsample:
+ self.conv_down = layers.ConvBN(
+ in_channels=num_channels,
+ out_channels=num_filters * 4,
+ kernel_size=1,
+ bias_attr=False)
+
+ if self.has_se:
+ self.se = SELayer(
+ num_channels=num_filters * 4,
+ num_filters=num_filters * 4,
+ reduction_ratio=16,
+ name=name + '_fc')
+
+ self.add = layers.Add()
+ self.relu = layers.Activation("relu")
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ residual = x
+ conv1 = self.conv1(x)
+ conv2 = self.conv2(conv1)
+ conv3 = self.conv3(conv2)
+
+ if self.downsample:
+ residual = self.conv_down(x)
+
+ if self.has_se:
+ conv3 = self.se(conv3)
+
+ y = self.add(conv3, residual)
+ y = self.relu(y)
+ return y
+
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ num_channels: int,
+ num_filters: int,
+ stride: int = 1,
+ has_se: bool = False,
+ downsample: bool = False,
+ name: str = None,
+ padding_same: bool = True):
+ super(BasicBlock, self).__init__()
+
+ self.has_se = has_se
+ self.downsample = downsample
+
+ self.conv1 = layers.ConvBNReLU(
+ in_channels=num_channels,
+ out_channels=num_filters,
+ kernel_size=3,
+ stride=stride,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False)
+ self.conv2 = layers.ConvBN(
+ in_channels=num_filters,
+ out_channels=num_filters,
+ kernel_size=3,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False)
+
+ if self.downsample:
+ self.conv_down = layers.ConvBNReLU(
+ in_channels=num_channels,
+ out_channels=num_filters,
+ kernel_size=1,
+ bias_attr=False)
+
+ if self.has_se:
+ self.se = SELayer(
+ num_channels=num_filters,
+ num_filters=num_filters,
+ reduction_ratio=16,
+ name=name + '_fc')
+
+ self.add = layers.Add()
+ self.relu = layers.Activation("relu")
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ residual = x
+ conv1 = self.conv1(x)
+ conv2 = self.conv2(conv1)
+
+ if self.downsample:
+ residual = self.conv_down(x)
+
+ if self.has_se:
+ conv2 = self.se(conv2)
+
+ y = self.add(conv2, residual)
+ y = self.relu(y)
+ return y
+
+
+class SELayer(nn.Layer):
+ def __init__(self, num_channels: int, num_filters: int, reduction_ratio: int, name: str = None):
+ super(SELayer, self).__init__()
+
+ self.pool2d_gap = nn.AdaptiveAvgPool2D(1)
+
+ self._num_channels = num_channels
+
+ med_ch = int(num_channels / reduction_ratio)
+ stdv = 1.0 / math.sqrt(num_channels * 1.0)
+ self.squeeze = nn.Linear(
+ num_channels,
+ med_ch,
+ weight_attr=paddle.ParamAttr(
+ initializer=nn.initializer.Uniform(-stdv, stdv)))
+
+ stdv = 1.0 / math.sqrt(med_ch * 1.0)
+ self.excitation = nn.Linear(
+ med_ch,
+ num_filters,
+ weight_attr=paddle.ParamAttr(
+ initializer=nn.initializer.Uniform(-stdv, stdv)))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ pool = self.pool2d_gap(x)
+ pool = paddle.reshape(pool, shape=[-1, self._num_channels])
+ squeeze = self.squeeze(pool)
+ squeeze = F.relu(squeeze)
+ excitation = self.excitation(squeeze)
+ excitation = F.sigmoid(excitation)
+ excitation = paddle.reshape(
+ excitation, shape=[-1, self._num_channels, 1, 1])
+ out = x * excitation
+ return out
+
+
+class Stage(nn.Layer):
+ def __init__(self,
+ num_channels: int,
+ num_modules: int,
+ num_blocks: int,
+ num_filters: int,
+ has_se: bool = False,
+ multi_scale_output: bool = True,
+ name: str = None,
+ align_corners: bool = False,
+ padding_same: bool = True):
+ super(Stage, self).__init__()
+
+ self._num_modules = num_modules
+
+ self.stage_func_list = []
+ for i in range(num_modules):
+ if i == num_modules - 1 and not multi_scale_output:
+ stage_func = self.add_sublayer(
+ "stage_{}_{}".format(name, i + 1),
+ HighResolutionModule(
+ num_channels=num_channels,
+ num_blocks=num_blocks,
+ num_filters=num_filters,
+ has_se=has_se,
+ multi_scale_output=False,
+ name=name + '_' + str(i + 1),
+ align_corners=align_corners,
+ padding_same=padding_same))
+ else:
+ stage_func = self.add_sublayer(
+ "stage_{}_{}".format(name, i + 1),
+ HighResolutionModule(
+ num_channels=num_channels,
+ num_blocks=num_blocks,
+ num_filters=num_filters,
+ has_se=has_se,
+ name=name + '_' + str(i + 1),
+ align_corners=align_corners,
+ padding_same=padding_same))
+
+ self.stage_func_list.append(stage_func)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ out = x
+ for idx in range(self._num_modules):
+ out = self.stage_func_list[idx](out)
+ return out
+
+
+class HighResolutionModule(nn.Layer):
+ def __init__(self,
+ num_channels: int,
+ num_blocks: int,
+ num_filters: int,
+ has_se: bool = False,
+ multi_scale_output: bool = True,
+ name: str = None,
+ align_corners: bool = False,
+ padding_same: bool = True):
+ super(HighResolutionModule, self).__init__()
+
+ self.branches_func = Branches(
+ num_blocks=num_blocks,
+ in_channels=num_channels,
+ out_channels=num_filters,
+ has_se=has_se,
+ name=name,
+ padding_same=padding_same)
+
+ self.fuse_func = FuseLayers(
+ in_channels=num_filters,
+ out_channels=num_filters,
+ multi_scale_output=multi_scale_output,
+ name=name,
+ align_corners=align_corners,
+ padding_same=padding_same)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ out = self.branches_func(x)
+ out = self.fuse_func(out)
+ return out
+
+
+class FuseLayers(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ multi_scale_output: bool = True,
+ name: str = None,
+ align_corners: bool = False,
+ padding_same: bool = True):
+ super(FuseLayers, self).__init__()
+
+ self._actual_ch = len(in_channels) if multi_scale_output else 1
+ self._in_channels = in_channels
+ self.align_corners = align_corners
+
+ self.residual_func_list = []
+ for i in range(self._actual_ch):
+ for j in range(len(in_channels)):
+ if j > i:
+ residual_func = self.add_sublayer(
+ "residual_{}_layer_{}_{}".format(name, i + 1, j + 1),
+ layers.ConvBN(
+ in_channels=in_channels[j],
+ out_channels=out_channels[i],
+ kernel_size=1,
+ bias_attr=False))
+ self.residual_func_list.append(residual_func)
+ elif j < i:
+ pre_num_filters = in_channels[j]
+ for k in range(i - j):
+ if k == i - j - 1:
+ residual_func = self.add_sublayer(
+ "residual_{}_layer_{}_{}_{}".format(
+ name, i + 1, j + 1, k + 1),
+ layers.ConvBN(
+ in_channels=pre_num_filters,
+ out_channels=out_channels[i],
+ kernel_size=3,
+ stride=2,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False))
+ pre_num_filters = out_channels[i]
+ else:
+ residual_func = self.add_sublayer(
+ "residual_{}_layer_{}_{}_{}".format(
+ name, i + 1, j + 1, k + 1),
+ layers.ConvBNReLU(
+ in_channels=pre_num_filters,
+ out_channels=out_channels[j],
+ kernel_size=3,
+ stride=2,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False))
+ pre_num_filters = out_channels[j]
+ self.residual_func_list.append(residual_func)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outs = []
+ residual_func_idx = 0
+ for i in range(self._actual_ch):
+ residual = x[i]
+ residual_shape = paddle.shape(residual)[-2:]
+ for j in range(len(self._in_channels)):
+ if j > i:
+ y = self.residual_func_list[residual_func_idx](x[j])
+ residual_func_idx += 1
+
+ y = F.interpolate(
+ y,
+ residual_shape,
+ mode='bilinear',
+ align_corners=self.align_corners)
+ residual = residual + y
+ elif j < i:
+ y = x[j]
+ for k in range(i - j):
+ y = self.residual_func_list[residual_func_idx](y)
+ residual_func_idx += 1
+
+ residual = residual + y
+
+ residual = F.relu(residual)
+ outs.append(residual)
+
+ return outs
+
+
+def HRNet_W18(**kwargs):
+ model = HRNet(
+ stage1_num_modules=1,
+ stage1_num_blocks=[4],
+ stage1_num_channels=[64],
+ stage2_num_modules=1,
+ stage2_num_blocks=[4, 4],
+ stage2_num_channels=[18, 36],
+ stage3_num_modules=4,
+ stage3_num_blocks=[4, 4, 4],
+ stage3_num_channels=[18, 36, 72],
+ stage4_num_modules=3,
+ stage4_num_blocks=[4, 4, 4, 4],
+ stage4_num_channels=[18, 36, 72, 144],
+ **kwargs)
+ return model
\ No newline at end of file
diff --git a/modules/image/matting/modnet_hrnet18_matting/module.py b/modules/image/matting/modnet_hrnet18_matting/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..dd1edbbf7931a92f2ffc03aaf51a35df8b5f2f58
--- /dev/null
+++ b/modules/image/matting/modnet_hrnet18_matting/module.py
@@ -0,0 +1,513 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import time
+import argparse
+from typing import Callable, Union, List, Tuple
+
+import numpy as np
+import cv2
+import scipy
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+from modnet_hrnet18_matting.hrnet import HRNet_W18
+import modnet_hrnet18_matting.processor as P
+
+
+@moduleinfo(
+ name="modnet_hrnet18_matting",
+ type="CV/matting",
+ author="paddlepaddle",
+ summary="modnet_hrnet18_matting is a matting model",
+ version="1.0.0"
+)
+class MODNetHRNet18(nn.Layer):
+ """
+ The MODNet implementation based on PaddlePaddle.
+
+ The original article refers to
+ Zhanghan Ke, et, al. "Is a Green Screen Really Necessary for Real-Time Portrait Matting?"
+ (https://arxiv.org/pdf/2011.11961.pdf).
+
+ Args:
+ hr_channels(int, optional): The channels of high resolutions branch. Defautl: None.
+ pretrained(str, optional): The path of pretrianed model. Defautl: None.
+ """
+
+ def __init__(self, hr_channels:int = 32, pretrained=None):
+ super(MODNetHRNet18, self).__init__()
+
+ self.backbone = HRNet_W18()
+ self.pretrained = pretrained
+
+ self.head = MODNetHead(
+ hr_channels=hr_channels, backbone_channels=self.backbone.feat_channels)
+ self.blurer = GaussianBlurLayer(1, 3)
+ self.transforms = P.Compose([P.LoadImages(), P.ResizeByShort(), P.ResizeToIntMult(), P.Normalize()])
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'modnet-hrnet_w18.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def preprocess(self, img: Union[str, np.ndarray] , transforms: Callable, trimap: Union[str, np.ndarray] = None):
+ data = {}
+ data['img'] = img
+ if trimap is not None:
+ data['trimap'] = trimap
+ data['gt_fields'] = ['trimap']
+ data['trans_info'] = []
+ data = self.transforms(data)
+ data['img'] = paddle.to_tensor(data['img'])
+ data['img'] = data['img'].unsqueeze(0)
+ if trimap is not None:
+ data['trimap'] = paddle.to_tensor(data['trimap'])
+ data['trimap'] = data['trimap'].unsqueeze((0, 1))
+
+ return data
+
+ def forward(self, inputs: dict) -> paddle.Tensor:
+ x = inputs['img']
+ feat_list = self.backbone(x)
+ y = self.head(inputs=inputs, feat_list=feat_list)
+ return y
+
+ def predict(self, image_list: list, trimap_list: list = None, visualization: bool =False, save_path: str = "modnet_hrnet18_matting_output") -> list:
+ self.eval()
+ result= []
+ with paddle.no_grad():
+ for i, im_path in enumerate(image_list):
+ trimap = trimap_list[i] if trimap_list is not None else None
+ data = self.preprocess(img=im_path, transforms=self.transforms, trimap=trimap)
+ alpha_pred = self.forward(data)
+ alpha_pred = P.reverse_transform(alpha_pred, data['trans_info'])
+ alpha_pred = (alpha_pred.numpy()).squeeze()
+ alpha_pred = (alpha_pred * 255).astype('uint8')
+ alpha_pred = P.save_alpha_pred(alpha_pred, trimap)
+ result.append(alpha_pred)
+ if visualization:
+ if not os.path.exists(save_path):
+ os.makedirs(save_path)
+ img_name = str(time.time()) + '.png'
+ image_save_path = os.path.join(save_path, img_name)
+ cv2.imwrite(image_save_path, alpha_pred)
+
+ return result
+
+ @serving
+ def serving_method(self, images: list, trimaps:list = None, **kwargs) -> dict:
+ """
+ Run as a service.
+ """
+ images_decode = [P.base64_to_cv2(image) for image in images]
+ if trimaps is not None:
+ trimap_decoder = [cv2.cvtColor(P.base64_to_cv2(trimap), cv2.COLOR_BGR2GRAY) for trimap in trimaps]
+ else:
+ trimap_decoder = None
+
+ outputs = self.predict(image_list=images_decode, trimap_list= trimap_decoder, **kwargs)
+ serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
+ results = {'data': serving_data}
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ args = self.parser.parse_args(argvs)
+ if args.trimap_path is not None:
+ trimap_list = [args.trimap_path]
+ else:
+ trimap_list = None
+
+ results = self.predict(image_list=[args.input_path], trimap_list=trimap_list, save_path=args.output_dir, visualization=args.visualization)
+
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default="modnet_hrnet18_matting_output", help="The directory to save output images.")
+ self.arg_config_group.add_argument(
+ '--visualization', type=bool, default=True, help="whether to save output as images.")
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
+ self.arg_input_group.add_argument('--trimap_path', type=str, default=None, help="path to image.")
+
+
+
+class MODNetHead(nn.Layer):
+ """
+ Segmentation head.
+ """
+ def __init__(self, hr_channels: int, backbone_channels: int):
+ super().__init__()
+
+ self.lr_branch = LRBranch(backbone_channels)
+ self.hr_branch = HRBranch(hr_channels, backbone_channels)
+ self.f_branch = FusionBranch(hr_channels, backbone_channels)
+
+ def forward(self, inputs: paddle.Tensor, feat_list: list):
+ pred_semantic, lr8x, [enc2x, enc4x] = self.lr_branch(feat_list)
+ pred_detail, hr2x = self.hr_branch(inputs['img'], enc2x, enc4x, lr8x)
+ pred_matte = self.f_branch(inputs['img'], lr8x, hr2x)
+
+ if self.training:
+ logit_dict = {
+ 'semantic': pred_semantic,
+ 'detail': pred_detail,
+ 'matte': pred_matte
+ }
+ return logit_dict
+ else:
+ return pred_matte
+
+
+
+class FusionBranch(nn.Layer):
+ def __init__(self, hr_channels: int, enc_channels: int):
+ super().__init__()
+ self.conv_lr4x = Conv2dIBNormRelu(
+ enc_channels[2], hr_channels, 5, stride=1, padding=2)
+
+ self.conv_f2x = Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1)
+ self.conv_f = nn.Sequential(
+ Conv2dIBNormRelu(
+ hr_channels + 3, int(hr_channels / 2), 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ int(hr_channels / 2),
+ 1,
+ 1,
+ stride=1,
+ padding=0,
+ with_ibn=False,
+ with_relu=False))
+
+ def forward(self, img: paddle.Tensor, lr8x: paddle.Tensor, hr2x: paddle.Tensor):
+ lr4x = F.interpolate(
+ lr8x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr4x = self.conv_lr4x(lr4x)
+ lr2x = F.interpolate(
+ lr4x, scale_factor=2, mode='bilinear', align_corners=False)
+
+ f2x = self.conv_f2x(paddle.concat((lr2x, hr2x), axis=1))
+ f = F.interpolate(
+ f2x, scale_factor=2, mode='bilinear', align_corners=False)
+ f = self.conv_f(paddle.concat((f, img), axis=1))
+ pred_matte = F.sigmoid(f)
+
+ return pred_matte
+
+
+class HRBranch(nn.Layer):
+ """
+ High Resolution Branch of MODNet
+ """
+
+ def __init__(self, hr_channels: int, enc_channels:int):
+ super().__init__()
+
+ self.tohr_enc2x = Conv2dIBNormRelu(
+ enc_channels[0], hr_channels, 1, stride=1, padding=0)
+ self.conv_enc2x = Conv2dIBNormRelu(
+ hr_channels + 3, hr_channels, 3, stride=2, padding=1)
+
+ self.tohr_enc4x = Conv2dIBNormRelu(
+ enc_channels[1], hr_channels, 1, stride=1, padding=0)
+ self.conv_enc4x = Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1)
+
+ self.conv_hr4x = nn.Sequential(
+ Conv2dIBNormRelu(
+ 2 * hr_channels + enc_channels[2] + 3,
+ 2 * hr_channels,
+ 3,
+ stride=1,
+ padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1))
+
+ self.conv_hr2x = nn.Sequential(
+ Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1))
+
+ self.conv_hr = nn.Sequential(
+ Conv2dIBNormRelu(
+ hr_channels + 3, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ hr_channels,
+ 1,
+ 1,
+ stride=1,
+ padding=0,
+ with_ibn=False,
+ with_relu=False))
+
+ def forward(self, img: paddle.Tensor, enc2x: paddle.Tensor, enc4x: paddle.Tensor, lr8x: paddle.Tensor):
+ img2x = F.interpolate(
+ img, scale_factor=1 / 2, mode='bilinear', align_corners=False)
+ img4x = F.interpolate(
+ img, scale_factor=1 / 4, mode='bilinear', align_corners=False)
+
+ enc2x = self.tohr_enc2x(enc2x)
+ hr4x = self.conv_enc2x(paddle.concat((img2x, enc2x), axis=1))
+
+ enc4x = self.tohr_enc4x(enc4x)
+ hr4x = self.conv_enc4x(paddle.concat((hr4x, enc4x), axis=1))
+
+ lr4x = F.interpolate(
+ lr8x, scale_factor=2, mode='bilinear', align_corners=False)
+ hr4x = self.conv_hr4x(paddle.concat((hr4x, lr4x, img4x), axis=1))
+
+ hr2x = F.interpolate(
+ hr4x, scale_factor=2, mode='bilinear', align_corners=False)
+ hr2x = self.conv_hr2x(paddle.concat((hr2x, enc2x), axis=1))
+
+ pred_detail = None
+ if self.training:
+ hr = F.interpolate(
+ hr2x, scale_factor=2, mode='bilinear', align_corners=False)
+ hr = self.conv_hr(paddle.concat((hr, img), axis=1))
+ pred_detail = F.sigmoid(hr)
+
+ return pred_detail, hr2x
+
+
+class LRBranch(nn.Layer):
+ """
+ Low Resolution Branch of MODNet
+ """
+ def __init__(self, backbone_channels: int):
+ super().__init__()
+ self.se_block = SEBlock(backbone_channels[4], reduction=4)
+ self.conv_lr16x = Conv2dIBNormRelu(
+ backbone_channels[4], backbone_channels[3], 5, stride=1, padding=2)
+ self.conv_lr8x = Conv2dIBNormRelu(
+ backbone_channels[3], backbone_channels[2], 5, stride=1, padding=2)
+ self.conv_lr = Conv2dIBNormRelu(
+ backbone_channels[2],
+ 1,
+ 3,
+ stride=2,
+ padding=1,
+ with_ibn=False,
+ with_relu=False)
+
+ def forward(self, feat_list: list):
+ enc2x, enc4x, enc32x = feat_list[0], feat_list[1], feat_list[4]
+
+ enc32x = self.se_block(enc32x)
+ lr16x = F.interpolate(
+ enc32x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr16x = self.conv_lr16x(lr16x)
+ lr8x = F.interpolate(
+ lr16x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr8x = self.conv_lr8x(lr8x)
+
+ pred_semantic = None
+ if self.training:
+ lr = self.conv_lr(lr8x)
+ pred_semantic = F.sigmoid(lr)
+
+ return pred_semantic, lr8x, [enc2x, enc4x]
+
+
+class IBNorm(nn.Layer):
+ """
+ Combine Instance Norm and Batch Norm into One Layer
+ """
+
+ def __init__(self, in_channels: int):
+ super().__init__()
+ self.bnorm_channels = in_channels // 2
+ self.inorm_channels = in_channels - self.bnorm_channels
+
+ self.bnorm = nn.BatchNorm2D(self.bnorm_channels)
+ self.inorm = nn.InstanceNorm2D(self.inorm_channels)
+
+ def forward(self, x):
+ bn_x = self.bnorm(x[:, :self.bnorm_channels, :, :])
+ in_x = self.inorm(x[:, self.bnorm_channels:, :, :])
+
+ return paddle.concat((bn_x, in_x), 1)
+
+
+class Conv2dIBNormRelu(nn.Layer):
+ """
+ Convolution + IBNorm + Relu
+ """
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ padding: int = 0,
+ dilation:int = 1,
+ groups: int = 1,
+ bias_attr: paddle.ParamAttr = None,
+ with_ibn: bool = True,
+ with_relu: bool = True):
+
+ super().__init__()
+
+ layers = [
+ nn.Conv2D(
+ in_channels,
+ out_channels,
+ kernel_size,
+ stride=stride,
+ padding=padding,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=bias_attr)
+ ]
+
+ if with_ibn:
+ layers.append(IBNorm(out_channels))
+
+ if with_relu:
+ layers.append(nn.ReLU())
+
+ self.layers = nn.Sequential(*layers)
+
+ def forward(self, x: paddle.Tensor):
+ return self.layers(x)
+
+
+class SEBlock(nn.Layer):
+ """
+ SE Block Proposed in https://arxiv.org/pdf/1709.01507.pdf
+ """
+
+ def __init__(self, num_channels: int, reduction:int = 1):
+ super().__init__()
+ self.pool = nn.AdaptiveAvgPool2D(1)
+ self.conv = nn.Sequential(
+ nn.Conv2D(
+ num_channels,
+ int(num_channels // reduction),
+ 1,
+ bias_attr=False), nn.ReLU(),
+ nn.Conv2D(
+ int(num_channels // reduction),
+ num_channels,
+ 1,
+ bias_attr=False), nn.Sigmoid())
+
+ def forward(self, x: paddle.Tensor):
+ w = self.pool(x)
+ w = self.conv(w)
+ return w * x
+
+
+class GaussianBlurLayer(nn.Layer):
+ """ Add Gaussian Blur to a 4D tensors
+ This layer takes a 4D tensor of {N, C, H, W} as input.
+ The Gaussian blur will be performed in given channel number (C) splitly.
+ """
+
+ def __init__(self, channels: int, kernel_size: int):
+ """
+ Args:
+ channels (int): Channel for input tensor
+ kernel_size (int): Size of the kernel used in blurring
+ """
+
+ super(GaussianBlurLayer, self).__init__()
+ self.channels = channels
+ self.kernel_size = kernel_size
+ assert self.kernel_size % 2 != 0
+
+ self.op = nn.Sequential(
+ nn.Pad2D(int(self.kernel_size / 2), mode='reflect'),
+ nn.Conv2D(
+ channels,
+ channels,
+ self.kernel_size,
+ stride=1,
+ padding=0,
+ bias_attr=False,
+ groups=channels))
+
+ self._init_kernel()
+ self.op[1].weight.stop_gradient = True
+
+ def forward(self, x: paddle.Tensor):
+ """
+ Args:
+ x (paddle.Tensor): input 4D tensor
+ Returns:
+ paddle.Tensor: Blurred version of the input
+ """
+
+ if not len(list(x.shape)) == 4:
+ print('\'GaussianBlurLayer\' requires a 4D tensor as input\n')
+ exit()
+ elif not x.shape[1] == self.channels:
+ print('In \'GaussianBlurLayer\', the required channel ({0}) is'
+ 'not the same as input ({1})\n'.format(
+ self.channels, x.shape[1]))
+ exit()
+
+ return self.op(x)
+
+ def _init_kernel(self):
+ sigma = 0.3 * ((self.kernel_size - 1) * 0.5 - 1) + 0.8
+
+ n = np.zeros((self.kernel_size, self.kernel_size))
+ i = int(self.kernel_size / 2)
+ n[i, i] = 1
+ kernel = scipy.ndimage.gaussian_filter(n, sigma)
+ kernel = kernel.astype('float32')
+ kernel = kernel[np.newaxis, np.newaxis, :, :]
+ paddle.assign(kernel, self.op[1].weight)
\ No newline at end of file
diff --git a/modules/image/matting/modnet_hrnet18_matting/processor.py b/modules/image/matting/modnet_hrnet18_matting/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..361c955390589469625aa985f6b75d5c95ed2e33
--- /dev/null
+++ b/modules/image/matting/modnet_hrnet18_matting/processor.py
@@ -0,0 +1,208 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import random
+import base64
+from typing import Callable, Union, List, Tuple
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+from paddleseg.transforms import functional
+from PIL import Image
+
+
+class Compose:
+ """
+ Do transformation on input data with corresponding pre-processing and augmentation operations.
+ The shape of input data to all operations is [height, width, channels].
+ """
+
+ def __init__(self, transforms: Callable, to_rgb: bool = True):
+ if not isinstance(transforms, list):
+ raise TypeError('The transforms must be a list!')
+ self.transforms = transforms
+ self.to_rgb = to_rgb
+
+ def __call__(self, data: dict) -> dict:
+
+ if 'trans_info' not in data:
+ data['trans_info'] = []
+ for op in self.transforms:
+ data = op(data)
+ if data is None:
+ return None
+
+ data['img'] = np.transpose(data['img'], (2, 0, 1))
+ for key in data.get('gt_fields', []):
+ if len(data[key].shape) == 2:
+ continue
+ data[key] = np.transpose(data[key], (2, 0, 1))
+
+ return data
+
+
+class LoadImages:
+ """
+ Read images from image path.
+
+ Args:
+ to_rgb (bool, optional): If converting image to RGB color space. Default: True.
+ """
+ def __init__(self, to_rgb: bool = True):
+ self.to_rgb = to_rgb
+
+ def __call__(self, data: dict) -> dict:
+
+ if isinstance(data['img'], str):
+ data['img'] = cv2.imread(data['img'])
+
+ for key in data.get('gt_fields', []):
+ if isinstance(data[key], str):
+ data[key] = cv2.imread(data[key], cv2.IMREAD_UNCHANGED)
+ # if alpha and trimap has 3 channels, extract one.
+ if key in ['alpha', 'trimap']:
+ if len(data[key].shape) > 2:
+ data[key] = data[key][:, :, 0]
+
+ if self.to_rgb:
+ data['img'] = cv2.cvtColor(data['img'], cv2.COLOR_BGR2RGB)
+ for key in data.get('gt_fields', []):
+ if len(data[key].shape) == 2:
+ continue
+ data[key] = cv2.cvtColor(data[key], cv2.COLOR_BGR2RGB)
+
+ return data
+
+
+class ResizeByShort:
+ """
+ Resize the short side of an image to given size, and then scale the other side proportionally.
+
+ Args:
+ short_size (int): The target size of short side.
+ """
+
+ def __init__(self, short_size: int =512):
+ self.short_size = short_size
+
+ def __call__(self, data: dict) -> dict:
+
+ data['trans_info'].append(('resize', data['img'].shape[0:2]))
+ data['img'] = functional.resize_short(data['img'], self.short_size)
+ for key in data.get('gt_fields', []):
+ data[key] = functional.resize_short(data[key], self.short_size)
+ return data
+
+
+class ResizeToIntMult:
+ """
+ Resize to some int muitple, d.g. 32.
+ """
+
+ def __init__(self, mult_int: int = 32):
+ self.mult_int = mult_int
+
+ def __call__(self, data: dict) -> dict:
+ data['trans_info'].append(('resize', data['img'].shape[0:2]))
+
+ h, w = data['img'].shape[0:2]
+ rw = w - w % 32
+ rh = h - h % 32
+ data['img'] = functional.resize(data['img'], (rw, rh))
+ for key in data.get('gt_fields', []):
+ data[key] = functional.resize(data[key], (rw, rh))
+
+ return data
+
+
+class Normalize:
+ """
+ Normalize an image.
+
+ Args:
+ mean (list, optional): The mean value of a data set. Default: [0.5, 0.5, 0.5].
+ std (list, optional): The standard deviation of a data set. Default: [0.5, 0.5, 0.5].
+
+ Raises:
+ ValueError: When mean/std is not list or any value in std is 0.
+ """
+
+ def __init__(self, mean: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5), std: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5)):
+ self.mean = mean
+ self.std = std
+ if not (isinstance(self.mean, (list, tuple))
+ and isinstance(self.std, (list, tuple))):
+ raise ValueError(
+ "{}: input type is invalid. It should be list or tuple".format(
+ self))
+ from functools import reduce
+ if reduce(lambda x, y: x * y, self.std) == 0:
+ raise ValueError('{}: std is invalid!'.format(self))
+
+ def __call__(self, data: dict) -> dict:
+ mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
+ std = np.array(self.std)[np.newaxis, np.newaxis, :]
+ data['img'] = functional.normalize(data['img'], mean, std)
+ if 'fg' in data.get('gt_fields', []):
+ data['fg'] = functional.normalize(data['fg'], mean, std)
+ if 'bg' in data.get('gt_fields', []):
+ data['bg'] = functional.normalize(data['bg'], mean, std)
+
+ return data
+
+
+def reverse_transform(alpha: paddle.Tensor, trans_info: List[str]):
+ """recover pred to origin shape"""
+ for item in trans_info[::-1]:
+ if item[0] == 'resize':
+ h, w = item[1][0], item[1][1]
+ alpha = F.interpolate(alpha, [h, w], mode='bilinear')
+ elif item[0] == 'padding':
+ h, w = item[1][0], item[1][1]
+ alpha = alpha[:, :, 0:h, 0:w]
+ else:
+ raise Exception("Unexpected info '{}' in im_info".format(item[0]))
+ return alpha
+
+def save_alpha_pred(alpha: np.ndarray, trimap: Union[np.ndarray, str] = None):
+ """
+ The value of alpha is range [0, 1], shape should be [h,w]
+ """
+ if isinstance(trimap, str):
+ trimap = cv2.imread(trimap, 0)
+
+ alpha[trimap == 0] = 0
+ alpha[trimap == 255] = 255
+ alpha = (alpha).astype('uint8')
+ return alpha
+
+
+def cv2_to_base64(image: np.ndarray):
+ """
+ Convert data from BGR to base64 format.
+ """
+ data = cv2.imencode('.png', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+def base64_to_cv2(b64str: str):
+ """
+ Convert data from base64 to BGR format.
+ """
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
\ No newline at end of file
diff --git a/modules/image/matting/modnet_mobilenetv2_matting/README.md b/modules/image/matting/modnet_mobilenetv2_matting/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..51b8691624e36da0648a1c5fc4f5c670b81a4cde
--- /dev/null
+++ b/modules/image/matting/modnet_mobilenetv2_matting/README.md
@@ -0,0 +1,155 @@
+# modnet_mobilenetv2_matting
+
+|模型名称|modnet_mobilenetv2_matting|
+| :--- | :---: |
+|类别|图像-抠图|
+|网络|modnet_mobilenetv2|
+|数据集|百度自建数据集|
+|是否支持Fine-tuning|否|
+|模型大小|38MB|
+|指标|SAD112.73|
+|最新更新日期|2021-12-03|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+
+ - 样例结果示例(左为原图,右为效果图):
+
+
+
+
+
+- ### 模型介绍
+
+ - Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。modnet_mobilenetv2_matting可生成抠图结果。
+
+
+
+ - 更多详情请参考:[modnet_mobilenetv2_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install modnet_mobilenetv2_matting
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run modnet_mobilenetv2_matting --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="modnet_mobilenetv2_matting")
+
+ result = model.predict(["/PATH/TO/IMAGE"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ trimap_list,
+ visualization,
+ save_path):
+ ```
+
+ - 人像matting预测API,用于将输入图片中的人像分割出来。
+
+ - 参数
+
+ - image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
+ - trimap_list(list(str | numpy.ndarray)):trimap输入路径或者灰度图单通道格式图片。默认为None。
+ - visualization (bool): 是否进行可视化,默认为False。
+ - save_path (str): 当visualization为True时,保存图片的路径,默认为"modnet_mobilenetv2_matting_output"。
+
+ - 返回
+
+ - result (list(numpy.ndarray)):模型分割结果:
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署人像matting在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m modnet_mobilenetv2_matting
+ ```
+
+ - 这样就完成了一个人像matting在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/modnet_mobilenetv2_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/matting/modnet_mobilenetv2_matting/README_en.md b/modules/image/matting/modnet_mobilenetv2_matting/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..a85aa07e9200e7d80756c0c67958a7f42215cf85
--- /dev/null
+++ b/modules/image/matting/modnet_mobilenetv2_matting/README_en.md
@@ -0,0 +1,156 @@
+# modnet_mobilenetv2_matting
+
+|Module Name|modnet_mobilenetv2_matting|
+| :--- | :---: |
+|Category|Image Matting|
+|Network|modnet_mobilenetv2|
+|Dataset|Baidu self-built dataset|
+|Support Fine-tuning|No|
+|Module Size|38MB|
+|Data Indicators|SAD112.73|
+|Latest update date|2021-12-03|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
+
+
+
+ - For more information, please refer to: [modnet_mobilenetv2_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install modnet_mobilenetv2_matting
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run modnet_mobilenetv2_matting --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="modnet_mobilenetv2_matting")
+
+ result = model.predict(image_list=["/PATH/TO/IMAGE"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ trimap_list,
+ visualization,
+ save_path):
+ ```
+
+ - Prediction API for matting.
+
+ - **Parameter**
+
+ - image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
+ - trimap_list(list(str | numpy.ndarray)): Trimap path or trimap data, ndarray.shape is in the format \[H, W],gray. Default is None.
+ - visualization (bool): Whether to save the recognition results as picture files, default is False.
+ - save_path (str): Save path of images, "modnet_mobilenetv2_matting_output" by default.
+
+ - **Return**
+
+ - result (list(numpy.ndarray)):The list of model results.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of matting.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m modnet_mobilenetv2_matting
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/modnet_mobilenetv2_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/matting/modnet_mobilenetv2_matting/mobilenetv2.py b/modules/image/matting/modnet_mobilenetv2_matting/mobilenetv2.py
new file mode 100644
index 0000000000000000000000000000000000000000..8895104a34073143ae17c1021519650dad022aeb
--- /dev/null
+++ b/modules/image/matting/modnet_mobilenetv2_matting/mobilenetv2.py
@@ -0,0 +1,224 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+
+import numpy as np
+import paddle
+from paddle import ParamAttr
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn import Conv2D, BatchNorm, Linear, Dropout
+from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D
+
+from paddleseg import utils
+from paddleseg.cvlibs import manager
+
+
+__all__ = ["MobileNetV2"]
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+ def __init__(self,
+ num_channels: int,
+ filter_size: int,
+ num_filters: int,
+ stride: int,
+ padding: int,
+ num_groups: int=1,
+ name: str = None,
+ use_cudnn: bool = True):
+ super(ConvBNLayer, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels=num_channels,
+ out_channels=num_filters,
+ kernel_size=filter_size,
+ stride=stride,
+ padding=padding,
+ groups=num_groups,
+ weight_attr=ParamAttr(name=name + "_weights"),
+ bias_attr=False)
+
+ self._batch_norm = BatchNorm(
+ num_filters,
+ param_attr=ParamAttr(name=name + "_bn_scale"),
+ bias_attr=ParamAttr(name=name + "_bn_offset"),
+ moving_mean_name=name + "_bn_mean",
+ moving_variance_name=name + "_bn_variance")
+
+ def forward(self, inputs: paddle.Tensor, if_act: bool = True) -> paddle.Tensor:
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ if if_act:
+ y = F.relu6(y)
+ return y
+
+
+class InvertedResidualUnit(nn.Layer):
+ """Inverted residual block"""
+ def __init__(self, num_channels: int, num_in_filter: int, num_filters: int, stride: int,
+ filter_size: int, padding: int, expansion_factor: int, name: str):
+ super(InvertedResidualUnit, self).__init__()
+ num_expfilter = int(round(num_in_filter * expansion_factor))
+ self._expand_conv = ConvBNLayer(
+ num_channels=num_channels,
+ num_filters=num_expfilter,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ num_groups=1,
+ name=name + "_expand")
+
+ self._bottleneck_conv = ConvBNLayer(
+ num_channels=num_expfilter,
+ num_filters=num_expfilter,
+ filter_size=filter_size,
+ stride=stride,
+ padding=padding,
+ num_groups=num_expfilter,
+ use_cudnn=False,
+ name=name + "_dwise")
+
+ self._linear_conv = ConvBNLayer(
+ num_channels=num_expfilter,
+ num_filters=num_filters,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ num_groups=1,
+ name=name + "_linear")
+
+ def forward(self, inputs: paddle.Tensor, ifshortcut: bool) -> paddle.Tensor:
+ y = self._expand_conv(inputs, if_act=True)
+ y = self._bottleneck_conv(y, if_act=True)
+ y = self._linear_conv(y, if_act=False)
+ if ifshortcut:
+ y = paddle.add(inputs, y)
+ return y
+
+
+class InvresiBlocks(nn.Layer):
+ def __init__(self, in_c: int, t: int, c: int, n: int, s: int, name: str):
+ super(InvresiBlocks, self).__init__()
+
+ self._first_block = InvertedResidualUnit(
+ num_channels=in_c,
+ num_in_filter=in_c,
+ num_filters=c,
+ stride=s,
+ filter_size=3,
+ padding=1,
+ expansion_factor=t,
+ name=name + "_1")
+
+ self._block_list = []
+ for i in range(1, n):
+ block = self.add_sublayer(
+ name + "_" + str(i + 1),
+ sublayer=InvertedResidualUnit(
+ num_channels=c,
+ num_in_filter=c,
+ num_filters=c,
+ stride=1,
+ filter_size=3,
+ padding=1,
+ expansion_factor=t,
+ name=name + "_" + str(i + 1)))
+ self._block_list.append(block)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self._first_block(inputs, ifshortcut=False)
+ for block in self._block_list:
+ y = block(y, ifshortcut=True)
+ return y
+
+
+class MobileNet(nn.Layer):
+ """Networj of MobileNet"""
+ def __init__(self,
+ input_channels: int = 3,
+ scale: float = 1.0,
+ pretrained: str = None,
+ prefix_name: str = ""):
+ super(MobileNet, self).__init__()
+ self.scale = scale
+
+ bottleneck_params_list = [
+ (1, 16, 1, 1),
+ (6, 24, 2, 2),
+ (6, 32, 3, 2),
+ (6, 64, 4, 2),
+ (6, 96, 3, 1),
+ (6, 160, 3, 2),
+ (6, 320, 1, 1),
+ ]
+
+ self.conv1 = ConvBNLayer(
+ num_channels=input_channels,
+ num_filters=int(32 * scale),
+ filter_size=3,
+ stride=2,
+ padding=1,
+ name=prefix_name + "conv1_1")
+
+ self.block_list = []
+ i = 1
+ in_c = int(32 * scale)
+ for layer_setting in bottleneck_params_list:
+ t, c, n, s = layer_setting
+ i += 1
+ block = self.add_sublayer(
+ prefix_name + "conv" + str(i),
+ sublayer=InvresiBlocks(
+ in_c=in_c,
+ t=t,
+ c=int(c * scale),
+ n=n,
+ s=s,
+ name=prefix_name + "conv" + str(i)))
+ self.block_list.append(block)
+ in_c = int(c * scale)
+
+ self.out_c = int(1280 * scale) if scale > 1.0 else 1280
+ self.conv9 = ConvBNLayer(
+ num_channels=in_c,
+ num_filters=self.out_c,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ name=prefix_name + "conv9")
+
+ self.feat_channels = [int(i * scale) for i in [16, 24, 32, 96, 1280]]
+ self.pretrained = pretrained
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ feat_list = []
+ y = self.conv1(inputs, if_act=True)
+
+ block_index = 0
+ for block in self.block_list:
+ y = block(y)
+ if block_index in [0, 1, 2, 4]:
+ feat_list.append(y)
+ block_index += 1
+ y = self.conv9(y, if_act=True)
+ feat_list.append(y)
+ return feat_list
+
+
+def MobileNetV2(**kwargs):
+ model = MobileNet(scale=1.0, **kwargs)
+ return model
diff --git a/modules/image/matting/modnet_mobilenetv2_matting/module.py b/modules/image/matting/modnet_mobilenetv2_matting/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..e6a0e6cbeb4c7c60f069e2642c4593fc6a4cde93
--- /dev/null
+++ b/modules/image/matting/modnet_mobilenetv2_matting/module.py
@@ -0,0 +1,514 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import time
+import argparse
+from typing import Callable, Union, List, Tuple
+
+import numpy as np
+import cv2
+import scipy
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+from modnet_mobilenetv2_matting.mobilenetv2 import MobileNetV2
+import modnet_mobilenetv2_matting.processor as P
+
+
+@moduleinfo(
+ name="modnet_mobilenetv2_matting",
+ type="CV",
+ author="paddlepaddle",
+ summary="modnet_mobilenetv2_matting is a matting model",
+ version="1.0.0"
+)
+class MODNetMobilenetV2(nn.Layer):
+ """
+ The MODNet implementation based on PaddlePaddle.
+
+ The original article refers to
+ Zhanghan Ke, et, al. "Is a Green Screen Really Necessary for Real-Time Portrait Matting?"
+ (https://arxiv.org/pdf/2011.11961.pdf).
+
+ Args:
+ hr_channels(int, optional): The channels of high resolutions branch. Defautl: None.
+ pretrained(str, optional): The path of pretrianed model. Defautl: None.
+
+ """
+
+ def __init__(self, hr_channels:int = 32, pretrained=None):
+ super(MODNetMobilenetV2, self).__init__()
+
+ self.backbone = MobileNetV2()
+ self.pretrained = pretrained
+
+ self.head = MODNetHead(
+ hr_channels=hr_channels, backbone_channels=self.backbone.feat_channels)
+ self.blurer = GaussianBlurLayer(1, 3)
+ self.transforms = P.Compose([P.LoadImages(), P.ResizeByShort(), P.ResizeToIntMult(), P.Normalize()])
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'modnet-mobilenetv2.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def preprocess(self, img: Union[str, np.ndarray] , transforms: Callable, trimap: Union[str, np.ndarray] = None):
+ data = {}
+ data['img'] = img
+ if trimap is not None:
+ data['trimap'] = trimap
+ data['gt_fields'] = ['trimap']
+ data['trans_info'] = []
+ data = self.transforms(data)
+ data['img'] = paddle.to_tensor(data['img'])
+ data['img'] = data['img'].unsqueeze(0)
+ if trimap is not None:
+ data['trimap'] = paddle.to_tensor(data['trimap'])
+ data['trimap'] = data['trimap'].unsqueeze((0, 1))
+
+ return data
+
+ def forward(self, inputs: dict):
+ x = inputs['img']
+ feat_list = self.backbone(x)
+ y = self.head(inputs=inputs, feat_list=feat_list)
+ return y
+
+ def predict(self, image_list: list, trimap_list: list = None, visualization: bool =False, save_path: str = "modnet_mobilenetv2_matting_output"):
+ self.eval()
+ result = []
+ with paddle.no_grad():
+ for i, im_path in enumerate(image_list):
+ trimap = trimap_list[i] if trimap_list is not None else None
+ data = self.preprocess(img=im_path, transforms=self.transforms, trimap=trimap)
+ alpha_pred = self.forward(data)
+ alpha_pred = P.reverse_transform(alpha_pred, data['trans_info'])
+ alpha_pred = (alpha_pred.numpy()).squeeze()
+ alpha_pred = (alpha_pred * 255).astype('uint8')
+ alpha_pred = P.save_alpha_pred(alpha_pred, trimap)
+ result.append(alpha_pred)
+ if visualization:
+ if not os.path.exists(save_path):
+ os.makedirs(save_path)
+ img_name = str(time.time()) + '.png'
+ image_save_path = os.path.join(save_path, img_name)
+ cv2.imwrite(image_save_path, alpha_pred)
+
+ return result
+
+ @serving
+ def serving_method(self, images: list, trimaps:list = None, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [P.base64_to_cv2(image) for image in images]
+ if trimaps is not None:
+ trimap_decoder = [cv2.cvtColor(P.base64_to_cv2(trimap), cv2.COLOR_BGR2GRAY) for trimap in trimaps]
+ else:
+ trimap_decoder = None
+
+ outputs = self.predict(image_list=images_decode, trimap_list= trimap_decoder, **kwargs)
+ serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
+ results = {'data': serving_data}
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ args = self.parser.parse_args(argvs)
+ if args.trimap_path is not None:
+ trimap_list = [args.trimap_path]
+ else:
+ trimap_list = None
+
+ results = self.predict(image_list=[args.input_path], trimap_list=trimap_list, save_path=args.output_dir, visualization=args.visualization)
+
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default="modnet_mobilenetv2_matting_output", help="The directory to save output images.")
+ self.arg_config_group.add_argument(
+ '--visualization', type=bool, default=True, help="whether to save output as images.")
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
+ self.arg_input_group.add_argument('--trimap_path', type=str, default=None, help="path to image.")
+
+
+
+class MODNetHead(nn.Layer):
+ """
+ Segmentation head.
+ """
+ def __init__(self, hr_channels: int, backbone_channels: int):
+ super().__init__()
+
+ self.lr_branch = LRBranch(backbone_channels)
+ self.hr_branch = HRBranch(hr_channels, backbone_channels)
+ self.f_branch = FusionBranch(hr_channels, backbone_channels)
+
+ def forward(self, inputs: paddle.Tensor, feat_list: list):
+ pred_semantic, lr8x, [enc2x, enc4x] = self.lr_branch(feat_list)
+ pred_detail, hr2x = self.hr_branch(inputs['img'], enc2x, enc4x, lr8x)
+ pred_matte = self.f_branch(inputs['img'], lr8x, hr2x)
+
+ if self.training:
+ logit_dict = {
+ 'semantic': pred_semantic,
+ 'detail': pred_detail,
+ 'matte': pred_matte
+ }
+ return logit_dict
+ else:
+ return pred_matte
+
+
+
+class FusionBranch(nn.Layer):
+ def __init__(self, hr_channels: int, enc_channels: int):
+ super().__init__()
+ self.conv_lr4x = Conv2dIBNormRelu(
+ enc_channels[2], hr_channels, 5, stride=1, padding=2)
+
+ self.conv_f2x = Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1)
+ self.conv_f = nn.Sequential(
+ Conv2dIBNormRelu(
+ hr_channels + 3, int(hr_channels / 2), 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ int(hr_channels / 2),
+ 1,
+ 1,
+ stride=1,
+ padding=0,
+ with_ibn=False,
+ with_relu=False))
+
+ def forward(self, img: paddle.Tensor, lr8x: paddle.Tensor, hr2x: paddle.Tensor):
+ lr4x = F.interpolate(
+ lr8x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr4x = self.conv_lr4x(lr4x)
+ lr2x = F.interpolate(
+ lr4x, scale_factor=2, mode='bilinear', align_corners=False)
+
+ f2x = self.conv_f2x(paddle.concat((lr2x, hr2x), axis=1))
+ f = F.interpolate(
+ f2x, scale_factor=2, mode='bilinear', align_corners=False)
+ f = self.conv_f(paddle.concat((f, img), axis=1))
+ pred_matte = F.sigmoid(f)
+
+ return pred_matte
+
+
+class HRBranch(nn.Layer):
+ """
+ High Resolution Branch of MODNet
+ """
+
+ def __init__(self, hr_channels: int, enc_channels:int):
+ super().__init__()
+
+ self.tohr_enc2x = Conv2dIBNormRelu(
+ enc_channels[0], hr_channels, 1, stride=1, padding=0)
+ self.conv_enc2x = Conv2dIBNormRelu(
+ hr_channels + 3, hr_channels, 3, stride=2, padding=1)
+
+ self.tohr_enc4x = Conv2dIBNormRelu(
+ enc_channels[1], hr_channels, 1, stride=1, padding=0)
+ self.conv_enc4x = Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1)
+
+ self.conv_hr4x = nn.Sequential(
+ Conv2dIBNormRelu(
+ 2 * hr_channels + enc_channels[2] + 3,
+ 2 * hr_channels,
+ 3,
+ stride=1,
+ padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1))
+
+ self.conv_hr2x = nn.Sequential(
+ Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1))
+
+ self.conv_hr = nn.Sequential(
+ Conv2dIBNormRelu(
+ hr_channels + 3, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ hr_channels,
+ 1,
+ 1,
+ stride=1,
+ padding=0,
+ with_ibn=False,
+ with_relu=False))
+
+ def forward(self, img: paddle.Tensor, enc2x: paddle.Tensor, enc4x: paddle.Tensor, lr8x: paddle.Tensor):
+ img2x = F.interpolate(
+ img, scale_factor=1 / 2, mode='bilinear', align_corners=False)
+ img4x = F.interpolate(
+ img, scale_factor=1 / 4, mode='bilinear', align_corners=False)
+
+ enc2x = self.tohr_enc2x(enc2x)
+ hr4x = self.conv_enc2x(paddle.concat((img2x, enc2x), axis=1))
+
+ enc4x = self.tohr_enc4x(enc4x)
+ hr4x = self.conv_enc4x(paddle.concat((hr4x, enc4x), axis=1))
+
+ lr4x = F.interpolate(
+ lr8x, scale_factor=2, mode='bilinear', align_corners=False)
+ hr4x = self.conv_hr4x(paddle.concat((hr4x, lr4x, img4x), axis=1))
+
+ hr2x = F.interpolate(
+ hr4x, scale_factor=2, mode='bilinear', align_corners=False)
+ hr2x = self.conv_hr2x(paddle.concat((hr2x, enc2x), axis=1))
+
+ pred_detail = None
+ if self.training:
+ hr = F.interpolate(
+ hr2x, scale_factor=2, mode='bilinear', align_corners=False)
+ hr = self.conv_hr(paddle.concat((hr, img), axis=1))
+ pred_detail = F.sigmoid(hr)
+
+ return pred_detail, hr2x
+
+
+class LRBranch(nn.Layer):
+ """
+ Low Resolution Branch of MODNet
+ """
+ def __init__(self, backbone_channels: int):
+ super().__init__()
+ self.se_block = SEBlock(backbone_channels[4], reduction=4)
+ self.conv_lr16x = Conv2dIBNormRelu(
+ backbone_channels[4], backbone_channels[3], 5, stride=1, padding=2)
+ self.conv_lr8x = Conv2dIBNormRelu(
+ backbone_channels[3], backbone_channels[2], 5, stride=1, padding=2)
+ self.conv_lr = Conv2dIBNormRelu(
+ backbone_channels[2],
+ 1,
+ 3,
+ stride=2,
+ padding=1,
+ with_ibn=False,
+ with_relu=False)
+
+ def forward(self, feat_list: list):
+ enc2x, enc4x, enc32x = feat_list[0], feat_list[1], feat_list[4]
+
+ enc32x = self.se_block(enc32x)
+ lr16x = F.interpolate(
+ enc32x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr16x = self.conv_lr16x(lr16x)
+ lr8x = F.interpolate(
+ lr16x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr8x = self.conv_lr8x(lr8x)
+
+ pred_semantic = None
+ if self.training:
+ lr = self.conv_lr(lr8x)
+ pred_semantic = F.sigmoid(lr)
+
+ return pred_semantic, lr8x, [enc2x, enc4x]
+
+
+class IBNorm(nn.Layer):
+ """
+ Combine Instance Norm and Batch Norm into One Layer
+ """
+
+ def __init__(self, in_channels: int):
+ super().__init__()
+ self.bnorm_channels = in_channels // 2
+ self.inorm_channels = in_channels - self.bnorm_channels
+
+ self.bnorm = nn.BatchNorm2D(self.bnorm_channels)
+ self.inorm = nn.InstanceNorm2D(self.inorm_channels)
+
+ def forward(self, x):
+ bn_x = self.bnorm(x[:, :self.bnorm_channels, :, :])
+ in_x = self.inorm(x[:, self.bnorm_channels:, :, :])
+
+ return paddle.concat((bn_x, in_x), 1)
+
+
+class Conv2dIBNormRelu(nn.Layer):
+ """
+ Convolution + IBNorm + Relu
+ """
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ padding: int = 0,
+ dilation:int = 1,
+ groups: int = 1,
+ bias_attr: paddle.ParamAttr = None,
+ with_ibn: bool = True,
+ with_relu: bool = True):
+
+ super().__init__()
+
+ layers = [
+ nn.Conv2D(
+ in_channels,
+ out_channels,
+ kernel_size,
+ stride=stride,
+ padding=padding,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=bias_attr)
+ ]
+
+ if with_ibn:
+ layers.append(IBNorm(out_channels))
+
+ if with_relu:
+ layers.append(nn.ReLU())
+
+ self.layers = nn.Sequential(*layers)
+
+ def forward(self, x: paddle.Tensor):
+ return self.layers(x)
+
+
+class SEBlock(nn.Layer):
+ """
+ SE Block Proposed in https://arxiv.org/pdf/1709.01507.pdf
+ """
+
+ def __init__(self, num_channels: int, reduction:int = 1):
+ super().__init__()
+ self.pool = nn.AdaptiveAvgPool2D(1)
+ self.conv = nn.Sequential(
+ nn.Conv2D(
+ num_channels,
+ int(num_channels // reduction),
+ 1,
+ bias_attr=False), nn.ReLU(),
+ nn.Conv2D(
+ int(num_channels // reduction),
+ num_channels,
+ 1,
+ bias_attr=False), nn.Sigmoid())
+
+ def forward(self, x: paddle.Tensor):
+ w = self.pool(x)
+ w = self.conv(w)
+ return w * x
+
+
+class GaussianBlurLayer(nn.Layer):
+ """ Add Gaussian Blur to a 4D tensors
+ This layer takes a 4D tensor of {N, C, H, W} as input.
+ The Gaussian blur will be performed in given channel number (C) splitly.
+ """
+
+ def __init__(self, channels: int, kernel_size: int):
+ """
+ Args:
+ channels (int): Channel for input tensor
+ kernel_size (int): Size of the kernel used in blurring
+ """
+
+ super(GaussianBlurLayer, self).__init__()
+ self.channels = channels
+ self.kernel_size = kernel_size
+ assert self.kernel_size % 2 != 0
+
+ self.op = nn.Sequential(
+ nn.Pad2D(int(self.kernel_size / 2), mode='reflect'),
+ nn.Conv2D(
+ channels,
+ channels,
+ self.kernel_size,
+ stride=1,
+ padding=0,
+ bias_attr=False,
+ groups=channels))
+
+ self._init_kernel()
+ self.op[1].weight.stop_gradient = True
+
+ def forward(self, x: paddle.Tensor):
+ """
+ Args:
+ x (paddle.Tensor): input 4D tensor
+ Returns:
+ paddle.Tensor: Blurred version of the input
+ """
+
+ if not len(list(x.shape)) == 4:
+ print('\'GaussianBlurLayer\' requires a 4D tensor as input\n')
+ exit()
+ elif not x.shape[1] == self.channels:
+ print('In \'GaussianBlurLayer\', the required channel ({0}) is'
+ 'not the same as input ({1})\n'.format(
+ self.channels, x.shape[1]))
+ exit()
+
+ return self.op(x)
+
+ def _init_kernel(self):
+ sigma = 0.3 * ((self.kernel_size - 1) * 0.5 - 1) + 0.8
+
+ n = np.zeros((self.kernel_size, self.kernel_size))
+ i = int(self.kernel_size / 2)
+ n[i, i] = 1
+ kernel = scipy.ndimage.gaussian_filter(n, sigma)
+ kernel = kernel.astype('float32')
+ kernel = kernel[np.newaxis, np.newaxis, :, :]
+ paddle.assign(kernel, self.op[1].weight)
\ No newline at end of file
diff --git a/modules/image/matting/modnet_mobilenetv2_matting/processor.py b/modules/image/matting/modnet_mobilenetv2_matting/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..3ae79593f0d3dab19520c3c666ae4a06b81960dd
--- /dev/null
+++ b/modules/image/matting/modnet_mobilenetv2_matting/processor.py
@@ -0,0 +1,207 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import random
+import base64
+from typing import Callable, Union, List, Tuple
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+from paddleseg.transforms import functional
+from PIL import Image
+
+
+class Compose:
+ """
+ Do transformation on input data with corresponding pre-processing and augmentation operations.
+ The shape of input data to all operations is [height, width, channels].
+ """
+
+ def __init__(self, transforms: Callable, to_rgb: bool = True):
+ if not isinstance(transforms, list):
+ raise TypeError('The transforms must be a list!')
+ self.transforms = transforms
+ self.to_rgb = to_rgb
+
+ def __call__(self, data: dict) -> dict:
+
+ if 'trans_info' not in data:
+ data['trans_info'] = []
+ for op in self.transforms:
+ data = op(data)
+ if data is None:
+ return None
+
+ data['img'] = np.transpose(data['img'], (2, 0, 1))
+ for key in data.get('gt_fields', []):
+ if len(data[key].shape) == 2:
+ continue
+ data[key] = np.transpose(data[key], (2, 0, 1))
+
+ return data
+
+
+class LoadImages:
+ """
+ Read images from image path.
+
+ Args:
+ to_rgb (bool, optional): If converting image to RGB color space. Default: True.
+ """
+ def __init__(self, to_rgb: bool = True):
+ self.to_rgb = to_rgb
+
+ def __call__(self, data: dict) -> dict:
+
+ if isinstance(data['img'], str):
+ data['img'] = cv2.imread(data['img'])
+
+ for key in data.get('gt_fields', []):
+ if isinstance(data[key], str):
+ data[key] = cv2.imread(data[key], cv2.IMREAD_UNCHANGED)
+ # if alpha and trimap has 3 channels, extract one.
+ if key in ['alpha', 'trimap']:
+ if len(data[key].shape) > 2:
+ data[key] = data[key][:, :, 0]
+
+ if self.to_rgb:
+ data['img'] = cv2.cvtColor(data['img'], cv2.COLOR_BGR2RGB)
+ for key in data.get('gt_fields', []):
+ if len(data[key].shape) == 2:
+ continue
+ data[key] = cv2.cvtColor(data[key], cv2.COLOR_BGR2RGB)
+
+ return data
+
+
+class ResizeByShort:
+ """
+ Resize the short side of an image to given size, and then scale the other side proportionally.
+
+ Args:
+ short_size (int): The target size of short side.
+ """
+
+ def __init__(self, short_size: int =512):
+ self.short_size = short_size
+
+ def __call__(self, data: dict) -> dict:
+
+ data['trans_info'].append(('resize', data['img'].shape[0:2]))
+ data['img'] = functional.resize_short(data['img'], self.short_size)
+ for key in data.get('gt_fields', []):
+ data[key] = functional.resize_short(data[key], self.short_size)
+ return data
+
+
+class ResizeToIntMult:
+ """
+ Resize to some int muitple, d.g. 32.
+ """
+
+ def __init__(self, mult_int: int = 32):
+ self.mult_int = mult_int
+
+ def __call__(self, data: dict) -> dict:
+ data['trans_info'].append(('resize', data['img'].shape[0:2]))
+
+ h, w = data['img'].shape[0:2]
+ rw = w - w % 32
+ rh = h - h % 32
+ data['img'] = functional.resize(data['img'], (rw, rh))
+ for key in data.get('gt_fields', []):
+ data[key] = functional.resize(data[key], (rw, rh))
+
+ return data
+
+
+class Normalize:
+ """
+ Normalize an image.
+
+ Args:
+ mean (list, optional): The mean value of a data set. Default: [0.5, 0.5, 0.5].
+ std (list, optional): The standard deviation of a data set. Default: [0.5, 0.5, 0.5].
+
+ Raises:
+ ValueError: When mean/std is not list or any value in std is 0.
+ """
+
+ def __init__(self, mean: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5), std: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5)):
+ self.mean = mean
+ self.std = std
+ if not (isinstance(self.mean, (list, tuple))
+ and isinstance(self.std, (list, tuple))):
+ raise ValueError(
+ "{}: input type is invalid. It should be list or tuple".format(
+ self))
+ from functools import reduce
+ if reduce(lambda x, y: x * y, self.std) == 0:
+ raise ValueError('{}: std is invalid!'.format(self))
+
+ def __call__(self, data: dict) -> dict:
+ mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
+ std = np.array(self.std)[np.newaxis, np.newaxis, :]
+ data['img'] = functional.normalize(data['img'], mean, std)
+ if 'fg' in data.get('gt_fields', []):
+ data['fg'] = functional.normalize(data['fg'], mean, std)
+ if 'bg' in data.get('gt_fields', []):
+ data['bg'] = functional.normalize(data['bg'], mean, std)
+
+ return data
+
+
+def reverse_transform(alpha: paddle.Tensor, trans_info: List[str]):
+ """recover pred to origin shape"""
+ for item in trans_info[::-1]:
+ if item[0] == 'resize':
+ h, w = item[1][0], item[1][1]
+ alpha = F.interpolate(alpha, [h, w], mode='bilinear')
+ elif item[0] == 'padding':
+ h, w = item[1][0], item[1][1]
+ alpha = alpha[:, :, 0:h, 0:w]
+ else:
+ raise Exception("Unexpected info '{}' in im_info".format(item[0]))
+ return alpha
+
+def save_alpha_pred(alpha: np.ndarray, trimap: np.ndarray = None):
+ """
+ The value of alpha is range [0, 1], shape should be [h,w]
+ """
+ if isinstance(trimap, str):
+ trimap = cv2.imread(trimap, 0)
+ alpha[trimap == 0] = 0
+ alpha[trimap == 255] = 255
+ alpha = (alpha).astype('uint8')
+ return alpha
+
+
+def cv2_to_base64(image: np.ndarray):
+ """
+ Convert data from BGR to base64 format.
+ """
+ data = cv2.imencode('.png', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+def base64_to_cv2(b64str: str):
+ """
+ Convert data from base64 to BGR format.
+ """
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
\ No newline at end of file
diff --git a/modules/image/matting/modnet_mobilenetv2_matting/requirements.py b/modules/image/matting/modnet_mobilenetv2_matting/requirements.py
new file mode 100644
index 0000000000000000000000000000000000000000..7df0ef23928361724c3fadb8d87d6a3be869e58b
--- /dev/null
+++ b/modules/image/matting/modnet_mobilenetv2_matting/requirements.py
@@ -0,0 +1 @@
+paddleseg >= 2.3.0
diff --git a/modules/image/matting/modnet_resnet50vd_matting/README.md b/modules/image/matting/modnet_resnet50vd_matting/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..03ad69e6732d545861063c85a38e872ff6e60c5d
--- /dev/null
+++ b/modules/image/matting/modnet_resnet50vd_matting/README.md
@@ -0,0 +1,157 @@
+# modnet_resnet50vd_matting
+
+|模型名称|modnet_resnet50vd_matting|
+| :--- | :---: |
+|类别|图像-抠图|
+|网络|modnet_resnet50vd|
+|数据集|百度自建数据集|
+|是否支持Fine-tuning|否|
+|模型大小|535MB|
+|指标|SAD112.73|
+|最新更新日期|2021-12-03|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+
+ - 样例结果示例(左为原图,右为效果图):
+
+
+
+
+
+- ### 模型介绍
+
+ - Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。modnet_resnet50vd_matting可生成抠图结果。
+
+
+
+ - 更多详情请参考:[modnet_resnet50vd_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install modnet_resnet50vd_matting
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run modnet_resnet50vd_matting --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="modnet_resnet50vd_matting")
+
+ result = model.predict(["/PATH/TO/IMAGE"])
+ print(result)
+ ```
+
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ trimap_list,
+ visualization,
+ save_path):
+ ```
+
+ - 人像matting预测API,用于将输入图片中的人像分割出来。
+
+ - 参数
+
+ - image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
+ - trimap_list(list(str | numpy.ndarray)):trimap输入路径或者灰度图单通道格式图片。
+ - visualization (bool): 是否进行可视化,默认为False。
+ - save_path (str): 当visualization为True时,保存图片的路径,默认为"modnet_resnet50vd_matting_output"。
+
+ - 返回
+
+ - result (list(numpy.ndarray)):模型分割结果:
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署人像matting在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m modnet_resnet50vd_matting
+ ```
+
+ - 这样就完成了一个人像matting在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/modnet_resnet50vd_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/matting/modnet_resnet50vd_matting/README_en.md b/modules/image/matting/modnet_resnet50vd_matting/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..2a6d4e463d2196d3874a8b87892312cb0dc49b31
--- /dev/null
+++ b/modules/image/matting/modnet_resnet50vd_matting/README_en.md
@@ -0,0 +1,156 @@
+# modnet_resnet50vd_matting
+
+|Module Name|modnet_resnet50vd_matting|
+| :--- | :---: |
+|Category|Image Matting|
+|Network|modnet_resnet50vd|
+|Dataset|Baidu self-built dataset|
+|Support Fine-tuning|No|
+|Module Size|535MB|
+|Data Indicators|SAD104.14|
+|Latest update date|2021-12-03|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
+
+
+
+ - For more information, please refer to: [modnet_resnet50vd_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install modnet_resnet50vd_matting
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run modnet_resnet50vd_matting --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="modnet_resnet50vd_matting")
+
+ result = model.predict(["/PATH/TO/IMAGE"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ trimap_list,
+ visualization,
+ save_path):
+ ```
+
+ - Prediction API for matting.
+
+ - **Parameter**
+
+ - image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\], BGR.
+ - trimap_list(list(str | numpy.ndarray)): Trimap path or trimap data, ndarray.shape is in the format \[H, W\], Gray. Default is None.
+ - visualization (bool): Whether to save the recognition results as picture files, default is False.
+ - save_path (str): Save path of images, "modnet_resnet50vd_matting_output" by default.
+
+ - **Return**
+
+ - result (list(numpy.ndarray)):The list of model results.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of matting.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m modnet_resnet50vd_matting
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/modnet_resnet50vd_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/matting/modnet_resnet50vd_matting/module.py b/modules/image/matting/modnet_resnet50vd_matting/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..b57c170a9e281c258fbce8102a52293d93ed0a9e
--- /dev/null
+++ b/modules/image/matting/modnet_resnet50vd_matting/module.py
@@ -0,0 +1,497 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import time
+import argparse
+from typing import Callable, Union, List, Tuple
+
+import numpy as np
+import cv2
+import scipy
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+from modnet_resnet50vd_matting.resnet import ResNet50_vd
+import modnet_resnet50vd_matting.processor as P
+
+
+@moduleinfo(
+ name="modnet_resnet50vd_matting",
+ type="CV/matting",
+ author="paddlepaddle",
+ summary="modnet_resnet50vd_matting is a matting model",
+ version="1.0.0"
+)
+class MODNetResNet50Vd(nn.Layer):
+ """
+ The MODNet implementation based on PaddlePaddle.
+
+ The original article refers to
+ Zhanghan Ke, et, al. "Is a Green Screen Really Necessary for Real-Time Portrait Matting?"
+ (https://arxiv.org/pdf/2011.11961.pdf).
+
+ Args:
+ hr_channels(int, optional): The channels of high resolutions branch. Defautl: None.
+ pretrained(str, optional): The path of pretrianed model. Defautl: None.
+ """
+
+ def __init__(self, hr_channels:int = 32, pretrained=None):
+ super(MODNetResNet50Vd, self).__init__()
+
+ self.backbone = ResNet50_vd()
+ self.pretrained = pretrained
+
+ self.head = MODNetHead(
+ hr_channels=hr_channels, backbone_channels=self.backbone.feat_channels)
+ self.blurer = GaussianBlurLayer(1, 3)
+ self.transforms = P.Compose([P.LoadImages(), P.ResizeByShort(), P.ResizeToIntMult(), P.Normalize()])
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'modnet-resnet50_vd.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def preprocess(self, img: Union[str, np.ndarray] , transforms: Callable, trimap: Union[str, np.ndarray] = None):
+ data = {}
+ data['img'] = img
+ if trimap is not None:
+ data['trimap'] = trimap
+ data['gt_fields'] = ['trimap']
+ data['trans_info'] = []
+ data = self.transforms(data)
+ data['img'] = paddle.to_tensor(data['img'])
+ data['img'] = data['img'].unsqueeze(0)
+ if trimap is not None:
+ data['trimap'] = paddle.to_tensor(data['trimap'])
+ data['trimap'] = data['trimap'].unsqueeze((0, 1))
+
+ return data
+
+ def forward(self, inputs: dict):
+ x = inputs['img']
+ feat_list = self.backbone(x)
+ y = self.head(inputs=inputs, feat_list=feat_list)
+ return y
+
+ def predict(self, image_list: list, trimap_list: list = None, visualization: bool =False, save_path: str = "modnet_resnet50vd_matting_output"):
+ self.eval()
+ result= []
+ with paddle.no_grad():
+ for i, im_path in enumerate(image_list):
+ trimap = trimap_list[i] if trimap_list is not None else None
+ data = self.preprocess(img=im_path, transforms=self.transforms, trimap=trimap)
+ alpha_pred = self.forward(data)
+ alpha_pred = P.reverse_transform(alpha_pred, data['trans_info'])
+ alpha_pred = (alpha_pred.numpy()).squeeze()
+ alpha_pred = (alpha_pred * 255).astype('uint8')
+ alpha_pred = P.save_alpha_pred(alpha_pred, trimap)
+ result.append(alpha_pred)
+ if visualization:
+ if not os.path.exists(save_path):
+ os.makedirs(save_path)
+ img_name = str(time.time()) + '.png'
+ image_save_path = os.path.join(save_path, img_name)
+ cv2.imwrite(image_save_path, alpha_pred)
+
+ return result
+
+ @serving
+ def serving_method(self, images: list, trimaps:list = None, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [P.base64_to_cv2(image) for image in images]
+ if trimaps is not None:
+ trimap_decoder = [cv2.cvtColor(P.base64_to_cv2(trimap), cv2.COLOR_BGR2GRAY) for trimap in trimaps]
+ else:
+ trimap_decoder = None
+
+ outputs = self.predict(image_list=images_decode, trimap_list= trimap_decoder, **kwargs)
+ serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
+ results = {'data': serving_data}
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ args = self.parser.parse_args(argvs)
+ if args.trimap_path is not None:
+ trimap_list = [args.trimap_path]
+ else:
+ trimap_list = None
+
+ results = self.predict(image_list=[args.input_path], trimap_list=trimap_list, save_path=args.output_dir, visualization=args.visualization)
+
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default="modnet_resnet50vd_matting_output", help="The directory to save output images.")
+ self.arg_config_group.add_argument(
+ '--visualization', type=bool, default=True, help="whether to save output as images.")
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
+ self.arg_input_group.add_argument('--trimap_path', type=str, default=None, help="path to trimap.")
+
+
+
+class MODNetHead(nn.Layer):
+ """
+ Segmentation head.
+ """
+ def __init__(self, hr_channels: int, backbone_channels: int):
+ super().__init__()
+
+ self.lr_branch = LRBranch(backbone_channels)
+ self.hr_branch = HRBranch(hr_channels, backbone_channels)
+ self.f_branch = FusionBranch(hr_channels, backbone_channels)
+
+ def forward(self, inputs: paddle.Tensor, feat_list: list) -> paddle.Tensor:
+ pred_semantic, lr8x, [enc2x, enc4x] = self.lr_branch(feat_list)
+ pred_detail, hr2x = self.hr_branch(inputs['img'], enc2x, enc4x, lr8x)
+ pred_matte = self.f_branch(inputs['img'], lr8x, hr2x)
+ return pred_matte
+
+
+
+class FusionBranch(nn.Layer):
+ def __init__(self, hr_channels: int, enc_channels: int):
+ super().__init__()
+ self.conv_lr4x = Conv2dIBNormRelu(
+ enc_channels[2], hr_channels, 5, stride=1, padding=2)
+
+ self.conv_f2x = Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1)
+ self.conv_f = nn.Sequential(
+ Conv2dIBNormRelu(
+ hr_channels + 3, int(hr_channels / 2), 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ int(hr_channels / 2),
+ 1,
+ 1,
+ stride=1,
+ padding=0,
+ with_ibn=False,
+ with_relu=False))
+
+ def forward(self, img: paddle.Tensor, lr8x: paddle.Tensor, hr2x: paddle.Tensor) -> paddle.Tensor:
+ lr4x = F.interpolate(
+ lr8x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr4x = self.conv_lr4x(lr4x)
+ lr2x = F.interpolate(
+ lr4x, scale_factor=2, mode='bilinear', align_corners=False)
+
+ f2x = self.conv_f2x(paddle.concat((lr2x, hr2x), axis=1))
+ f = F.interpolate(
+ f2x, scale_factor=2, mode='bilinear', align_corners=False)
+ f = self.conv_f(paddle.concat((f, img), axis=1))
+ pred_matte = F.sigmoid(f)
+
+ return pred_matte
+
+
+class HRBranch(nn.Layer):
+ """
+ High Resolution Branch of MODNet
+ """
+
+ def __init__(self, hr_channels: int, enc_channels:int):
+ super().__init__()
+
+ self.tohr_enc2x = Conv2dIBNormRelu(
+ enc_channels[0], hr_channels, 1, stride=1, padding=0)
+ self.conv_enc2x = Conv2dIBNormRelu(
+ hr_channels + 3, hr_channels, 3, stride=2, padding=1)
+
+ self.tohr_enc4x = Conv2dIBNormRelu(
+ enc_channels[1], hr_channels, 1, stride=1, padding=0)
+ self.conv_enc4x = Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1)
+
+ self.conv_hr4x = nn.Sequential(
+ Conv2dIBNormRelu(
+ 2 * hr_channels + enc_channels[2] + 3,
+ 2 * hr_channels,
+ 3,
+ stride=1,
+ padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1))
+
+ self.conv_hr2x = nn.Sequential(
+ Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1))
+
+ self.conv_hr = nn.Sequential(
+ Conv2dIBNormRelu(
+ hr_channels + 3, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ hr_channels,
+ 1,
+ 1,
+ stride=1,
+ padding=0,
+ with_ibn=False,
+ with_relu=False))
+
+ def forward(self, img: paddle.Tensor, enc2x: paddle.Tensor, enc4x: paddle.Tensor, lr8x: paddle.Tensor) -> paddle.Tensor:
+ img2x = F.interpolate(
+ img, scale_factor=1 / 2, mode='bilinear', align_corners=False)
+ img4x = F.interpolate(
+ img, scale_factor=1 / 4, mode='bilinear', align_corners=False)
+
+ enc2x = self.tohr_enc2x(enc2x)
+ hr4x = self.conv_enc2x(paddle.concat((img2x, enc2x), axis=1))
+
+ enc4x = self.tohr_enc4x(enc4x)
+ hr4x = self.conv_enc4x(paddle.concat((hr4x, enc4x), axis=1))
+
+ lr4x = F.interpolate(
+ lr8x, scale_factor=2, mode='bilinear', align_corners=False)
+ hr4x = self.conv_hr4x(paddle.concat((hr4x, lr4x, img4x), axis=1))
+
+ hr2x = F.interpolate(
+ hr4x, scale_factor=2, mode='bilinear', align_corners=False)
+ hr2x = self.conv_hr2x(paddle.concat((hr2x, enc2x), axis=1))
+ pred_detail = None
+ return pred_detail, hr2x
+
+
+class LRBranch(nn.Layer):
+ """
+ Low Resolution Branch of MODNet
+ """
+ def __init__(self, backbone_channels: int):
+ super().__init__()
+ self.se_block = SEBlock(backbone_channels[4], reduction=4)
+ self.conv_lr16x = Conv2dIBNormRelu(
+ backbone_channels[4], backbone_channels[3], 5, stride=1, padding=2)
+ self.conv_lr8x = Conv2dIBNormRelu(
+ backbone_channels[3], backbone_channels[2], 5, stride=1, padding=2)
+ self.conv_lr = Conv2dIBNormRelu(
+ backbone_channels[2],
+ 1,
+ 3,
+ stride=2,
+ padding=1,
+ with_ibn=False,
+ with_relu=False)
+
+ def forward(self, feat_list: list) -> List[paddle.Tensor]:
+ enc2x, enc4x, enc32x = feat_list[0], feat_list[1], feat_list[4]
+
+ enc32x = self.se_block(enc32x)
+ lr16x = F.interpolate(
+ enc32x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr16x = self.conv_lr16x(lr16x)
+ lr8x = F.interpolate(
+ lr16x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr8x = self.conv_lr8x(lr8x)
+
+ pred_semantic = None
+ if self.training:
+ lr = self.conv_lr(lr8x)
+ pred_semantic = F.sigmoid(lr)
+
+ return pred_semantic, lr8x, [enc2x, enc4x]
+
+
+class IBNorm(nn.Layer):
+ """
+ Combine Instance Norm and Batch Norm into One Layer
+ """
+
+ def __init__(self, in_channels: int):
+ super().__init__()
+ self.bnorm_channels = in_channels // 2
+ self.inorm_channels = in_channels - self.bnorm_channels
+
+ self.bnorm = nn.BatchNorm2D(self.bnorm_channels)
+ self.inorm = nn.InstanceNorm2D(self.inorm_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ bn_x = self.bnorm(x[:, :self.bnorm_channels, :, :])
+ in_x = self.inorm(x[:, self.bnorm_channels:, :, :])
+
+ return paddle.concat((bn_x, in_x), 1)
+
+
+class Conv2dIBNormRelu(nn.Layer):
+ """
+ Convolution + IBNorm + Relu
+ """
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ padding: int = 0,
+ dilation:int = 1,
+ groups: int = 1,
+ bias_attr: paddle.ParamAttr = None,
+ with_ibn: bool = True,
+ with_relu: bool = True):
+
+ super().__init__()
+
+ layers = [
+ nn.Conv2D(
+ in_channels,
+ out_channels,
+ kernel_size,
+ stride=stride,
+ padding=padding,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=bias_attr)
+ ]
+
+ if with_ibn:
+ layers.append(IBNorm(out_channels))
+
+ if with_relu:
+ layers.append(nn.ReLU())
+
+ self.layers = nn.Sequential(*layers)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ return self.layers(x)
+
+
+class SEBlock(nn.Layer):
+ """
+ SE Block Proposed in https://arxiv.org/pdf/1709.01507.pdf
+ """
+
+ def __init__(self, num_channels: int, reduction:int = 1):
+ super().__init__()
+ self.pool = nn.AdaptiveAvgPool2D(1)
+ self.conv = nn.Sequential(
+ nn.Conv2D(
+ num_channels,
+ int(num_channels // reduction),
+ 1,
+ bias_attr=False), nn.ReLU(),
+ nn.Conv2D(
+ int(num_channels // reduction),
+ num_channels,
+ 1,
+ bias_attr=False), nn.Sigmoid())
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ w = self.pool(x)
+ w = self.conv(w)
+ return w * x
+
+
+class GaussianBlurLayer(nn.Layer):
+ """ Add Gaussian Blur to a 4D tensors
+ This layer takes a 4D tensor of {N, C, H, W} as input.
+ The Gaussian blur will be performed in given channel number (C) splitly.
+ """
+
+ def __init__(self, channels: int, kernel_size: int):
+ """
+ Args:
+ channels (int): Channel for input tensor
+ kernel_size (int): Size of the kernel used in blurring
+ """
+
+ super(GaussianBlurLayer, self).__init__()
+ self.channels = channels
+ self.kernel_size = kernel_size
+ assert self.kernel_size % 2 != 0
+
+ self.op = nn.Sequential(
+ nn.Pad2D(int(self.kernel_size / 2), mode='reflect'),
+ nn.Conv2D(
+ channels,
+ channels,
+ self.kernel_size,
+ stride=1,
+ padding=0,
+ bias_attr=False,
+ groups=channels))
+
+ self._init_kernel()
+ self.op[1].weight.stop_gradient = True
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ """
+ Args:
+ x (paddle.Tensor): input 4D tensor
+ Returns:
+ paddle.Tensor: Blurred version of the input
+ """
+
+ if not len(list(x.shape)) == 4:
+ print('\'GaussianBlurLayer\' requires a 4D tensor as input\n')
+ exit()
+ elif not x.shape[1] == self.channels:
+ print('In \'GaussianBlurLayer\', the required channel ({0}) is'
+ 'not the same as input ({1})\n'.format(
+ self.channels, x.shape[1]))
+ exit()
+
+ return self.op(x)
+
+ def _init_kernel(self):
+ sigma = 0.3 * ((self.kernel_size - 1) * 0.5 - 1) + 0.8
+
+ n = np.zeros((self.kernel_size, self.kernel_size))
+ i = int(self.kernel_size / 2)
+ n[i, i] = 1
+ kernel = scipy.ndimage.gaussian_filter(n, sigma)
+ kernel = kernel.astype('float32')
+ kernel = kernel[np.newaxis, np.newaxis, :, :]
+ paddle.assign(kernel, self.op[1].weight)
\ No newline at end of file
diff --git a/modules/image/matting/modnet_resnet50vd_matting/processor.py b/modules/image/matting/modnet_resnet50vd_matting/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..3ae79593f0d3dab19520c3c666ae4a06b81960dd
--- /dev/null
+++ b/modules/image/matting/modnet_resnet50vd_matting/processor.py
@@ -0,0 +1,207 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import random
+import base64
+from typing import Callable, Union, List, Tuple
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+from paddleseg.transforms import functional
+from PIL import Image
+
+
+class Compose:
+ """
+ Do transformation on input data with corresponding pre-processing and augmentation operations.
+ The shape of input data to all operations is [height, width, channels].
+ """
+
+ def __init__(self, transforms: Callable, to_rgb: bool = True):
+ if not isinstance(transforms, list):
+ raise TypeError('The transforms must be a list!')
+ self.transforms = transforms
+ self.to_rgb = to_rgb
+
+ def __call__(self, data: dict) -> dict:
+
+ if 'trans_info' not in data:
+ data['trans_info'] = []
+ for op in self.transforms:
+ data = op(data)
+ if data is None:
+ return None
+
+ data['img'] = np.transpose(data['img'], (2, 0, 1))
+ for key in data.get('gt_fields', []):
+ if len(data[key].shape) == 2:
+ continue
+ data[key] = np.transpose(data[key], (2, 0, 1))
+
+ return data
+
+
+class LoadImages:
+ """
+ Read images from image path.
+
+ Args:
+ to_rgb (bool, optional): If converting image to RGB color space. Default: True.
+ """
+ def __init__(self, to_rgb: bool = True):
+ self.to_rgb = to_rgb
+
+ def __call__(self, data: dict) -> dict:
+
+ if isinstance(data['img'], str):
+ data['img'] = cv2.imread(data['img'])
+
+ for key in data.get('gt_fields', []):
+ if isinstance(data[key], str):
+ data[key] = cv2.imread(data[key], cv2.IMREAD_UNCHANGED)
+ # if alpha and trimap has 3 channels, extract one.
+ if key in ['alpha', 'trimap']:
+ if len(data[key].shape) > 2:
+ data[key] = data[key][:, :, 0]
+
+ if self.to_rgb:
+ data['img'] = cv2.cvtColor(data['img'], cv2.COLOR_BGR2RGB)
+ for key in data.get('gt_fields', []):
+ if len(data[key].shape) == 2:
+ continue
+ data[key] = cv2.cvtColor(data[key], cv2.COLOR_BGR2RGB)
+
+ return data
+
+
+class ResizeByShort:
+ """
+ Resize the short side of an image to given size, and then scale the other side proportionally.
+
+ Args:
+ short_size (int): The target size of short side.
+ """
+
+ def __init__(self, short_size: int =512):
+ self.short_size = short_size
+
+ def __call__(self, data: dict) -> dict:
+
+ data['trans_info'].append(('resize', data['img'].shape[0:2]))
+ data['img'] = functional.resize_short(data['img'], self.short_size)
+ for key in data.get('gt_fields', []):
+ data[key] = functional.resize_short(data[key], self.short_size)
+ return data
+
+
+class ResizeToIntMult:
+ """
+ Resize to some int muitple, d.g. 32.
+ """
+
+ def __init__(self, mult_int: int = 32):
+ self.mult_int = mult_int
+
+ def __call__(self, data: dict) -> dict:
+ data['trans_info'].append(('resize', data['img'].shape[0:2]))
+
+ h, w = data['img'].shape[0:2]
+ rw = w - w % 32
+ rh = h - h % 32
+ data['img'] = functional.resize(data['img'], (rw, rh))
+ for key in data.get('gt_fields', []):
+ data[key] = functional.resize(data[key], (rw, rh))
+
+ return data
+
+
+class Normalize:
+ """
+ Normalize an image.
+
+ Args:
+ mean (list, optional): The mean value of a data set. Default: [0.5, 0.5, 0.5].
+ std (list, optional): The standard deviation of a data set. Default: [0.5, 0.5, 0.5].
+
+ Raises:
+ ValueError: When mean/std is not list or any value in std is 0.
+ """
+
+ def __init__(self, mean: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5), std: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5)):
+ self.mean = mean
+ self.std = std
+ if not (isinstance(self.mean, (list, tuple))
+ and isinstance(self.std, (list, tuple))):
+ raise ValueError(
+ "{}: input type is invalid. It should be list or tuple".format(
+ self))
+ from functools import reduce
+ if reduce(lambda x, y: x * y, self.std) == 0:
+ raise ValueError('{}: std is invalid!'.format(self))
+
+ def __call__(self, data: dict) -> dict:
+ mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
+ std = np.array(self.std)[np.newaxis, np.newaxis, :]
+ data['img'] = functional.normalize(data['img'], mean, std)
+ if 'fg' in data.get('gt_fields', []):
+ data['fg'] = functional.normalize(data['fg'], mean, std)
+ if 'bg' in data.get('gt_fields', []):
+ data['bg'] = functional.normalize(data['bg'], mean, std)
+
+ return data
+
+
+def reverse_transform(alpha: paddle.Tensor, trans_info: List[str]):
+ """recover pred to origin shape"""
+ for item in trans_info[::-1]:
+ if item[0] == 'resize':
+ h, w = item[1][0], item[1][1]
+ alpha = F.interpolate(alpha, [h, w], mode='bilinear')
+ elif item[0] == 'padding':
+ h, w = item[1][0], item[1][1]
+ alpha = alpha[:, :, 0:h, 0:w]
+ else:
+ raise Exception("Unexpected info '{}' in im_info".format(item[0]))
+ return alpha
+
+def save_alpha_pred(alpha: np.ndarray, trimap: np.ndarray = None):
+ """
+ The value of alpha is range [0, 1], shape should be [h,w]
+ """
+ if isinstance(trimap, str):
+ trimap = cv2.imread(trimap, 0)
+ alpha[trimap == 0] = 0
+ alpha[trimap == 255] = 255
+ alpha = (alpha).astype('uint8')
+ return alpha
+
+
+def cv2_to_base64(image: np.ndarray):
+ """
+ Convert data from BGR to base64 format.
+ """
+ data = cv2.imencode('.png', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+def base64_to_cv2(b64str: str):
+ """
+ Convert data from base64 to BGR format.
+ """
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
\ No newline at end of file
diff --git a/modules/image/matting/modnet_resnet50vd_matting/resnet.py b/modules/image/matting/modnet_resnet50vd_matting/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..19abe41c8e47ca297941eb44e7ffc49e63b996da
--- /dev/null
+++ b/modules/image/matting/modnet_resnet50vd_matting/resnet.py
@@ -0,0 +1,332 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+from paddleseg.models import layers
+from paddleseg.utils import utils
+
+__all__ = ["ResNet50_vd"]
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(
+ self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ ):
+ super(ConvBNLayer, self).__init__()
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = nn.AvgPool2D(
+ kernel_size=2, stride=2, padding=0, ceil_mode=True)
+ self._conv = nn.Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False)
+
+ self._batch_norm = layers.SyncBatchNorm(out_channels)
+ self._act_op = layers.Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ """Residual bottleneck block"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1):
+ super(BottleneckBlock, self).__init__()
+
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu')
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation)
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True)
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+
+ ####################################################################
+ # If given dilation rate > 1, using corresponding padding.
+ # The performance drops down without the follow padding.
+ if self.dilation > 1:
+ padding = self.dilation
+ y = F.pad(y, [padding, padding, padding, padding])
+ #####################################################################
+
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = paddle.add(x=short, y=conv2)
+ y = F.relu(y)
+ return y
+
+
+class BasicBlock(nn.Layer):
+ """Basic residual block"""
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False):
+ super(BasicBlock, self).__init__()
+ self.stride = stride
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu')
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ act=None)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first else True)
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = paddle.add(x=short, y=conv1)
+ y = F.relu(y)
+
+ return y
+
+
+class ResNet_vd(nn.Layer):
+ """
+ The ResNet_vd implementation based on PaddlePaddle.
+
+ The original article refers to Jingdong
+ Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+ (https://arxiv.org/pdf/1812.01187.pdf).
+
+ """
+
+ def __init__(self,
+ input_channels: int = 3,
+ layers: int = 50,
+ output_stride: int = 32,
+ multi_grid: tuple = (1, 1, 1),
+ pretrained: str = None):
+ super(ResNet_vd, self).__init__()
+
+ self.conv1_logit = None # for gscnn shape stream
+ self.layers = layers
+ supported_layers = [18, 34, 50, 101, 152, 200]
+ assert layers in supported_layers, \
+ "supported layers are {} but input layer is {}".format(
+ supported_layers, layers)
+
+ if layers == 18:
+ depth = [2, 2, 2, 2]
+ elif layers == 34 or layers == 50:
+ depth = [3, 4, 6, 3]
+ elif layers == 101:
+ depth = [3, 4, 23, 3]
+ elif layers == 152:
+ depth = [3, 8, 36, 3]
+ elif layers == 200:
+ depth = [3, 12, 48, 3]
+ num_channels = [64, 256, 512, 1024
+ ] if layers >= 50 else [64, 64, 128, 256]
+ num_filters = [64, 128, 256, 512]
+
+ # for channels of four returned stages
+ self.feat_channels = [c * 4 for c in num_filters
+ ] if layers >= 50 else num_filters
+ self.feat_channels = [64] + self.feat_channels
+
+ dilation_dict = None
+ if output_stride == 8:
+ dilation_dict = {2: 2, 3: 4}
+ elif output_stride == 16:
+ dilation_dict = {3: 2}
+
+ self.conv1_1 = ConvBNLayer(
+ in_channels=input_channels,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu')
+ self.conv1_2 = ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu')
+ self.conv1_3 = ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu')
+ self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+
+ # self.block_list = []
+ self.stage_list = []
+ if layers >= 50:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ if layers in [101, 152] and block == 2:
+ if i == 0:
+ conv_name = "res" + str(block + 2) + "a"
+ else:
+ conv_name = "res" + str(block + 2) + "b" + str(i)
+ else:
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+
+ ###############################################################################
+ # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+
+ # Actually block here is 'stage', and i is 'block' in 'stage'
+ # At the stage 4, expand the the dilation_rate if given multi_grid
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ ###############################################################################
+
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ dilation=dilation_rate))
+
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+ else:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+ basic_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BasicBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block],
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0))
+ block_list.append(basic_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ self.pretrained = pretrained
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ feat_list = []
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ feat_list.append(y)
+
+ y = self.pool2d_max(y)
+
+ # A feature list saves the output feature map of each stage.
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+
+ return feat_list
+
+
+def ResNet50_vd(**args):
+ model = ResNet_vd(layers=50, **args)
+ return model
diff --git a/modules/image/semantic_segmentation/ExtremeC3_Portrait_Segmentation/README.md b/modules/image/semantic_segmentation/ExtremeC3_Portrait_Segmentation/README.md
index 17d2979a19b9df5963b341e42347921e40c94c40..b413d0315d1f34e4f076ae2814c9d9ab62730544 100644
--- a/modules/image/semantic_segmentation/ExtremeC3_Portrait_Segmentation/README.md
+++ b/modules/image/semantic_segmentation/ExtremeC3_Portrait_Segmentation/README.md
@@ -44,7 +44,7 @@
## 三、模型API预测
-- ### 1、代码示例
+- ### 1、预测代码示例
```python
import cv2
@@ -60,27 +60,27 @@
visualization=False)
```
- - ### 2、API
-
- ```python
- def Segmentation(
- images=None,
- paths=None,
- batch_size=1,
- output_dir='output',
- visualization=False):
- ```
- - 人像分割 API
-
- - **参数**
- * images (list[np.ndarray]) : 输入图像数据列表(BGR)
- * paths (list[str]) : 输入图像路径列表
- * batch_size (int) : 数据批大小
- * output_dir (str) : 可视化图像输出目录
- * visualization (bool) : 是否可视化
-
- - **返回**
- * results (list[dict{"mask":np.ndarray,"result":np.ndarray}]): 输出图像数据列表
+- ### 2、API
+
+```python
+def Segmentation(
+ images=None,
+ paths=None,
+ batch_size=1,
+ output_dir='output',
+ visualization=False):
+```
+- 人像分割 API
+
+- **参数**
+ * images (list[np.ndarray]) : 输入图像数据列表(BGR)
+ * paths (list[str]) : 输入图像路径列表
+ * batch_size (int) : 数据批大小
+ * output_dir (str) : 可视化图像输出目录
+ * visualization (bool) : 是否可视化
+
+- **返回**
+ * results (list[dict{"mask":np.ndarray,"result":np.ndarray}]): 输出图像数据列表
## 四、更新历史
diff --git a/modules/image/semantic_segmentation/ExtremeC3_Portrait_Segmentation/README_en.md b/modules/image/semantic_segmentation/ExtremeC3_Portrait_Segmentation/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..15ac80e05b615611a75591b5e1e6d42a66521564
--- /dev/null
+++ b/modules/image/semantic_segmentation/ExtremeC3_Portrait_Segmentation/README_en.md
@@ -0,0 +1,89 @@
+# ExtremeC3_Portrait_Segmentation
+
+|Module Name|ExtremeC3_Portrait_Segmentation|
+| :--- | :---: |
+|Category|image segmentation|
+|Network |ExtremeC3|
+|Dataset|EG1800, Baidu fashion dataset|
+|Fine-tuning supported or not|No|
+|Module Size|0.038MB|
+|Data indicators|-|
+|Latest update date|2021-02-26|
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+ * ExtremeC3_Portrait_Segmentation is a light weigth module based on ExtremeC3 to achieve portrait segmentation.
+
+ * For more information, please refer to: [ExtremeC3_Portrait_Segmentation](https://github.com/clovaai/ext_portrait_segmentation).
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ExtremeC3_Portrait_Segmentation
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ ```python
+ import cv2
+ import paddlehub as hub
+
+ model = hub.Module(name='ExtremeC3_Portrait_Segmentation')
+
+ result = model.Segmentation(
+ images=[cv2.imread('/PATH/TO/IMAGE')],
+ paths=None,
+ batch_size=1,
+ output_dir='output',
+ visualization=False)
+ ```
+
+- ### 2、API
+
+ ```python
+ def Segmentation(
+ images=None,
+ paths=None,
+ batch_size=1,
+ output_dir='output',
+ visualization=False):
+ ```
+ - Prediction API, used for portrait segmentation.
+
+ - **Parameter**
+ * images (list[np.ndarray]) : image data, ndarray.shape is in the format [H, W, C], BGR;
+ * paths (list[str]) :image path
+ * batch_size (int) : batch size
+ * output_dir (str) : save path of images, 'output' by default.
+ * visualization (bool) : whether to save the segmentation results as picture files.
+ - **Return**
+ * results (list[dict{"mask":np.ndarray,"result":np.ndarray}]): list of recognition results.
+
+## IV. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/Pneumonia_CT_LKM_PP/README_en.md b/modules/image/semantic_segmentation/Pneumonia_CT_LKM_PP/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..397441dfd52a074a9e9ca9775a0b54f98d02027e
--- /dev/null
+++ b/modules/image/semantic_segmentation/Pneumonia_CT_LKM_PP/README_en.md
@@ -0,0 +1,91 @@
+# Pneumonia_CT_LKM_PP
+
+|Module Name|Pneumonia_CT_LKM_PP|
+| :--- | :---: |
+|Category|Image segmentation|
+|Network |-|
+|Dataset|-|
+|Fine-tuning supported or not|No|
+|Module Size|35M|
+|Data indicators|-|
+|Latest update date|2021-02-26|
+
+
+## I. Basic Information
+
+
+- ### Module Introduction
+
+ - Pneumonia CT analysis model (Pneumonia-CT-LKM-PP) can efficiently complete the detection of lesions and outline the patient's CT images. Through post-processing codes, the number, volume, and lesions of lung lesions can be analyzed. This model has been fully trained by high-resolution and low-resolution CT image data, which can adapt to the examination data collected by different levels of CT imaging equipment.
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install Pneumonia_CT_LKM_PP==1.0.0
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ ```python
+ import paddlehub as hub
+
+ pneumonia = hub.Module(name="Pneumonia_CT_LKM_PP")
+
+ input_only_lesion_np_path = "/PATH/TO/ONLY_LESION_NP"
+ input_both_lesion_np_path = "/PATH/TO/LESION_NP"
+ input_both_lung_np_path = "/PATH/TO/LUNG_NP"
+
+ # set input dict
+ input_dict = {"image_np_path": [
+ [input_only_lesion_np_path],
+ [input_both_lesion_np_path, input_both_lung_np_path],
+ ]}
+
+ # execute predict and print the result
+ results = pneumonia.segmentation(data=input_dict)
+ for result in results:
+ print(result)
+
+ ```
+
+
+- ### 2、API
+
+ - ```python
+ def segmentation(data)
+ ```
+
+ - Prediction API, used for CT analysis of pneumonia.
+
+ - **Parameter**
+
+ * data (dict): key is "image_np_path", value is the list of results which contains lesion and lung segmentation masks.
+
+
+ - **Return**
+
+ * result (list\[dict\]): the list of recognition results, where each element is dict and each field is:
+ * input_lesion_np_path: input path of lesion.
+ * output_lesion_np: segmentation result path of lesion.
+ * input_lung_np_path: input path of lung.
+ * output_lung_np:segmentation result path of lung.
+
+
+## IV. Release Note
+
+* 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/Pneumonia_CT_LKM_PP_lung/README_en.md b/modules/image/semantic_segmentation/Pneumonia_CT_LKM_PP_lung/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..067ab57f3f189b02df2f578d5e58d77acb9e9620
--- /dev/null
+++ b/modules/image/semantic_segmentation/Pneumonia_CT_LKM_PP_lung/README_en.md
@@ -0,0 +1,91 @@
+# Pneumonia_CT_LKM_PP_lung
+
+|Module Name|Pneumonia_CT_LKM_PP_lung|
+| :--- | :---: |
+|Category|Image segmentation|
+|Network |-|
+|Dataset|-|
+|Fine-tuning supported or not|No|
+|Module Size|35M|
+|Data indicators|-|
+|Latest update date|2021-02-26|
+
+
+## I. Basic Information
+
+
+- ### Module Introduction
+
+ - Pneumonia CT analysis model (Pneumonia-CT-LKM-PP) can efficiently complete the detection of lesions and outline the patient's CT images. Through post-processing codes, the number, volume, and lesions of lung lesions can be analyzed. This model has been fully trained by high-resolution and low-resolution CT image data, which can adapt to the examination data collected by different levels of CT imaging equipment. (This module is a submodule of Pneumonia_CT_LKM_PP.)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install Pneumonia_CT_LKM_PP_lung==1.0.0
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ ```python
+ import paddlehub as hub
+
+ pneumonia = hub.Module(name="Pneumonia_CT_LKM_PP_lung")
+
+ input_only_lesion_np_path = "/PATH/TO/ONLY_LESION_NP"
+ input_both_lesion_np_path = "/PATH/TO/LESION_NP"
+ input_both_lung_np_path = "/PATH/TO/LUNG_NP"
+
+ # set input dict
+ input_dict = {"image_np_path": [
+ [input_only_lesion_np_path],
+ [input_both_lesion_np_path, input_both_lung_np_path],
+ ]}
+
+ # execute predict and print the result
+ results = pneumonia.segmentation(data=input_dict)
+ for result in results:
+ print(result)
+
+ ```
+
+
+- ### 2、API
+
+ - ```python
+ def segmentation(data)
+ ```
+
+ - Prediction API, used for CT analysis of pneumonia.
+
+ - **Parameter**
+
+ * data (dict): Key is "image_np_path", value is the list of results which contains lesion and lung segmentation masks.
+
+
+ - **Return**
+
+ * result (list\[dict\]): The list of recognition results, where each element is dict and each field is:
+ * input_lesion_np_path: Input path of lesion.
+ * output_lesion_np: Segmentation result path of lesion.
+ * input_lung_np_path: Input path of lung.
+ * output_lung_np: Segmentation result path of lung.
+
+
+## IV. Release Note
+
+* 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/U2Net/README.md b/modules/image/semantic_segmentation/U2Net/README.md
index bedd1cc65feebb68d754814eecbbbb03d35397bf..535b8fc426f37cdc87fe0168ab4025cd44a150b7 100644
--- a/modules/image/semantic_segmentation/U2Net/README.md
+++ b/modules/image/semantic_segmentation/U2Net/README.md
@@ -43,7 +43,7 @@
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
-- ### 1、代码示例
+- ### 1、预测代码示例
```python
import cv2
diff --git a/modules/image/semantic_segmentation/U2Net/README_en.md b/modules/image/semantic_segmentation/U2Net/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..4cea82d051779812790eb09f8571fc8d7a5b8d01
--- /dev/null
+++ b/modules/image/semantic_segmentation/U2Net/README_en.md
@@ -0,0 +1,96 @@
+# U2Net
+
+|Module Name |U2Net|
+| :--- | :---: |
+|Category |Image segmentation|
+|Network |U^2Net|
+|Dataset|-|
+|Fine-tuning supported or not|No|
+|Module Size |254MB|
+|Data indicators|-|
+|Latest update date|2021-02-26|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+
+- ### Module Introduction
+
+ - Network architecture:
+
+
+
+
+ - For more information, please refer to: [U2Net](https://github.com/xuebinqin/U-2-Net)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+ - ```shell
+ $ hub install U2Net
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ ```python
+ import cv2
+ import paddlehub as hub
+
+ model = hub.Module(name='U2Net')
+
+ result = model.Segmentation(
+ images=[cv2.imread('/PATH/TO/IMAGE')],
+ paths=None,
+ batch_size=1,
+ input_size=320,
+ output_dir='output',
+ visualization=True)
+ ```
+ - ### 2、API
+
+ ```python
+ def Segmentation(
+ images=None,
+ paths=None,
+ batch_size=1,
+ input_size=320,
+ output_dir='output',
+ visualization=False):
+ ```
+ - Prediction API, obtaining segmentation result.
+
+ - **Parameter**
+ * images (list[np.ndarray]) : Image data, ndarray.shape is in the format [H, W, C], BGR.
+ * paths (list[str]) : Image path.
+ * batch_size (int) : Batch size.
+ * input_size (int) : Input image size, default is 320.
+ * output_dir (str) : Save path of images, 'output' by default.
+ * visualization (bool) : Whether to save the results as picture files.
+
+ - **Return**
+ * results (list[np.ndarray]): The list of segmentation results.
+
+## IV. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/U2Netp/README.md b/modules/image/semantic_segmentation/U2Netp/README.md
index b476a9f35007e4a74398d95998c4998f6d2c2c13..267409a5d414a59eee45b04e6d5ef63d430607a1 100644
--- a/modules/image/semantic_segmentation/U2Netp/README.md
+++ b/modules/image/semantic_segmentation/U2Netp/README.md
@@ -47,7 +47,7 @@
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
-- ### 1、代码示例
+- ### 1、预测代码示例
```python
import cv2
diff --git a/modules/image/semantic_segmentation/U2Netp/README_en.md b/modules/image/semantic_segmentation/U2Netp/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..ffb0bac24f0d46294b94b7e65fad784c73a43854
--- /dev/null
+++ b/modules/image/semantic_segmentation/U2Netp/README_en.md
@@ -0,0 +1,96 @@
+# U2Netp
+
+|Module Name |U2Netp|
+| :--- | :---: |
+|Category |Image segmentation|
+|Network |U^2Net|
+|Dataset|-|
+|Fine-tuning supported or not|No|
+|Module Size |6.7MB|
+|Data indicators|-|
+|Latest update date|2021-02-26|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+
+- ### Module Introduction
+
+ - Network architecture:
+
+
+
+
+ - For more information, please refer to: [U2Net](https://github.com/xuebinqin/U-2-Net)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+ - ```shell
+ $ hub install U2Netp
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ ```python
+ import cv2
+ import paddlehub as hub
+
+ model = hub.Module(name='U2Netp')
+
+ result = model.Segmentation(
+ images=[cv2.imread('/PATH/TO/IMAGE')],
+ paths=None,
+ batch_size=1,
+ input_size=320,
+ output_dir='output',
+ visualization=True)
+ ```
+ - ### 2、API
+
+ ```python
+ def Segmentation(
+ images=None,
+ paths=None,
+ batch_size=1,
+ input_size=320,
+ output_dir='output',
+ visualization=False):
+ ```
+ - Prediction API, obtaining segmentation result.
+
+ - **Parameter**
+ * images (list[np.ndarray]) : Image data, ndarray.shape is in the format [H, W, C], BGR.
+ * paths (list[str]) : Image path.
+ * batch_size (int) : Batch size.
+ * input_size (int) : Input image size, default is 320.
+ * output_dir (str) : Save path of images, 'output' by default.
+ * visualization (bool) : Whether to save the results as picture files.
+
+ - **Return**
+ * results (list[np.ndarray]): The list of segmentation results.
+
+## IV. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/ace2p/README.md b/modules/image/semantic_segmentation/ace2p/README.md
index 710c2424a45298d86b1486afbf751eb874ae4764..12b23cf4f1beed338058a89e64a0ac1d854e3892 100644
--- a/modules/image/semantic_segmentation/ace2p/README.md
+++ b/modules/image/semantic_segmentation/ace2p/README.md
@@ -57,10 +57,10 @@
- ### 1、命令行预测
```shell
- $ hub install ace2p==1.1.0
+ $ hub run ace2p --input_path "/PATH/TO/IMAGE"
```
- - ### 2、代码示例
+ - ### 2、预测代码示例
```python
import paddlehub as hub
@@ -70,49 +70,49 @@
result = human_parser.segmentation(images=[cv2.imread('/PATH/TO/IMAGE')])
```
- - ### 3、API
+ - ### 3、API
- ```python
- def segmentation(images=None,
- paths=None,
- batch_size=1,
- use_gpu=False,
- output_dir='ace2p_output',
- visualization=False):
- ```
+ ```python
+ def segmentation(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ output_dir='ace2p_output',
+ visualization=False):
+ ```
- - 预测API,用于图像分割得到人体解析。
+ - 预测API,用于图像分割得到人体解析。
- - **参数**
+ - **参数**
- * images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- * paths (list\[str\]): 图片的路径;
- * batch\_size (int): batch 的大小;
- * use\_gpu (bool): 是否使用 GPU;
- * output\_dir (str): 保存处理结果的文件目录;
- * visualization (bool): 是否将识别结果保存为图片文件。
+ * images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
+ * paths (list\[str\]): 图片的路径;
+ * batch\_size (int): batch 的大小;
+ * use\_gpu (bool): 是否使用 GPU;
+ * output\_dir (str): 保存处理结果的文件目录;
+ * visualization (bool): 是否将识别结果保存为图片文件。
- - **返回**
+ - **返回**
- * res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,关键字有'path', 'data',相应的取值为:
- * path (str): 原输入图片的路径;
- * data (numpy.ndarray): 图像分割得到的结果,shape 为`H * W`,元素的取值为0-19,表示每个像素的分类结果,映射顺序与下面的调色板相同。
+ * res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,关键字有'path', 'data',相应的取值为:
+ * path (str): 原输入图片的路径;
+ * data (numpy.ndarray): 图像分割得到的结果,shape 为`H * W`,元素的取值为0-19,表示每个像素的分类结果,映射顺序与下面的调色板相同。
- ```python
- def save_inference_model(dirname,
- model_filename=None,
- params_filename=None,
- combined=True)
- ```
+ ```python
+ def save_inference_model(dirname,
+ model_filename=None,
+ params_filename=None,
+ combined=True)
+ ```
- - 将模型保存到指定路径。
+ - 将模型保存到指定路径。
- - **参数**
+ - **参数**
- * dirname: 存在模型的目录名称
- * model\_filename: 模型文件名称,默认为\_\_model\_\_
- * params\_filename: 参数文件名称,默认为\_\_params\_\_(仅当`combined`为True时生效)
- * combined: 是否将参数保存到统一的一个文件中。
+ * dirname: 存在模型的目录名称
+ * model\_filename: 模型文件名称,默认为\_\_model\_\_
+ * params\_filename: 参数文件名称,默认为\_\_params\_\_(仅当`combined`为True时生效)
+ * combined: 是否将参数保存到统一的一个文件中。
## 四、服务部署
diff --git a/modules/image/semantic_segmentation/ace2p/README_en.md b/modules/image/semantic_segmentation/ace2p/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..3fa0c273e3b3095ce8ba7b8abf97543e3be6ca48
--- /dev/null
+++ b/modules/image/semantic_segmentation/ace2p/README_en.md
@@ -0,0 +1,184 @@
+# ace2p
+
+|Module Name|ace2p|
+| :--- | :---: |
+|Category|Image segmentation|
+|Network|ACE2P|
+|Dataset|LIP|
+|Fine-tuning supported or not|No|
+|Module Size|259MB|
+|Data indicators|-|
+|Latest update date |2021-02-26|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Network architecture:
+
+
+
+
+ - Color palette
+
+
+
+
+
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - Human Parsing is a fine-grained semantic segmentation task that aims to identify the components (for example, body parts and clothing) of a human image at the pixel level. The PaddleHub Module uses ResNet101 as the backbone network, and accepts input image sizes of 473x473x3.
+
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ace2p
+ ```
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run ace2p --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ human_parser = hub.Module(name="ace2p")
+ result = human_parser.segmentation(images=[cv2.imread('/PATH/TO/IMAGE')])
+ ```
+
+- ### 3、API
+
+ - ```python
+ def segmentation(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ output_dir='ace2p_output',
+ visualization=False):
+ ```
+
+ - Prediction API, used for human parsing.
+
+ - **Parameter**
+
+ * images (list\[numpy.ndarray\]): Image data, ndarray.shape is in the format [H, W, C], BGR.
+ * paths (list\[str\]): Image path.
+ * batch\_size (int): Batch size.
+ * use\_gpu (bool): Use GPU or not. **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**
+ * output\_dir (str): Save path of output, default is 'ace2p_output'.
+ * visualization (bool): Whether to save the recognition results as picture files.
+
+ - **Return**
+
+ * res (list\[dict\]): The list of recognition results, where each element is dict and each field is:
+ * save\_path (str, optional): Save path of the result.
+ * data (numpy.ndarray): The result of portrait segmentation.
+
+
+ - ```python
+ def save_inference_model(dirname,
+ model_filename=None,
+ params_filename=None,
+ combined=True)
+ ```
+
+ - Save the model to the specified path.
+
+ - **Parameters**
+ * dirname: Save path.
+ * model\_filename: mMdel file name,defalt is \_\_model\_\_
+ * params\_filename: Parameter file name,defalt is \_\_params\_\_(Only takes effect when `combined` is True)
+ * combined: Whether to save the parameters to a unified file.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of human parsing
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m ace2p
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+ # Send an HTTP request
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ace2p"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # print prediction results
+ print(base64_to_cv2(r.json()["results"][0]['data']))
+ ```
+
+
+## 五、更新历史
+
+- 1.0.0
+
+ First release
+
+* 1.1.0
+
+ Adapt to paddlehub2.0
diff --git a/modules/image/semantic_segmentation/bisenet_lane_segmentation/README.md b/modules/image/semantic_segmentation/bisenet_lane_segmentation/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..b9814fe7bb98ca34f13b0a94741a57d365ed035c
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenet_lane_segmentation/README.md
@@ -0,0 +1,151 @@
+# bisenet_lane_segmentation
+
+|模型名称|bisenet_lane_segmentation|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|bisenet|
+|数据集|TuSimple|
+|是否支持Fine-tuning|否|
+|模型大小|9.7MB|
+|指标|ACC96.09%|
+|最新更新日期|2021-12-03|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+
+ - 样例结果示例(左为原图,右为效果图):
+
+
+
+
+
+- ### 模型介绍
+
+ - 车道线分割是自动驾驶算法的一个范畴,可以用来辅助进行车辆定位和进行决策,早期已有基于传统图像处理的车道线检测方法,但是随着技术的演进,车道线检测任务所应对的场景越来越多样化,目前更多的方式是寻求在语义上对车道线存在位置的检测。bisenet_lane_segmentation是一个轻量化车道线分割模型。
+
+ - 更多详情请参考:[bisenet_lane_segmentation](https://github.com/PaddlePaddle/PaddleSeg)
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+ - Python >= 3.7+
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install bisenet_lane_segmentation
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run bisenet_lane_segmentation --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="bisenet_lane_segmentation")
+ result = model.predict(image_list=["/PATH/TO/IMAGE"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ visualization,
+ save_path):
+ ```
+
+ - 车道线分割预测API,用于将输入图片中的车道线分割出来。
+
+ - 参数
+
+ - image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
+ - visualization (bool): 是否进行可视化,默认为False。
+ - save_path (str): 当visualization为True时,保存图片的路径,默认为"bisenet_lane_segmentation_output"。
+
+ - 返回
+
+ - result (list(numpy.ndarray)):模型分割结果:
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署车道线分割在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m bisenet_lane_segmentation
+ ```
+
+ - 这样就完成了一个车道线分割在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/bisenet_lane_segmentation"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ #print(r.json())
+ mask = base64_to_cv2(r.json()["results"]['data'][0])
+ print(mask)
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
diff --git a/modules/image/semantic_segmentation/bisenet_lane_segmentation/README_en.md b/modules/image/semantic_segmentation/bisenet_lane_segmentation/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..8e6364bc34e44465d6ece095184f7eb1d8cedcd4
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenet_lane_segmentation/README_en.md
@@ -0,0 +1,154 @@
+# bisenet_lane_segmentation
+
+|Module Name|bisenet_lane_segmentation|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|bisenet|
+|Dataset|TuSimple|
+|Support Fine-tuning|No|
+|Module Size|9.7MB|
+|Data Indicators|ACC96.09%|
+|Latest update date|2021-12-03|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - Lane segmentation is a category of automatic driving algorithms, which can be used to assist vehicle positioning and decision-making. In the early days, there were lane detection methods based on traditional image processing, but with the evolution of technology, the scenes that lane detection tasks deal with More and more diversified, and more methods are currently seeking to detect the location of lane semantically. bisenet_lane_segmentation is a lightweight model for lane segmentation.
+
+
+
+ - For more information, please refer to: [bisenet_lane_segmentation](https://github.com/PaddlePaddle/PaddleSeg)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+ - Python >= 3.7+
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install bisenet_lane_segmentation
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run bisenet_lane_segmentation --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="bisenet_lane_segmentation")
+ result = model.predict(image_list=["/PATH/TO/IMAGE"])
+ print(result)
+
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ visualization,
+ save_path):
+ ```
+
+ - Prediction API for lane segmentation.
+
+ - **Parameter**
+
+ - image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
+ - visualization (bool): Whether to save the recognition results as picture files, default is False.
+ - save_path (str): Save path of images, "bisenet_lane_segmentation_output" by default.
+
+ - **Return**
+
+ - result (list(numpy.ndarray)):The list of model results.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of lane segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m bisenet_lane_segmentation
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/bisenet_lane_segmentation"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ #print(r.json())
+ mask = base64_to_cv2(r.json()["results"]['data'][0])
+ print(mask)
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/get_lane_coords.py b/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/get_lane_coords.py
new file mode 100644
index 0000000000000000000000000000000000000000..868f0bcc37ed850c90c6bec0616ac4e0b929b30f
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/get_lane_coords.py
@@ -0,0 +1,156 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# this code is based on
+# https://github.com/ZJULearning/resa/blob/main/datasets/tusimple.py
+
+import cv2
+import numpy as np
+
+
+class LaneProcessor:
+ def __init__(self,
+ num_classes=2,
+ ori_shape=(720, 1280),
+ cut_height=0,
+ y_pixel_gap=10,
+ points_nums=56,
+ thresh=0.6,
+ smooth=True):
+ super(LaneProcessor, self).__init__()
+ self.num_classes = num_classes
+ self.ori_shape = ori_shape
+ self.cut_height = cut_height
+ self.y_pixel_gap = y_pixel_gap
+ self.points_nums = points_nums
+ self.thresh = thresh
+ self.smooth = smooth
+
+ def get_lane_coords(self, seg_pred):
+ lane_coords_list = []
+ for batch in range(len(seg_pred)):
+ seg = seg_pred[batch]
+ lane_coords = self.heatmap2coords(seg)
+ for i in range(len(lane_coords)):
+ lane_coords[i] = sorted(
+ lane_coords[i], key=lambda pair: pair[1])
+ lane_coords_list.append(lane_coords)
+ return lane_coords_list
+
+ def process_gap(self, coordinate):
+ if any(x > 0 for x in coordinate):
+ start = [i for i, x in enumerate(coordinate) if x > 0][0]
+ end = [
+ i for i, x in reversed(list(enumerate(coordinate))) if x > 0
+ ][0]
+ lane = coordinate[start:end + 1]
+ # The line segment is not continuous
+ if any(x < 0 for x in lane):
+ gap_start = [
+ i for i, x in enumerate(lane[:-1])
+ if x > 0 and lane[i + 1] < 0
+ ]
+ gap_end = [
+ i + 1 for i, x in enumerate(lane[:-1])
+ if x < 0 and lane[i + 1] > 0
+ ]
+ gap_id = [i for i, x in enumerate(lane) if x < 0]
+ if len(gap_start) == 0 or len(gap_end) == 0:
+ return coordinate
+ for id in gap_id:
+ for i in range(len(gap_start)):
+ if i >= len(gap_end):
+ return coordinate
+ if id > gap_start[i] and id < gap_end[i]:
+ gap_width = float(gap_end[i] - gap_start[i])
+ # line interpolation
+ lane[id] = int((id - gap_start[i]) / gap_width *
+ lane[gap_end[i]] +
+ (gap_end[i] - id) / gap_width *
+ lane[gap_start[i]])
+ if not all(x > 0 for x in lane):
+ print("Gaps still exist!")
+ coordinate[start:end + 1] = lane
+ return coordinate
+
+ def get_coords(self, heat_map):
+ dst_height = self.ori_shape[0] - self.cut_height
+ coords = np.zeros(self.points_nums)
+ coords[:] = -2
+ pointCount = 0
+ for i in range(self.points_nums):
+ y_coord = dst_height - 10 - i * self.y_pixel_gap
+ y = int(y_coord / dst_height * heat_map.shape[0])
+ if y < 0:
+ break
+ prob_line = heat_map[y, :]
+ x = np.argmax(prob_line)
+ prob = prob_line[x]
+ if prob > self.thresh:
+ coords[i] = int(x / heat_map.shape[1] * self.ori_shape[1])
+ pointCount = pointCount + 1
+ if pointCount < 2:
+ coords[:] = -2
+ self.process_gap(coords)
+ return coords
+
+ def fix_outliers(self, coords):
+ data = [x for i, x in enumerate(coords) if x > 0]
+ index = [i for i, x in enumerate(coords) if x > 0]
+ if len(data) == 0:
+ return coords
+ diff = []
+ is_outlier = False
+ n = 1
+ x_gap = abs((data[-1] - data[0]) / (1.0 * (len(data) - 1)))
+ for idx, dt in enumerate(data):
+ if is_outlier == False:
+ t = idx - 1
+ n = 1
+ if idx == 0:
+ diff.append(0)
+ else:
+ diff.append(abs(data[idx] - data[t]))
+ if abs(data[idx] - data[t]) > n * (x_gap * 1.5):
+ n = n + 1
+ is_outlier = True
+ ind = index[idx]
+ coords[ind] = -1
+ else:
+ is_outlier = False
+
+ def heatmap2coords(self, seg_pred):
+ coordinates = []
+ for i in range(self.num_classes - 1):
+ heat_map = seg_pred[i + 1]
+ if self.smooth:
+ heat_map = cv2.blur(
+ heat_map, (9, 9), borderType=cv2.BORDER_REPLICATE)
+ coords = self.get_coords(heat_map)
+ indexes = [i for i, x in enumerate(coords) if x > 0]
+ if not indexes:
+ continue
+ self.add_coords(coordinates, coords)
+
+ if len(coordinates) == 0:
+ coords = np.zeros(self.points_nums)
+ self.add_coords(coordinates, coords)
+ return coordinates
+
+ def add_coords(self, coordinates, coords):
+ sub_lanes = []
+ for j in range(self.points_nums):
+ y_lane = self.ori_shape[0] - 10 - j * self.y_pixel_gap
+ x_lane = coords[j] if coords[j] > 0 else -2
+ sub_lanes.append([x_lane, y_lane])
+ coordinates.append(sub_lanes)
diff --git a/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/lane.py b/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/lane.py
new file mode 100644
index 0000000000000000000000000000000000000000..8a7a481570e993810079445a7f54a70bd2e41c57
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/lane.py
@@ -0,0 +1,141 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# this code is from https://github.com/TuSimple/tusimple-benchmark/blob/master/evaluate/lane.py
+
+import json as json
+import numpy as np
+from sklearn.linear_model import LinearRegression
+
+
+class LaneEval(object):
+ lr = LinearRegression()
+ pixel_thresh = 20
+ pt_thresh = 0.85
+
+ @staticmethod
+ def get_angle(xs, y_samples):
+ xs, ys = xs[xs >= 0], y_samples[xs >= 0]
+ if len(xs) > 1:
+ LaneEval.lr.fit(ys[:, None], xs)
+ k = LaneEval.lr.coef_[0]
+ theta = np.arctan(k)
+ else:
+ theta = 0
+ return theta
+
+ @staticmethod
+ def line_accuracy(pred, gt, thresh):
+ pred = np.array([p if p >= 0 else -100 for p in pred])
+ gt = np.array([g if g >= 0 else -100 for g in gt])
+ return np.sum(np.where(np.abs(pred - gt) < thresh, 1., 0.)) / len(gt)
+
+ @staticmethod
+ def bench(pred, gt, y_samples, running_time):
+ if any(len(p) != len(y_samples) for p in pred):
+ raise Exception('Format of lanes error.')
+ if running_time > 200 or len(gt) + 2 < len(pred):
+ return 0., 0., 1.
+ angles = [
+ LaneEval.get_angle(np.array(x_gts), np.array(y_samples))
+ for x_gts in gt
+ ]
+ threshs = [LaneEval.pixel_thresh / np.cos(angle) for angle in angles]
+ line_accs = []
+ fp, fn = 0., 0.
+ matched = 0.
+ for x_gts, thresh in zip(gt, threshs):
+ accs = [
+ LaneEval.line_accuracy(
+ np.array(x_preds), np.array(x_gts), thresh)
+ for x_preds in pred
+ ]
+ max_acc = np.max(accs) if len(accs) > 0 else 0.
+ if max_acc < LaneEval.pt_thresh:
+ fn += 1
+ else:
+ matched += 1
+ line_accs.append(max_acc)
+ fp = len(pred) - matched
+ if len(gt) > 4 and fn > 0:
+ fn -= 1
+ s = sum(line_accs)
+ if len(gt) > 4:
+ s -= min(line_accs)
+ return s / max(min(4.0, len(gt)),
+ 1.), fp / len(pred) if len(pred) > 0 else 0., fn / max(
+ min(len(gt), 4.), 1.)
+
+ @staticmethod
+ def bench_one_submit(pred_file, gt_file):
+ try:
+ json_pred = [
+ json.loads(line) for line in open(pred_file).readlines()
+ ]
+ except BaseException as e:
+ raise Exception('Fail to load json file of the prediction.')
+ json_gt = [json.loads(line) for line in open(gt_file).readlines()]
+ if len(json_gt) != len(json_pred):
+ raise Exception(
+ 'We do not get the predictions of all the test tasks')
+ gts = {l['raw_file']: l for l in json_gt}
+ accuracy, fp, fn = 0., 0., 0.
+ for pred in json_pred:
+ if 'raw_file' not in pred or 'lanes' not in pred or 'run_time' not in pred:
+ raise Exception(
+ 'raw_file or lanes or run_time not in some predictions.')
+ raw_file = pred['raw_file']
+ pred_lanes = pred['lanes']
+ run_time = pred['run_time']
+ if raw_file not in gts:
+ raise Exception(
+ 'Some raw_file from your predictions do not exist in the test tasks.'
+ )
+ gt = gts[raw_file]
+ gt_lanes = gt['lanes']
+ y_samples = gt['h_samples']
+ try:
+ a, p, n = LaneEval.bench(pred_lanes, gt_lanes, y_samples,
+ run_time)
+ except BaseException as e:
+ raise Exception('Format of lanes error.')
+ accuracy += a
+ fp += p
+ fn += n
+ num = len(gts)
+ # the first return parameter is the default ranking parameter
+ return json.dumps([{
+ 'name': 'Accuracy',
+ 'value': accuracy / num,
+ 'order': 'desc'
+ }, {
+ 'name': 'FP',
+ 'value': fp / num,
+ 'order': 'asc'
+ }, {
+ 'name': 'FN',
+ 'value': fn / num,
+ 'order': 'asc'
+ }]), accuracy / num, fp / num, fn / num
+
+
+if __name__ == '__main__':
+ import sys
+
+ try:
+ if len(sys.argv) != 3:
+ raise Exception('Invalid input arguments')
+ print(LaneEval.bench_one_submit(sys.argv[1], sys.argv[2]))
+ except Exception as e:
+ print(e.message)
+ sys.exit(e.message)
diff --git a/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/tusimple_processor.py b/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/tusimple_processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..6fa7fc55d2513e5bd2c4edeb78f761a8882466b2
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/tusimple_processor.py
@@ -0,0 +1,125 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+import cv2
+import json
+import paddle.nn as nn
+
+from .lane import LaneEval
+from .get_lane_coords import LaneProcessor
+
+
+def mkdir(path):
+ sub_dir = os.path.dirname(path)
+ if not os.path.exists(sub_dir):
+ os.makedirs(sub_dir)
+
+
+class TusimpleProcessor:
+ def __init__(self,
+ num_classes=2,
+ ori_shape=(720, 1280),
+ cut_height=0,
+ thresh=0.6,
+ test_gt_json=None,
+ save_dir='output/'):
+ super(TusimpleProcessor, self).__init__()
+ self.num_classes = num_classes
+ self.dump_to_json = []
+ self.save_dir = save_dir
+ self.test_gt_json = test_gt_json
+ self.color_map = [(255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0),
+ (255, 0, 255), (0, 255, 125), (50, 100, 50),
+ (100, 50, 100)]
+ self.laneProcessor = LaneProcessor(
+ num_classes=self.num_classes,
+ ori_shape=ori_shape,
+ cut_height=cut_height,
+ y_pixel_gap=10,
+ points_nums=56,
+ thresh=thresh,
+ smooth=True)
+
+ def dump_data_to_json(self,
+ output,
+ im_path,
+ run_time=0,
+ is_dump_json=True,
+ is_view=False):
+ seg_pred = output[0]
+ seg_pred = nn.functional.softmax(seg_pred, axis=1)
+ seg_pred = seg_pred.numpy()
+ lane_coords_list = self.laneProcessor.get_lane_coords(seg_pred)
+
+ for batch in range(len(seg_pred)):
+ lane_coords = lane_coords_list[batch]
+ path_list = im_path[batch].split("/")
+ if is_dump_json:
+ json_pred = {}
+ json_pred['lanes'] = []
+ json_pred['run_time'] = run_time * 1000
+ json_pred['h_sample'] = []
+
+ json_pred['raw_file'] = os.path.join(*path_list[-4:])
+ for l in lane_coords:
+ if len(l) == 0:
+ continue
+ json_pred['lanes'].append([])
+ for (x, y) in l:
+ json_pred['lanes'][-1].append(int(x))
+ for (x, y) in lane_coords[0]:
+ json_pred['h_sample'].append(y)
+ self.dump_to_json.append(json.dumps(json_pred))
+
+ if is_view:
+ img = cv2.imread(im_path[batch])
+ if is_dump_json:
+ img_name = '_'.join([x for x in path_list[-4:]])
+ sub_dir = 'visual_eval'
+ else:
+ img_name = os.path.basename(im_path[batch])
+ sub_dir = 'visual_points'
+ saved_path = os.path.join(self.save_dir, sub_dir, img_name)
+ self.draw(img, lane_coords, saved_path)
+
+ def predict(self, output, im_path):
+ self.dump_data_to_json(
+ output, [im_path], is_dump_json=False, is_view=True)
+
+ def bench_one_submit(self):
+ output_file = os.path.join(self.save_dir, 'pred.json')
+ if output_file is not None:
+ mkdir(output_file)
+ with open(output_file, "w+") as f:
+ for line in self.dump_to_json:
+ print(line, end="\n", file=f)
+
+ eval_rst, acc, fp, fn = LaneEval.bench_one_submit(
+ output_file, self.test_gt_json)
+ self.dump_to_json = []
+ return acc, fp, fn, eval_rst
+
+ def draw(self, img, coords, file_path=None):
+ for i, coord in enumerate(coords):
+ for x, y in coord:
+ if x <= 0 or y <= 0:
+ continue
+ cv2.circle(img, (int(x), int(y)), 4,
+ self.color_map[i % self.num_classes], 2)
+
+ if file_path is not None:
+ mkdir(file_path)
+ cv2.imwrite(file_path, img)
diff --git a/modules/image/semantic_segmentation/bisenet_lane_segmentation/module.py b/modules/image/semantic_segmentation/bisenet_lane_segmentation/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..29dcb93d36f994c831e5ee5a982bb06affc8193f
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenet_lane_segmentation/module.py
@@ -0,0 +1,165 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import time
+import argparse
+import os
+from typing import Union, List, Tuple
+
+import cv2
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo, runnable, serving
+import paddleseg.transforms as T
+from paddleseg.utils import logger, progbar, visualize
+from paddlehub.module.cv_module import ImageSegmentationModule
+import paddleseg.utils as utils
+from paddleseg.models import layers
+from paddleseg.models import BiSeNetV2
+
+from bisenet_lane_segmentation.processor import Crop, reverse_transform, cv2_to_base64, base64_to_cv2
+from bisenet_lane_segmentation.lane_processor.tusimple_processor import TusimpleProcessor
+
+@moduleinfo(
+ name="bisenet_lane_segmentation",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="BiSeNetLane is a lane segmentation model.",
+ version="1.0.0")
+class BiSeNetLane(nn.Layer):
+ """
+ The BiSeNetLane use BiseNet V2 to process lane segmentation .
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ lambd (float, optional): A factor for controlling the size of semantic branch channels. Default: 0.25.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 7,
+ lambd: float = 0.25,
+ align_corners: bool = False,
+ pretrained: str = None):
+ super(BiSeNetLane, self).__init__()
+
+ self.net = BiSeNetV2(
+ num_classes=num_classes,
+ lambd=lambd,
+ align_corners=align_corners,
+ pretrained=None)
+
+ self.transforms = [Crop(up_h_off=160), T.Resize([640, 368]), T.Normalize()]
+ self.cut_height = 160
+ self.postprocessor = TusimpleProcessor(num_classes=7, cut_height=160,)
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ logit_list = self.net(x)
+ return logit_list
+
+ def predict(self, image_list: list, visualization: bool = False, save_path: str = "bisenet_lane_segmentation_output") -> List[np.ndarray]:
+ self.eval()
+ result = []
+ with paddle.no_grad():
+ for i, im in enumerate(image_list):
+ if isinstance(im, str):
+ im = cv2.imread(im)
+
+ ori_shape = im.shape[:2]
+ for op in self.transforms:
+ outputs = op(im)
+ im = outputs[0]
+
+ im = np.transpose(im, (2, 0, 1))
+ im = im[np.newaxis, ...]
+ im = paddle.to_tensor(im)
+ logit = self.forward(im)[0]
+ pred = reverse_transform(logit, ori_shape, self.transforms, mode='bilinear')
+ pred = paddle.argmax(pred, axis=1, keepdim=True, dtype='int32')
+ pred = paddle.squeeze(pred[0])
+ pred = pred.numpy().astype('uint8')
+ if visualization:
+ color_map = visualize.get_color_map_list(256)
+ pred_mask = visualize.get_pseudo_color_map(pred, color_map)
+ if not os.path.exists(save_path):
+ os.makedirs(save_path)
+ img_name = str(time.time()) + '.png'
+ image_save_path = os.path.join(save_path, img_name)
+ pred_mask.save(image_save_path)
+ result.append(pred)
+ return result
+
+ @serving
+ def serving_method(self, images: str, **kwargs) -> dict:
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ outputs = self.predict(image_list=images_decode, **kwargs)
+ serving_data = [cv2_to_base64(outputs[i]) for i in range(len(outputs))]
+ results = {'data': serving_data}
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list) -> List[np.ndarray]:
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ args = self.parser.parse_args(argvs)
+
+ results = self.predict(image_list=[args.input_path], save_path=args.output_dir, visualization=args.visualization)
+
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default="bisenet_lane_segmentation_output", help="The directory to save output images.")
+ self.arg_config_group.add_argument(
+ '--visualization', type=bool, default=True, help="whether to save output as images.")
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
+
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/bisenet_lane_segmentation/processor.py b/modules/image/semantic_segmentation/bisenet_lane_segmentation/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..dc1cf08804a03cef641f7620a5fa2262713cce54
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenet_lane_segmentation/processor.py
@@ -0,0 +1,185 @@
+import base64
+import collections.abc
+from itertools import combinations
+from typing import Union, List, Tuple, Callable
+
+import numpy as np
+import cv2
+import paddle
+import paddle.nn.functional as F
+
+
+def get_reverse_list(ori_shape: list, transforms: Callable) -> list:
+ """
+ get reverse list of transform.
+
+ Args:
+ ori_shape (list): Origin shape of image.
+ transforms (list): List of transform.
+
+ Returns:
+ list: List of tuple, there are two format:
+ ('resize', (h, w)) The image shape before resize,
+ ('padding', (h, w)) The image shape before padding.
+ """
+ reverse_list = []
+ h, w = ori_shape[0], ori_shape[1]
+ for op in transforms:
+ if op.__class__.__name__ in ['Resize']:
+ reverse_list.append(('resize', (h, w)))
+ h, w = op.target_size[0], op.target_size[1]
+ if op.__class__.__name__ in ['Crop']:
+ reverse_list.append(('crop', (op.up_h_off, op.down_h_off),
+ (op.left_w_off, op.right_w_off)))
+ h = h - op.up_h_off
+ h = h - op.down_h_off
+ w = w - op.left_w_off
+ w = w - op.right_w_off
+ if op.__class__.__name__ in ['ResizeByLong']:
+ reverse_list.append(('resize', (h, w)))
+ long_edge = max(h, w)
+ short_edge = min(h, w)
+ short_edge = int(round(short_edge * op.long_size / long_edge))
+ long_edge = op.long_size
+ if h > w:
+ h = long_edge
+ w = short_edge
+ else:
+ w = long_edge
+ h = short_edge
+ if op.__class__.__name__ in ['ResizeByShort']:
+ reverse_list.append(('resize', (h, w)))
+ long_edge = max(h, w)
+ short_edge = min(h, w)
+ long_edge = int(round(long_edge * op.short_size / short_edge))
+ short_edge = op.short_size
+ if h > w:
+ h = long_edge
+ w = short_edge
+ else:
+ w = long_edge
+ h = short_edge
+ if op.__class__.__name__ in ['Padding']:
+ reverse_list.append(('padding', (h, w)))
+ w, h = op.target_size[0], op.target_size[1]
+ if op.__class__.__name__ in ['PaddingByAspectRatio']:
+ reverse_list.append(('padding', (h, w)))
+ ratio = w / h
+ if ratio == op.aspect_ratio:
+ pass
+ elif ratio > op.aspect_ratio:
+ h = int(w / op.aspect_ratio)
+ else:
+ w = int(h * op.aspect_ratio)
+ if op.__class__.__name__ in ['LimitLong']:
+ long_edge = max(h, w)
+ short_edge = min(h, w)
+ if ((op.max_long is not None) and (long_edge > op.max_long)):
+ reverse_list.append(('resize', (h, w)))
+ long_edge = op.max_long
+ short_edge = int(round(short_edge * op.max_long / long_edge))
+ elif ((op.min_long is not None) and (long_edge < op.min_long)):
+ reverse_list.append(('resize', (h, w)))
+ long_edge = op.min_long
+ short_edge = int(round(short_edge * op.min_long / long_edge))
+ if h > w:
+ h = long_edge
+ w = short_edge
+ else:
+ w = long_edge
+ h = short_edge
+ return reverse_list
+
+
+def reverse_transform(pred: paddle.Tensor, ori_shape: list, transforms: Callable, mode: str = 'nearest') -> paddle.Tensor:
+ """recover pred to origin shape"""
+ reverse_list = get_reverse_list(ori_shape, transforms)
+ for item in reverse_list[::-1]:
+ if item[0] == 'resize':
+ h, w = item[1][0], item[1][1]
+ # if paddle.get_device() == 'cpu':
+ # pred = paddle.cast(pred, 'uint8')
+ # pred = F.interpolate(pred, (h, w), mode=mode)
+ # pred = paddle.cast(pred, 'int32')
+ # else:
+ pred = F.interpolate(pred, (h, w), mode=mode)
+ elif item[0] == 'crop':
+ up_h_off, down_h_off = item[1][0], item[1][1]
+ left_w_off, right_w_off = item[2][0], item[2][1]
+ pred = F.pad(
+ pred, [left_w_off, right_w_off, up_h_off, down_h_off],
+ value=0,
+ mode='constant',
+ data_format="NCHW")
+ elif item[0] == 'padding':
+ h, w = item[1][0], item[1][1]
+ pred = pred[:, :, 0:h, 0:w]
+ else:
+ raise Exception("Unexpected info '{}' in im_info".format(item[0]))
+ return pred
+
+
+class Crop:
+ """
+ crop an image from four forwards.
+
+ Args:
+ up_h_off (int, optional): The cut height for image from up to down. Default: 0.
+ down_h_off (int, optional): The cut height for image from down to up . Default: 0.
+ left_w_off (int, optional): The cut height for image from left to right. Default: 0.
+ right_w_off (int, optional): The cut width for image from right to left. Default: 0.
+ """
+
+ def __init__(self, up_h_off: int = 0, down_h_off: int = 0, left_w_off: int = 0, right_w_off: int = 0):
+ self.up_h_off = up_h_off
+ self.down_h_off = down_h_off
+ self.left_w_off = left_w_off
+ self.right_w_off = right_w_off
+
+ def __call__(self, im: np.ndarray, label: np.ndarray = None) -> Tuple[np.ndarray]:
+ if self.up_h_off < 0 or self.down_h_off < 0 or self.left_w_off < 0 or self.right_w_off < 0:
+ raise Exception(
+ "up_h_off, down_h_off, left_w_off, right_w_off must equal or greater zero"
+ )
+
+ if self.up_h_off > 0 and self.up_h_off < im.shape[0]:
+ im = im[self.up_h_off:, :, :]
+ if label is not None:
+ label = label[self.up_h_off:, :]
+
+ if self.down_h_off > 0 and self.down_h_off < im.shape[0]:
+ im = im[:-self.down_h_off, :, :]
+ if label is not None:
+ label = label[:-self.down_h_off, :]
+
+ if self.left_w_off > 0 and self.left_w_off < im.shape[1]:
+ im = im[:, self.left_w_off:, :]
+ if label is not None:
+ label = label[:, self.left_w_off:]
+
+ if self.right_w_off > 0 and self.right_w_off < im.shape[1]:
+ im = im[:, :-self.right_w_off, :]
+ if label is not None:
+ label = label[:, :-self.right_w_off]
+
+ if label is None:
+ return (im, )
+ else:
+ return (im, label)
+
+def cv2_to_base64(image: np.ndarray) -> str:
+ """
+ Convert data from BGR to base64 format.
+ """
+ data = cv2.imencode('.png', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+def base64_to_cv2(b64str: str) -> np.ndarray:
+ """
+ Convert data from base64 to BGR format.
+ """
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
diff --git a/modules/image/semantic_segmentation/deeplabv3p_xception65_humanseg/README_en.md b/modules/image/semantic_segmentation/deeplabv3p_xception65_humanseg/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..1afa20b09c0da5a9b051fcbfc59f8d43c52ce908
--- /dev/null
+++ b/modules/image/semantic_segmentation/deeplabv3p_xception65_humanseg/README_en.md
@@ -0,0 +1,175 @@
+# deeplabv3p_xception65_humanseg
+
+|Module Name |deeplabv3p_xception65_humanseg|
+| :--- | :---: |
+|Category|Image segmentation|
+|Network|deeplabv3p|
+|Dataset|Baidu self-built dataset|
+|Fine-tuning supported or not|No|
+|Module Size|162MB|
+|Data indicators |-|
+|Latest update date|2021-02-26|
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - DeepLabv3+ model is trained by Baidu self-built dataset, which can be used for portrait segmentation.
+
+
+
+
+- For more information, please refer to: [deeplabv3p](https://github.com/PaddlePaddle/PaddleSeg)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install deeplabv3p_xception65_humanseg
+ ```
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ hub run deeplabv3p_xception65_humanseg --input_path "/PATH/TO/IMAGE"
+ ```
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ human_seg = hub.Module(name="deeplabv3p_xception65_humanseg")
+ result = human_seg.segmentation(images=[cv2.imread('/PATH/TO/IMAGE')])
+ ```
+
+- ### 3.API
+
+ - ```python
+ def segmentation(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ visualization=False,
+ output_dir='humanseg_output')
+ ```
+
+ - Prediction API, generating segmentation result.
+
+ - **Parameter**
+ * images (list\[numpy.ndarray\]): Image data, ndarray.shape is in the format [H, W, C], BGR.
+ * paths (list\[str\]): Image path.
+ * batch\_size (int): Batch size.
+ * use\_gpu (bool): Use GPU or not. **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**
+ * visualization (bool): Whether to save the recognition results as picture files.
+ * output\_dir (str): Save path of images.
+
+ - **Return**
+
+ * res (list\[dict\]): The list of recognition results, where each element is dict and each field is:
+ * save\_path (str, optional): Save path of the result.
+ * data (numpy.ndarray): The result of portrait segmentation.
+
+ - ```python
+ def save_inference_model(dirname,
+ model_filename=None,
+ params_filename=None,
+ combined=True)
+ ```
+
+ - Save the model to the specified path.
+
+ - **Parameters**
+ * dirname: Save path.
+ * model\_filename: Model file name,defalt is \_\_model\_\_
+ * params\_filename: Parameter file name,defalt is \_\_params\_\_(Only takes effect when `combined` is True)
+ * combined: Whether to save the parameters to a unified file.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of for human segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m deeplabv3p_xception65_humanseg
+ ```
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # Send an HTTP request
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/deeplabv3p_xception65_humanseg"
+ r = requests.post(url=url, headers=headers,
+ mask =cv2.cvtColor(base64_to_cv2(r.json()["results"][0]['data']), cv2.COLOR_BGR2GRAY)
+ rgba = np.concatenate((org_im, np.expand_dims(mask, axis=2)), axis=2)
+ cv2.imwrite("segment_human_server.png", rgba)
+ ```
+## V. Release Note
+
+- 1.0.0
+
+ First release
+
+* 1.1.0
+
+ Improve prediction performance
+
+* 1.1.1
+
+ Fix the bug of image value out of range
+
+* 1.1.2
+
+ Fix memory leakage problem of on cudnn 8.0.4
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/README.md b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..8a3951ac11aed63c93fdb383f47537813ef5ea69
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/README.md
@@ -0,0 +1,186 @@
+# ginet_resnet101vd_ade20k
+
+|模型名称|ginet_resnet101vd_ade20k|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|ginet_resnet101vd|
+|数据集|ADE20K|
+|是否支持Fine-tuning|是|
+|模型大小|287MB|
+|指标|-|
+|最新更新日期|2021-12-14|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+ - Sample results:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ginet_resnet101vd_ade20k
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_ade20k')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet101vd_ade20k模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet101vd_ade20k', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_ade20k', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet101vd_ade20k
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_ade20k"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/README_en.md b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..b7d0b3e0fd095c589edfbe29fbb2a19cc3524d2e
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/README_en.md
@@ -0,0 +1,185 @@
+# ginet_resnet101vd_ade20k
+
+|Module Name|ginet_resnet101vd_ade20k|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|ginet_resnet101vd|
+|Dataset|ADE20K|
+|Fine-tuning supported or not|Yes|
+|Module Size|287MB|
+|Data indicators|-|
+|Latest update date|2021-12-14|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ginet_resnet101vd_ade20k
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_ade20k')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet101vd_ade20k model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet101vd_ade20k', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_ade20k', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet101vd_ade20k
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_ade20k"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/layers.py b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e46219fd671ed9834795c9881292eed787b990d
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/layers.py
@@ -0,0 +1,345 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(
+ self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ name: str = None):
+ super(ConvBNLayer, self).__init__()
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = AvgPool2D(
+ kernel_size=2, stride=2, padding=0, ceil_mode=True)
+ self._conv = Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False)
+
+ self._batch_norm = SyncBatchNorm(out_channels)
+ self._act_op = Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ """Residual bottleneck block"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ name: str = None):
+ super(BottleneckBlock, self).__init__()
+
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ name=name + "_branch2a")
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ name=name + "_branch2b")
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ name=name + "_branch2c")
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ if self.dilation > 1:
+ padding = self.dilation
+ y = F.pad(y, [padding, padding, padding, padding])
+
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = paddle.add(x=short, y=conv2)
+ y = F.relu(y)
+ return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+
+ Returns:
+ A callable object of Activation.
+
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+
+ Examples:
+
+ from paddleseg.models.common.activation import Activation
+
+ relu = Activation("relu")
+ print(relu)
+ #
+
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool= False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/module.py b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..4a7aff27e9b964b069c0c2be44ab719d2298591d
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/module.py
@@ -0,0 +1,309 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from paddleseg.utils import utils
+from paddleseg.models import layers
+
+from ginet_resnet101vd_ade20k.resnet import ResNet101_vd
+
+
+@moduleinfo(
+ name="ginet_resnet101vd_ade20k",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="GINetResnet101 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class GINetResNet101(nn.Layer):
+ """
+ The GINetResNet101 implementation based on PaddlePaddle.
+ The original article refers to
+ Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
+ (https://arxiv.org/pdf/2009.06160).
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
+ If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
+ jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 150,
+ backbone_indices: Tuple[int]=(0, 1, 2, 3),
+ enable_auxiliary_loss: bool = True,
+ align_corners: bool = True,
+ jpu: bool = True,
+ pretrained: str = None):
+ super(GINetResNet101, self).__init__()
+ self.nclass = num_classes
+ self.aux = enable_auxiliary_loss
+ self.jpu = jpu
+
+ self.backbone = ResNet101_vd()
+ self.backbone_indices = backbone_indices
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
+ self.head = GIHead(in_channels=2048, nclass=num_classes)
+
+ if self.aux:
+ self.auxlayer = layers.AuxLayer(
+ 1024, 1024 // 4, num_classes, bias_attr=False)
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def base_forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feat_list = self.backbone(x)
+ c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
+
+ if self.jpu:
+ return self.jpu(c1, c2, c3, c4)
+ else:
+ return c1, c2, c3, c4
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ _, _, h, w = x.shape
+ _, _, c3, c4 = self.base_forward(x)
+
+ logit_list = []
+ x, _ = self.head(c4)
+ logit_list.append(x)
+
+ if self.aux:
+ auxout = self.auxlayer(c3)
+
+ logit_list.append(auxout)
+
+ return [
+ F.interpolate(
+ logit, (h, w),
+ mode='bilinear',
+ align_corners=self.align_corners) for logit in logit_list
+ ]
+
+
+class GIHead(nn.Layer):
+ """The Graph Interaction Network head."""
+
+ def __init__(self, in_channels: int, nclass: int):
+ super().__init__()
+ self.nclass = nclass
+ inter_channels = in_channels // 4
+ self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
+ self.inp = paddle.create_parameter(
+ shape=self.inp.shape,
+ dtype=str(self.inp.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.inp))
+
+ self.fc1 = nn.Sequential(
+ nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
+ self.fc2 = nn.Sequential(
+ nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
+ self.conv5 = layers.ConvBNReLU(
+ in_channels,
+ inter_channels,
+ 3,
+ padding=1,
+ bias_attr=False,
+ stride=1)
+
+ self.gloru = GlobalReasonUnit(
+ in_channels=inter_channels,
+ num_state=256,
+ num_node=84,
+ nclass=nclass)
+ self.conv6 = nn.Sequential(
+ nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ B, C, H, W = x.shape
+ inp = self.inp.detach()
+
+ inp = self.fc1(inp)
+ inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
+ .expand((B, 256, self.nclass))
+
+ out = self.conv5(x)
+
+ out, se_out = self.gloru(out, inp)
+ out = self.conv6(out)
+ return out, se_out
+
+
+class GlobalReasonUnit(nn.Layer):
+ """
+ The original paper refers to:
+ Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
+ """
+
+ def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
+ super().__init__()
+ self.num_state = num_state
+ self.conv_theta = nn.Conv2D(
+ in_channels, num_node, kernel_size=1, stride=1, padding=0)
+ self.conv_phi = nn.Conv2D(
+ in_channels, num_state, kernel_size=1, stride=1, padding=0)
+ self.graph = GraphLayer(num_state, num_node, nclass)
+ self.extend_dim = nn.Conv2D(
+ num_state, in_channels, kernel_size=1, bias_attr=False)
+
+ self.bn = layers.SyncBatchNorm(in_channels)
+
+ def forward(self, x: paddle.Tensor, inp: paddle.Tensor) -> List[paddle.Tensor]:
+ B = self.conv_theta(x)
+ sizeB = B.shape
+ B = B.reshape((sizeB[0], sizeB[1], -1))
+
+ sizex = x.shape
+ x_reduce = self.conv_phi(x)
+ x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
+ .transpose((0, 2, 1))
+
+ V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
+ V = paddle.divide(
+ V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
+
+ class_node, new_V = self.graph(inp, V)
+ D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
+ Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
+ Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
+ sizex[2], -1))
+ Y = self.extend_dim(Y)
+ Y = self.bn(Y)
+ out = Y + x
+
+ return out, class_node
+
+
+class GraphLayer(nn.Layer):
+ def __init__(self, num_state: int, num_node: int, num_class: int):
+ super().__init__()
+ self.vis_gcn = GCN(num_state, num_node)
+ self.word_gcn = GCN(num_state, num_class)
+ self.transfer = GraphTransfer(num_state)
+ self.gamma_vis = paddle.zeros([num_node])
+ self.gamma_word = paddle.zeros([num_class])
+ self.gamma_vis = paddle.create_parameter(
+ shape=self.gamma_vis.shape,
+ dtype=str(self.gamma_vis.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
+ self.gamma_word = paddle.create_parameter(
+ shape=self.gamma_word.shape,
+ dtype=str(self.gamma_word.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
+
+ def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ inp = self.word_gcn(inp)
+ new_V = self.vis_gcn(vis_node)
+ class_node, vis_node = self.transfer(inp, new_V)
+
+ class_node = self.gamma_word * inp + class_node
+ new_V = self.gamma_vis * vis_node + new_V
+ return class_node, new_V
+
+
+class GCN(nn.Layer):
+ def __init__(self, num_state: int = 128, num_node: int = 64, bias=False):
+ super().__init__()
+ self.conv1 = nn.Conv1D(
+ num_node,
+ num_node,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ )
+ self.relu = nn.ReLU()
+ self.conv2 = nn.Conv1D(
+ num_state,
+ num_state,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ bias_attr=bias)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
+ h = h + x
+ h = self.relu(h)
+ h = self.conv2(h)
+ return h
+
+
+class GraphTransfer(nn.Layer):
+ """Transfer vis graph to class node, transfer class node to vis feature"""
+
+ def __init__(self, in_dim: int):
+ super().__init__()
+ self.channle_in = in_dim
+ self.query_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.key_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.value_conv_vis = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.value_conv_word = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.softmax_vis = nn.Softmax(axis=-1)
+ self.softmax_word = nn.Softmax(axis=-2)
+
+ def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ m_batchsize, C, Nc = word.shape
+ m_batchsize, C, Nn = vis_node.shape
+
+ proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
+ .transpose((0, 2, 1))
+ proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
+
+ energy = paddle.bmm(proj_query, proj_key)
+ attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
+ attention_word = self.softmax_word(energy)
+
+ proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
+ Nn))
+ proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
+ Nc))
+
+ class_out = paddle.bmm(proj_value_vis, attention_vis)
+ node_out = paddle.bmm(proj_value_word, attention_word)
+ return class_out, node_out
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/resnet.py b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..e3e031f0e239a2d8e965596579ed16a5501b324f
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/resnet.py
@@ -0,0 +1,136 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import ginet_resnet101vd_ade20k.layers as L
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ name: str = None):
+ super(BasicBlock, self).__init__()
+ self.stride = stride
+ self.conv0 = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ name=name + "_branch2a")
+ self.conv1 = L.ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ act=None,
+ name=name + "_branch2b")
+
+ if not shortcut:
+ self.short = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = paddle.elementwise_add(x=short, y=conv1, act='relu')
+
+ return y
+
+
+class ResNet101_vd(nn.Layer):
+ def __init__(self,
+ multi_grid: tuple = (1, 2, 4)):
+ super(ResNet101_vd, self).__init__()
+ depth = [3, 4, 23, 3]
+ num_channels = [64, 256, 512, 1024]
+ num_filters = [64, 128, 256, 512]
+ self.feat_channels = [c * 4 for c in num_filters]
+ dilation_dict = {2: 2, 3: 4}
+ self.conv1_1 = L.ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ name="conv1_1")
+ self.conv1_2 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_2")
+ self.conv1_3 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_3")
+ self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+ self.stage_list = []
+
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ L.BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ name=conv_name,
+ dilation=dilation_rate))
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ y = self.pool2d_max(y)
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+ return feat_list
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/README.md b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..faa1a537b2e96f2af75ac81a9d6e5247fbe84379
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/README.md
@@ -0,0 +1,185 @@
+# ginet_resnet101vd_cityscapes
+
+|模型名称|ginet_resnet101vd_cityscapes|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|ginet_resnet101vd|
+|数据集|Cityscapes|
+|是否支持Fine-tuning|是|
+|模型大小|286MB|
+|指标|-|
+|最新更新日期|2021-12-14|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ginet_resnet101vd_cityscapes
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet101vd_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet101vd_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet101vd_cityscapes
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_cityscapes"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/README_en.md b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..2e09ff0c9121c1531b8f4892a3ae8b492b87019b
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/README_en.md
@@ -0,0 +1,185 @@
+# ginet_resnet101vd_cityscapes
+
+|Module Name|ginet_resnet101vd_cityscapes|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|ginet_resnet101vd|
+|Dataset|Cityscapes|
+|Fine-tuning supported or not|Yes|
+|Module Size|286MB|
+|Data indicators|-|
+|Latest update date|2021-12-14|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ginet_resnet101vd_cityscapes
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet101vd_cityscapes model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet101vd_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet101vd_cityscapes
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_cityscapes"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/layers.py b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e46219fd671ed9834795c9881292eed787b990d
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/layers.py
@@ -0,0 +1,345 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(
+ self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ name: str = None):
+ super(ConvBNLayer, self).__init__()
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = AvgPool2D(
+ kernel_size=2, stride=2, padding=0, ceil_mode=True)
+ self._conv = Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False)
+
+ self._batch_norm = SyncBatchNorm(out_channels)
+ self._act_op = Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ """Residual bottleneck block"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ name: str = None):
+ super(BottleneckBlock, self).__init__()
+
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ name=name + "_branch2a")
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ name=name + "_branch2b")
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ name=name + "_branch2c")
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ if self.dilation > 1:
+ padding = self.dilation
+ y = F.pad(y, [padding, padding, padding, padding])
+
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = paddle.add(x=short, y=conv2)
+ y = F.relu(y)
+ return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+
+ Returns:
+ A callable object of Activation.
+
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+
+ Examples:
+
+ from paddleseg.models.common.activation import Activation
+
+ relu = Activation("relu")
+ print(relu)
+ #
+
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool= False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/module.py b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..e135d4ab484a4bd9c7c81e6905d527680fe69a04
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/module.py
@@ -0,0 +1,308 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from paddleseg.utils import utils
+from paddleseg.models import layers
+
+from ginet_resnet101vd_cityscapes.resnet import ResNet101_vd
+
+
+@moduleinfo(
+ name="ginet_resnet101vd_cityscapes",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="GINetResnet101 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class GINetResNet101(nn.Layer):
+ """
+ The GINetResNet101 implementation based on PaddlePaddle.
+ The original article refers to
+ Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
+ (https://arxiv.org/pdf/2009.06160).
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
+ If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
+ jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 19,
+ backbone_indices: Tuple[int]=(0, 1, 2, 3),
+ enable_auxiliary_loss: bool = True,
+ align_corners: bool = True,
+ jpu: bool = True,
+ pretrained: str = None):
+ super(GINetResNet101, self).__init__()
+ self.nclass = num_classes
+ self.aux = enable_auxiliary_loss
+ self.jpu = jpu
+
+ self.backbone = ResNet101_vd()
+ self.backbone_indices = backbone_indices
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
+ self.head = GIHead(in_channels=2048, nclass=num_classes)
+
+ if self.aux:
+ self.auxlayer = layers.AuxLayer(
+ 1024, 1024 // 4, num_classes, bias_attr=False)
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def base_forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feat_list = self.backbone(x)
+ c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
+
+ if self.jpu:
+ return self.jpu(c1, c2, c3, c4)
+ else:
+ return c1, c2, c3, c4
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ _, _, h, w = x.shape
+ _, _, c3, c4 = self.base_forward(x)
+
+ logit_list = []
+ x, _ = self.head(c4)
+ logit_list.append(x)
+
+ if self.aux:
+ auxout = self.auxlayer(c3)
+
+ logit_list.append(auxout)
+
+ return [
+ F.interpolate(
+ logit, (h, w),
+ mode='bilinear',
+ align_corners=self.align_corners) for logit in logit_list
+ ]
+
+
+class GIHead(nn.Layer):
+ """The Graph Interaction Network head."""
+
+ def __init__(self, in_channels: int, nclass: int):
+ super().__init__()
+ self.nclass = nclass
+ inter_channels = in_channels // 4
+ self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
+ self.inp = paddle.create_parameter(
+ shape=self.inp.shape,
+ dtype=str(self.inp.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.inp))
+
+ self.fc1 = nn.Sequential(
+ nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
+ self.fc2 = nn.Sequential(
+ nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
+ self.conv5 = layers.ConvBNReLU(
+ in_channels,
+ inter_channels,
+ 3,
+ padding=1,
+ bias_attr=False,
+ stride=1)
+
+ self.gloru = GlobalReasonUnit(
+ in_channels=inter_channels,
+ num_state=256,
+ num_node=84,
+ nclass=nclass)
+ self.conv6 = nn.Sequential(
+ nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ B, C, H, W = x.shape
+ inp = self.inp.detach()
+
+ inp = self.fc1(inp)
+ inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
+ .expand((B, 256, self.nclass))
+
+ out = self.conv5(x)
+
+ out, se_out = self.gloru(out, inp)
+ out = self.conv6(out)
+ return out, se_out
+
+
+class GlobalReasonUnit(nn.Layer):
+ """
+ The original paper refers to:
+ Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
+ """
+
+ def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
+ super().__init__()
+ self.num_state = num_state
+ self.conv_theta = nn.Conv2D(
+ in_channels, num_node, kernel_size=1, stride=1, padding=0)
+ self.conv_phi = nn.Conv2D(
+ in_channels, num_state, kernel_size=1, stride=1, padding=0)
+ self.graph = GraphLayer(num_state, num_node, nclass)
+ self.extend_dim = nn.Conv2D(
+ num_state, in_channels, kernel_size=1, bias_attr=False)
+
+ self.bn = layers.SyncBatchNorm(in_channels)
+
+ def forward(self, x: paddle.Tensor, inp: paddle.Tensor) -> paddle.Tensor:
+ B = self.conv_theta(x)
+ sizeB = B.shape
+ B = B.reshape((sizeB[0], sizeB[1], -1))
+
+ sizex = x.shape
+ x_reduce = self.conv_phi(x)
+ x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
+ .transpose((0, 2, 1))
+
+ V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
+ V = paddle.divide(
+ V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
+
+ class_node, new_V = self.graph(inp, V)
+ D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
+ Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
+ Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
+ sizex[2], -1))
+ Y = self.extend_dim(Y)
+ Y = self.bn(Y)
+ out = Y + x
+
+ return out, class_node
+
+
+class GraphLayer(nn.Layer):
+ def __init__(self, num_state: int, num_node: int, num_class: int):
+ super().__init__()
+ self.vis_gcn = GCN(num_state, num_node)
+ self.word_gcn = GCN(num_state, num_class)
+ self.transfer = GraphTransfer(num_state)
+ self.gamma_vis = paddle.zeros([num_node])
+ self.gamma_word = paddle.zeros([num_class])
+ self.gamma_vis = paddle.create_parameter(
+ shape=self.gamma_vis.shape,
+ dtype=str(self.gamma_vis.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
+ self.gamma_word = paddle.create_parameter(
+ shape=self.gamma_word.shape,
+ dtype=str(self.gamma_word.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
+
+ def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ inp = self.word_gcn(inp)
+ new_V = self.vis_gcn(vis_node)
+ class_node, vis_node = self.transfer(inp, new_V)
+
+ class_node = self.gamma_word * inp + class_node
+ new_V = self.gamma_vis * vis_node + new_V
+ return class_node, new_V
+
+
+class GCN(nn.Layer):
+ def __init__(self, num_state: int = 128, num_node: int = 64, bias=False):
+ super().__init__()
+ self.conv1 = nn.Conv1D(
+ num_node,
+ num_node,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ )
+ self.relu = nn.ReLU()
+ self.conv2 = nn.Conv1D(
+ num_state,
+ num_state,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ bias_attr=bias)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
+ h = h + x
+ h = self.relu(h)
+ h = self.conv2(h)
+ return h
+
+
+class GraphTransfer(nn.Layer):
+ """Transfer vis graph to class node, transfer class node to vis feature"""
+
+ def __init__(self, in_dim: int):
+ super().__init__()
+ self.channle_in = in_dim
+ self.query_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.key_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.value_conv_vis = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.value_conv_word = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.softmax_vis = nn.Softmax(axis=-1)
+ self.softmax_word = nn.Softmax(axis=-2)
+
+ def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ m_batchsize, C, Nc = word.shape
+ m_batchsize, C, Nn = vis_node.shape
+
+ proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
+ .transpose((0, 2, 1))
+ proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
+
+ energy = paddle.bmm(proj_query, proj_key)
+ attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
+ attention_word = self.softmax_word(energy)
+ proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
+ Nn))
+ proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
+ Nc))
+
+ class_out = paddle.bmm(proj_value_vis, attention_vis)
+ node_out = paddle.bmm(proj_value_word, attention_word)
+ return class_out, node_out
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/resnet.py b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..6104fa44ac2286e3636960631768599e2467c336
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/resnet.py
@@ -0,0 +1,136 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import ginet_resnet101vd_cityscapes.layers as L
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ name: str = None):
+ super(BasicBlock, self).__init__()
+ self.stride = stride
+ self.conv0 = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ name=name + "_branch2a")
+ self.conv1 = L.ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ act=None,
+ name=name + "_branch2b")
+
+ if not shortcut:
+ self.short = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = paddle.elementwise_add(x=short, y=conv1, act='relu')
+
+ return y
+
+
+class ResNet101_vd(nn.Layer):
+ def __init__(self,
+ multi_grid: tuple = (1, 2, 4)):
+ super(ResNet101_vd, self).__init__()
+ depth = [3, 4, 23, 3]
+ num_channels = [64, 256, 512, 1024]
+ num_filters = [64, 128, 256, 512]
+ self.feat_channels = [c * 4 for c in num_filters]
+ dilation_dict = {2: 2, 3: 4}
+ self.conv1_1 = L.ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ name="conv1_1")
+ self.conv1_2 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_2")
+ self.conv1_3 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_3")
+ self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+ self.stage_list = []
+
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ L.BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ name=conv_name,
+ dilation=dilation_rate))
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ y = self.pool2d_max(y)
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+ return feat_list
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_voc/README.md b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..41f95d112f885e3e5decb5854b35a71a99eba452
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/README.md
@@ -0,0 +1,185 @@
+# ginet_resnet101vd_voc
+
+|模型名称|ginet_resnet101vd_voc|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|ginet_resnet101vd|
+|数据集|PascalVOC2012|
+|是否支持Fine-tuning|是|
+|模型大小|286MB|
+|指标|-|
+|最新更新日期|2021-12-14|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ginet_resnet101vd_voc
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet101vd_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet101vd_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet101vd_voc
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_voc/README_en.md b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..1bfc41ddd29da74e1df9da24cc23e0c65cf2a02f
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/README_en.md
@@ -0,0 +1,185 @@
+# ginet_resnet101vd_voc
+
+|Module Name|ginet_resnet101vd_voc|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|ginet_resnet101vd|
+|Dataset|PascalVOC2012|
+|Fine-tuning supported or not|Yes|
+|Module Size|286MB|
+|Data indicators|-|
+|Latest update date|2021-12-14|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ginet_resnet101vd_voc
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet101vd_voc model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet101vd_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='ttest_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet101vd_voc
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_voc/layers.py b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e46219fd671ed9834795c9881292eed787b990d
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/layers.py
@@ -0,0 +1,345 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(
+ self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ name: str = None):
+ super(ConvBNLayer, self).__init__()
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = AvgPool2D(
+ kernel_size=2, stride=2, padding=0, ceil_mode=True)
+ self._conv = Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False)
+
+ self._batch_norm = SyncBatchNorm(out_channels)
+ self._act_op = Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ """Residual bottleneck block"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ name: str = None):
+ super(BottleneckBlock, self).__init__()
+
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ name=name + "_branch2a")
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ name=name + "_branch2b")
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ name=name + "_branch2c")
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ if self.dilation > 1:
+ padding = self.dilation
+ y = F.pad(y, [padding, padding, padding, padding])
+
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = paddle.add(x=short, y=conv2)
+ y = F.relu(y)
+ return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+
+ Returns:
+ A callable object of Activation.
+
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+
+ Examples:
+
+ from paddleseg.models.common.activation import Activation
+
+ relu = Activation("relu")
+ print(relu)
+ #
+
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool= False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_voc/module.py b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..19422e3e70d829be67d62256403812df93811e7e
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/module.py
@@ -0,0 +1,309 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from paddleseg.utils import utils
+from paddleseg.models import layers
+
+from ginet_resnet101vd_voc.resnet import ResNet101_vd
+
+
+@moduleinfo(
+ name="ginet_resnet101vd_voc",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="GINetResnet101 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class GINetResNet101(nn.Layer):
+ """
+ The GINetResNet101 implementation based on PaddlePaddle.
+ The original article refers to
+ Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
+ (https://arxiv.org/pdf/2009.06160).
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
+ If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
+ jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 21,
+ backbone_indices: Tuple[int]=(0, 1, 2, 3),
+ enable_auxiliary_loss: bool = True,
+ align_corners: bool = True,
+ jpu: bool = True,
+ pretrained: str = None):
+ super(GINetResNet101, self).__init__()
+ self.nclass = num_classes
+ self.aux = enable_auxiliary_loss
+ self.jpu = jpu
+
+ self.backbone = ResNet101_vd()
+ self.backbone_indices = backbone_indices
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
+ self.head = GIHead(in_channels=2048, nclass=num_classes)
+
+ if self.aux:
+ self.auxlayer = layers.AuxLayer(
+ 1024, 1024 // 4, num_classes, bias_attr=False)
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def base_forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feat_list = self.backbone(x)
+ c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
+
+ if self.jpu:
+ return self.jpu(c1, c2, c3, c4)
+ else:
+ return c1, c2, c3, c4
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ _, _, h, w = x.shape
+ _, _, c3, c4 = self.base_forward(x)
+
+ logit_list = []
+ x, _ = self.head(c4)
+ logit_list.append(x)
+
+ if self.aux:
+ auxout = self.auxlayer(c3)
+
+ logit_list.append(auxout)
+
+ return [
+ F.interpolate(
+ logit, (h, w),
+ mode='bilinear',
+ align_corners=self.align_corners) for logit in logit_list
+ ]
+
+
+class GIHead(nn.Layer):
+ """The Graph Interaction Network head."""
+
+ def __init__(self, in_channels: int, nclass: int):
+ super().__init__()
+ self.nclass = nclass
+ inter_channels = in_channels // 4
+ self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
+ self.inp = paddle.create_parameter(
+ shape=self.inp.shape,
+ dtype=str(self.inp.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.inp))
+
+ self.fc1 = nn.Sequential(
+ nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
+ self.fc2 = nn.Sequential(
+ nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
+ self.conv5 = layers.ConvBNReLU(
+ in_channels,
+ inter_channels,
+ 3,
+ padding=1,
+ bias_attr=False,
+ stride=1)
+
+ self.gloru = GlobalReasonUnit(
+ in_channels=inter_channels,
+ num_state=256,
+ num_node=84,
+ nclass=nclass)
+ self.conv6 = nn.Sequential(
+ nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ B, C, H, W = x.shape
+ inp = self.inp.detach()
+
+ inp = self.fc1(inp)
+ inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
+ .expand((B, 256, self.nclass))
+
+ out = self.conv5(x)
+
+ out, se_out = self.gloru(out, inp)
+ out = self.conv6(out)
+ return out, se_out
+
+
+class GlobalReasonUnit(nn.Layer):
+ """
+ The original paper refers to:
+ Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
+ """
+
+ def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
+ super().__init__()
+ self.num_state = num_state
+ self.conv_theta = nn.Conv2D(
+ in_channels, num_node, kernel_size=1, stride=1, padding=0)
+ self.conv_phi = nn.Conv2D(
+ in_channels, num_state, kernel_size=1, stride=1, padding=0)
+ self.graph = GraphLayer(num_state, num_node, nclass)
+ self.extend_dim = nn.Conv2D(
+ num_state, in_channels, kernel_size=1, bias_attr=False)
+
+ self.bn = layers.SyncBatchNorm(in_channels)
+
+ def forward(self, x: paddle.Tensor, inp: paddle.Tensor) -> List[paddle.Tensor]:
+ B = self.conv_theta(x)
+ sizeB = B.shape
+ B = B.reshape((sizeB[0], sizeB[1], -1))
+
+ sizex = x.shape
+ x_reduce = self.conv_phi(x)
+ x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
+ .transpose((0, 2, 1))
+
+ V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
+ V = paddle.divide(
+ V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
+
+ class_node, new_V = self.graph(inp, V)
+ D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
+ Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
+ Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
+ sizex[2], -1))
+ Y = self.extend_dim(Y)
+ Y = self.bn(Y)
+ out = Y + x
+
+ return out, class_node
+
+
+class GraphLayer(nn.Layer):
+ def __init__(self, num_state: int, num_node: int, num_class: int):
+ super().__init__()
+ self.vis_gcn = GCN(num_state, num_node)
+ self.word_gcn = GCN(num_state, num_class)
+ self.transfer = GraphTransfer(num_state)
+ self.gamma_vis = paddle.zeros([num_node])
+ self.gamma_word = paddle.zeros([num_class])
+ self.gamma_vis = paddle.create_parameter(
+ shape=self.gamma_vis.shape,
+ dtype=str(self.gamma_vis.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
+ self.gamma_word = paddle.create_parameter(
+ shape=self.gamma_word.shape,
+ dtype=str(self.gamma_word.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
+
+ def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ inp = self.word_gcn(inp)
+ new_V = self.vis_gcn(vis_node)
+ class_node, vis_node = self.transfer(inp, new_V)
+
+ class_node = self.gamma_word * inp + class_node
+ new_V = self.gamma_vis * vis_node + new_V
+ return class_node, new_V
+
+
+class GCN(nn.Layer):
+ def __init__(self, num_state: int = 128, num_node: int = 64, bias=False):
+ super().__init__()
+ self.conv1 = nn.Conv1D(
+ num_node,
+ num_node,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ )
+ self.relu = nn.ReLU()
+ self.conv2 = nn.Conv1D(
+ num_state,
+ num_state,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ bias_attr=bias)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
+ h = h + x
+ h = self.relu(h)
+ h = self.conv2(h)
+ return h
+
+
+class GraphTransfer(nn.Layer):
+ """Transfer vis graph to class node, transfer class node to vis feature"""
+
+ def __init__(self, in_dim: int):
+ super().__init__()
+ self.channle_in = in_dim
+ self.query_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.key_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.value_conv_vis = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.value_conv_word = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.softmax_vis = nn.Softmax(axis=-1)
+ self.softmax_word = nn.Softmax(axis=-2)
+
+ def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ m_batchsize, C, Nc = word.shape
+ m_batchsize, C, Nn = vis_node.shape
+
+ proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
+ .transpose((0, 2, 1))
+ proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
+
+ energy = paddle.bmm(proj_query, proj_key)
+ attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
+ attention_word = self.softmax_word(energy)
+
+ proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
+ Nn))
+ proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
+ Nc))
+
+ class_out = paddle.bmm(proj_value_vis, attention_vis)
+ node_out = paddle.bmm(proj_value_word, attention_word)
+ return class_out, node_out
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_voc/resnet.py b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..4014d4f8932ba9e81cd5afb8ca81a73863197151
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/resnet.py
@@ -0,0 +1,136 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import ginet_resnet101vd_voc.layers as L
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ name: str = None):
+ super(BasicBlock, self).__init__()
+ self.stride = stride
+ self.conv0 = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ name=name + "_branch2a")
+ self.conv1 = L.ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ act=None,
+ name=name + "_branch2b")
+
+ if not shortcut:
+ self.short = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = paddle.elementwise_add(x=short, y=conv1, act='relu')
+
+ return y
+
+
+class ResNet101_vd(nn.Layer):
+ def __init__(self,
+ multi_grid: tuple = (1, 2, 4)):
+ super(ResNet101_vd, self).__init__()
+ depth = [3, 4, 23, 3]
+ num_channels = [64, 256, 512, 1024]
+ num_filters = [64, 128, 256, 512]
+ self.feat_channels = [c * 4 for c in num_filters]
+ dilation_dict = {2: 2, 3: 4}
+ self.conv1_1 = L.ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ name="conv1_1")
+ self.conv1_2 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_2")
+ self.conv1_3 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_3")
+ self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+ self.stage_list = []
+
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ L.BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ name=conv_name,
+ dilation=dilation_rate))
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ y = self.pool2d_max(y)
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+ return feat_list
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/README.md b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..341563f32cf13647472b2c0e7a8fd38f4d83adaa
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/README.md
@@ -0,0 +1,186 @@
+# ginet_resnet50vd_ade20k
+
+|模型名称|ginet_resnet50vd_ade20k|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|ginet_resnet50vd|
+|数据集|ADE20K|
+|是否支持Fine-tuning|是|
+|模型大小|214MB|
+|指标|-|
+|最新更新日期|2021-12-14|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+ - Sample results:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ginet_resnet50vd_ade20k
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_ade20k')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet50vd_ade20k模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet50vd_ade20k', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_ade20k', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet50vd_ade20k
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_ade20k"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/README_en.md b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..d9c1a26daaecc5b22e622146d67b2664700fca74
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/README_en.md
@@ -0,0 +1,185 @@
+# ginet_resnet50vd_ade20k
+
+|Module Name|ginet_resnet50vd_ade20k|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|ginet_resnet50vd|
+|Dataset|ADE20K|
+|Fine-tuning supported or not|Yes|
+|Module Size|214MB|
+|Data indicators|-|
+|Latest update date|2021-12-14|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ginet_resnet50vd_ade20k
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_ade20k')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet50vd_ade20k model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet50vd_ade20k', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_ade20k', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet50vd_ade20k
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_ade20k"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/layers.py b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e46219fd671ed9834795c9881292eed787b990d
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/layers.py
@@ -0,0 +1,345 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(
+ self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ name: str = None):
+ super(ConvBNLayer, self).__init__()
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = AvgPool2D(
+ kernel_size=2, stride=2, padding=0, ceil_mode=True)
+ self._conv = Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False)
+
+ self._batch_norm = SyncBatchNorm(out_channels)
+ self._act_op = Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ """Residual bottleneck block"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ name: str = None):
+ super(BottleneckBlock, self).__init__()
+
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ name=name + "_branch2a")
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ name=name + "_branch2b")
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ name=name + "_branch2c")
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ if self.dilation > 1:
+ padding = self.dilation
+ y = F.pad(y, [padding, padding, padding, padding])
+
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = paddle.add(x=short, y=conv2)
+ y = F.relu(y)
+ return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+
+ Returns:
+ A callable object of Activation.
+
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+
+ Examples:
+
+ from paddleseg.models.common.activation import Activation
+
+ relu = Activation("relu")
+ print(relu)
+ #
+
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool= False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/module.py b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..79ce4d0f070472b989c5a83b6f2542bd66f550fc
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/module.py
@@ -0,0 +1,309 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from paddleseg.utils import utils
+from paddleseg.models import layers
+
+from ginet_resnet50vd_ade20k.resnet import ResNet50_vd
+
+
+@moduleinfo(
+ name="ginet_resnet50vd_ade20k",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="GINetResnet50 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class GINetResNet50(nn.Layer):
+ """
+ The GINetResNet50 implementation based on PaddlePaddle.
+ The original article refers to
+ Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
+ (https://arxiv.org/pdf/2009.06160).
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
+ If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
+ jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 150,
+ backbone_indices: Tuple[int]=(0, 1, 2, 3),
+ enable_auxiliary_loss: bool = True,
+ align_corners: bool = True,
+ jpu: bool = True,
+ pretrained: str = None):
+ super(GINetResNet50, self).__init__()
+ self.nclass = num_classes
+ self.aux = enable_auxiliary_loss
+ self.jpu = jpu
+
+ self.backbone = ResNet50_vd()
+ self.backbone_indices = backbone_indices
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
+ self.head = GIHead(in_channels=2048, nclass=num_classes)
+
+ if self.aux:
+ self.auxlayer = layers.AuxLayer(
+ 1024, 1024 // 4, num_classes, bias_attr=False)
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def base_forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feat_list = self.backbone(x)
+ c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
+
+ if self.jpu:
+ return self.jpu(c1, c2, c3, c4)
+ else:
+ return c1, c2, c3, c4
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ _, _, h, w = x.shape
+ _, _, c3, c4 = self.base_forward(x)
+
+ logit_list = []
+ x, _ = self.head(c4)
+ logit_list.append(x)
+
+ if self.aux:
+ auxout = self.auxlayer(c3)
+
+ logit_list.append(auxout)
+
+ return [
+ F.interpolate(
+ logit, (h, w),
+ mode='bilinear',
+ align_corners=self.align_corners) for logit in logit_list
+ ]
+
+
+class GIHead(nn.Layer):
+ """The Graph Interaction Network head."""
+
+ def __init__(self, in_channels: int, nclass: int):
+ super().__init__()
+ self.nclass = nclass
+ inter_channels = in_channels // 4
+ self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
+ self.inp = paddle.create_parameter(
+ shape=self.inp.shape,
+ dtype=str(self.inp.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.inp))
+
+ self.fc1 = nn.Sequential(
+ nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
+ self.fc2 = nn.Sequential(
+ nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
+ self.conv5 = layers.ConvBNReLU(
+ in_channels,
+ inter_channels,
+ 3,
+ padding=1,
+ bias_attr=False,
+ stride=1)
+
+ self.gloru = GlobalReasonUnit(
+ in_channels=inter_channels,
+ num_state=256,
+ num_node=84,
+ nclass=nclass)
+ self.conv6 = nn.Sequential(
+ nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ B, C, H, W = x.shape
+ inp = self.inp.detach()
+
+ inp = self.fc1(inp)
+ inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
+ .expand((B, 256, self.nclass))
+
+ out = self.conv5(x)
+
+ out, se_out = self.gloru(out, inp)
+ out = self.conv6(out)
+ return out, se_out
+
+
+class GlobalReasonUnit(nn.Layer):
+ """
+ The original paper refers to:
+ Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
+ """
+
+ def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
+ super().__init__()
+ self.num_state = num_state
+ self.conv_theta = nn.Conv2D(
+ in_channels, num_node, kernel_size=1, stride=1, padding=0)
+ self.conv_phi = nn.Conv2D(
+ in_channels, num_state, kernel_size=1, stride=1, padding=0)
+ self.graph = GraphLayer(num_state, num_node, nclass)
+ self.extend_dim = nn.Conv2D(
+ num_state, in_channels, kernel_size=1, bias_attr=False)
+
+ self.bn = layers.SyncBatchNorm(in_channels)
+
+ def forward(self, x: paddle.Tensor, inp:paddle.Tensor) -> List[paddle.Tensor]:
+ B = self.conv_theta(x)
+ sizeB = B.shape
+ B = B.reshape((sizeB[0], sizeB[1], -1))
+
+ sizex = x.shape
+ x_reduce = self.conv_phi(x)
+ x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
+ .transpose((0, 2, 1))
+
+ V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
+ V = paddle.divide(
+ V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
+
+ class_node, new_V = self.graph(inp, V)
+ D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
+ Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
+ Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
+ sizex[2], -1))
+ Y = self.extend_dim(Y)
+ Y = self.bn(Y)
+ out = Y + x
+
+ return out, class_node
+
+
+class GraphLayer(nn.Layer):
+ def __init__(self, num_state: int, num_node: int, num_class: int):
+ super().__init__()
+ self.vis_gcn = GCN(num_state, num_node)
+ self.word_gcn = GCN(num_state, num_class)
+ self.transfer = GraphTransfer(num_state)
+ self.gamma_vis = paddle.zeros([num_node])
+ self.gamma_word = paddle.zeros([num_class])
+ self.gamma_vis = paddle.create_parameter(
+ shape=self.gamma_vis.shape,
+ dtype=str(self.gamma_vis.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
+ self.gamma_word = paddle.create_parameter(
+ shape=self.gamma_word.shape,
+ dtype=str(self.gamma_word.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
+
+ def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ inp = self.word_gcn(inp)
+ new_V = self.vis_gcn(vis_node)
+ class_node, vis_node = self.transfer(inp, new_V)
+
+ class_node = self.gamma_word * inp + class_node
+ new_V = self.gamma_vis * vis_node + new_V
+ return class_node, new_V
+
+
+class GCN(nn.Layer):
+ def __init__(self, num_state: int = 128, num_node: int = 64, bias: bool = False):
+ super().__init__()
+ self.conv1 = nn.Conv1D(
+ num_node,
+ num_node,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ )
+ self.relu = nn.ReLU()
+ self.conv2 = nn.Conv1D(
+ num_state,
+ num_state,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ bias_attr=bias)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
+ h = h + x
+ h = self.relu(h)
+ h = self.conv2(h)
+ return h
+
+
+class GraphTransfer(nn.Layer):
+ """Transfer vis graph to class node, transfer class node to vis feature"""
+
+ def __init__(self, in_dim: int):
+ super().__init__()
+ self.channle_in = in_dim
+ self.query_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.key_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.value_conv_vis = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.value_conv_word = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.softmax_vis = nn.Softmax(axis=-1)
+ self.softmax_word = nn.Softmax(axis=-2)
+
+ def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ m_batchsize, C, Nc = word.shape
+ m_batchsize, C, Nn = vis_node.shape
+
+ proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
+ .transpose((0, 2, 1))
+ proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
+
+ energy = paddle.bmm(proj_query, proj_key)
+ attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
+ attention_word = self.softmax_word(energy)
+
+ proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
+ Nn))
+ proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
+ Nc))
+
+ class_out = paddle.bmm(proj_value_vis, attention_vis)
+ node_out = paddle.bmm(proj_value_word, attention_word)
+ return class_out, node_out
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/resnet.py b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..d6e376ddca8c01569f1f20d0e25ec3e9fa513922
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/resnet.py
@@ -0,0 +1,137 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import ginet_resnet50vd_ade20k.layers as L
+
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ name: str = None):
+ super(BasicBlock, self).__init__()
+ self.stride = stride
+ self.conv0 = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ name=name + "_branch2a")
+ self.conv1 = L.ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ act=None,
+ name=name + "_branch2b")
+
+ if not shortcut:
+ self.short = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = paddle.elementwise_add(x=short, y=conv1, act='relu')
+
+ return y
+
+
+class ResNet50_vd(nn.Layer):
+ def __init__(self,
+ multi_grid: tuple = (1, 2, 4)):
+ super(ResNet50_vd, self).__init__()
+ depth = [3, 4, 6, 3]
+ num_channels = [64, 256, 512, 1024]
+ num_filters = [64, 128, 256, 512]
+ self.feat_channels = [c * 4 for c in num_filters]
+ dilation_dict = {2: 2, 3: 4}
+ self.conv1_1 = L.ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ name="conv1_1")
+ self.conv1_2 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_2")
+ self.conv1_3 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_3")
+ self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+ self.stage_list = []
+
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ L.BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ name=conv_name,
+ dilation=dilation_rate))
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ y = self.pool2d_max(y)
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+ return feat_list
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/README.md b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..849f47627fa1e5c3c2150188981e9aff32737ae8
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/README.md
@@ -0,0 +1,185 @@
+# ginet_resnet50vd_cityscapes
+
+|模型名称|ginet_resnet50vd_cityscapes|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|ginet_resnet50vd|
+|数据集|Cityscapes|
+|是否支持Fine-tuning|是|
+|模型大小|214MB|
+|指标|-|
+|最新更新日期|2021-12-14|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ginet_resnet50vd_cityscapes
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet50vd_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet50vd_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet50vd_cityscapes
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_cityscapes"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/README_en.md b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..b265ee908f2476008405d2f548f8f029a81775a0
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/README_en.md
@@ -0,0 +1,185 @@
+# ginet_resnet50vd_cityscapes
+
+|Module Name|ginet_resnet50vd_cityscapes|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|ginet_resnet50vd|
+|Dataset|Cityscapes|
+|Fine-tuning supported or not|Yes|
+|Module Size|214MB|
+|Data indicators|-|
+|Latest update date|2021-12-14|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ginet_resnet50vd_cityscapes
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet50vd_cityscapes model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet50vd_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet50vd_cityscapes
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_cityscapes"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/layers.py b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e46219fd671ed9834795c9881292eed787b990d
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/layers.py
@@ -0,0 +1,345 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(
+ self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ name: str = None):
+ super(ConvBNLayer, self).__init__()
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = AvgPool2D(
+ kernel_size=2, stride=2, padding=0, ceil_mode=True)
+ self._conv = Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False)
+
+ self._batch_norm = SyncBatchNorm(out_channels)
+ self._act_op = Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ """Residual bottleneck block"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ name: str = None):
+ super(BottleneckBlock, self).__init__()
+
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ name=name + "_branch2a")
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ name=name + "_branch2b")
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ name=name + "_branch2c")
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ if self.dilation > 1:
+ padding = self.dilation
+ y = F.pad(y, [padding, padding, padding, padding])
+
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = paddle.add(x=short, y=conv2)
+ y = F.relu(y)
+ return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+
+ Returns:
+ A callable object of Activation.
+
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+
+ Examples:
+
+ from paddleseg.models.common.activation import Activation
+
+ relu = Activation("relu")
+ print(relu)
+ #
+
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool= False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/module.py b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..1dac751bca852b3ee9ae247248b19c878d44365e
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/module.py
@@ -0,0 +1,309 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from paddleseg.utils import utils
+from paddleseg.models import layers
+
+from ginet_resnet50vd_cityscapes.resnet import ResNet50_vd
+
+
+@moduleinfo(
+ name="ginet_resnet50vd_cityscapes",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="GINetResnet50 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class GINetResNet50(nn.Layer):
+ """
+ The GINetResNet50 implementation based on PaddlePaddle.
+ The original article refers to
+ Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
+ (https://arxiv.org/pdf/2009.06160).
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
+ If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
+ jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 19,
+ backbone_indices: Tuple[int]=(0, 1, 2, 3),
+ enable_auxiliary_loss: bool = True,
+ align_corners: bool = True,
+ jpu: bool = True,
+ pretrained: str = None):
+ super(GINetResNet50, self).__init__()
+ self.nclass = num_classes
+ self.aux = enable_auxiliary_loss
+ self.jpu = jpu
+
+ self.backbone = ResNet50_vd()
+ self.backbone_indices = backbone_indices
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
+ self.head = GIHead(in_channels=2048, nclass=num_classes)
+
+ if self.aux:
+ self.auxlayer = layers.AuxLayer(
+ 1024, 1024 // 4, num_classes, bias_attr=False)
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def base_forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ feat_list = self.backbone(x)
+ c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
+
+ if self.jpu:
+ return self.jpu(c1, c2, c3, c4)
+ else:
+ return c1, c2, c3, c4
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ _, _, h, w = x.shape
+ _, _, c3, c4 = self.base_forward(x)
+
+ logit_list = []
+ x, _ = self.head(c4)
+ logit_list.append(x)
+
+ if self.aux:
+ auxout = self.auxlayer(c3)
+
+ logit_list.append(auxout)
+
+ return [
+ F.interpolate(
+ logit, (h, w),
+ mode='bilinear',
+ align_corners=self.align_corners) for logit in logit_list
+ ]
+
+
+class GIHead(nn.Layer):
+ """The Graph Interaction Network head."""
+
+ def __init__(self, in_channels: int, nclass: int):
+ super().__init__()
+ self.nclass = nclass
+ inter_channels = in_channels // 4
+ self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
+ self.inp = paddle.create_parameter(
+ shape=self.inp.shape,
+ dtype=str(self.inp.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.inp))
+
+ self.fc1 = nn.Sequential(
+ nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
+ self.fc2 = nn.Sequential(
+ nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
+ self.conv5 = layers.ConvBNReLU(
+ in_channels,
+ inter_channels,
+ 3,
+ padding=1,
+ bias_attr=False,
+ stride=1)
+
+ self.gloru = GlobalReasonUnit(
+ in_channels=inter_channels,
+ num_state=256,
+ num_node=84,
+ nclass=nclass)
+ self.conv6 = nn.Sequential(
+ nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ B, C, H, W = x.shape
+ inp = self.inp.detach()
+
+ inp = self.fc1(inp)
+ inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
+ .expand((B, 256, self.nclass))
+
+ out = self.conv5(x)
+
+ out, se_out = self.gloru(out, inp)
+ out = self.conv6(out)
+ return out, se_out
+
+
+class GlobalReasonUnit(nn.Layer):
+ """
+ The original paper refers to:
+ Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
+ """
+
+ def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
+ super().__init__()
+ self.num_state = num_state
+ self.conv_theta = nn.Conv2D(
+ in_channels, num_node, kernel_size=1, stride=1, padding=0)
+ self.conv_phi = nn.Conv2D(
+ in_channels, num_state, kernel_size=1, stride=1, padding=0)
+ self.graph = GraphLayer(num_state, num_node, nclass)
+ self.extend_dim = nn.Conv2D(
+ num_state, in_channels, kernel_size=1, bias_attr=False)
+
+ self.bn = layers.SyncBatchNorm(in_channels)
+
+ def forward(self, x: paddle.Tensor, inp: paddle.Tensor) -> paddle.Tensor:
+ B = self.conv_theta(x)
+ sizeB = B.shape
+ B = B.reshape((sizeB[0], sizeB[1], -1))
+
+ sizex = x.shape
+ x_reduce = self.conv_phi(x)
+ x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
+ .transpose((0, 2, 1))
+
+ V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
+ V = paddle.divide(
+ V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
+
+ class_node, new_V = self.graph(inp, V)
+ D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
+ Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
+ Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
+ sizex[2], -1))
+ Y = self.extend_dim(Y)
+ Y = self.bn(Y)
+ out = Y + x
+
+ return out, class_node
+
+
+class GraphLayer(nn.Layer):
+ def __init__(self, num_state: int, num_node: int, num_class: int):
+ super().__init__()
+ self.vis_gcn = GCN(num_state, num_node)
+ self.word_gcn = GCN(num_state, num_class)
+ self.transfer = GraphTransfer(num_state)
+ self.gamma_vis = paddle.zeros([num_node])
+ self.gamma_word = paddle.zeros([num_class])
+ self.gamma_vis = paddle.create_parameter(
+ shape=self.gamma_vis.shape,
+ dtype=str(self.gamma_vis.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
+ self.gamma_word = paddle.create_parameter(
+ shape=self.gamma_word.shape,
+ dtype=str(self.gamma_word.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
+
+ def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ inp = self.word_gcn(inp)
+ new_V = self.vis_gcn(vis_node)
+ class_node, vis_node = self.transfer(inp, new_V)
+
+ class_node = self.gamma_word * inp + class_node
+ new_V = self.gamma_vis * vis_node + new_V
+ return class_node, new_V
+
+
+class GCN(nn.Layer):
+ def __init__(self, num_state: int = 128, num_node: int = 64, bias: bool = False):
+ super().__init__()
+ self.conv1 = nn.Conv1D(
+ num_node,
+ num_node,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ )
+ self.relu = nn.ReLU()
+ self.conv2 = nn.Conv1D(
+ num_state,
+ num_state,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ bias_attr=bias)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
+ h = h + x
+ h = self.relu(h)
+ h = self.conv2(h)
+ return h
+
+
+class GraphTransfer(nn.Layer):
+ """Transfer vis graph to class node, transfer class node to vis feature"""
+
+ def __init__(self, in_dim: int):
+ super().__init__()
+ self.channle_in = in_dim
+ self.query_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.key_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.value_conv_vis = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.value_conv_word = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.softmax_vis = nn.Softmax(axis=-1)
+ self.softmax_word = nn.Softmax(axis=-2)
+
+ def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ m_batchsize, C, Nc = word.shape
+ m_batchsize, C, Nn = vis_node.shape
+
+ proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
+ .transpose((0, 2, 1))
+ proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
+
+ energy = paddle.bmm(proj_query, proj_key)
+ attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
+ attention_word = self.softmax_word(energy)
+
+ proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
+ Nn))
+ proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
+ Nc))
+
+ class_out = paddle.bmm(proj_value_vis, attention_vis)
+ node_out = paddle.bmm(proj_value_word, attention_word)
+ return class_out, node_out
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/resnet.py b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..d526b26991ff72083d7431971608b8a489f60df9
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/resnet.py
@@ -0,0 +1,137 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import ginet_resnet50vd_cityscapes.layers as L
+
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ name: str = None):
+ super(BasicBlock, self).__init__()
+ self.stride = stride
+ self.conv0 = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ name=name + "_branch2a")
+ self.conv1 = L.ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ act=None,
+ name=name + "_branch2b")
+
+ if not shortcut:
+ self.short = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = paddle.elementwise_add(x=short, y=conv1, act='relu')
+
+ return y
+
+
+class ResNet50_vd(nn.Layer):
+ def __init__(self,
+ multi_grid: tuple = (1, 2, 4)):
+ super(ResNet50_vd, self).__init__()
+ depth = [3, 4, 6, 3]
+ num_channels = [64, 256, 512, 1024]
+ num_filters = [64, 128, 256, 512]
+ self.feat_channels = [c * 4 for c in num_filters]
+ dilation_dict = {2: 2, 3: 4}
+ self.conv1_1 = L.ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ name="conv1_1")
+ self.conv1_2 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_2")
+ self.conv1_3 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_3")
+ self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+ self.stage_list = []
+
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ L.BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ name=conv_name,
+ dilation=dilation_rate))
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ y = self.pool2d_max(y)
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+ return feat_list
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_voc/README.md b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e0f1d605c5f8f87c1ad56d6c12b3a1384a514720
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/README.md
@@ -0,0 +1,185 @@
+# ginet_resnet50vd_voc
+
+|模型名称|ginet_resnet50vd_voc|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|ginet_resnet50vd|
+|数据集|PascalVOC2012|
+|是否支持Fine-tuning|是|
+|模型大小|214MB|
+|指标|-|
+|最新更新日期|2021-12-14|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ginet_resnet50vd_voc
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet50vd_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet50vd_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet50vd_voc
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_voc/README_en.md b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..71bba22353984fa84150ed687c9432db6ba0da65
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/README_en.md
@@ -0,0 +1,185 @@
+# ginet_resnet50vd_voc
+
+|Module Name|ginet_resnet50vd_voc|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|ginet_resnet50vd|
+|Dataset|PascalVOC2012|
+|Fine-tuning supported or not|Yes|
+|Module Size|214MB|
+|Data indicators|-|
+|Latest update date|2021-12-14|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ginet_resnet50vd_voc
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet50vd_voc model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet50vd_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet50vd_voc
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_voc/layers.py b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e46219fd671ed9834795c9881292eed787b990d
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/layers.py
@@ -0,0 +1,345 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(
+ self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ name: str = None):
+ super(ConvBNLayer, self).__init__()
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = AvgPool2D(
+ kernel_size=2, stride=2, padding=0, ceil_mode=True)
+ self._conv = Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False)
+
+ self._batch_norm = SyncBatchNorm(out_channels)
+ self._act_op = Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ """Residual bottleneck block"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ name: str = None):
+ super(BottleneckBlock, self).__init__()
+
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ name=name + "_branch2a")
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ name=name + "_branch2b")
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ name=name + "_branch2c")
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ if self.dilation > 1:
+ padding = self.dilation
+ y = F.pad(y, [padding, padding, padding, padding])
+
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = paddle.add(x=short, y=conv2)
+ y = F.relu(y)
+ return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+
+ Returns:
+ A callable object of Activation.
+
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+
+ Examples:
+
+ from paddleseg.models.common.activation import Activation
+
+ relu = Activation("relu")
+ print(relu)
+ #
+
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool= False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_voc/module.py b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..fed27ebf3a07794343c5841dc5c31b51e46f6544
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/module.py
@@ -0,0 +1,309 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from paddleseg.utils import utils
+from paddleseg.models import layers
+
+from ginet_resnet50vd_voc.resnet import ResNet50_vd
+
+
+@moduleinfo(
+ name="ginet_resnet50vd_voc",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="GINetResnet50 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class GINetResNet50(nn.Layer):
+ """
+ The GINetResNet50 implementation based on PaddlePaddle.
+ The original article refers to
+ Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
+ (https://arxiv.org/pdf/2009.06160).
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
+ If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
+ jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 21,
+ backbone_indices: Tuple[int]=(0, 1, 2, 3),
+ enable_auxiliary_loss:bool = True,
+ align_corners: bool = True,
+ jpu: bool = True,
+ pretrained: str = None):
+ super(GINetResNet50, self).__init__()
+ self.nclass = num_classes
+ self.aux = enable_auxiliary_loss
+ self.jpu = jpu
+
+ self.backbone = ResNet50_vd()
+ self.backbone_indices = backbone_indices
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
+ self.head = GIHead(in_channels=2048, nclass=num_classes)
+
+ if self.aux:
+ self.auxlayer = layers.AuxLayer(
+ 1024, 1024 // 4, num_classes, bias_attr=False)
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def base_forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feat_list = self.backbone(x)
+ c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
+
+ if self.jpu:
+ return self.jpu(c1, c2, c3, c4)
+ else:
+ return c1, c2, c3, c4
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ _, _, h, w = x.shape
+ _, _, c3, c4 = self.base_forward(x)
+
+ logit_list = []
+ x, _ = self.head(c4)
+ logit_list.append(x)
+
+ if self.aux:
+ auxout = self.auxlayer(c3)
+
+ logit_list.append(auxout)
+
+ return [
+ F.interpolate(
+ logit, (h, w),
+ mode='bilinear',
+ align_corners=self.align_corners) for logit in logit_list
+ ]
+
+
+class GIHead(nn.Layer):
+ """The Graph Interaction Network head."""
+
+ def __init__(self, in_channels: int, nclass: int):
+ super().__init__()
+ self.nclass = nclass
+ inter_channels = in_channels // 4
+ self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
+ self.inp = paddle.create_parameter(
+ shape=self.inp.shape,
+ dtype=str(self.inp.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.inp))
+
+ self.fc1 = nn.Sequential(
+ nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
+ self.fc2 = nn.Sequential(
+ nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
+ self.conv5 = layers.ConvBNReLU(
+ in_channels,
+ inter_channels,
+ 3,
+ padding=1,
+ bias_attr=False,
+ stride=1)
+
+ self.gloru = GlobalReasonUnit(
+ in_channels=inter_channels,
+ num_state=256,
+ num_node=84,
+ nclass=nclass)
+ self.conv6 = nn.Sequential(
+ nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ B, C, H, W = x.shape
+ inp = self.inp.detach()
+
+ inp = self.fc1(inp)
+ inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
+ .expand((B, 256, self.nclass))
+
+ out = self.conv5(x)
+
+ out, se_out = self.gloru(out, inp)
+ out = self.conv6(out)
+ return out, se_out
+
+
+class GlobalReasonUnit(nn.Layer):
+ """
+ The original paper refers to:
+ Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
+ """
+
+ def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
+ super().__init__()
+ self.num_state = num_state
+ self.conv_theta = nn.Conv2D(
+ in_channels, num_node, kernel_size=1, stride=1, padding=0)
+ self.conv_phi = nn.Conv2D(
+ in_channels, num_state, kernel_size=1, stride=1, padding=0)
+ self.graph = GraphLayer(num_state, num_node, nclass)
+ self.extend_dim = nn.Conv2D(
+ num_state, in_channels, kernel_size=1, bias_attr=False)
+
+ self.bn = layers.SyncBatchNorm(in_channels)
+
+ def forward(self, x: paddle.Tensor, inp: paddle.Tensor) -> List[paddle.Tensor]:
+ B = self.conv_theta(x)
+ sizeB = B.shape
+ B = B.reshape((sizeB[0], sizeB[1], -1))
+
+ sizex = x.shape
+ x_reduce = self.conv_phi(x)
+ x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
+ .transpose((0, 2, 1))
+
+ V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
+ V = paddle.divide(
+ V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
+
+ class_node, new_V = self.graph(inp, V)
+ D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
+ Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
+ Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
+ sizex[2], -1))
+ Y = self.extend_dim(Y)
+ Y = self.bn(Y)
+ out = Y + x
+
+ return out, class_node
+
+
+class GraphLayer(nn.Layer):
+ def __init__(self, num_state: int, num_node: int, num_class: int):
+ super().__init__()
+ self.vis_gcn = GCN(num_state, num_node)
+ self.word_gcn = GCN(num_state, num_class)
+ self.transfer = GraphTransfer(num_state)
+ self.gamma_vis = paddle.zeros([num_node])
+ self.gamma_word = paddle.zeros([num_class])
+ self.gamma_vis = paddle.create_parameter(
+ shape=self.gamma_vis.shape,
+ dtype=str(self.gamma_vis.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
+ self.gamma_word = paddle.create_parameter(
+ shape=self.gamma_word.shape,
+ dtype=str(self.gamma_word.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
+
+ def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ inp = self.word_gcn(inp)
+ new_V = self.vis_gcn(vis_node)
+ class_node, vis_node = self.transfer(inp, new_V)
+
+ class_node = self.gamma_word * inp + class_node
+ new_V = self.gamma_vis * vis_node + new_V
+ return class_node, new_V
+
+
+class GCN(nn.Layer):
+ def __init__(self, num_state: int = 128, num_node: int = 64, bias: bool = False):
+ super().__init__()
+ self.conv1 = nn.Conv1D(
+ num_node,
+ num_node,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ )
+ self.relu = nn.ReLU()
+ self.conv2 = nn.Conv1D(
+ num_state,
+ num_state,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ bias_attr=bias)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
+ h = h + x
+ h = self.relu(h)
+ h = self.conv2(h)
+ return h
+
+
+class GraphTransfer(nn.Layer):
+ """Transfer vis graph to class node, transfer class node to vis feature"""
+
+ def __init__(self, in_dim: int):
+ super().__init__()
+ self.channle_in = in_dim
+ self.query_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.key_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.value_conv_vis = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.value_conv_word = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.softmax_vis = nn.Softmax(axis=-1)
+ self.softmax_word = nn.Softmax(axis=-2)
+
+ def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ m_batchsize, C, Nc = word.shape
+ m_batchsize, C, Nn = vis_node.shape
+
+ proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
+ .transpose((0, 2, 1))
+ proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
+
+ energy = paddle.bmm(proj_query, proj_key)
+ attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
+ attention_word = self.softmax_word(energy)
+
+ proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
+ Nn))
+ proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
+ Nc))
+
+ class_out = paddle.bmm(proj_value_vis, attention_vis)
+ node_out = paddle.bmm(proj_value_word, attention_word)
+ return class_out, node_out
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_voc/resnet.py b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..79f648ef9f3381b41852a8010381a6087d6b7f72
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/resnet.py
@@ -0,0 +1,137 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import ginet_resnet50vd_voc.layers as L
+
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ name: str = None):
+ super(BasicBlock, self).__init__()
+ self.stride = stride
+ self.conv0 = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ name=name + "_branch2a")
+ self.conv1 = L.ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ act=None,
+ name=name + "_branch2b")
+
+ if not shortcut:
+ self.short = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = paddle.elementwise_add(x=short, y=conv1, act='relu')
+
+ return y
+
+
+class ResNet50_vd(nn.Layer):
+ def __init__(self,
+ multi_grid: tuple = (1, 2, 4)):
+ super(ResNet50_vd, self).__init__()
+ depth = [3, 4, 6, 3]
+ num_channels = [64, 256, 512, 1024]
+ num_filters = [64, 128, 256, 512]
+ self.feat_channels = [c * 4 for c in num_filters]
+ dilation_dict = {2: 2, 3: 4}
+ self.conv1_1 = L.ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ name="conv1_1")
+ self.conv1_2 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_2")
+ self.conv1_3 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_3")
+ self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+ self.stage_list = []
+
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ L.BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ name=conv_name,
+ dilation=dilation_rate))
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ y = self.pool2d_max(y)
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+ return feat_list
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/humanseg_lite/README.md b/modules/image/semantic_segmentation/humanseg_lite/README.md
index effab0ff515694b2e376711a097c76ab564fdcbe..67472e1818aae31ef2d78b09410b1646a7bc388f 100644
--- a/modules/image/semantic_segmentation/humanseg_lite/README.md
+++ b/modules/image/semantic_segmentation/humanseg_lite/README.md
@@ -48,7 +48,7 @@
```
hub run humanseg_lite --input_path "/PATH/TO/IMAGE"
```
-- ### 2、代码示例
+- ### 2、预测代码示例
- 图片分割及视频分割代码示例:
@@ -72,7 +72,7 @@
import numpy as np
import paddlehub as hub
- human_seg = hub.Module('humanseg_lite')
+ human_seg = hub.Module(name='humanseg_lite')
cap_video = cv2.VideoCapture('\PATH\TO\VIDEO')
fps = cap_video.get(cv2.CAP_PROP_FPS)
save_path = 'humanseg_lite_video.avi'
diff --git a/modules/image/semantic_segmentation/humanseg_lite/README_en.md b/modules/image/semantic_segmentation/humanseg_lite/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..e37ba0123129939cd84601293d5d8b1e536b93ad
--- /dev/null
+++ b/modules/image/semantic_segmentation/humanseg_lite/README_en.md
@@ -0,0 +1,255 @@
+# humanseg_lite
+
+|Module Name |humanseg_lite|
+| :--- | :---: |
+|Category |Image segmentation|
+|Network|shufflenet|
+|Dataset|Baidu self-built dataset|
+|Fine-tuning supported or not|No|
+|Module Size|541k|
+|Data indicators|-|
+|Latest update date|2021-02-26|
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - HumanSeg_lite is based on ShuffleNetV2 network. The network size is only 541K. It is suitable for selfie portrait segmentation and can be segmented in real time on the mobile terminal.
+
+ - For more information, please refer to:[humanseg_lite](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.2/contrib/HumanSeg)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install humanseg_lite
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```
+ hub run humanseg_lite --input_path "/PATH/TO/IMAGE"
+
+ ```
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+- ### 2、Prediction Code Example
+ - Image segmentation and video segmentation example:
+ - ```python
+ import cv2
+ import paddlehub as hub
+
+ human_seg = hub.Module(name='humanseg_lite')
+ im = cv2.imread('/PATH/TO/IMAGE')
+ res = human_seg.segment(images=[im],visualization=True)
+ print(res[0]['data'])
+ human_seg.video_segment('/PATH/TO/VIDEO')
+ human_seg.save_inference_model('/PATH/TO/SAVE/MODEL')
+
+ ```
+ - Video prediction example:
+
+ - ```python
+ import cv2
+ import numpy as np
+ import paddlehub as hub
+
+ human_seg = hub.Module('humanseg_lite')
+ cap_video = cv2.VideoCapture('\PATH\TO\VIDEO')
+ fps = cap_video.get(cv2.CAP_PROP_FPS)
+ save_path = 'humanseg_lite_video.avi'
+ width = int(cap_video.get(cv2.CAP_PROP_FRAME_WIDTH))
+ height = int(cap_video.get(cv2.CAP_PROP_FRAME_HEIGHT))
+ cap_out = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), fps, (width, height))
+ prev_gray = None
+ prev_cfd = None
+ while cap_video.isOpened():
+ ret, frame_org = cap_video.read()
+ if ret:
+ [img_matting, prev_gray, prev_cfd] = human_seg.video_stream_segment(frame_org=frame_org, frame_id=cap_video.get(1), prev_gray=prev_gray, prev_cfd=prev_cfd)
+ img_matting = np.repeat(img_matting[:, :, np.newaxis], 3, axis=2)
+ bg_im = np.ones_like(img_matting) * 255
+ comb = (img_matting * frame_org + (1 - img_matting) * bg_im).astype(np.uint8)
+ cap_out.write(comb)
+ else:
+ break
+
+ cap_video.release()
+ cap_out.release()
+
+ ```
+
+- ### 3、API
+
+ - ```python
+ def segment(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ visualization=False,
+ output_dir='humanseg_lite_output')
+ ```
+
+ - Prediction API, generating segmentation result.
+
+ - **Parameter**
+
+ * images (list\[numpy.ndarray\]): image data, ndarray.shape is in the format [H, W, C], BGR.
+ * paths (list\[str\]): image path.
+ * batch\_size (int): batch size.
+ * use\_gpu (bool): use GPU or not. **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**
+ * visualization (bool): Whether to save the results as picture files.
+ * output\_dir (str): save path of images, humanseg_lite_output by default.
+
+ - **Return**
+
+ * res (list\[dict\]): The list of recognition results, where each element is dict and each field is:
+ * save\_path (str, optional): Save path of the result.
+ * data (numpy.ndarray): The result of portrait segmentation.
+
+ - ```python
+ def video_stream_segment(self,
+ frame_org,
+ frame_id,
+ prev_gray,
+ prev_cfd,
+ use_gpu=False):
+ ```
+ - Prediction API, used to segment video portraits frame by frame.
+
+ - **Parameter**
+
+ * frame_org (numpy.ndarray): single frame for prediction,ndarray.shape is in the format [H, W, C], BGR.
+ * frame_id (int): The number of the current frame.
+ * prev_gray (numpy.ndarray): Grayscale image of the previous network input.
+ * prev_cfd (numpy.ndarray): The fusion image from optical flow and the prediction result from previous frame.
+ * use\_gpu (bool): use GPU or not. **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**
+
+
+ - **Return**
+
+ * img_matting (numpy.ndarray): The result of portrait segmentation.
+ * cur_gray (numpy.ndarray): Grayscale image of the current network input.
+ * optflow_map (numpy.ndarray): The fusion image from optical flow and the prediction result from current frame.
+
+
+ - ```python
+ def video_segment(self,
+ video_path=None,
+ use_gpu=False,
+ save_dir='humanseg_lite_video_result'):
+ ```
+
+ - Prediction API to produce video segmentation result.
+
+ - **Parameter**
+
+ * video\_path (str): Video path for segmentation。If None, the video will be obtained from the local camera, and a window will display the online segmentation result.
+ * use\_gpu (bool): use GPU or not. **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**
+ * save\_dir (str): save path of video.
+
+
+ - ```python
+ def save_inference_model(dirname='humanseg_lite_model',
+ model_filename=None,
+ params_filename=None,
+ combined=True)
+ ```
+
+
+ - Save the model to the specified path.
+
+ - **Parameters**
+
+ * dirname: Save path.
+ * model\_filename: model file name,defalt is \_\_model\_\_
+ * params\_filename: parameter file name,defalt is \_\_params\_\_(Only takes effect when `combined` is True)
+ * combined: Whether to save the parameters to a unified file.
+
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of for human segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ hub serving start -m humanseg_lite
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ - ```python
+ import requests
+ import json
+ import base64
+
+ import cv2
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # Send an HTTP request
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/humanseg_lite"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ mask =cv2.cvtColor(base64_to_cv2(r.json()["results"][0]['data']), cv2.COLOR_BGR2GRAY)
+ rgba = np.concatenate((org_im, np.expand_dims(mask, axis=2)), axis=2)
+ cv2.imwrite("segment_human_lite.png", rgba)
+ ```
+
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
+
+- 1.1.0
+
+ Added video portrait segmentation interface
+
+ Added video stream portrait segmentation interface
+* 1.1.1
+
+ Fix memory leakage problem of on cudnn 8.0.4
diff --git a/modules/image/semantic_segmentation/humanseg_mobile/README.md b/modules/image/semantic_segmentation/humanseg_mobile/README.md
index 2e65c49b47a6c8751c4581bef5a7258e872cd078..188234ed27f826c9f1bf99454616237a3e102fb6 100644
--- a/modules/image/semantic_segmentation/humanseg_mobile/README.md
+++ b/modules/image/semantic_segmentation/humanseg_mobile/README.md
@@ -52,7 +52,7 @@
```
hub run humanseg_mobile --input_path "/PATH/TO/IMAGE"
```
-- ### 2、代码示例
+- ### 2、预测代码示例
- 图片分割及视频分割代码示例:
@@ -76,7 +76,7 @@
import numpy as np
import paddlehub as hub
- human_seg = hub.Module('humanseg_mobile')
+ human_seg = hub.Module(name='humanseg_mobile')
cap_video = cv2.VideoCapture('\PATH\TO\VIDEO')
fps = cap_video.get(cv2.CAP_PROP_FPS)
save_path = 'humanseg_mobile_video.avi'
diff --git a/modules/image/semantic_segmentation/humanseg_mobile/README_en.md b/modules/image/semantic_segmentation/humanseg_mobile/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..7af902ceda26f503c00311a2d9da445ea500cbeb
--- /dev/null
+++ b/modules/image/semantic_segmentation/humanseg_mobile/README_en.md
@@ -0,0 +1,256 @@
+# humanseg_mobile
+
+|Module Name |humanseg_mobile|
+| :--- | :---: |
+|Category |Image segmentation|
+|Network|hrnet|
+|Dataset|Baidu self-built dataset|
+|Fine-tuning supported or not|No|
+|Module Size|5.8M|
+|Data indicators|-|
+|Latest update date|2021-02-26|
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - HumanSeg_mobile is based on HRNet_w18_small_v1 network. The network size is only 5.8M. It is suitable for selfie portrait segmentation and can be segmented in real time on the mobile terminal.
+
+ - For more information, please refer to:[humanseg_mobile](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.2/contrib/HumanSeg)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install humanseg_mobile
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```
+ hub run humanseg_mobile --input_path "/PATH/TO/IMAGE"
+
+ ```
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+- ### 2、Prediction Code Example
+ - Image segmentation and video segmentation example:
+ ```python
+ import cv2
+ import paddlehub as hub
+
+ human_seg = hub.Module(name='humanseg_mobile')
+ im = cv2.imread('/PATH/TO/IMAGE')
+ res = human_seg.segment(images=[im],visualization=True)
+ print(res[0]['data'])
+ human_seg.video_segment('/PATH/TO/VIDEO')
+ human_seg.save_inference_model('/PATH/TO/SAVE/MODEL')
+
+ ```
+ - Video prediction example:
+
+ ```python
+ import cv2
+ import numpy as np
+ import paddlehub as hub
+
+ human_seg = hub.Module('humanseg_mobile')
+ cap_video = cv2.VideoCapture('\PATH\TO\VIDEO')
+ fps = cap_video.get(cv2.CAP_PROP_FPS)
+ save_path = 'humanseg_mobile_video.avi'
+ width = int(cap_video.get(cv2.CAP_PROP_FRAME_WIDTH))
+ height = int(cap_video.get(cv2.CAP_PROP_FRAME_HEIGHT))
+ cap_out = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), fps, (width, height))
+ prev_gray = None
+ prev_cfd = None
+ while cap_video.isOpened():
+ ret, frame_org = cap_video.read()
+ if ret:
+ [img_matting, prev_gray, prev_cfd] = human_seg.video_stream_segment(frame_org=frame_org, frame_id=cap_video.get(1), prev_gray=prev_gray, prev_cfd=prev_cfd)
+ img_matting = np.repeat(img_matting[:, :, np.newaxis], 3, axis=2)
+ bg_im = np.ones_like(img_matting) * 255
+ comb = (img_matting * frame_org + (1 - img_matting) * bg_im).astype(np.uint8)
+ cap_out.write(comb)
+ else:
+ break
+
+ cap_video.release()
+ cap_out.release()
+
+ ```
+
+- ### 3、API
+
+ ```python
+ def segment(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ visualization=False,
+ output_dir='humanseg_mobile_output')
+ ```
+
+ - Prediction API, generating segmentation result.
+
+ - **Parameter**
+
+ * images (list\[numpy.ndarray\]): image data, ndarray.shape is in the format [H, W, C], BGR.
+ * paths (list\[str\]): image path.
+ * batch\_size (int): batch size.
+ * use\_gpu (bool): use GPU or not. **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**
+ * visualization (bool): Whether to save the results as picture files.
+ * output\_dir (str): save path of images, humanseg_mobile_output by default.
+
+ - **Return**
+
+ * res (list\[dict\]): The list of recognition results, where each element is dict and each field is:
+ * save\_path (str, optional): Save path of the result.
+ * data (numpy.ndarray): The result of portrait segmentation.
+
+ ```python
+ def video_stream_segment(self,
+ frame_org,
+ frame_id,
+ prev_gray,
+ prev_cfd,
+ use_gpu=False):
+ ```
+
+ - Prediction API, used to segment video portraits frame by frame.
+
+ - **Parameter**
+
+ * frame_org (numpy.ndarray): single frame for prediction,ndarray.shape is in the format [H, W, C], BGR.
+ * frame_id (int): The number of the current frame.
+ * prev_gray (numpy.ndarray): Grayscale image of the previous network input.
+ * prev_cfd (numpy.ndarray): The fusion image from optical flow and the prediction result from previous frame.
+ * use\_gpu (bool): Use GPU or not. **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**
+
+
+ - **Return**
+
+ * img_matting (numpy.ndarray): The result of portrait segmentation.
+ * cur_gray (numpy.ndarray): Grayscale image of the current network input.
+ * optflow_map (numpy.ndarray): The fusion image from optical flow and the prediction result from current frame.
+
+
+ ```python
+ def video_segment(self,
+ video_path=None,
+ use_gpu=False,
+ save_dir='humanseg_mobile_video_result'):
+ ```
+
+ - Prediction API to produce video segmentation result.
+
+ - **Parameter**
+
+ * video\_path (str): Video path for segmentation。If None, the video will be obtained from the local camera, and a window will display the online segmentation result.
+ * use\_gpu (bool): Use GPU or not. **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**
+ * save\_dir (str): save path of video.
+
+
+ ```python
+ def save_inference_model(dirname='humanseg_mobile_model',
+ model_filename=None,
+ params_filename=None,
+ combined=True)
+ ```
+
+
+ - Save the model to the specified path.
+
+ - **Parameters**
+
+ * dirname: Save path.
+ * model\_filename: Model file name,defalt is \_\_model\_\_
+ * params\_filename: Parameter file name,defalt is \_\_params\_\_(Only takes effect when `combined` is True)
+ * combined: Whether to save the parameters to a unified file.
+
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of for human segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m humanseg_mobile
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ ```python
+ import requests
+ import json
+ import base64
+
+ import cv2
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # Send an HTTP request
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/humanseg_mobile"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ mask =cv2.cvtColor(base64_to_cv2(r.json()["results"][0]['data']), cv2.COLOR_BGR2GRAY)
+ rgba = np.concatenate((org_im, np.expand_dims(mask, axis=2)), axis=2)
+ cv2.imwrite("segment_human_mobile.png", rgba)
+ ```
+
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
+
+- 1.1.0
+
+ Added video portrait split interface
+
+ Added video stream portrait segmentation interface
+* 1.1.1
+
+ Fix the video memory leakage problem of on cudnn 8.0.4
diff --git a/modules/image/semantic_segmentation/humanseg_server/README.md b/modules/image/semantic_segmentation/humanseg_server/README.md
index 8845cb82cd109e6ddfb7b92f01f607333dada588..35e19365cc9f0b6c034ab6012faf5f7355fceaa3 100644
--- a/modules/image/semantic_segmentation/humanseg_server/README.md
+++ b/modules/image/semantic_segmentation/humanseg_server/README.md
@@ -51,7 +51,7 @@
```
hub run humanseg_server --input_path "/PATH/TO/IMAGE"
```
-- ### 2、代码示例
+- ### 2、预测代码示例
- 图片分割及视频分割代码示例:
@@ -75,7 +75,7 @@
import numpy as np
import paddlehub as hub
- human_seg = hub.Module('humanseg_server')
+ human_seg = hub.Module(name='humanseg_server')
cap_video = cv2.VideoCapture('\PATH\TO\VIDEO')
fps = cap_video.get(cv2.CAP_PROP_FPS)
save_path = 'humanseg_server_video.avi'
diff --git a/modules/image/semantic_segmentation/humanseg_server/README_en.md b/modules/image/semantic_segmentation/humanseg_server/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..052b37e2af72d9090de2e2950ee2284695cba695
--- /dev/null
+++ b/modules/image/semantic_segmentation/humanseg_server/README_en.md
@@ -0,0 +1,255 @@
+# humanseg_server
+
+|Module Name |humanseg_server|
+| :--- | :---: |
+|Category |Image segmentation|
+|Network|hrnet|
+|Dataset|Baidu self-built dataset|
+|Fine-tuning supported or not|No|
+|Module Size|159MB|
+|Data indicators|-|
+|Latest update date|2021-02-26|
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - HumanSeg-server model is trained by Baidu self-built dataset, which can be used for portrait segmentation.
+
+ - For more information, please refer to:[humanseg_server](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.2/contrib/HumanSeg)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install humanseg_server
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```
+ hub run humanseg_server --input_path "/PATH/TO/IMAGE"
+ ```
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+- ### 2、Prediction Code Example
+ - Image segmentation and video segmentation example:
+ ```python
+ import cv2
+ import paddlehub as hub
+
+ human_seg = hub.Module(name='humanseg_server')
+ im = cv2.imread('/PATH/TO/IMAGE')
+ res = human_seg.segment(images=[im],visualization=True)
+ print(res[0]['data'])
+ human_seg.video_segment('/PATH/TO/VIDEO')
+ human_seg.save_inference_model('/PATH/TO/SAVE/MODEL')
+
+ ```
+ - Video prediction example:
+
+ ```python
+ import cv2
+ import numpy as np
+ import paddlehub as hub
+
+ human_seg = hub.Module('humanseg_server')
+ cap_video = cv2.VideoCapture('\PATH\TO\VIDEO')
+ fps = cap_video.get(cv2.CAP_PROP_FPS)
+ save_path = 'humanseg_server_video.avi'
+ width = int(cap_video.get(cv2.CAP_PROP_FRAME_WIDTH))
+ height = int(cap_video.get(cv2.CAP_PROP_FRAME_HEIGHT))
+ cap_out = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), fps, (width, height))
+ prev_gray = None
+ prev_cfd = None
+ while cap_video.isOpened():
+ ret, frame_org = cap_video.read()
+ if ret:
+ [img_matting, prev_gray, prev_cfd] = human_seg.video_stream_segment(frame_org=frame_org, frame_id=cap_video.get(1), prev_gray=prev_gray, prev_cfd=prev_cfd)
+ img_matting = np.repeat(img_matting[:, :, np.newaxis], 3, axis=2)
+ bg_im = np.ones_like(img_matting) * 255
+ comb = (img_matting * frame_org + (1 - img_matting) * bg_im).astype(np.uint8)
+ cap_out.write(comb)
+ else:
+ break
+
+ cap_video.release()
+ cap_out.release()
+
+ ```
+
+- ### 3、API
+
+ ```python
+ def segment(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ visualization=False,
+ output_dir='humanseg_server_output')
+ ```
+
+ - Prediction API, generating segmentation result.
+
+ - **Parameter**
+
+ * images (list\[numpy.ndarray\]): Image data, ndarray.shape is in the format [H, W, C], BGR.
+ * paths (list\[str\]): Image path.
+ * batch\_size (int): Batch size.
+ * use\_gpu (bool): Use GPU or not. **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**
+ * visualization (bool): Whether to save the results as picture files.
+ * output\_dir (str): Save path of images, humanseg_server_output by default.
+
+ - **Return**
+
+ * res (list\[dict\]): The list of recognition results, where each element is dict and each field is:
+ * save\_path (str, optional): Save path of the result.
+ * data (numpy.ndarray): The result of portrait segmentation.
+
+ ```python
+ def video_stream_segment(self,
+ frame_org,
+ frame_id,
+ prev_gray,
+ prev_cfd,
+ use_gpu=False):
+ ```
+
+ - Prediction API, used to segment video portraits frame by frame.
+
+ - **Parameter**
+
+ * frame_org (numpy.ndarray): Single frame for prediction,ndarray.shape is in the format [H, W, C], BGR.
+ * frame_id (int): The number of the current frame.
+ * prev_gray (numpy.ndarray): Grayscale image of the previous network input.
+ * prev_cfd (numpy.ndarray): The fusion image from optical flow and the prediction result from previous frame.
+ * use\_gpu (bool): Use GPU or not. **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**
+
+
+ - **Return**
+
+ * img_matting (numpy.ndarray): The result of portrait segmentation.
+ * cur_gray (numpy.ndarray): Grayscale image of the current network input.
+ * optflow_map (numpy.ndarray): The fusion image from optical flow and the prediction result from current frame.
+
+
+ ```python
+ def video_segment(self,
+ video_path=None,
+ use_gpu=False,
+ save_dir='humanseg_server_video_result'):
+ ```
+
+ - Prediction API to produce video segmentation result.
+
+ - **Parameter**
+
+ * video\_path (str): Video path for segmentation。If None, the video will be obtained from the local camera, and a window will display the online segmentation result.
+ * use\_gpu (bool): Use GPU or not. **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU**
+ * save\_dir (str): Save path of video.
+
+
+ ```python
+ def save_inference_model(dirname='humanseg_server_model',
+ model_filename=None,
+ params_filename=None,
+ combined=True)
+ ```
+
+
+ - Save the model to the specified path.
+
+ - **Parameters**
+
+ * dirname: Save path.
+ * model\_filename: Model file name,defalt is \_\_model\_\_
+ * params\_filename: Parameter file name,defalt is \_\_params\_\_(Only takes effect when `combined` is True)
+ * combined: Whether to save the parameters to a unified file.
+
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of for human segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m humanseg_server
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ - ```python
+ import requests
+ import json
+ import base64
+
+ import cv2
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # Send an HTTP request
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/humanseg_server"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ mask =cv2.cvtColor(base64_to_cv2(r.json()["results"][0]['data']), cv2.COLOR_BGR2GRAY)
+ rgba = np.concatenate((org_im, np.expand_dims(mask, axis=2)), axis=2)
+ cv2.imwrite("segment_human_server.png", rgba)
+ ```
+
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
+
+- 1.1.0
+
+ Added video portrait segmentation interface
+
+ Added video stream portrait segmentation interface
+
+* 1.1.1
+
+ Fix memory leakage problem of on cudnn 8.0.4
diff --git a/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..84180ef991177b07ba1e9d652743de294449caa3
--- /dev/null
+++ b/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/README.md
@@ -0,0 +1,165 @@
+# arabic_ocr_db_crnn_mobile
+
+|模型名称|arabic_ocr_db_crnn_mobile|
+| :--- | :---: |
+|类别|图像-文字识别|
+|网络|Differentiable Binarization+CRNN|
+|数据集|icdar2015数据集|
+|是否支持Fine-tuning|否|
+|最新更新日期|2021-12-2|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - arabic_ocr_db_crnn_mobile Module用于识别图片当中的阿拉伯文字,包括阿拉伯文、波斯文、维吾尔文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的阿拉伯文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别阿拉伯文的轻量级OCR模型,支持直接预测。
+
+ - 更多详情参考:
+ - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
+ - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.2
+
+ - paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install arabic_ocr_db_crnn_mobile
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run arabic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
+ $ hub run arabic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
+ ```
+ - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ ocr = hub.Module(name="arabic_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
+ result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
+
+ # or
+ # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+ - ```python
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9)
+ ```
+
+ - 构造ArabicOCRDBCRNNMobile对象
+
+ - **参数**
+ - det(bool): 是否开启文字检测。默认为True。
+ - rec(bool): 是否开启文字识别。默认为True。
+ - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
+ - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - box\_thresh (float): 检测文本框置信度的阈值;
+ - angle_classification_thresh(float): 文本方向分类置信度的阈值
+
+
+ - ```python
+ def recognize_text(images=[],
+ paths=[],
+ output_dir='ocr_result',
+ visualization=False)
+ ```
+
+ - 预测API,检测输入图片中的所有文本的位置和识别文本结果。
+
+ - **参数**
+
+ - paths (list\[str\]): 图片的路径;
+ - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
+ - output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
+ - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
+
+ - **返回**
+
+ - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
+ - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
+ - text(str): 识别得到的文本
+ - confidence(float): 识别文本结果置信度
+ - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
+ - orientation(str): 分类的方向,仅在只有方向分类开启时输出
+ - score(float): 分类的得分,仅在只有方向分类开启时输出
+ - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
+
+
+## 四、服务部署
+
+- PaddleHub Serving 可以部署一个目标检测的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m arabic_ocr_db_crnn_mobile
+ ```
+
+ - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/arabic_ocr_db_crnn_mobile"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+ - ```shell
+ $ hub install arabic_ocr_db_crnn_mobile==1.0.0
+ ```
diff --git a/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..e1d603f6eabdb622b5cf58b9a5b645e991d3889a
--- /dev/null
+++ b/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/module.py
@@ -0,0 +1,87 @@
+import paddlehub as hub
+from paddleocr.ppocr.utils.logging import get_logger
+from paddleocr.tools.infer.utility import base64_to_cv2
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+
+@moduleinfo(
+ name="arabic_ocr_db_crnn_mobile",
+ version="1.1.0",
+ summary="ocr service",
+ author="PaddlePaddle",
+ type="cv/text_recognition")
+class ArabicOCRDBCRNNMobile:
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9):
+ """
+ initialize with the necessary elements
+ Args:
+ det(bool): Whether to use text detector.
+ rec(bool): Whether to use text recognizer.
+ use_angle_cls(bool): Whether to use text orientation classifier.
+ enable_mkldnn(bool): Whether to enable mkldnn.
+ use_gpu (bool): Whether to use gpu.
+ box_thresh(float): the threshold of the detected text box's confidence
+ angle_classification_thresh(float): the threshold of the angle classification confidence
+ """
+ self.logger = get_logger()
+ self.model = hub.Module(
+ name="multi_languages_ocr_db_crnn",
+ lang="arabic",
+ det=det,
+ rec=rec,
+ use_angle_cls=use_angle_cls,
+ enable_mkldnn=enable_mkldnn,
+ use_gpu=use_gpu,
+ box_thresh=box_thresh,
+ angle_classification_thresh=angle_classification_thresh)
+ self.model.name = self.name
+
+ def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
+ """
+ Get the text in the predicted images.
+ Args:
+ images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
+ paths (list[str]): The paths of images. If paths not images
+ output_dir (str): The directory to store output images.
+ visualization (bool): Whether to save image or not.
+ Returns:
+ res (list): The result of text detection box and save path of images.
+ """
+ all_results = self.model.recognize_text(
+ images=images, paths=paths, output_dir=output_dir, visualization=visualization)
+ return all_results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.recognize_text(images_decode, **kwargs)
+ return results
+
+ @runnable
+ def run_cmd(self, argvs):
+ """
+ Run as a command
+ """
+ results = self.model.run_cmd(argvs)
+ return results
+
+ def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
+ '''
+ Export the model to ONNX format.
+
+ Args:
+ dirname(str): The directory to save the onnx model.
+ input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
+ opset_version(int): operator set
+ '''
+ self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
diff --git a/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3
--- /dev/null
+++ b/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/requirements.txt
@@ -0,0 +1,4 @@
+paddleocr>=2.3.0.2
+paddle2onnx>=0.9.0
+shapely
+pyclipper
diff --git a/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..8a234985b78d0bf05a89ed42a6d27b1117f0b924
--- /dev/null
+++ b/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/README.md
@@ -0,0 +1,165 @@
+# chinese_cht_ocr_db_crnn_mobile
+
+|模型名称|chinese_cht_ocr_db_crnn_mobile|
+| :--- | :---: |
+|类别|图像-文字识别|
+|网络|Differentiable Binarization+CRNN|
+|数据集|icdar2015数据集|
+|是否支持Fine-tuning|否|
+|最新更新日期|2021-12-2|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - chinese_cht_ocr_db_crnn_mobile Module用于识别图片当中的繁体中文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的繁体中文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别繁体中文的轻量级OCR模型,支持直接预测。
+
+ - 更多详情参考:
+ - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
+ - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.2
+
+ - paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install chinese_cht_ocr_db_crnn_mobile
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run chinese_cht_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
+ $ hub run chinese_cht_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
+ ```
+ - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ ocr = hub.Module(name="chinese_cht_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
+ result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
+
+ # or
+ # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+ - ```python
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9)
+ ```
+
+ - 构造ChineseChtOCRDBCRNNMobile对象
+
+ - **参数**
+ - det(bool): 是否开启文字检测。默认为True。
+ - rec(bool): 是否开启文字识别。默认为True。
+ - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
+ - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - box\_thresh (float): 检测文本框置信度的阈值;
+ - angle_classification_thresh(float): 文本方向分类置信度的阈值
+
+
+ - ```python
+ def recognize_text(images=[],
+ paths=[],
+ output_dir='ocr_result',
+ visualization=False)
+ ```
+
+ - 预测API,检测输入图片中的所有文本的位置和识别文本结果。
+
+ - **参数**
+
+ - paths (list\[str\]): 图片的路径;
+ - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
+ - output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
+ - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
+
+ - **返回**
+
+ - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
+ - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
+ - text(str): 识别得到的文本
+ - confidence(float): 识别文本结果置信度
+ - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
+ - orientation(str): 分类的方向,仅在只有方向分类开启时输出
+ - score(float): 分类的得分,仅在只有方向分类开启时输出
+ - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
+
+
+## 四、服务部署
+
+- PaddleHub Serving 可以部署一个目标检测的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m chinese_cht_ocr_db_crnn_mobile
+ ```
+
+ - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/chinese_cht_ocr_db_crnn_mobile"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+ - ```shell
+ $ hub install chinese_cht_ocr_db_crnn_mobile==1.0.0
+ ```
diff --git a/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..b1c10a8feab26bb3a00e235c00de56d7476476bb
--- /dev/null
+++ b/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/module.py
@@ -0,0 +1,87 @@
+import paddlehub as hub
+from paddleocr.ppocr.utils.logging import get_logger
+from paddleocr.tools.infer.utility import base64_to_cv2
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+
+@moduleinfo(
+ name="chinese_cht_ocr_db_crnn_mobile",
+ version="1.0.0",
+ summary="ocr service",
+ author="PaddlePaddle",
+ type="cv/text_recognition")
+class ChineseChtOCRDBCRNNMobile:
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9):
+ """
+ initialize with the necessary elements
+ Args:
+ det(bool): Whether to use text detector.
+ rec(bool): Whether to use text recognizer.
+ use_angle_cls(bool): Whether to use text orientation classifier.
+ enable_mkldnn(bool): Whether to enable mkldnn.
+ use_gpu (bool): Whether to use gpu.
+ box_thresh(float): the threshold of the detected text box's confidence
+ angle_classification_thresh(float): the threshold of the angle classification confidence
+ """
+ self.logger = get_logger()
+ self.model = hub.Module(
+ name="multi_languages_ocr_db_crnn",
+ lang="chinese_cht",
+ det=det,
+ rec=rec,
+ use_angle_cls=use_angle_cls,
+ enable_mkldnn=enable_mkldnn,
+ use_gpu=use_gpu,
+ box_thresh=box_thresh,
+ angle_classification_thresh=angle_classification_thresh)
+ self.model.name = self.name
+
+ def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
+ """
+ Get the text in the predicted images.
+ Args:
+ images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
+ paths (list[str]): The paths of images. If paths not images
+ output_dir (str): The directory to store output images.
+ visualization (bool): Whether to save image or not.
+ Returns:
+ res (list): The result of text detection box and save path of images.
+ """
+ all_results = self.model.recognize_text(
+ images=images, paths=paths, output_dir=output_dir, visualization=visualization)
+ return all_results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.recognize_text(images_decode, **kwargs)
+ return results
+
+ @runnable
+ def run_cmd(self, argvs):
+ """
+ Run as a command
+ """
+ results = self.model.run_cmd(argvs)
+ return results
+
+ def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
+ '''
+ Export the model to ONNX format.
+
+ Args:
+ dirname(str): The directory to save the onnx model.
+ input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
+ opset_version(int): operator set
+ '''
+ self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
diff --git a/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3
--- /dev/null
+++ b/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/requirements.txt
@@ -0,0 +1,4 @@
+paddleocr>=2.3.0.2
+paddle2onnx>=0.9.0
+shapely
+pyclipper
diff --git a/modules/image/text_recognition/chinese_ocr_db_crnn_mobile/README_en.md b/modules/image/text_recognition/chinese_ocr_db_crnn_mobile/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..679b2a0598933d4c5450adca1c997e1a4c323ef4
--- /dev/null
+++ b/modules/image/text_recognition/chinese_ocr_db_crnn_mobile/README_en.md
@@ -0,0 +1,202 @@
+# chinese_ocr_db_crnn_mobile
+
+| Module Name | chinese_ocr_db_crnn_mobile |
+| :------------------ | :------------: |
+| Category | image-text_recognition |
+| Network | Differentiable Binarization+RCNN |
+| Dataset | icdar2015 |
+| Fine-tuning supported or not | No |
+| Module Size | 16M |
+| Latest update date | 2021-02-26 |
+| Data indicators | - |
+
+
+## I. Basic Information of Module
+
+- ### Application Effect Display
+ - [Online experience in OCR text recognition scenarios](https://www.paddlepaddle.org.cn/hub/scene/ocr)
+ - Example result:
+
+
+
+
+- ### Module Introduction
+
+ - chinese_ocr_db_crnn_mobile Module is used to identify Chinese characters in pictures. Get the text box after using [chinese_text_detection_db_mobile Module](../chinese_text_detection_db_mobile/), identify the Chinese characters in the text box, and then do angle classification to the detection text box. CRNN(Convolutional Recurrent Neural Network) is adopted as the final recognition algorithm. This Module is an ultra-lightweight Chinese OCR model that supports direct prediction.
+
+
+
+
+
+
+ - For more information, please refer to:[An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
+
+
+
+## II. Installation
+
+- ### 1、Environmental dependence
+
+ - paddlepaddle >= 1.7.2
+
+ - paddlehub >= 1.6.0 | [How to install PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+ - shapely
+
+ - pyclipper
+
+ - ```shell
+ $ pip install shapely pyclipper
+ ```
+ - **This Module relies on the third-party libraries shapely and pyclipper. Please install shapely and pyclipper before using this Module.**
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install chinese_ocr_db_crnn_mobile
+ ```
+ - If you have problems during installation, please refer to:[windows_quickstart](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [linux_quickstart](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [mac_quickstart](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## III. Module API and Prediction
+
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run chinese_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
+ ```
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command line instruction](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ ocr = hub.Module(name="chinese_ocr_db_crnn_mobile", enable_mkldnn=True) # MKLDNN acceleration is only available on CPU
+ result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
+
+ # or
+ # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+ - ```python
+ __init__(text_detector_module=None, enable_mkldnn=False)
+ ```
+
+ - Construct the ChineseOCRDBCRNN object
+
+ - **Parameter**
+
+ - text_detector_module(str): PaddleHub Module Name for text detection, use [chinese_text_detection_db_mobile Module](../chinese_text_detection_db_mobile/) by default if set to None. Its function is to detect the text in the picture.
+ - enable_mkldnn(bool): Whether to enable MKLDNN to accelerate CPU computing. This parameter is valid only when the CPU is running. The default is False.
+
+
+ - ```python
+ def recognize_text(images=[],
+ paths=[],
+ use_gpu=False,
+ output_dir='ocr_result',
+ visualization=False,
+ box_thresh=0.5,
+ text_thresh=0.5,
+ angle_classification_thresh=0.9)
+ ```
+
+ - Prediction API, detecting the position of all Chinese text in the input image.
+
+ - **Parameter**
+
+ - paths (list\[str\]): image path
+ - images (list\[numpy.ndarray\]): image data, ndarray.shape is in the format \[H, W, C\], BGR;
+ - use\_gpu (bool): use GPU or not **If GPU is used, set the CUDA_VISIBLE_DEVICES environment variable first**
+ - box\_thresh (float): The confidence threshold of text box detection;
+ - text\_thresh (float): The confidence threshold of Chinese text recognition;
+ - angle_classification_thresh(float): The confidence threshold of text Angle classification
+ - visualization (bool): Whether to save the recognition results as picture files;
+ - output\_dir (str): path to save the image, ocr\_result by default.
+
+ - **Return**
+
+ - res (list\[dict\]): The list of recognition results, where each element is dict and each field is:
+ - data (list\[dict\]): recognition result, each element in the list is dict and each field is:
+ - text(str): The result text of recognition
+ - confidence(float): The confidence of the results
+ - text_box_position(list): The pixel coordinates of the text box in the original picture, a 4*2 matrix, represent the coordinates of the lower left, lower right, upper right and upper left vertices of the text box in turn
+ data is \[\] if there's no result
+ - save_path (str, optional): Path to save the result, save_path is '' if no image is saved.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online object detection service.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+ - ```shell
+ $ hub serving start -m chinese_ocr_db_crnn_mobile
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before prediction. Otherwise, need not set it.
+
+
+- ### Step 2: Send a predictive request
+
+ - After configuring the server, the following lines of code can be used to send the prediction request and obtain the prediction result
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # Send an HTTP request
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/chinese_ocr_db_crnn_mobile"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # print prediction result
+ print(r.json()["results"])
+ ```
+
+## V. Release Note
+
+* 1.0.0
+
+ First release
+
+* 1.0.1
+
+ Fixed failure to use the online service invocating model
+
+* 1.0.2
+
+ Supports MKLDNN to speed up CPU computing
+
+* 1.1.0
+
+ An ultra-lightweight three-stage model (text box detection - angle classification - text recognition) is used to identify text in images.
+
+* 1.1.1
+
+ Supports recognition of spaces in text.
+
+* 1.1.2
+
+ Fixed an issue where only 30 fields can be detected.
+
+ - ```shell
+ $ hub install chinese_ocr_db_crnn_mobile==1.1.2
+ ```
diff --git a/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..24eb4f6d9bc7d02963519457d0b1bdcb657ca330
--- /dev/null
+++ b/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/README.md
@@ -0,0 +1,165 @@
+# cyrillic_ocr_db_crnn_mobile
+
+|模型名称|cyrillic_ocr_db_crnn_mobile|
+| :--- | :---: |
+|类别|图像-文字识别|
+|网络|Differentiable Binarization+CRNN|
+|数据集|icdar2015数据集|
+|是否支持Fine-tuning|否|
+|最新更新日期|2021-12-2|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - cyrillic_ocr_db_crnn_mobile Module用于识别图片当中的斯拉夫文,包括俄罗斯文、塞尔维亚文、白俄罗斯文、保加利亚文、乌克兰文、蒙古文、阿迪赫文、阿瓦尔文、达尔瓦文、因古什文、拉克文、莱兹甘文、塔巴萨兰文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的斯拉夫文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别斯拉夫文的轻量级OCR模型,支持直接预测。
+
+ - 更多详情参考:
+ - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
+ - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.2
+
+ - paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install cyrillic_ocr_db_crnn_mobile
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run cyrillic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
+ $ hub run cyrillic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
+ ```
+ - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ ocr = hub.Module(name="cyrillic_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
+ result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
+
+ # or
+ # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+ - ```python
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9)
+ ```
+
+ - 构造CyrillicOCRDBCRNNMobile对象
+
+ - **参数**
+ - det(bool): 是否开启文字检测。默认为True。
+ - rec(bool): 是否开启文字识别。默认为True。
+ - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
+ - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - box\_thresh (float): 检测文本框置信度的阈值;
+ - angle_classification_thresh(float): 文本方向分类置信度的阈值
+
+
+ - ```python
+ def recognize_text(images=[],
+ paths=[],
+ output_dir='ocr_result',
+ visualization=False)
+ ```
+
+ - 预测API,检测输入图片中的所有文本的位置和识别文本结果。
+
+ - **参数**
+
+ - paths (list\[str\]): 图片的路径;
+ - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
+ - output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
+ - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
+
+ - **返回**
+
+ - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
+ - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
+ - text(str): 识别得到的文本
+ - confidence(float): 识别文本结果置信度
+ - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
+ - orientation(str): 分类的方向,仅在只有方向分类开启时输出
+ - score(float): 分类的得分,仅在只有方向分类开启时输出
+ - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
+
+
+## 四、服务部署
+
+- PaddleHub Serving 可以部署一个目标检测的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m cyrillic_ocr_db_crnn_mobile
+ ```
+
+ - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/cyrillic_ocr_db_crnn_mobile"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+ - ```shell
+ $ hub install cyrillic_ocr_db_crnn_mobile==1.0.0
+ ```
diff --git a/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..bd182e6693ddb72059fbb3a5cc28a96e3f27c1e6
--- /dev/null
+++ b/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/module.py
@@ -0,0 +1,87 @@
+import paddlehub as hub
+from paddleocr.ppocr.utils.logging import get_logger
+from paddleocr.tools.infer.utility import base64_to_cv2
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+
+@moduleinfo(
+ name="cyrillic_ocr_db_crnn_mobile",
+ version="1.0.0",
+ summary="ocr service",
+ author="PaddlePaddle",
+ type="cv/text_recognition")
+class CyrillicOCRDBCRNNMobile:
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9):
+ """
+ initialize with the necessary elements
+ Args:
+ det(bool): Whether to use text detector.
+ rec(bool): Whether to use text recognizer.
+ use_angle_cls(bool): Whether to use text orientation classifier.
+ enable_mkldnn(bool): Whether to enable mkldnn.
+ use_gpu (bool): Whether to use gpu.
+ box_thresh(float): the threshold of the detected text box's confidence
+ angle_classification_thresh(float): the threshold of the angle classification confidence
+ """
+ self.logger = get_logger()
+ self.model = hub.Module(
+ name="multi_languages_ocr_db_crnn",
+ lang="cyrillic",
+ det=det,
+ rec=rec,
+ use_angle_cls=use_angle_cls,
+ enable_mkldnn=enable_mkldnn,
+ use_gpu=use_gpu,
+ box_thresh=box_thresh,
+ angle_classification_thresh=angle_classification_thresh)
+ self.model.name = self.name
+
+ def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
+ """
+ Get the text in the predicted images.
+ Args:
+ images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
+ paths (list[str]): The paths of images. If paths not images
+ output_dir (str): The directory to store output images.
+ visualization (bool): Whether to save image or not.
+ Returns:
+ res (list): The result of text detection box and save path of images.
+ """
+ all_results = self.model.recognize_text(
+ images=images, paths=paths, output_dir=output_dir, visualization=visualization)
+ return all_results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.recognize_text(images_decode, **kwargs)
+ return results
+
+ @runnable
+ def run_cmd(self, argvs):
+ """
+ Run as a command
+ """
+ results = self.model.run_cmd(argvs)
+ return results
+
+ def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
+ '''
+ Export the model to ONNX format.
+
+ Args:
+ dirname(str): The directory to save the onnx model.
+ input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
+ opset_version(int): operator set
+ '''
+ self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
diff --git a/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3
--- /dev/null
+++ b/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/requirements.txt
@@ -0,0 +1,4 @@
+paddleocr>=2.3.0.2
+paddle2onnx>=0.9.0
+shapely
+pyclipper
diff --git a/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..a47c2dd12e04d55f116fd52a3008470ef6fe94b8
--- /dev/null
+++ b/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/README.md
@@ -0,0 +1,165 @@
+# devanagari_ocr_db_crnn_mobile
+
+|模型名称|devanagari_ocr_db_crnn_mobile|
+| :--- | :---: |
+|类别|图像-文字识别|
+|网络|Differentiable Binarization+CRNN|
+|数据集|icdar2015数据集|
+|是否支持Fine-tuning|否|
+|最新更新日期|2021-12-2|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - devanagari_ocr_db_crnn_mobile Module用于识别图片当中的梵文,包括印地文、马拉地文、尼泊尔文、比尔哈文、迈蒂利文、昂加文、孟加拉文、摩揭陀文、那格浦尔文、尼瓦尔文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的梵文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别梵文的轻量级OCR模型,支持直接预测。
+
+ - 更多详情参考:
+ - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
+ - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.2
+
+ - paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install devanagari_ocr_db_crnn_mobile
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run devanagari_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
+ $ hub run devanagari_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
+ ```
+ - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ ocr = hub.Module(name="devanagari_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
+ result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
+
+ # or
+ # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+ - ```python
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9)
+ ```
+
+ - 构造DevanagariOCRDBCRNNMobile对象
+
+ - **参数**
+ - det(bool): 是否开启文字检测。默认为True。
+ - rec(bool): 是否开启文字识别。默认为True。
+ - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
+ - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - box\_thresh (float): 检测文本框置信度的阈值;
+ - angle_classification_thresh(float): 文本方向分类置信度的阈值
+
+
+ - ```python
+ def recognize_text(images=[],
+ paths=[],
+ output_dir='ocr_result',
+ visualization=False)
+ ```
+
+ - 预测API,检测输入图片中的所有文本的位置和识别文本结果。
+
+ - **参数**
+
+ - paths (list\[str\]): 图片的路径;
+ - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
+ - output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
+ - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
+
+ - **返回**
+
+ - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
+ - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
+ - text(str): 识别得到的文本
+ - confidence(float): 识别文本结果置信度
+ - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
+ - orientation(str): 分类的方向,仅在只有方向分类开启时输出
+ - score(float): 分类的得分,仅在只有方向分类开启时输出
+ - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
+
+
+## 四、服务部署
+
+- PaddleHub Serving 可以部署一个目标检测的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m devanagari_ocr_db_crnn_mobile
+ ```
+
+ - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/devanagari_ocr_db_crnn_mobile"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+ - ```shell
+ $ hub install devanagari_ocr_db_crnn_mobile==1.0.0
+ ```
diff --git a/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..a165f934188d9d0df9fd9f18378e141330ff4b38
--- /dev/null
+++ b/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/module.py
@@ -0,0 +1,87 @@
+import paddlehub as hub
+from paddleocr.ppocr.utils.logging import get_logger
+from paddleocr.tools.infer.utility import base64_to_cv2
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+
+@moduleinfo(
+ name="devanagari_ocr_db_crnn_mobile",
+ version="1.0.0",
+ summary="ocr service",
+ author="PaddlePaddle",
+ type="cv/text_recognition")
+class DevanagariOCRDBCRNNMobile:
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9):
+ """
+ initialize with the necessary elements
+ Args:
+ det(bool): Whether to use text detector.
+ rec(bool): Whether to use text recognizer.
+ use_angle_cls(bool): Whether to use text orientation classifier.
+ enable_mkldnn(bool): Whether to enable mkldnn.
+ use_gpu (bool): Whether to use gpu.
+ box_thresh(float): the threshold of the detected text box's confidence
+ angle_classification_thresh(float): the threshold of the angle classification confidence
+ """
+ self.logger = get_logger()
+ self.model = hub.Module(
+ name="multi_languages_ocr_db_crnn",
+ lang="devanagari",
+ det=det,
+ rec=rec,
+ use_angle_cls=use_angle_cls,
+ enable_mkldnn=enable_mkldnn,
+ use_gpu=use_gpu,
+ box_thresh=box_thresh,
+ angle_classification_thresh=angle_classification_thresh)
+ self.model.name = self.name
+
+ def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
+ """
+ Get the text in the predicted images.
+ Args:
+ images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
+ paths (list[str]): The paths of images. If paths not images
+ output_dir (str): The directory to store output images.
+ visualization (bool): Whether to save image or not.
+ Returns:
+ res (list): The result of text detection box and save path of images.
+ """
+ all_results = self.model.recognize_text(
+ images=images, paths=paths, output_dir=output_dir, visualization=visualization)
+ return all_results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.recognize_text(images_decode, **kwargs)
+ return results
+
+ @runnable
+ def run_cmd(self, argvs):
+ """
+ Run as a command
+ """
+ results = self.model.run_cmd(argvs)
+ return results
+
+ def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
+ '''
+ Export the model to ONNX format.
+
+ Args:
+ dirname(str): The directory to save the onnx model.
+ input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
+ opset_version(int): operator set
+ '''
+ self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
diff --git a/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3
--- /dev/null
+++ b/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/requirements.txt
@@ -0,0 +1,4 @@
+paddleocr>=2.3.0.2
+paddle2onnx>=0.9.0
+shapely
+pyclipper
diff --git a/modules/image/text_recognition/french_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/french_ocr_db_crnn_mobile/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..2b7feb555d1547e869a55ba9ed4bda38a7244398
--- /dev/null
+++ b/modules/image/text_recognition/french_ocr_db_crnn_mobile/README.md
@@ -0,0 +1,169 @@
+# french_ocr_db_crnn_mobile
+
+|模型名称|french_ocr_db_crnn_mobile|
+| :--- | :---: |
+|类别|图像-文字识别|
+|网络|Differentiable Binarization+CRNN|
+|数据集|icdar2015数据集|
+|是否支持Fine-tuning|否|
+|最新更新日期|2021-12-2|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - french_ocr_db_crnn_mobile Module用于识别图片当中的法文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的法文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别法文的轻量级OCR模型,支持直接预测。
+
+ - 更多详情参考:
+ - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
+ - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.2
+
+ - paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install french_ocr_db_crnn_mobile
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run french_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
+ $ hub run french_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
+ ```
+ - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ ocr = hub.Module(name="french_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
+ result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
+
+ # or
+ # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+ - ```python
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9)
+ ```
+
+ - 构造FrechOCRDBCRNNMobile对象
+
+ - **参数**
+ - det(bool): 是否开启文字检测。默认为True。
+ - rec(bool): 是否开启文字识别。默认为True。
+ - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
+ - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - box\_thresh (float): 检测文本框置信度的阈值;
+ - angle_classification_thresh(float): 文本方向分类置信度的阈值
+
+
+ - ```python
+ def recognize_text(images=[],
+ paths=[],
+ output_dir='ocr_result',
+ visualization=False)
+ ```
+
+ - 预测API,检测输入图片中的所有文本的位置和识别文本结果。
+
+ - **参数**
+
+ - paths (list\[str\]): 图片的路径;
+ - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
+ - output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
+ - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
+
+ - **返回**
+
+ - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
+ - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
+ - text(str): 识别得到的文本
+ - confidence(float): 识别文本结果置信度
+ - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
+ - orientation(str): 分类的方向,仅在只有方向分类开启时输出
+ - score(float): 分类的得分,仅在只有方向分类开启时输出
+ - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
+
+
+## 四、服务部署
+
+- PaddleHub Serving 可以部署一个目标检测的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m french_ocr_db_crnn_mobile
+ ```
+
+ - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/french_ocr_db_crnn_mobile"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+* 1.1.0
+
+ 优化模型
+ - ```shell
+ $ hub install french_ocr_db_crnn_mobile==1.1.0
+ ```
diff --git a/modules/image/text_recognition/french_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/french_ocr_db_crnn_mobile/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/image/text_recognition/french_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/french_ocr_db_crnn_mobile/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..674c2db8b5d4bfae94b800e202f72692bda33f97
--- /dev/null
+++ b/modules/image/text_recognition/french_ocr_db_crnn_mobile/module.py
@@ -0,0 +1,87 @@
+import paddlehub as hub
+from paddleocr.ppocr.utils.logging import get_logger
+from paddleocr.tools.infer.utility import base64_to_cv2
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+
+@moduleinfo(
+ name="french_ocr_db_crnn_mobile",
+ version="1.1.0",
+ summary="ocr service",
+ author="PaddlePaddle",
+ type="cv/text_recognition")
+class FrechOCRDBCRNNMobile:
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9):
+ """
+ initialize with the necessary elements
+ Args:
+ det(bool): Whether to use text detector.
+ rec(bool): Whether to use text recognizer.
+ use_angle_cls(bool): Whether to use text orientation classifier.
+ enable_mkldnn(bool): Whether to enable mkldnn.
+ use_gpu (bool): Whether to use gpu.
+ box_thresh(float): the threshold of the detected text box's confidence
+ angle_classification_thresh(float): the threshold of the angle classification confidence
+ """
+ self.logger = get_logger()
+ self.model = hub.Module(
+ name="multi_languages_ocr_db_crnn",
+ lang="fr",
+ det=det,
+ rec=rec,
+ use_angle_cls=use_angle_cls,
+ enable_mkldnn=enable_mkldnn,
+ use_gpu=use_gpu,
+ box_thresh=box_thresh,
+ angle_classification_thresh=angle_classification_thresh)
+ self.model.name = self.name
+
+ def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
+ """
+ Get the text in the predicted images.
+ Args:
+ images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
+ paths (list[str]): The paths of images. If paths not images
+ output_dir (str): The directory to store output images.
+ visualization (bool): Whether to save image or not.
+ Returns:
+ res (list): The result of text detection box and save path of images.
+ """
+ all_results = self.model.recognize_text(
+ images=images, paths=paths, output_dir=output_dir, visualization=visualization)
+ return all_results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.recognize_text(images_decode, **kwargs)
+ return results
+
+ @runnable
+ def run_cmd(self, argvs):
+ """
+ Run as a command
+ """
+ results = self.model.run_cmd(argvs)
+ return results
+
+ def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
+ '''
+ Export the model to ONNX format.
+
+ Args:
+ dirname(str): The directory to save the onnx model.
+ input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
+ opset_version(int): operator set
+ '''
+ self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
diff --git a/modules/image/text_recognition/french_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/french_ocr_db_crnn_mobile/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3
--- /dev/null
+++ b/modules/image/text_recognition/french_ocr_db_crnn_mobile/requirements.txt
@@ -0,0 +1,4 @@
+paddleocr>=2.3.0.2
+paddle2onnx>=0.9.0
+shapely
+pyclipper
diff --git a/modules/image/text_recognition/german_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/german_ocr_db_crnn_mobile/README.md
index d5cfe848f7c27281e82789787ffc2688f643af52..813355649c664a4f1ebf4dc62d9f899e3177aa45 100644
--- a/modules/image/text_recognition/german_ocr_db_crnn_mobile/README.md
+++ b/modules/image/text_recognition/german_ocr_db_crnn_mobile/README.md
@@ -27,18 +27,9 @@
- ### 1、环境依赖
- - paddlepaddle >= 1.8.0
+ - paddlepaddle >= 2.0.2
- - paddlehub >= 1.8.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
-
- - shapely
-
- - pyclipper
-
- - ```shell
- $ pip install shapely pyclipper
- ```
- - **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
+ - paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
@@ -58,7 +49,7 @@
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
-- ### 2、代码示例
+- ### 2、预测代码示例
- ```python
import paddlehub as hub
@@ -159,13 +150,15 @@
print(r.json()["results"])
```
-
## 五、更新历史
* 1.0.0
初始发布
+* 1.1.0
+
+ 优化模型
- ```shell
- $ hub install german_ocr_db_crnn_mobile==1.0.0
+ $ hub install german_ocr_db_crnn_mobile==1.1.0
```
diff --git a/modules/image/text_recognition/german_ocr_db_crnn_mobile/assets/german_dict.txt b/modules/image/text_recognition/german_ocr_db_crnn_mobile/assets/german_dict.txt
deleted file mode 100644
index 30c4d4218e8a77386db912e24117b1f197466e83..0000000000000000000000000000000000000000
--- a/modules/image/text_recognition/german_ocr_db_crnn_mobile/assets/german_dict.txt
+++ /dev/null
@@ -1,131 +0,0 @@
-!
-"
-$
-%
-&
-'
-(
-)
-+
-,
--
-.
-/
-0
-1
-2
-3
-4
-5
-6
-7
-8
-9
-:
-;
->
-?
-A
-B
-C
-D
-E
-F
-G
-H
-I
-J
-K
-L
-M
-N
-O
-P
-Q
-R
-S
-T
-U
-V
-W
-X
-Y
-Z
-[
-]
-a
-b
-c
-d
-e
-f
-g
-h
-i
-j
-k
-l
-m
-n
-o
-p
-q
-r
-s
-t
-u
-v
-w
-x
-y
-z
-£
-§
-
-²
-´
-µ
-·
-º
-¼
-½
-¿
-À
-Á
-Ä
-Å
-Ç
-É
-Í
-Ï
-Ô
-Ö
-Ø
-Ù
-Ü
-ß
-à
-á
-â
-ã
-ä
-å
-æ
-ç
-è
-é
-ê
-ë
-í
-ï
-ñ
-ò
-ó
-ô
-ö
-ø
-ù
-ú
-û
-ü
-
diff --git a/modules/image/text_recognition/german_ocr_db_crnn_mobile/character.py b/modules/image/text_recognition/german_ocr_db_crnn_mobile/character.py
deleted file mode 100644
index 21dbbd9dc790e3d009f45c1ef1b68c001e9f0e0b..0000000000000000000000000000000000000000
--- a/modules/image/text_recognition/german_ocr_db_crnn_mobile/character.py
+++ /dev/null
@@ -1,213 +0,0 @@
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import numpy as np
-import string
-
-class CharacterOps(object):
- """ Convert between text-label and text-index """
-
- def __init__(self, config):
- self.character_type = config['character_type']
- self.loss_type = config['loss_type']
- self.max_text_len = config['max_text_length']
- if self.character_type == "en":
- self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
- dict_character = list(self.character_str)
- elif self.character_type in [
- "ch", 'japan', 'korean', 'french', 'german'
- ]:
- character_dict_path = config['character_dict_path']
- add_space = False
- if 'use_space_char' in config:
- add_space = config['use_space_char']
- self.character_str = ""
- with open(character_dict_path, "rb") as fin:
- lines = fin.readlines()
- for line in lines:
- line = line.decode('utf-8').strip("\n").strip("\r\n")
- self.character_str += line
- if add_space:
- self.character_str += " "
- dict_character = list(self.character_str)
- elif self.character_type == "en_sensitive":
- # same with ASTER setting (use 94 char).
- self.character_str = string.printable[:-6]
- dict_character = list(self.character_str)
- else:
- self.character_str = None
- assert self.character_str is not None, \
- "Nonsupport type of the character: {}".format(self.character_str)
- self.beg_str = "sos"
- self.end_str = "eos"
- if self.loss_type == "attention":
- dict_character = [self.beg_str, self.end_str] + dict_character
- elif self.loss_type == "srn":
- dict_character = dict_character + [self.beg_str, self.end_str]
- self.dict = {}
- for i, char in enumerate(dict_character):
- self.dict[char] = i
- self.character = dict_character
-
- def encode(self, text):
- """convert text-label into text-index.
- input:
- text: text labels of each image. [batch_size]
-
- output:
- text: concatenated text index for CTCLoss.
- [sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)]
- length: length of each text. [batch_size]
- """
- if self.character_type == "en":
- text = text.lower()
-
- text_list = []
- for char in text:
- if char not in self.dict:
- continue
- text_list.append(self.dict[char])
- text = np.array(text_list)
- return text
-
- def decode(self, text_index, is_remove_duplicate=False):
- """ convert text-index into text-label. """
- char_list = []
- char_num = self.get_char_num()
-
- if self.loss_type == "attention":
- beg_idx = self.get_beg_end_flag_idx("beg")
- end_idx = self.get_beg_end_flag_idx("end")
- ignored_tokens = [beg_idx, end_idx]
- else:
- ignored_tokens = [char_num]
-
- for idx in range(len(text_index)):
- if text_index[idx] in ignored_tokens:
- continue
- if is_remove_duplicate:
- if idx > 0 and text_index[idx - 1] == text_index[idx]:
- continue
- char_list.append(self.character[int(text_index[idx])])
- text = ''.join(char_list)
- return text
-
- def get_char_num(self):
- return len(self.character)
-
- def get_beg_end_flag_idx(self, beg_or_end):
- if self.loss_type == "attention":
- if beg_or_end == "beg":
- idx = np.array(self.dict[self.beg_str])
- elif beg_or_end == "end":
- idx = np.array(self.dict[self.end_str])
- else:
- assert False, "Unsupport type %s in get_beg_end_flag_idx"\
- % beg_or_end
- return idx
- else:
- err = "error in get_beg_end_flag_idx when using the loss %s"\
- % (self.loss_type)
- assert False, err
-
-
-def cal_predicts_accuracy(char_ops,
- preds,
- preds_lod,
- labels,
- labels_lod,
- is_remove_duplicate=False):
- acc_num = 0
- img_num = 0
- for ino in range(len(labels_lod) - 1):
- beg_no = preds_lod[ino]
- end_no = preds_lod[ino + 1]
- preds_text = preds[beg_no:end_no].reshape(-1)
- preds_text = char_ops.decode(preds_text, is_remove_duplicate)
-
- beg_no = labels_lod[ino]
- end_no = labels_lod[ino + 1]
- labels_text = labels[beg_no:end_no].reshape(-1)
- labels_text = char_ops.decode(labels_text, is_remove_duplicate)
- img_num += 1
-
- if preds_text == labels_text:
- acc_num += 1
- acc = acc_num * 1.0 / img_num
- return acc, acc_num, img_num
-
-
-def cal_predicts_accuracy_srn(char_ops,
- preds,
- labels,
- max_text_len,
- is_debug=False):
- acc_num = 0
- img_num = 0
-
- char_num = char_ops.get_char_num()
-
- total_len = preds.shape[0]
- img_num = int(total_len / max_text_len)
- for i in range(img_num):
- cur_label = []
- cur_pred = []
- for j in range(max_text_len):
- if labels[j + i * max_text_len] != int(char_num - 1): #0
- cur_label.append(labels[j + i * max_text_len][0])
- else:
- break
-
- for j in range(max_text_len + 1):
- if j < len(cur_label) and preds[j + i * max_text_len][
- 0] != cur_label[j]:
- break
- elif j == len(cur_label) and j == max_text_len:
- acc_num += 1
- break
- elif j == len(cur_label) and preds[j + i * max_text_len][0] == int(
- char_num - 1):
- acc_num += 1
- break
- acc = acc_num * 1.0 / img_num
- return acc, acc_num, img_num
-
-
-def convert_rec_attention_infer_res(preds):
- img_num = preds.shape[0]
- target_lod = [0]
- convert_ids = []
- for ino in range(img_num):
- end_pos = np.where(preds[ino, :] == 1)[0]
- if len(end_pos) <= 1:
- text_list = preds[ino, 1:]
- else:
- text_list = preds[ino, 1:end_pos[1]]
- target_lod.append(target_lod[ino] + len(text_list))
- convert_ids = convert_ids + list(text_list)
- convert_ids = np.array(convert_ids)
- convert_ids = convert_ids.reshape((-1, 1))
- return convert_ids, target_lod
-
-
-def convert_rec_label_to_lod(ori_labels):
- img_num = len(ori_labels)
- target_lod = [0]
- convert_ids = []
- for ino in range(img_num):
- target_lod.append(target_lod[ino] + len(ori_labels[ino]))
- convert_ids = convert_ids + list(ori_labels[ino])
- convert_ids = np.array(convert_ids)
- convert_ids = convert_ids.reshape((-1, 1))
- return convert_ids, target_lod
diff --git a/modules/image/text_recognition/german_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/german_ocr_db_crnn_mobile/module.py
index 6b59d274faa7a583851369a38fb73756dfcbcebe..569cc14817d85313037a60463f0115fb0a65deaf 100644
--- a/modules/image/text_recognition/german_ocr_db_crnn_mobile/module.py
+++ b/modules/image/text_recognition/german_ocr_db_crnn_mobile/module.py
@@ -1,304 +1,61 @@
-# -*- coding:utf-8 -*-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import argparse
-import ast
-import copy
-import math
-import os
-import time
-
-from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor
-from paddlehub.common.logger import logger
-from paddlehub.module.module import moduleinfo, runnable, serving
-from PIL import Image
-import cv2
-import numpy as np
-import paddle.fluid as fluid
import paddlehub as hub
-
-from german_ocr_db_crnn_mobile.character import CharacterOps
-from german_ocr_db_crnn_mobile.utils import base64_to_cv2, draw_ocr, get_image_ext, sorted_boxes
+from paddleocr.ppocr.utils.logging import get_logger
+from paddleocr.tools.infer.utility import base64_to_cv2
+from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="german_ocr_db_crnn_mobile",
- version="1.0.0",
- summary=
- "The module can recognize the german texts in an image. Firstly, it will detect the text box positions based on the differentiable_binarization module. Then it recognizes the german texts. ",
- author="paddle-dev",
- author_email="paddle-dev@baidu.com",
+ version="1.1.0",
+ summary="ocr service",
+ author="PaddlePaddle",
type="cv/text_recognition")
-class GermanOCRDBCRNNMobile(hub.Module):
- def _initialize(self, text_detector_module=None, enable_mkldnn=False, use_angle_classification=False):
+class GermanOCRDBCRNNMobile:
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9):
"""
initialize with the necessary elements
+ Args:
+ det(bool): Whether to use text detector.
+ rec(bool): Whether to use text recognizer.
+ use_angle_cls(bool): Whether to use text orientation classifier.
+ enable_mkldnn(bool): Whether to enable mkldnn.
+ use_gpu (bool): Whether to use gpu.
+ box_thresh(float): the threshold of the detected text box's confidence
+ angle_classification_thresh(float): the threshold of the angle classification confidence
"""
- self.character_dict_path = os.path.join(self.directory, 'assets',
- 'german_dict.txt')
- char_ops_params = {
- 'character_type': 'german',
- 'character_dict_path': self.character_dict_path,
- 'loss_type': 'ctc',
- 'max_text_length': 25,
- 'use_space_char': True
- }
- self.char_ops = CharacterOps(char_ops_params)
- self.rec_image_shape = [3, 32, 320]
- self._text_detector_module = text_detector_module
- self.font_file = os.path.join(self.directory, 'assets', 'german.ttf')
- self.enable_mkldnn = enable_mkldnn
- self.use_angle_classification = use_angle_classification
-
- self.rec_pretrained_model_path = os.path.join(
- self.directory, 'inference_model', 'character_rec')
- self.rec_predictor, self.rec_input_tensor, self.rec_output_tensors = self._set_config(
- self.rec_pretrained_model_path)
-
- if self.use_angle_classification:
- self.cls_pretrained_model_path = os.path.join(
- self.directory, 'inference_model', 'angle_cls')
-
- self.cls_predictor, self.cls_input_tensor, self.cls_output_tensors = self._set_config(
- self.cls_pretrained_model_path)
-
- def _set_config(self, pretrained_model_path):
- """
- predictor config path
- """
- model_file_path = os.path.join(pretrained_model_path, 'model')
- params_file_path = os.path.join(pretrained_model_path, 'params')
-
- config = AnalysisConfig(model_file_path, params_file_path)
- try:
- _places = os.environ["CUDA_VISIBLE_DEVICES"]
- int(_places[0])
- use_gpu = True
- except:
- use_gpu = False
-
- if use_gpu:
- config.enable_use_gpu(8000, 0)
- else:
- config.disable_gpu()
- if self.enable_mkldnn:
- # cache 10 different shapes for mkldnn to avoid memory leak
- config.set_mkldnn_cache_capacity(10)
- config.enable_mkldnn()
-
- config.disable_glog_info()
- config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass")
- config.switch_use_feed_fetch_ops(False)
-
- predictor = create_paddle_predictor(config)
-
- input_names = predictor.get_input_names()
- input_tensor = predictor.get_input_tensor(input_names[0])
- output_names = predictor.get_output_names()
- output_tensors = []
- for output_name in output_names:
- output_tensor = predictor.get_output_tensor(output_name)
- output_tensors.append(output_tensor)
-
- return predictor, input_tensor, output_tensors
-
- @property
- def text_detector_module(self):
- """
- text detect module
- """
- if not self._text_detector_module:
- self._text_detector_module = hub.Module(
- name='chinese_text_detection_db_mobile',
- enable_mkldnn=self.enable_mkldnn,
- version='1.0.4')
- return self._text_detector_module
-
- def read_images(self, paths=[]):
- images = []
- for img_path in paths:
- assert os.path.isfile(
- img_path), "The {} isn't a valid file.".format(img_path)
- img = cv2.imread(img_path)
- if img is None:
- logger.info("error in loading image:{}".format(img_path))
- continue
- images.append(img)
- return images
-
- def get_rotate_crop_image(self, img, points):
- '''
- img_height, img_width = img.shape[0:2]
- left = int(np.min(points[:, 0]))
- right = int(np.max(points[:, 0]))
- top = int(np.min(points[:, 1]))
- bottom = int(np.max(points[:, 1]))
- img_crop = img[top:bottom, left:right, :].copy()
- points[:, 0] = points[:, 0] - left
- points[:, 1] = points[:, 1] - top
- '''
- img_crop_width = int(
- max(
- np.linalg.norm(points[0] - points[1]),
- np.linalg.norm(points[2] - points[3])))
- img_crop_height = int(
- max(
- np.linalg.norm(points[0] - points[3]),
- np.linalg.norm(points[1] - points[2])))
- pts_std = np.float32([[0, 0], [img_crop_width, 0],
- [img_crop_width, img_crop_height],
- [0, img_crop_height]])
- M = cv2.getPerspectiveTransform(points, pts_std)
- dst_img = cv2.warpPerspective(
- img,
- M, (img_crop_width, img_crop_height),
- borderMode=cv2.BORDER_REPLICATE,
- flags=cv2.INTER_CUBIC)
- dst_img_height, dst_img_width = dst_img.shape[0:2]
- if dst_img_height * 1.0 / dst_img_width >= 1.5:
- dst_img = np.rot90(dst_img)
- return dst_img
-
- def resize_norm_img_rec(self, img, max_wh_ratio):
- imgC, imgH, imgW = self.rec_image_shape
- assert imgC == img.shape[2]
- h, w = img.shape[:2]
- ratio = w / float(h)
- if math.ceil(imgH * ratio) > imgW:
- resized_w = imgW
- else:
- resized_w = int(math.ceil(imgH * ratio))
- resized_image = cv2.resize(img, (resized_w, imgH))
- resized_image = resized_image.astype('float32')
- resized_image = resized_image.transpose((2, 0, 1)) / 255
- resized_image -= 0.5
- resized_image /= 0.5
- padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
- padding_im[:, :, 0:resized_w] = resized_image
- return padding_im
-
- def resize_norm_img_cls(self, img):
- cls_image_shape = [3, 48, 192]
- imgC, imgH, imgW = cls_image_shape
- h = img.shape[0]
- w = img.shape[1]
- ratio = w / float(h)
- if math.ceil(imgH * ratio) > imgW:
- resized_w = imgW
- else:
- resized_w = int(math.ceil(imgH * ratio))
- resized_image = cv2.resize(img, (resized_w, imgH))
- resized_image = resized_image.astype('float32')
- if cls_image_shape[0] == 1:
- resized_image = resized_image / 255
- resized_image = resized_image[np.newaxis, :]
- else:
- resized_image = resized_image.transpose((2, 0, 1)) / 255
- resized_image -= 0.5
- resized_image /= 0.5
- padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
- padding_im[:, :, 0:resized_w] = resized_image
- return padding_im
-
- def recognize_text(self,
- images=[],
- paths=[],
- use_gpu=False,
- output_dir='ocr_result',
- visualization=False,
- box_thresh=0.5,
- text_thresh=0.5,
- angle_classification_thresh=0.9):
- """
- Get the chinese texts in the predicted images.
+ self.logger = get_logger()
+ self.model = hub.Module(
+ name="multi_languages_ocr_db_crnn",
+ lang="german",
+ det=det,
+ rec=rec,
+ use_angle_cls=use_angle_cls,
+ enable_mkldnn=enable_mkldnn,
+ use_gpu=use_gpu,
+ box_thresh=box_thresh,
+ angle_classification_thresh=angle_classification_thresh)
+ self.model.name = self.name
+
+ def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
+ """
+ Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
- use_gpu (bool): Whether to use gpu.
- batch_size(int): the program deals once with one
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
- box_thresh(float): the threshold of the detected text box's confidence
- text_thresh(float): the threshold of the chinese text recognition confidence
- angle_classification_thresh(float): the threshold of the angle classification confidence
-
Returns:
- res (list): The result of chinese texts and save path of images.
+ res (list): The result of text detection box and save path of images.
"""
- if use_gpu:
- try:
- _places = os.environ["CUDA_VISIBLE_DEVICES"]
- int(_places[0])
- except:
- raise RuntimeError(
- "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
- )
-
- self.use_gpu = use_gpu
-
- if images != [] and isinstance(images, list) and paths == []:
- predicted_data = images
- elif images == [] and isinstance(paths, list) and paths != []:
- predicted_data = self.read_images(paths)
- else:
- raise TypeError("The input data is inconsistent with expectations.")
-
- assert predicted_data != [], "There is not any image to be predicted. Please check the input data."
-
- detection_results = self.text_detector_module.detect_text(
- images=predicted_data, use_gpu=self.use_gpu, box_thresh=box_thresh)
- print('*'*10)
- print(detection_results)
-
- boxes = [
- np.array(item['data']).astype(np.float32)
- for item in detection_results
- ]
- all_results = []
- for index, img_boxes in enumerate(boxes):
- original_image = predicted_data[index].copy()
- result = {'save_path': ''}
- if img_boxes.size == 0:
- result['data'] = []
- else:
- img_crop_list = []
- boxes = sorted_boxes(img_boxes)
- for num_box in range(len(boxes)):
- tmp_box = copy.deepcopy(boxes[num_box])
- img_crop = self.get_rotate_crop_image(
- original_image, tmp_box)
- img_crop_list.append(img_crop)
-
- if self.use_angle_classification:
- img_crop_list, angle_list = self._classify_text(
- img_crop_list,
- angle_classification_thresh=angle_classification_thresh)
-
- rec_results = self._recognize_text(img_crop_list)
-
- # if the recognized text confidence score is lower than text_thresh, then drop it
- rec_res_final = []
- for index, res in enumerate(rec_results):
- text, score = res
- if score >= text_thresh:
- rec_res_final.append({
- 'text':
- text,
- 'confidence':
- float(score),
- 'text_box_position':
- boxes[index].astype(np.int).tolist()
- })
- result['data'] = rec_res_final
-
- if visualization and result['data']:
- result['save_path'] = self.save_result_image(
- original_image, boxes, rec_results, output_dir,
- text_thresh)
- all_results.append(result)
-
+ all_results = self.model.recognize_text(
+ images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
@@ -310,282 +67,21 @@ class GermanOCRDBCRNNMobile(hub.Module):
results = self.recognize_text(images_decode, **kwargs)
return results
- def save_result_image(
- self,
- original_image,
- detection_boxes,
- rec_results,
- output_dir='ocr_result',
- text_thresh=0.5,
- ):
- image = Image.fromarray(cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB))
- txts = [item[0] for item in rec_results]
- scores = [item[1] for item in rec_results]
- draw_img = draw_ocr(
- image,
- detection_boxes,
- txts,
- scores,
- font_file=self.font_file,
- draw_txt=True,
- drop_score=text_thresh)
-
- if not os.path.exists(output_dir):
- os.makedirs(output_dir)
- ext = get_image_ext(original_image)
- saved_name = 'ndarray_{}{}'.format(time.time(), ext)
- save_file_path = os.path.join(output_dir, saved_name)
- cv2.imwrite(save_file_path, draw_img[:, :, ::-1])
- return save_file_path
-
- def _classify_text(self, image_list, angle_classification_thresh=0.9):
- img_list = copy.deepcopy(image_list)
- img_num = len(img_list)
- # Calculate the aspect ratio of all text bars
- width_list = []
- for img in img_list:
- width_list.append(img.shape[1] / float(img.shape[0]))
- # Sorting can speed up the cls process
- indices = np.argsort(np.array(width_list))
-
- cls_res = [['', 0.0]] * img_num
- batch_num = 30
- for beg_img_no in range(0, img_num, batch_num):
- end_img_no = min(img_num, beg_img_no + batch_num)
- norm_img_batch = []
- max_wh_ratio = 0
- for ino in range(beg_img_no, end_img_no):
- h, w = img_list[indices[ino]].shape[0:2]
- wh_ratio = w * 1.0 / h
- max_wh_ratio = max(max_wh_ratio, wh_ratio)
- for ino in range(beg_img_no, end_img_no):
- norm_img = self.resize_norm_img_cls(img_list[indices[ino]])
- norm_img = norm_img[np.newaxis, :]
- norm_img_batch.append(norm_img)
- norm_img_batch = np.concatenate(norm_img_batch)
- norm_img_batch = norm_img_batch.copy()
-
- self.cls_input_tensor.copy_from_cpu(norm_img_batch)
- self.cls_predictor.zero_copy_run()
-
- prob_out = self.cls_output_tensors[0].copy_to_cpu()
- label_out = self.cls_output_tensors[1].copy_to_cpu()
- if len(label_out.shape) != 1:
- prob_out, label_out = label_out, prob_out
- label_list = ['0', '180']
- for rno in range(len(label_out)):
- label_idx = label_out[rno]
- score = prob_out[rno][label_idx]
- label = label_list[label_idx]
- cls_res[indices[beg_img_no + rno]] = [label, score]
- if '180' in label and score > angle_classification_thresh:
- img_list[indices[beg_img_no + rno]] = cv2.rotate(
- img_list[indices[beg_img_no + rno]], 1)
- return img_list, cls_res
-
- def _recognize_text(self, img_list):
- img_num = len(img_list)
- # Calculate the aspect ratio of all text bars
- width_list = []
- for img in img_list:
- width_list.append(img.shape[1] / float(img.shape[0]))
- # Sorting can speed up the recognition process
- indices = np.argsort(np.array(width_list))
-
- rec_res = [['', 0.0]] * img_num
- batch_num = 30
- for beg_img_no in range(0, img_num, batch_num):
- end_img_no = min(img_num, beg_img_no + batch_num)
- norm_img_batch = []
- max_wh_ratio = 0
- for ino in range(beg_img_no, end_img_no):
- h, w = img_list[indices[ino]].shape[0:2]
- wh_ratio = w * 1.0 / h
- max_wh_ratio = max(max_wh_ratio, wh_ratio)
- for ino in range(beg_img_no, end_img_no):
- norm_img = self.resize_norm_img_rec(img_list[indices[ino]],
- max_wh_ratio)
- norm_img = norm_img[np.newaxis, :]
- norm_img_batch.append(norm_img)
-
- norm_img_batch = np.concatenate(norm_img_batch, axis=0)
- norm_img_batch = norm_img_batch.copy()
-
- self.rec_input_tensor.copy_from_cpu(norm_img_batch)
- self.rec_predictor.zero_copy_run()
-
- rec_idx_batch = self.rec_output_tensors[0].copy_to_cpu()
- rec_idx_lod = self.rec_output_tensors[0].lod()[0]
- predict_batch = self.rec_output_tensors[1].copy_to_cpu()
- predict_lod = self.rec_output_tensors[1].lod()[0]
- for rno in range(len(rec_idx_lod) - 1):
- beg = rec_idx_lod[rno]
- end = rec_idx_lod[rno + 1]
- rec_idx_tmp = rec_idx_batch[beg:end, 0]
- preds_text = self.char_ops.decode(rec_idx_tmp)
- beg = predict_lod[rno]
- end = predict_lod[rno + 1]
- probs = predict_batch[beg:end, :]
- ind = np.argmax(probs, axis=1)
- blank = probs.shape[1]
- valid_ind = np.where(ind != (blank - 1))[0]
- if len(valid_ind) == 0:
- continue
- score = np.mean(probs[valid_ind, ind[valid_ind]])
- # rec_res.append([preds_text, score])
- rec_res[indices[beg_img_no + rno]] = [preds_text, score]
-
- return rec_res
-
- def save_inference_model(self,
- dirname,
- model_filename=None,
- params_filename=None,
- combined=True):
- detector_dir = os.path.join(dirname, 'text_detector')
- classifier_dir = os.path.join(dirname, 'angle_classifier')
- recognizer_dir = os.path.join(dirname, 'text_recognizer')
- self._save_detector_model(detector_dir, model_filename, params_filename,
- combined)
- if self.use_angle_classification:
- self._save_classifier_model(classifier_dir, model_filename,
- params_filename, combined)
-
- self._save_recognizer_model(recognizer_dir, model_filename,
- params_filename, combined)
- logger.info("The inference model has been saved in the path {}".format(
- os.path.realpath(dirname)))
-
- def _save_detector_model(self,
- dirname,
- model_filename=None,
- params_filename=None,
- combined=True):
- self.text_detector_module.save_inference_model(
- dirname, model_filename, params_filename, combined)
-
- def _save_recognizer_model(self,
- dirname,
- model_filename=None,
- params_filename=None,
- combined=True):
- if combined:
- model_filename = "__model__" if not model_filename else model_filename
- params_filename = "__params__" if not params_filename else params_filename
- place = fluid.CPUPlace()
- exe = fluid.Executor(place)
-
- model_file_path = os.path.join(self.rec_pretrained_model_path, 'model')
- params_file_path = os.path.join(self.rec_pretrained_model_path,
- 'params')
- program, feeded_var_names, target_vars = fluid.io.load_inference_model(
- dirname=self.rec_pretrained_model_path,
- model_filename=model_file_path,
- params_filename=params_file_path,
- executor=exe)
-
- fluid.io.save_inference_model(
- dirname=dirname,
- main_program=program,
- executor=exe,
- feeded_var_names=feeded_var_names,
- target_vars=target_vars,
- model_filename=model_filename,
- params_filename=params_filename)
-
- def _save_classifier_model(self,
- dirname,
- model_filename=None,
- params_filename=None,
- combined=True):
- if combined:
- model_filename = "__model__" if not model_filename else model_filename
- params_filename = "__params__" if not params_filename else params_filename
- place = fluid.CPUPlace()
- exe = fluid.Executor(place)
-
- model_file_path = os.path.join(self.cls_pretrained_model_path, 'model')
- params_file_path = os.path.join(self.cls_pretrained_model_path,
- 'params')
- program, feeded_var_names, target_vars = fluid.io.load_inference_model(
- dirname=self.cls_pretrained_model_path,
- model_filename=model_file_path,
- params_filename=params_file_path,
- executor=exe)
-
- fluid.io.save_inference_model(
- dirname=dirname,
- main_program=program,
- executor=exe,
- feeded_var_names=feeded_var_names,
- target_vars=target_vars,
- model_filename=model_filename,
- params_filename=params_filename)
-
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
- self.parser = argparse.ArgumentParser(
- description="Run the %s module." % self.name,
- prog='hub run %s' % self.name,
- usage='%(prog)s',
- add_help=True)
-
- self.arg_input_group = self.parser.add_argument_group(
- title="Input options", description="Input data. Required")
- self.arg_config_group = self.parser.add_argument_group(
- title="Config options",
- description=
- "Run configuration for controlling module behavior, not required.")
-
- self.add_module_config_arg()
- self.add_module_input_arg()
-
- args = self.parser.parse_args(argvs)
- results = self.recognize_text(
- paths=[args.input_path],
- use_gpu=args.use_gpu,
- output_dir=args.output_dir,
- visualization=args.visualization)
+ results = self.model.run_cmd(argvs)
return results
- def add_module_config_arg(self):
- """
- Add the command config options
- """
- self.arg_config_group.add_argument(
- '--use_gpu',
- type=ast.literal_eval,
- default=False,
- help="whether use GPU or not")
- self.arg_config_group.add_argument(
- '--output_dir',
- type=str,
- default='ocr_result',
- help="The directory to save output images.")
- self.arg_config_group.add_argument(
- '--visualization',
- type=ast.literal_eval,
- default=False,
- help="whether to save output as images.")
-
- def add_module_input_arg(self):
- """
- Add the command input options
- """
- self.arg_input_group.add_argument(
- '--input_path', type=str, default=None, help="diretory to image")
-
+ def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
+ '''
+ Export the model to ONNX format.
-if __name__ == '__main__':
- ocr = GermanOCRDBCRNNMobile(enable_mkldnn=False, use_angle_classification=True)
- image_path = [
- '/mnt/zhangxuefei/PaddleOCR/doc/imgs/ger_1.jpg',
- '/mnt/zhangxuefei/PaddleOCR/doc/imgs/12.jpg',
- '/mnt/zhangxuefei/PaddleOCR/doc/imgs/test_image.jpg'
- ]
- res = ocr.recognize_text(paths=image_path, visualization=True)
- ocr.save_inference_model('save')
- print(res)
+ Args:
+ dirname(str): The directory to save the onnx model.
+ input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
+ opset_version(int): operator set
+ '''
+ self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
diff --git a/modules/image/text_recognition/german_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/german_ocr_db_crnn_mobile/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3
--- /dev/null
+++ b/modules/image/text_recognition/german_ocr_db_crnn_mobile/requirements.txt
@@ -0,0 +1,4 @@
+paddleocr>=2.3.0.2
+paddle2onnx>=0.9.0
+shapely
+pyclipper
diff --git a/modules/image/text_recognition/german_ocr_db_crnn_mobile/utils.py b/modules/image/text_recognition/german_ocr_db_crnn_mobile/utils.py
deleted file mode 100644
index 8c41af300cc91de369a473cb7327b794b6cf5715..0000000000000000000000000000000000000000
--- a/modules/image/text_recognition/german_ocr_db_crnn_mobile/utils.py
+++ /dev/null
@@ -1,190 +0,0 @@
-# -*- coding:utf-8 -*-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-
-from PIL import Image, ImageDraw, ImageFont
-import base64
-import cv2
-import numpy as np
-
-
-def draw_ocr(image,
- boxes,
- txts,
- scores,
- font_file,
- draw_txt=True,
- drop_score=0.5):
- """
- Visualize the results of OCR detection and recognition
- args:
- image(Image|array): RGB image
- boxes(list): boxes with shape(N, 4, 2)
- txts(list): the texts
- scores(list): txxs corresponding scores
- draw_txt(bool): whether draw text or not
- drop_score(float): only scores greater than drop_threshold will be visualized
- return(array):
- the visualized img
- """
- if scores is None:
- scores = [1] * len(boxes)
- for (box, score) in zip(boxes, scores):
- if score < drop_score or math.isnan(score):
- continue
- box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64)
- image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2)
-
- if draw_txt:
- img = np.array(resize_img(image, input_size=600))
- txt_img = text_visual(
- txts,
- scores,
- font_file,
- img_h=img.shape[0],
- img_w=600,
- threshold=drop_score)
- img = np.concatenate([np.array(img), np.array(txt_img)], axis=1)
- return img
- return image
-
-
-def text_visual(texts, scores, font_file, img_h=400, img_w=600, threshold=0.):
- """
- create new blank img and draw txt on it
- args:
- texts(list): the text will be draw
- scores(list|None): corresponding score of each txt
- img_h(int): the height of blank img
- img_w(int): the width of blank img
- return(array):
- """
- if scores is not None:
- assert len(texts) == len(
- scores), "The number of txts and corresponding scores must match"
-
- def create_blank_img():
- blank_img = np.ones(shape=[img_h, img_w], dtype=np.int8) * 255
- blank_img[:, img_w - 1:] = 0
- blank_img = Image.fromarray(blank_img).convert("RGB")
- draw_txt = ImageDraw.Draw(blank_img)
- return blank_img, draw_txt
-
- blank_img, draw_txt = create_blank_img()
-
- font_size = 20
- txt_color = (0, 0, 0)
- font = ImageFont.truetype(font_file, font_size, encoding="utf-8")
-
- gap = font_size + 5
- txt_img_list = []
- count, index = 1, 0
- for idx, txt in enumerate(texts):
- index += 1
- if scores[idx] < threshold or math.isnan(scores[idx]):
- index -= 1
- continue
- first_line = True
- while str_count(txt) >= img_w // font_size - 4:
- tmp = txt
- txt = tmp[:img_w // font_size - 4]
- if first_line:
- new_txt = str(index) + ': ' + txt
- first_line = False
- else:
- new_txt = ' ' + txt
- draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
- txt = tmp[img_w // font_size - 4:]
- if count >= img_h // gap - 1:
- txt_img_list.append(np.array(blank_img))
- blank_img, draw_txt = create_blank_img()
- count = 0
- count += 1
- if first_line:
- new_txt = str(index) + ': ' + txt + ' ' + '%.3f' % (scores[idx])
- else:
- new_txt = " " + txt + " " + '%.3f' % (scores[idx])
- draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
- # whether add new blank img or not
- if count >= img_h // gap - 1 and idx + 1 < len(texts):
- txt_img_list.append(np.array(blank_img))
- blank_img, draw_txt = create_blank_img()
- count = 0
- count += 1
- txt_img_list.append(np.array(blank_img))
- if len(txt_img_list) == 1:
- blank_img = np.array(txt_img_list[0])
- else:
- blank_img = np.concatenate(txt_img_list, axis=1)
- return np.array(blank_img)
-
-
-def str_count(s):
- """
- Count the number of Chinese characters,
- a single English character and a single number
- equal to half the length of Chinese characters.
- args:
- s(string): the input of string
- return(int):
- the number of Chinese characters
- """
- import string
- count_zh = count_pu = 0
- s_len = len(s)
- en_dg_count = 0
- for c in s:
- if c in string.ascii_letters or c.isdigit() or c.isspace():
- en_dg_count += 1
- elif c.isalpha():
- count_zh += 1
- else:
- count_pu += 1
- return s_len - math.ceil(en_dg_count / 2)
-
-
-def resize_img(img, input_size=600):
- img = np.array(img)
- im_shape = img.shape
- im_size_min = np.min(im_shape[0:2])
- im_size_max = np.max(im_shape[0:2])
- im_scale = float(input_size) / float(im_size_max)
- im = cv2.resize(img, None, None, fx=im_scale, fy=im_scale)
- return im
-
-
-def get_image_ext(image):
- if image.shape[2] == 4:
- return ".png"
- return ".jpg"
-
-
-def sorted_boxes(dt_boxes):
- """
- Sort text boxes in order from top to bottom, left to right
- args:
- dt_boxes(array):detected text boxes with shape [4, 2]
- return:
- sorted boxes(array) with shape [4, 2]
- """
- num_boxes = dt_boxes.shape[0]
- sorted_boxes = sorted(dt_boxes, key=lambda x: (x[0][1], x[0][0]))
- _boxes = list(sorted_boxes)
-
- for i in range(num_boxes - 1):
- if abs(_boxes[i + 1][0][1] - _boxes[i][0][1]) < 10 and \
- (_boxes[i + 1][0][0] < _boxes[i][0][0]):
- tmp = _boxes[i]
- _boxes[i] = _boxes[i + 1]
- _boxes[i + 1] = tmp
- return _boxes
-
-
-def base64_to_cv2(b64str):
- data = base64.b64decode(b64str.encode('utf8'))
- data = np.fromstring(data, np.uint8)
- data = cv2.imdecode(data, cv2.IMREAD_COLOR)
- return data
diff --git a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/README.md
index 05f32a6621b4d81b5b14e1f1550449d22ad0f359..66a87dc54c14170c3ee8e9985c5d23e81fd03e91 100644
--- a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/README.md
+++ b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/README.md
@@ -27,18 +27,9 @@
- ### 1、环境依赖
- - paddlepaddle >= 1.8.0
+ - paddlepaddle >= 2.0.2
- - paddlehub >= 1.8.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
-
- - shapely
-
- - pyclipper
-
- - ```shell
- $ pip install shapely pyclipper
- ```
- - **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
+ - paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
@@ -58,7 +49,7 @@
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
-- ### 2、代码示例
+- ### 2、预测代码示例
- ```python
import paddlehub as hub
@@ -160,13 +151,15 @@
print(r.json()["results"])
```
-
## 五、更新历史
* 1.0.0
初始发布
+* 1.1.0
+
+ 优化模型
- ```shell
- $ hub install japan_ocr_db_crnn_mobile==1.0.0
+ $ hub install japan_ocr_db_crnn_mobile==1.1.0
```
diff --git a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/assets/japan.ttc b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/assets/japan.ttc
deleted file mode 100644
index ad68243b968fc87b207928594c585039859b75a9..0000000000000000000000000000000000000000
Binary files a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/assets/japan.ttc and /dev/null differ
diff --git a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/assets/japan_dict.txt b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/assets/japan_dict.txt
deleted file mode 100644
index 339d4b89e5159a346636641a0814874faa59754a..0000000000000000000000000000000000000000
--- a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/assets/japan_dict.txt
+++ /dev/null
@@ -1,4399 +0,0 @@
-!
-"
-#
-$
-%
-&
-'
-(
-)
-*
-+
-,
--
-.
-/
-0
-1
-2
-3
-4
-5
-6
-7
-8
-9
-:
-;
-<
-=
->
-?
-A
-B
-C
-D
-E
-F
-G
-H
-I
-J
-K
-L
-M
-N
-O
-P
-Q
-R
-S
-T
-U
-V
-W
-X
-Y
-Z
-[
-]
-_
-`
-a
-b
-c
-d
-e
-f
-g
-h
-i
-j
-k
-l
-m
-n
-o
-p
-q
-r
-s
-t
-u
-v
-w
-x
-y
-z
-©
-°
-²
-´
-½
-Á
-Ä
-Å
-Ç
-È
-É
-Í
-Ó
-Ö
-×
-Ü
-ß
-à
-á
-â
-ã
-ä
-å
-æ
-ç
-è
-é
-ê
-ë
-í
-ð
-ñ
-ò
-ó
-ô
-õ
-ö
-ø
-ú
-û
-ü
-ý
-ā
-ă
-ą
-ć
-Č
-č
-đ
-ē
-ė
-ę
-ğ
-ī
-ı
-Ł
-ł
-ń
-ň
-ō
-ř
-Ş
-ş
-Š
-š
-ţ
-ū
-ż
-Ž
-ž
-Ș
-ș
-ț
-Δ
-α
-λ
-μ
-φ
-Г
-О
-а
-в
-л
-о
-р
-с
-т
-я
-ồ
-
-—
-―
-’
-“
-”
-…
-℃
-→
-∇
-−
-■
-☆
-
-、
-。
-々
-〆
-〈
-〉
-「
-」
-『
-』
-〔
-〕
-〜
-ぁ
-あ
-ぃ
-い
-う
-ぇ
-え
-ぉ
-お
-か
-が
-き
-ぎ
-く
-ぐ
-け
-げ
-こ
-ご
-さ
-ざ
-し
-じ
-す
-ず
-せ
-ぜ
-そ
-ぞ
-た
-だ
-ち
-ぢ
-っ
-つ
-づ
-て
-で
-と
-ど
-な
-に
-ぬ
-ね
-の
-は
-ば
-ぱ
-ひ
-び
-ぴ
-ふ
-ぶ
-ぷ
-へ
-べ
-ぺ
-ほ
-ぼ
-ぽ
-ま
-み
-む
-め
-も
-ゃ
-や
-ゅ
-ゆ
-ょ
-よ
-ら
-り
-る
-れ
-ろ
-わ
-ゑ
-を
-ん
-ゝ
-ゞ
-ァ
-ア
-ィ
-イ
-ゥ
-ウ
-ェ
-エ
-ォ
-オ
-カ
-ガ
-キ
-ギ
-ク
-グ
-ケ
-ゲ
-コ
-ゴ
-サ
-ザ
-シ
-ジ
-ス
-ズ
-セ
-ゼ
-ソ
-ゾ
-タ
-ダ
-チ
-ヂ
-ッ
-ツ
-ヅ
-テ
-デ
-ト
-ド
-ナ
-ニ
-ヌ
-ネ
-ノ
-ハ
-バ
-パ
-ヒ
-ビ
-ピ
-フ
-ブ
-プ
-ヘ
-ベ
-ペ
-ホ
-ボ
-ポ
-マ
-ミ
-ム
-メ
-モ
-ャ
-ヤ
-ュ
-ユ
-ョ
-ヨ
-ラ
-リ
-ル
-レ
-ロ
-ワ
-ヰ
-ン
-ヴ
-ヵ
-ヶ
-・
-ー
-㈱
-一
-丁
-七
-万
-丈
-三
-上
-下
-不
-与
-丑
-且
-世
-丘
-丙
-丞
-両
-並
-中
-串
-丸
-丹
-主
-丼
-丿
-乃
-久
-之
-乎
-乏
-乗
-乘
-乙
-九
-乞
-也
-乱
-乳
-乾
-亀
-了
-予
-争
-事
-二
-于
-互
-五
-井
-亘
-亙
-些
-亜
-亟
-亡
-交
-亥
-亦
-亨
-享
-京
-亭
-亮
-人
-什
-仁
-仇
-今
-介
-仍
-仏
-仔
-仕
-他
-仗
-付
-仙
-代
-令
-以
-仮
-仰
-仲
-件
-任
-企
-伊
-伍
-伎
-伏
-伐
-休
-会
-伝
-伯
-估
-伴
-伶
-伸
-伺
-似
-伽
-佃
-但
-位
-低
-住
-佐
-佑
-体
-何
-余
-佚
-佛
-作
-佩
-佳
-併
-佶
-使
-侈
-例
-侍
-侏
-侑
-侘
-供
-依
-侠
-価
-侮
-侯
-侵
-侶
-便
-係
-促
-俄
-俊
-俔
-俗
-俘
-保
-信
-俣
-俤
-修
-俯
-俳
-俵
-俸
-俺
-倉
-個
-倍
-倒
-候
-借
-倣
-値
-倫
-倭
-倶
-倹
-偃
-假
-偈
-偉
-偏
-偐
-偕
-停
-健
-側
-偵
-偶
-偽
-傀
-傅
-傍
-傑
-傘
-備
-催
-傭
-傲
-傳
-債
-傷
-傾
-僊
-働
-像
-僑
-僕
-僚
-僧
-僭
-僮
-儀
-億
-儇
-儒
-儛
-償
-儡
-優
-儲
-儺
-儼
-兀
-允
-元
-兄
-充
-兆
-先
-光
-克
-兌
-免
-兎
-児
-党
-兜
-入
-全
-八
-公
-六
-共
-兵
-其
-具
-典
-兼
-内
-円
-冊
-再
-冑
-冒
-冗
-写
-冠
-冤
-冥
-冨
-冬
-冲
-决
-冶
-冷
-准
-凉
-凋
-凌
-凍
-凛
-凝
-凞
-几
-凡
-処
-凪
-凰
-凱
-凶
-凸
-凹
-出
-函
-刀
-刃
-分
-切
-刈
-刊
-刎
-刑
-列
-初
-判
-別
-利
-刪
-到
-制
-刷
-券
-刹
-刺
-刻
-剃
-則
-削
-剋
-前
-剖
-剛
-剣
-剤
-剥
-剪
-副
-剰
-割
-創
-剽
-劇
-劉
-劔
-力
-功
-加
-劣
-助
-努
-劫
-劭
-励
-労
-効
-劾
-勃
-勅
-勇
-勉
-勒
-動
-勘
-務
-勝
-募
-勢
-勤
-勧
-勲
-勺
-勾
-勿
-匁
-匂
-包
-匏
-化
-北
-匙
-匝
-匠
-匡
-匣
-匯
-匲
-匹
-区
-医
-匿
-十
-千
-升
-午
-卉
-半
-卍
-卑
-卒
-卓
-協
-南
-単
-博
-卜
-占
-卦
-卯
-印
-危
-即
-却
-卵
-卸
-卿
-厄
-厚
-原
-厠
-厨
-厩
-厭
-厳
-去
-参
-又
-叉
-及
-友
-双
-反
-収
-叔
-取
-受
-叙
-叛
-叟
-叡
-叢
-口
-古
-句
-叩
-只
-叫
-召
-可
-台
-叱
-史
-右
-叶
-号
-司
-吃
-各
-合
-吉
-吊
-同
-名
-后
-吏
-吐
-向
-君
-吝
-吟
-吠
-否
-含
-吸
-吹
-吻
-吽
-吾
-呂
-呆
-呈
-呉
-告
-呑
-周
-呪
-呰
-味
-呼
-命
-咀
-咄
-咋
-和
-咒
-咫
-咲
-咳
-咸
-哀
-品
-哇
-哉
-員
-哨
-哩
-哭
-哲
-哺
-唄
-唆
-唇
-唐
-唖
-唯
-唱
-唳
-唸
-唾
-啄
-商
-問
-啓
-啼
-善
-喋
-喚
-喜
-喝
-喧
-喩
-喪
-喫
-喬
-單
-喰
-営
-嗅
-嗇
-嗔
-嗚
-嗜
-嗣
-嘆
-嘉
-嘗
-嘘
-嘩
-嘯
-嘱
-嘲
-嘴
-噂
-噌
-噛
-器
-噴
-噺
-嚆
-嚢
-囀
-囃
-囉
-囚
-四
-回
-因
-団
-困
-囲
-図
-固
-国
-圀
-圃
-國
-圏
-園
-圓
-團
-圜
-土
-圧
-在
-圭
-地
-址
-坂
-均
-坊
-坐
-坑
-坡
-坤
-坦
-坪
-垂
-型
-垢
-垣
-埃
-埋
-城
-埒
-埔
-域
-埠
-埴
-埵
-執
-培
-基
-埼
-堀
-堂
-堅
-堆
-堕
-堤
-堪
-堯
-堰
-報
-場
-堵
-堺
-塀
-塁
-塊
-塑
-塔
-塗
-塘
-塙
-塚
-塞
-塩
-填
-塵
-塾
-境
-墉
-墓
-増
-墜
-墟
-墨
-墳
-墺
-墻
-墾
-壁
-壇
-壊
-壌
-壕
-士
-壬
-壮
-声
-壱
-売
-壷
-壹
-壺
-壽
-変
-夏
-夕
-外
-夙
-多
-夜
-夢
-夥
-大
-天
-太
-夫
-夬
-夭
-央
-失
-夷
-夾
-奄
-奇
-奈
-奉
-奎
-奏
-契
-奔
-奕
-套
-奘
-奠
-奢
-奥
-奨
-奪
-奮
-女
-奴
-奸
-好
-如
-妃
-妄
-妊
-妍
-妓
-妖
-妙
-妥
-妨
-妬
-妲
-妹
-妻
-妾
-姉
-始
-姐
-姓
-委
-姚
-姜
-姞
-姥
-姦
-姨
-姪
-姫
-姶
-姻
-姿
-威
-娑
-娘
-娟
-娠
-娩
-娯
-娼
-婆
-婉
-婚
-婢
-婦
-婬
-婿
-媄
-媒
-媓
-媚
-媛
-媞
-媽
-嫁
-嫄
-嫉
-嫌
-嫐
-嫗
-嫡
-嬉
-嬌
-嬢
-嬪
-嬬
-嬾
-孁
-子
-孔
-字
-存
-孚
-孝
-孟
-季
-孤
-学
-孫
-孵
-學
-宅
-宇
-守
-安
-宋
-完
-宍
-宏
-宕
-宗
-官
-宙
-定
-宛
-宜
-宝
-実
-客
-宣
-室
-宥
-宮
-宰
-害
-宴
-宵
-家
-宸
-容
-宿
-寂
-寄
-寅
-密
-寇
-富
-寒
-寓
-寔
-寛
-寝
-察
-寡
-實
-寧
-審
-寮
-寵
-寶
-寸
-寺
-対
-寿
-封
-専
-射
-将
-尉
-尊
-尋
-對
-導
-小
-少
-尖
-尚
-尤
-尪
-尭
-就
-尹
-尺
-尻
-尼
-尽
-尾
-尿
-局
-居
-屈
-届
-屋
-屍
-屎
-屏
-屑
-屓
-展
-属
-屠
-層
-履
-屯
-山
-岐
-岑
-岡
-岩
-岫
-岬
-岳
-岷
-岸
-峠
-峡
-峨
-峯
-峰
-島
-峻
-崇
-崋
-崎
-崑
-崖
-崗
-崛
-崩
-嵌
-嵐
-嵩
-嵯
-嶂
-嶋
-嶠
-嶺
-嶼
-嶽
-巀
-巌
-巒
-巖
-川
-州
-巡
-巣
-工
-左
-巧
-巨
-巫
-差
-己
-巳
-巴
-巷
-巻
-巽
-巾
-市
-布
-帆
-希
-帖
-帚
-帛
-帝
-帥
-師
-席
-帯
-帰
-帳
-帷
-常
-帽
-幄
-幅
-幇
-幌
-幔
-幕
-幟
-幡
-幢
-幣
-干
-平
-年
-并
-幸
-幹
-幻
-幼
-幽
-幾
-庁
-広
-庄
-庇
-床
-序
-底
-庖
-店
-庚
-府
-度
-座
-庫
-庭
-庵
-庶
-康
-庸
-廂
-廃
-廉
-廊
-廓
-廟
-廠
-廣
-廬
-延
-廷
-建
-廻
-廼
-廿
-弁
-弄
-弉
-弊
-弌
-式
-弐
-弓
-弔
-引
-弖
-弗
-弘
-弛
-弟
-弥
-弦
-弧
-弱
-張
-強
-弼
-弾
-彈
-彊
-彌
-彎
-当
-彗
-彙
-彝
-形
-彦
-彩
-彫
-彬
-彭
-彰
-影
-彷
-役
-彼
-往
-征
-徂
-径
-待
-律
-後
-徐
-徑
-徒
-従
-得
-徠
-御
-徧
-徨
-復
-循
-徭
-微
-徳
-徴
-德
-徹
-徽
-心
-必
-忉
-忌
-忍
-志
-忘
-忙
-応
-忠
-快
-忯
-念
-忻
-忽
-忿
-怒
-怖
-思
-怠
-怡
-急
-性
-怨
-怪
-怯
-恂
-恋
-恐
-恒
-恕
-恣
-恤
-恥
-恨
-恩
-恬
-恭
-息
-恵
-悉
-悌
-悍
-悔
-悟
-悠
-患
-悦
-悩
-悪
-悲
-悼
-情
-惇
-惑
-惚
-惜
-惟
-惠
-惣
-惧
-惨
-惰
-想
-惹
-惺
-愈
-愉
-愍
-意
-愔
-愚
-愛
-感
-愷
-愿
-慈
-態
-慌
-慎
-慕
-慢
-慣
-慧
-慨
-慮
-慰
-慶
-憂
-憎
-憐
-憑
-憙
-憤
-憧
-憩
-憬
-憲
-憶
-憾
-懇
-應
-懌
-懐
-懲
-懸
-懺
-懽
-懿
-戈
-戊
-戌
-戎
-成
-我
-戒
-戔
-或
-戚
-戟
-戦
-截
-戮
-戯
-戴
-戸
-戻
-房
-所
-扁
-扇
-扈
-扉
-手
-才
-打
-払
-托
-扮
-扱
-扶
-批
-承
-技
-抄
-把
-抑
-抓
-投
-抗
-折
-抜
-択
-披
-抱
-抵
-抹
-押
-抽
-担
-拇
-拈
-拉
-拍
-拏
-拐
-拒
-拓
-拘
-拙
-招
-拝
-拠
-拡
-括
-拭
-拳
-拵
-拶
-拾
-拿
-持
-挂
-指
-按
-挑
-挙
-挟
-挨
-振
-挺
-挽
-挿
-捉
-捕
-捗
-捜
-捧
-捨
-据
-捺
-捻
-掃
-掄
-授
-掌
-排
-掖
-掘
-掛
-掟
-採
-探
-掣
-接
-控
-推
-掩
-措
-掬
-掲
-掴
-掻
-掾
-揃
-揄
-揆
-揉
-描
-提
-揖
-揚
-換
-握
-揮
-援
-揶
-揺
-損
-搦
-搬
-搭
-携
-搾
-摂
-摘
-摩
-摸
-摺
-撃
-撒
-撞
-撤
-撥
-撫
-播
-撮
-撰
-撲
-撹
-擁
-操
-擔
-擦
-擬
-擾
-攘
-攝
-攣
-支
-收
-改
-攻
-放
-政
-故
-敏
-救
-敗
-教
-敢
-散
-敦
-敬
-数
-整
-敵
-敷
-斂
-文
-斉
-斎
-斐
-斑
-斗
-料
-斜
-斟
-斤
-斥
-斧
-斬
-断
-斯
-新
-方
-於
-施
-旁
-旅
-旋
-旌
-族
-旗
-旛
-无
-旡
-既
-日
-旦
-旧
-旨
-早
-旬
-旭
-旺
-旻
-昂
-昆
-昇
-昉
-昌
-明
-昏
-易
-昔
-星
-映
-春
-昧
-昨
-昪
-昭
-是
-昵
-昼
-晁
-時
-晃
-晋
-晏
-晒
-晟
-晦
-晧
-晩
-普
-景
-晴
-晶
-智
-暁
-暇
-暈
-暉
-暑
-暖
-暗
-暘
-暢
-暦
-暫
-暮
-暲
-暴
-暹
-暾
-曄
-曇
-曉
-曖
-曙
-曜
-曝
-曠
-曰
-曲
-曳
-更
-書
-曹
-曼
-曽
-曾
-替
-最
-會
-月
-有
-朋
-服
-朏
-朔
-朕
-朗
-望
-朝
-期
-朧
-木
-未
-末
-本
-札
-朱
-朴
-机
-朽
-杁
-杉
-李
-杏
-材
-村
-杓
-杖
-杜
-杞
-束
-条
-杢
-杣
-来
-杭
-杮
-杯
-東
-杲
-杵
-杷
-杼
-松
-板
-枅
-枇
-析
-枓
-枕
-林
-枚
-果
-枝
-枠
-枡
-枢
-枯
-枳
-架
-柄
-柊
-柏
-某
-柑
-染
-柔
-柘
-柚
-柯
-柱
-柳
-柴
-柵
-査
-柾
-柿
-栂
-栃
-栄
-栖
-栗
-校
-株
-栲
-栴
-核
-根
-栻
-格
-栽
-桁
-桂
-桃
-框
-案
-桐
-桑
-桓
-桔
-桜
-桝
-桟
-桧
-桴
-桶
-桾
-梁
-梅
-梆
-梓
-梔
-梗
-梛
-條
-梟
-梢
-梧
-梨
-械
-梱
-梲
-梵
-梶
-棄
-棋
-棒
-棗
-棘
-棚
-棟
-棠
-森
-棲
-棹
-棺
-椀
-椅
-椋
-植
-椎
-椏
-椒
-椙
-検
-椥
-椹
-椿
-楊
-楓
-楕
-楚
-楞
-楠
-楡
-楢
-楨
-楪
-楫
-業
-楮
-楯
-楳
-極
-楷
-楼
-楽
-概
-榊
-榎
-榕
-榛
-榜
-榮
-榱
-榴
-槃
-槇
-槊
-構
-槌
-槍
-槐
-様
-槙
-槻
-槽
-槿
-樂
-樋
-樓
-樗
-標
-樟
-模
-権
-横
-樫
-樵
-樹
-樺
-樽
-橇
-橋
-橘
-機
-橿
-檀
-檄
-檎
-檐
-檗
-檜
-檣
-檥
-檬
-檮
-檸
-檻
-櫃
-櫓
-櫛
-櫟
-櫨
-櫻
-欄
-欅
-欠
-次
-欣
-欧
-欲
-欺
-欽
-款
-歌
-歎
-歓
-止
-正
-此
-武
-歩
-歪
-歯
-歳
-歴
-死
-殆
-殉
-殊
-残
-殖
-殯
-殴
-段
-殷
-殺
-殻
-殿
-毀
-毅
-母
-毎
-毒
-比
-毘
-毛
-毫
-毬
-氈
-氏
-民
-気
-水
-氷
-永
-氾
-汀
-汁
-求
-汎
-汐
-汗
-汚
-汝
-江
-池
-汪
-汰
-汲
-決
-汽
-沂
-沃
-沅
-沆
-沈
-沌
-沐
-沓
-沖
-沙
-没
-沢
-沱
-河
-沸
-油
-治
-沼
-沽
-沿
-況
-泉
-泊
-泌
-法
-泗
-泡
-波
-泣
-泥
-注
-泯
-泰
-泳
-洋
-洒
-洗
-洛
-洞
-津
-洩
-洪
-洲
-洸
-洹
-活
-洽
-派
-流
-浄
-浅
-浙
-浚
-浜
-浣
-浦
-浩
-浪
-浮
-浴
-海
-浸
-涅
-消
-涌
-涙
-涛
-涯
-液
-涵
-涼
-淀
-淄
-淆
-淇
-淋
-淑
-淘
-淡
-淤
-淨
-淫
-深
-淳
-淵
-混
-淹
-添
-清
-済
-渉
-渋
-渓
-渕
-渚
-減
-渟
-渠
-渡
-渤
-渥
-渦
-温
-渫
-測
-港
-游
-渾
-湊
-湖
-湘
-湛
-湧
-湫
-湯
-湾
-湿
-満
-源
-準
-溜
-溝
-溢
-溥
-溪
-溶
-溺
-滄
-滅
-滋
-滌
-滑
-滕
-滝
-滞
-滴
-滸
-滹
-滿
-漁
-漂
-漆
-漉
-漏
-漑
-演
-漕
-漠
-漢
-漣
-漫
-漬
-漱
-漸
-漿
-潅
-潔
-潙
-潜
-潟
-潤
-潭
-潮
-潰
-潴
-澁
-澂
-澄
-澎
-澗
-澤
-澪
-澱
-澳
-激
-濁
-濃
-濟
-濠
-濡
-濤
-濫
-濯
-濱
-濾
-瀉
-瀋
-瀑
-瀕
-瀞
-瀟
-瀧
-瀬
-瀾
-灌
-灑
-灘
-火
-灯
-灰
-灸
-災
-炉
-炊
-炎
-炒
-炭
-炮
-炷
-点
-為
-烈
-烏
-烙
-烝
-烹
-焔
-焙
-焚
-無
-焦
-然
-焼
-煇
-煉
-煌
-煎
-煕
-煙
-煤
-煥
-照
-煩
-煬
-煮
-煽
-熈
-熊
-熙
-熟
-熨
-熱
-熹
-熾
-燃
-燈
-燎
-燔
-燕
-燗
-燥
-燭
-燻
-爆
-爐
-爪
-爬
-爲
-爵
-父
-爺
-爼
-爽
-爾
-片
-版
-牌
-牒
-牘
-牙
-牛
-牝
-牟
-牡
-牢
-牧
-物
-牲
-特
-牽
-犂
-犠
-犬
-犯
-状
-狂
-狄
-狐
-狗
-狙
-狛
-狡
-狩
-独
-狭
-狷
-狸
-狼
-猊
-猛
-猟
-猥
-猨
-猩
-猪
-猫
-献
-猴
-猶
-猷
-猾
-猿
-獄
-獅
-獏
-獣
-獲
-玄
-玅
-率
-玉
-王
-玖
-玩
-玲
-珀
-珂
-珈
-珉
-珊
-珍
-珎
-珞
-珠
-珣
-珥
-珪
-班
-現
-球
-理
-琉
-琢
-琥
-琦
-琮
-琲
-琳
-琴
-琵
-琶
-瑁
-瑋
-瑙
-瑚
-瑛
-瑜
-瑞
-瑠
-瑤
-瑩
-瑪
-瑳
-瑾
-璃
-璋
-璜
-璞
-璧
-璨
-環
-璵
-璽
-璿
-瓊
-瓔
-瓜
-瓢
-瓦
-瓶
-甍
-甑
-甕
-甘
-甚
-甞
-生
-産
-甥
-用
-甫
-田
-由
-甲
-申
-男
-町
-画
-界
-畏
-畑
-畔
-留
-畜
-畝
-畠
-畢
-略
-番
-異
-畳
-當
-畷
-畸
-畺
-畿
-疆
-疇
-疋
-疎
-疏
-疑
-疫
-疱
-疲
-疹
-疼
-疾
-病
-症
-痒
-痔
-痕
-痘
-痙
-痛
-痢
-痩
-痴
-痺
-瘍
-瘡
-瘧
-療
-癇
-癌
-癒
-癖
-癡
-癪
-発
-登
-白
-百
-的
-皆
-皇
-皋
-皐
-皓
-皮
-皺
-皿
-盂
-盃
-盆
-盈
-益
-盒
-盗
-盛
-盞
-盟
-盡
-監
-盤
-盥
-盧
-目
-盲
-直
-相
-盾
-省
-眉
-看
-県
-眞
-真
-眠
-眷
-眺
-眼
-着
-睡
-督
-睦
-睨
-睿
-瞋
-瞑
-瞞
-瞬
-瞭
-瞰
-瞳
-瞻
-瞼
-瞿
-矍
-矛
-矜
-矢
-知
-矧
-矩
-短
-矮
-矯
-石
-砂
-砌
-研
-砕
-砥
-砦
-砧
-砲
-破
-砺
-硝
-硫
-硬
-硯
-碁
-碇
-碌
-碑
-碓
-碕
-碗
-碣
-碧
-碩
-確
-碾
-磁
-磐
-磔
-磧
-磨
-磬
-磯
-礁
-礎
-礒
-礙
-礫
-礬
-示
-礼
-社
-祀
-祁
-祇
-祈
-祉
-祐
-祓
-祕
-祖
-祗
-祚
-祝
-神
-祟
-祠
-祢
-祥
-票
-祭
-祷
-祺
-禁
-禄
-禅
-禊
-禍
-禎
-福
-禔
-禖
-禛
-禦
-禧
-禮
-禰
-禹
-禽
-禿
-秀
-私
-秋
-科
-秒
-秘
-租
-秤
-秦
-秩
-称
-移
-稀
-程
-税
-稔
-稗
-稙
-稚
-稜
-稠
-種
-稱
-稲
-稷
-稻
-稼
-稽
-稿
-穀
-穂
-穆
-積
-穎
-穏
-穗
-穜
-穢
-穣
-穫
-穴
-究
-空
-突
-窃
-窄
-窒
-窓
-窟
-窠
-窩
-窪
-窮
-窯
-竃
-竄
-竈
-立
-站
-竜
-竝
-竟
-章
-童
-竪
-竭
-端
-竴
-競
-竹
-竺
-竽
-竿
-笄
-笈
-笏
-笑
-笙
-笛
-笞
-笠
-笥
-符
-第
-笹
-筅
-筆
-筇
-筈
-等
-筋
-筌
-筍
-筏
-筐
-筑
-筒
-答
-策
-筝
-筥
-筧
-筬
-筮
-筯
-筰
-筵
-箆
-箇
-箋
-箏
-箒
-箔
-箕
-算
-箙
-箜
-管
-箪
-箭
-箱
-箸
-節
-篁
-範
-篆
-篇
-築
-篋
-篌
-篝
-篠
-篤
-篥
-篦
-篩
-篭
-篳
-篷
-簀
-簒
-簡
-簧
-簪
-簫
-簺
-簾
-簿
-籀
-籃
-籌
-籍
-籐
-籟
-籠
-籤
-籬
-米
-籾
-粂
-粉
-粋
-粒
-粕
-粗
-粘
-粛
-粟
-粥
-粧
-粮
-粳
-精
-糊
-糖
-糜
-糞
-糟
-糠
-糧
-糯
-糸
-糺
-系
-糾
-紀
-約
-紅
-紋
-納
-紐
-純
-紗
-紘
-紙
-級
-紛
-素
-紡
-索
-紫
-紬
-累
-細
-紳
-紵
-紹
-紺
-絁
-終
-絃
-組
-絅
-経
-結
-絖
-絞
-絡
-絣
-給
-統
-絲
-絵
-絶
-絹
-絽
-綏
-經
-継
-続
-綜
-綟
-綬
-維
-綱
-網
-綴
-綸
-綺
-綽
-綾
-綿
-緊
-緋
-総
-緑
-緒
-線
-締
-緥
-編
-緩
-緬
-緯
-練
-緻
-縁
-縄
-縅
-縒
-縛
-縞
-縢
-縣
-縦
-縫
-縮
-縹
-總
-績
-繁
-繊
-繋
-繍
-織
-繕
-繝
-繦
-繧
-繰
-繹
-繼
-纂
-纈
-纏
-纐
-纒
-纛
-缶
-罔
-罠
-罧
-罪
-置
-罰
-署
-罵
-罷
-罹
-羂
-羅
-羆
-羇
-羈
-羊
-羌
-美
-群
-羨
-義
-羯
-羲
-羹
-羽
-翁
-翅
-翌
-習
-翔
-翛
-翠
-翡
-翫
-翰
-翺
-翻
-翼
-耀
-老
-考
-者
-耆
-而
-耐
-耕
-耗
-耨
-耳
-耶
-耽
-聊
-聖
-聘
-聚
-聞
-聟
-聡
-聨
-聯
-聰
-聲
-聴
-職
-聾
-肄
-肆
-肇
-肉
-肋
-肌
-肖
-肘
-肛
-肝
-股
-肢
-肥
-肩
-肪
-肯
-肱
-育
-肴
-肺
-胃
-胆
-背
-胎
-胖
-胚
-胝
-胞
-胡
-胤
-胱
-胴
-胸
-能
-脂
-脅
-脆
-脇
-脈
-脊
-脚
-脛
-脩
-脱
-脳
-腋
-腎
-腐
-腑
-腔
-腕
-腫
-腰
-腱
-腸
-腹
-腺
-腿
-膀
-膏
-膚
-膜
-膝
-膠
-膣
-膨
-膩
-膳
-膵
-膾
-膿
-臂
-臆
-臈
-臍
-臓
-臘
-臚
-臣
-臥
-臨
-自
-臭
-至
-致
-臺
-臼
-舂
-舅
-與
-興
-舌
-舍
-舎
-舒
-舖
-舗
-舘
-舜
-舞
-舟
-舩
-航
-般
-舳
-舶
-船
-艇
-艘
-艦
-艮
-良
-色
-艶
-芋
-芒
-芙
-芝
-芥
-芦
-芬
-芭
-芯
-花
-芳
-芸
-芹
-芻
-芽
-芿
-苅
-苑
-苔
-苗
-苛
-苞
-苡
-若
-苦
-苧
-苫
-英
-苴
-苻
-茂
-范
-茄
-茅
-茎
-茗
-茘
-茜
-茨
-茲
-茵
-茶
-茸
-茹
-草
-荊
-荏
-荒
-荘
-荷
-荻
-荼
-莞
-莪
-莫
-莬
-莱
-莵
-莽
-菅
-菊
-菌
-菓
-菖
-菘
-菜
-菟
-菩
-菫
-華
-菱
-菴
-萄
-萊
-萌
-萍
-萎
-萠
-萩
-萬
-萱
-落
-葉
-著
-葛
-葡
-董
-葦
-葩
-葬
-葭
-葱
-葵
-葺
-蒋
-蒐
-蒔
-蒙
-蒟
-蒡
-蒲
-蒸
-蒻
-蒼
-蒿
-蓄
-蓆
-蓉
-蓋
-蓑
-蓬
-蓮
-蓼
-蔀
-蔑
-蔓
-蔚
-蔡
-蔦
-蔬
-蔭
-蔵
-蔽
-蕃
-蕉
-蕊
-蕎
-蕨
-蕩
-蕪
-蕭
-蕾
-薄
-薇
-薊
-薔
-薗
-薙
-薛
-薦
-薨
-薩
-薪
-薫
-薬
-薭
-薮
-藁
-藉
-藍
-藏
-藐
-藝
-藤
-藩
-藪
-藷
-藹
-藺
-藻
-蘂
-蘆
-蘇
-蘊
-蘭
-虎
-虐
-虔
-虚
-虜
-虞
-號
-虫
-虹
-虻
-蚊
-蚕
-蛇
-蛉
-蛍
-蛎
-蛙
-蛛
-蛟
-蛤
-蛭
-蛮
-蛸
-蛹
-蛾
-蜀
-蜂
-蜃
-蜆
-蜊
-蜘
-蜜
-蜷
-蜻
-蝉
-蝋
-蝕
-蝙
-蝠
-蝦
-蝶
-蝿
-螂
-融
-螣
-螺
-蟄
-蟇
-蟠
-蟷
-蟹
-蟻
-蠢
-蠣
-血
-衆
-行
-衍
-衒
-術
-街
-衙
-衛
-衝
-衞
-衡
-衢
-衣
-表
-衫
-衰
-衵
-衷
-衽
-衾
-衿
-袁
-袈
-袋
-袍
-袒
-袖
-袙
-袞
-袢
-被
-袰
-袱
-袴
-袷
-袿
-裁
-裂
-裃
-装
-裏
-裔
-裕
-裘
-裙
-補
-裟
-裡
-裲
-裳
-裴
-裸
-裹
-製
-裾
-褂
-褄
-複
-褌
-褐
-褒
-褥
-褪
-褶
-褻
-襄
-襖
-襞
-襟
-襠
-襦
-襪
-襲
-襴
-襷
-西
-要
-覆
-覇
-覈
-見
-規
-視
-覗
-覚
-覧
-親
-覲
-観
-覺
-觀
-角
-解
-触
-言
-訂
-計
-討
-訓
-託
-記
-訛
-訟
-訢
-訥
-訪
-設
-許
-訳
-訴
-訶
-診
-註
-証
-詐
-詔
-評
-詛
-詞
-詠
-詢
-詣
-試
-詩
-詫
-詮
-詰
-話
-該
-詳
-誄
-誅
-誇
-誉
-誌
-認
-誓
-誕
-誘
-語
-誠
-誡
-誣
-誤
-誥
-誦
-説
-読
-誰
-課
-誼
-誾
-調
-談
-請
-諌
-諍
-諏
-諒
-論
-諚
-諜
-諟
-諡
-諦
-諧
-諫
-諭
-諮
-諱
-諶
-諷
-諸
-諺
-諾
-謀
-謄
-謌
-謎
-謗
-謙
-謚
-講
-謝
-謡
-謫
-謬
-謹
-證
-識
-譚
-譛
-譜
-警
-譬
-譯
-議
-譲
-譴
-護
-讀
-讃
-讐
-讒
-谷
-谿
-豅
-豆
-豊
-豎
-豐
-豚
-象
-豪
-豫
-豹
-貌
-貝
-貞
-負
-財
-貢
-貧
-貨
-販
-貪
-貫
-責
-貯
-貰
-貴
-買
-貸
-費
-貼
-貿
-賀
-賁
-賂
-賃
-賄
-資
-賈
-賊
-賎
-賑
-賓
-賛
-賜
-賞
-賠
-賢
-賣
-賤
-賦
-質
-賭
-購
-賽
-贄
-贅
-贈
-贋
-贔
-贖
-赤
-赦
-走
-赴
-起
-超
-越
-趙
-趣
-足
-趺
-趾
-跋
-跏
-距
-跡
-跨
-跪
-路
-跳
-践
-踊
-踏
-踐
-踞
-踪
-踵
-蹄
-蹉
-蹊
-蹟
-蹲
-蹴
-躅
-躇
-躊
-躍
-躑
-躙
-躪
-身
-躬
-躯
-躰
-車
-軋
-軌
-軍
-軒
-軟
-転
-軸
-軻
-軽
-軾
-較
-載
-輌
-輔
-輜
-輝
-輦
-輩
-輪
-輯
-輸
-輿
-轄
-轍
-轟
-轢
-辛
-辞
-辟
-辥
-辦
-辨
-辰
-辱
-農
-辺
-辻
-込
-迂
-迅
-迎
-近
-返
-迢
-迦
-迪
-迫
-迭
-述
-迷
-迹
-追
-退
-送
-逃
-逅
-逆
-逍
-透
-逐
-逓
-途
-逕
-逗
-這
-通
-逝
-逞
-速
-造
-逢
-連
-逮
-週
-進
-逸
-逼
-遁
-遂
-遅
-遇
-遊
-運
-遍
-過
-遐
-道
-達
-違
-遙
-遜
-遠
-遡
-遣
-遥
-適
-遭
-遮
-遯
-遵
-遷
-選
-遺
-遼
-避
-邀
-邁
-邂
-邃
-還
-邇
-邉
-邊
-邑
-那
-邦
-邨
-邪
-邯
-邵
-邸
-郁
-郊
-郎
-郡
-郢
-部
-郭
-郴
-郵
-郷
-都
-鄂
-鄙
-鄭
-鄰
-鄲
-酉
-酋
-酌
-配
-酎
-酒
-酔
-酢
-酥
-酪
-酬
-酵
-酷
-酸
-醍
-醐
-醒
-醗
-醜
-醤
-醪
-醵
-醸
-采
-釈
-釉
-釋
-里
-重
-野
-量
-釐
-金
-釘
-釜
-針
-釣
-釧
-釿
-鈍
-鈎
-鈐
-鈔
-鈞
-鈦
-鈴
-鈷
-鈸
-鈿
-鉄
-鉇
-鉉
-鉋
-鉛
-鉢
-鉤
-鉦
-鉱
-鉾
-銀
-銃
-銅
-銈
-銑
-銕
-銘
-銚
-銜
-銭
-鋏
-鋒
-鋤
-鋭
-鋲
-鋳
-鋸
-鋺
-鋼
-錆
-錍
-錐
-錘
-錠
-錣
-錦
-錫
-錬
-錯
-録
-錵
-鍋
-鍍
-鍑
-鍔
-鍛
-鍬
-鍮
-鍵
-鍼
-鍾
-鎌
-鎖
-鎗
-鎚
-鎧
-鎬
-鎮
-鎰
-鎹
-鏃
-鏑
-鏡
-鐃
-鐇
-鐐
-鐔
-鐘
-鐙
-鐚
-鐡
-鐵
-鐸
-鑁
-鑊
-鑑
-鑒
-鑚
-鑠
-鑢
-鑰
-鑵
-鑷
-鑼
-鑽
-鑿
-長
-門
-閃
-閇
-閉
-開
-閏
-閑
-間
-閔
-閘
-関
-閣
-閤
-閥
-閦
-閨
-閬
-閲
-閻
-閼
-閾
-闇
-闍
-闔
-闕
-闘
-關
-闡
-闢
-闥
-阜
-阪
-阮
-阯
-防
-阻
-阿
-陀
-陂
-附
-陌
-降
-限
-陛
-陞
-院
-陣
-除
-陥
-陪
-陬
-陰
-陳
-陵
-陶
-陸
-険
-陽
-隅
-隆
-隈
-隊
-隋
-階
-随
-隔
-際
-障
-隠
-隣
-隧
-隷
-隻
-隼
-雀
-雁
-雄
-雅
-集
-雇
-雉
-雊
-雋
-雌
-雍
-雑
-雖
-雙
-雛
-離
-難
-雨
-雪
-雫
-雰
-雲
-零
-雷
-雹
-電
-需
-震
-霊
-霍
-霖
-霜
-霞
-霧
-霰
-露
-靈
-青
-靖
-静
-靜
-非
-面
-革
-靫
-靭
-靱
-靴
-靺
-鞁
-鞄
-鞆
-鞋
-鞍
-鞏
-鞘
-鞠
-鞨
-鞭
-韋
-韓
-韜
-韮
-音
-韶
-韻
-響
-頁
-頂
-頃
-項
-順
-須
-頌
-預
-頑
-頒
-頓
-領
-頚
-頬
-頭
-頴
-頸
-頻
-頼
-顆
-題
-額
-顎
-顔
-顕
-顗
-願
-顛
-類
-顧
-顯
-風
-飛
-食
-飢
-飩
-飫
-飯
-飲
-飴
-飼
-飽
-飾
-餃
-餅
-餉
-養
-餌
-餐
-餓
-餘
-餝
-餡
-館
-饂
-饅
-饉
-饋
-饌
-饒
-饗
-首
-馗
-香
-馨
-馬
-馳
-馴
-駄
-駅
-駆
-駈
-駐
-駒
-駕
-駝
-駿
-騁
-騎
-騏
-騒
-験
-騙
-騨
-騰
-驕
-驚
-驛
-驢
-骨
-骸
-髄
-體
-高
-髙
-髢
-髪
-髭
-髮
-髷
-髻
-鬘
-鬚
-鬢
-鬨
-鬯
-鬱
-鬼
-魁
-魂
-魄
-魅
-魏
-魔
-魚
-魯
-鮎
-鮑
-鮒
-鮪
-鮫
-鮭
-鮮
-鯉
-鯔
-鯖
-鯛
-鯨
-鯰
-鯱
-鰐
-鰒
-鰭
-鰯
-鰰
-鰹
-鰻
-鱈
-鱒
-鱗
-鱧
-鳥
-鳩
-鳰
-鳳
-鳴
-鳶
-鴈
-鴉
-鴎
-鴛
-鴟
-鴦
-鴨
-鴫
-鴻
-鵄
-鵜
-鵞
-鵡
-鵬
-鵲
-鵺
-鶉
-鶏
-鶯
-鶴
-鷄
-鷙
-鷲
-鷹
-鷺
-鸚
-鸞
-鹸
-鹽
-鹿
-麁
-麒
-麓
-麗
-麝
-麞
-麟
-麦
-麩
-麹
-麺
-麻
-麾
-麿
-黄
-黌
-黍
-黒
-黙
-黛
-黠
-鼈
-鼉
-鼎
-鼓
-鼠
-鼻
-齊
-齋
-齟
-齢
-齬
-龍
-龕
-龗
-!
-#
-%
-&
-(
-)
-+
-,
--
-.
-/
-0
-1
-2
-3
-4
-5
-6
-7
-8
-9
-:
-;
-=
-?
-@
-A
-B
-C
-D
-E
-F
-G
-H
-I
-J
-K
-L
-M
-N
-O
-P
-R
-S
-T
-U
-V
-W
-X
-Z
-a
-c
-d
-e
-f
-h
-i
-j
-k
-l
-m
-n
-o
-p
-r
-s
-t
-u
-y
-z
-~
-・
-
diff --git a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/character.py b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/character.py
deleted file mode 100644
index 21dbbd9dc790e3d009f45c1ef1b68c001e9f0e0b..0000000000000000000000000000000000000000
--- a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/character.py
+++ /dev/null
@@ -1,213 +0,0 @@
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import numpy as np
-import string
-
-class CharacterOps(object):
- """ Convert between text-label and text-index """
-
- def __init__(self, config):
- self.character_type = config['character_type']
- self.loss_type = config['loss_type']
- self.max_text_len = config['max_text_length']
- if self.character_type == "en":
- self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
- dict_character = list(self.character_str)
- elif self.character_type in [
- "ch", 'japan', 'korean', 'french', 'german'
- ]:
- character_dict_path = config['character_dict_path']
- add_space = False
- if 'use_space_char' in config:
- add_space = config['use_space_char']
- self.character_str = ""
- with open(character_dict_path, "rb") as fin:
- lines = fin.readlines()
- for line in lines:
- line = line.decode('utf-8').strip("\n").strip("\r\n")
- self.character_str += line
- if add_space:
- self.character_str += " "
- dict_character = list(self.character_str)
- elif self.character_type == "en_sensitive":
- # same with ASTER setting (use 94 char).
- self.character_str = string.printable[:-6]
- dict_character = list(self.character_str)
- else:
- self.character_str = None
- assert self.character_str is not None, \
- "Nonsupport type of the character: {}".format(self.character_str)
- self.beg_str = "sos"
- self.end_str = "eos"
- if self.loss_type == "attention":
- dict_character = [self.beg_str, self.end_str] + dict_character
- elif self.loss_type == "srn":
- dict_character = dict_character + [self.beg_str, self.end_str]
- self.dict = {}
- for i, char in enumerate(dict_character):
- self.dict[char] = i
- self.character = dict_character
-
- def encode(self, text):
- """convert text-label into text-index.
- input:
- text: text labels of each image. [batch_size]
-
- output:
- text: concatenated text index for CTCLoss.
- [sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)]
- length: length of each text. [batch_size]
- """
- if self.character_type == "en":
- text = text.lower()
-
- text_list = []
- for char in text:
- if char not in self.dict:
- continue
- text_list.append(self.dict[char])
- text = np.array(text_list)
- return text
-
- def decode(self, text_index, is_remove_duplicate=False):
- """ convert text-index into text-label. """
- char_list = []
- char_num = self.get_char_num()
-
- if self.loss_type == "attention":
- beg_idx = self.get_beg_end_flag_idx("beg")
- end_idx = self.get_beg_end_flag_idx("end")
- ignored_tokens = [beg_idx, end_idx]
- else:
- ignored_tokens = [char_num]
-
- for idx in range(len(text_index)):
- if text_index[idx] in ignored_tokens:
- continue
- if is_remove_duplicate:
- if idx > 0 and text_index[idx - 1] == text_index[idx]:
- continue
- char_list.append(self.character[int(text_index[idx])])
- text = ''.join(char_list)
- return text
-
- def get_char_num(self):
- return len(self.character)
-
- def get_beg_end_flag_idx(self, beg_or_end):
- if self.loss_type == "attention":
- if beg_or_end == "beg":
- idx = np.array(self.dict[self.beg_str])
- elif beg_or_end == "end":
- idx = np.array(self.dict[self.end_str])
- else:
- assert False, "Unsupport type %s in get_beg_end_flag_idx"\
- % beg_or_end
- return idx
- else:
- err = "error in get_beg_end_flag_idx when using the loss %s"\
- % (self.loss_type)
- assert False, err
-
-
-def cal_predicts_accuracy(char_ops,
- preds,
- preds_lod,
- labels,
- labels_lod,
- is_remove_duplicate=False):
- acc_num = 0
- img_num = 0
- for ino in range(len(labels_lod) - 1):
- beg_no = preds_lod[ino]
- end_no = preds_lod[ino + 1]
- preds_text = preds[beg_no:end_no].reshape(-1)
- preds_text = char_ops.decode(preds_text, is_remove_duplicate)
-
- beg_no = labels_lod[ino]
- end_no = labels_lod[ino + 1]
- labels_text = labels[beg_no:end_no].reshape(-1)
- labels_text = char_ops.decode(labels_text, is_remove_duplicate)
- img_num += 1
-
- if preds_text == labels_text:
- acc_num += 1
- acc = acc_num * 1.0 / img_num
- return acc, acc_num, img_num
-
-
-def cal_predicts_accuracy_srn(char_ops,
- preds,
- labels,
- max_text_len,
- is_debug=False):
- acc_num = 0
- img_num = 0
-
- char_num = char_ops.get_char_num()
-
- total_len = preds.shape[0]
- img_num = int(total_len / max_text_len)
- for i in range(img_num):
- cur_label = []
- cur_pred = []
- for j in range(max_text_len):
- if labels[j + i * max_text_len] != int(char_num - 1): #0
- cur_label.append(labels[j + i * max_text_len][0])
- else:
- break
-
- for j in range(max_text_len + 1):
- if j < len(cur_label) and preds[j + i * max_text_len][
- 0] != cur_label[j]:
- break
- elif j == len(cur_label) and j == max_text_len:
- acc_num += 1
- break
- elif j == len(cur_label) and preds[j + i * max_text_len][0] == int(
- char_num - 1):
- acc_num += 1
- break
- acc = acc_num * 1.0 / img_num
- return acc, acc_num, img_num
-
-
-def convert_rec_attention_infer_res(preds):
- img_num = preds.shape[0]
- target_lod = [0]
- convert_ids = []
- for ino in range(img_num):
- end_pos = np.where(preds[ino, :] == 1)[0]
- if len(end_pos) <= 1:
- text_list = preds[ino, 1:]
- else:
- text_list = preds[ino, 1:end_pos[1]]
- target_lod.append(target_lod[ino] + len(text_list))
- convert_ids = convert_ids + list(text_list)
- convert_ids = np.array(convert_ids)
- convert_ids = convert_ids.reshape((-1, 1))
- return convert_ids, target_lod
-
-
-def convert_rec_label_to_lod(ori_labels):
- img_num = len(ori_labels)
- target_lod = [0]
- convert_ids = []
- for ino in range(img_num):
- target_lod.append(target_lod[ino] + len(ori_labels[ino]))
- convert_ids = convert_ids + list(ori_labels[ino])
- convert_ids = np.array(convert_ids)
- convert_ids = convert_ids.reshape((-1, 1))
- return convert_ids, target_lod
diff --git a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/module.py
index cd04f063496af4a93459ec19a7a46b93f2dab51b..890d9d56be4edd7f2cba2bdd45daa7248067044b 100644
--- a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/module.py
+++ b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/module.py
@@ -1,304 +1,61 @@
-# -*- coding:utf-8 -*-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import argparse
-import ast
-import copy
-import math
-import os
-import time
-
-from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor
-from paddlehub.common.logger import logger
-from paddlehub.module.module import moduleinfo, runnable, serving
-from PIL import Image
-import cv2
-import numpy as np
-import paddle.fluid as fluid
import paddlehub as hub
-
-from japan_ocr_db_crnn_mobile.character import CharacterOps
-from japan_ocr_db_crnn_mobile.utils import base64_to_cv2, draw_ocr, get_image_ext, sorted_boxes
+from paddleocr.ppocr.utils.logging import get_logger
+from paddleocr.tools.infer.utility import base64_to_cv2
+from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="japan_ocr_db_crnn_mobile",
- version="1.0.0",
- summary=
- "The module can recognize the japan texts in an image. Firstly, it will detect the text box positions based on the differentiable_binarization module. Then it recognizes the german texts. ",
- author="paddle-dev",
- author_email="paddle-dev@baidu.com",
+ version="1.1.0",
+ summary="ocr service",
+ author="PaddlePaddle",
type="cv/text_recognition")
-class JapanOCRDBCRNNMobile(hub.Module):
- def _initialize(self, text_detector_module=None, enable_mkldnn=False, use_angle_classification=False):
+class JapanOCRDBCRNNMobile:
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9):
"""
initialize with the necessary elements
+ Args:
+ det(bool): Whether to use text detector.
+ rec(bool): Whether to use text recognizer.
+ use_angle_cls(bool): Whether to use text orientation classifier.
+ enable_mkldnn(bool): Whether to enable mkldnn.
+ use_gpu (bool): Whether to use gpu.
+ box_thresh(float): the threshold of the detected text box's confidence
+ angle_classification_thresh(float): the threshold of the angle classification confidence
"""
- self.character_dict_path = os.path.join(self.directory, 'assets',
- 'japan_dict.txt')
- char_ops_params = {
- 'character_type': 'japan',
- 'character_dict_path': self.character_dict_path,
- 'loss_type': 'ctc',
- 'max_text_length': 25,
- 'use_space_char': True
- }
- self.char_ops = CharacterOps(char_ops_params)
- self.rec_image_shape = [3, 32, 320]
- self._text_detector_module = text_detector_module
- self.font_file = os.path.join(self.directory, 'assets', 'japan.ttc')
- self.enable_mkldnn = enable_mkldnn
- self.use_angle_classification = use_angle_classification
-
- self.rec_pretrained_model_path = os.path.join(
- self.directory, 'inference_model', 'character_rec')
- self.rec_predictor, self.rec_input_tensor, self.rec_output_tensors = self._set_config(
- self.rec_pretrained_model_path)
-
- if self.use_angle_classification:
- self.cls_pretrained_model_path = os.path.join(
- self.directory, 'inference_model', 'angle_cls')
-
- self.cls_predictor, self.cls_input_tensor, self.cls_output_tensors = self._set_config(
- self.cls_pretrained_model_path)
-
- def _set_config(self, pretrained_model_path):
- """
- predictor config path
- """
- model_file_path = os.path.join(pretrained_model_path, 'model')
- params_file_path = os.path.join(pretrained_model_path, 'params')
-
- config = AnalysisConfig(model_file_path, params_file_path)
- try:
- _places = os.environ["CUDA_VISIBLE_DEVICES"]
- int(_places[0])
- use_gpu = True
- except:
- use_gpu = False
-
- if use_gpu:
- config.enable_use_gpu(8000, 0)
- else:
- config.disable_gpu()
- if self.enable_mkldnn:
- # cache 10 different shapes for mkldnn to avoid memory leak
- config.set_mkldnn_cache_capacity(10)
- config.enable_mkldnn()
-
- config.disable_glog_info()
- config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass")
- config.switch_use_feed_fetch_ops(False)
-
- predictor = create_paddle_predictor(config)
-
- input_names = predictor.get_input_names()
- input_tensor = predictor.get_input_tensor(input_names[0])
- output_names = predictor.get_output_names()
- output_tensors = []
- for output_name in output_names:
- output_tensor = predictor.get_output_tensor(output_name)
- output_tensors.append(output_tensor)
-
- return predictor, input_tensor, output_tensors
-
- @property
- def text_detector_module(self):
- """
- text detect module
- """
- if not self._text_detector_module:
- self._text_detector_module = hub.Module(
- name='chinese_text_detection_db_mobile',
- enable_mkldnn=self.enable_mkldnn,
- version='1.0.4')
- return self._text_detector_module
-
- def read_images(self, paths=[]):
- images = []
- for img_path in paths:
- assert os.path.isfile(
- img_path), "The {} isn't a valid file.".format(img_path)
- img = cv2.imread(img_path)
- if img is None:
- logger.info("error in loading image:{}".format(img_path))
- continue
- images.append(img)
- return images
-
- def get_rotate_crop_image(self, img, points):
- '''
- img_height, img_width = img.shape[0:2]
- left = int(np.min(points[:, 0]))
- right = int(np.max(points[:, 0]))
- top = int(np.min(points[:, 1]))
- bottom = int(np.max(points[:, 1]))
- img_crop = img[top:bottom, left:right, :].copy()
- points[:, 0] = points[:, 0] - left
- points[:, 1] = points[:, 1] - top
- '''
- img_crop_width = int(
- max(
- np.linalg.norm(points[0] - points[1]),
- np.linalg.norm(points[2] - points[3])))
- img_crop_height = int(
- max(
- np.linalg.norm(points[0] - points[3]),
- np.linalg.norm(points[1] - points[2])))
- pts_std = np.float32([[0, 0], [img_crop_width, 0],
- [img_crop_width, img_crop_height],
- [0, img_crop_height]])
- M = cv2.getPerspectiveTransform(points, pts_std)
- dst_img = cv2.warpPerspective(
- img,
- M, (img_crop_width, img_crop_height),
- borderMode=cv2.BORDER_REPLICATE,
- flags=cv2.INTER_CUBIC)
- dst_img_height, dst_img_width = dst_img.shape[0:2]
- if dst_img_height * 1.0 / dst_img_width >= 1.5:
- dst_img = np.rot90(dst_img)
- return dst_img
-
- def resize_norm_img_rec(self, img, max_wh_ratio):
- imgC, imgH, imgW = self.rec_image_shape
- assert imgC == img.shape[2]
- h, w = img.shape[:2]
- ratio = w / float(h)
- if math.ceil(imgH * ratio) > imgW:
- resized_w = imgW
- else:
- resized_w = int(math.ceil(imgH * ratio))
- resized_image = cv2.resize(img, (resized_w, imgH))
- resized_image = resized_image.astype('float32')
- resized_image = resized_image.transpose((2, 0, 1)) / 255
- resized_image -= 0.5
- resized_image /= 0.5
- padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
- padding_im[:, :, 0:resized_w] = resized_image
- return padding_im
-
- def resize_norm_img_cls(self, img):
- cls_image_shape = [3, 48, 192]
- imgC, imgH, imgW = cls_image_shape
- h = img.shape[0]
- w = img.shape[1]
- ratio = w / float(h)
- if math.ceil(imgH * ratio) > imgW:
- resized_w = imgW
- else:
- resized_w = int(math.ceil(imgH * ratio))
- resized_image = cv2.resize(img, (resized_w, imgH))
- resized_image = resized_image.astype('float32')
- if cls_image_shape[0] == 1:
- resized_image = resized_image / 255
- resized_image = resized_image[np.newaxis, :]
- else:
- resized_image = resized_image.transpose((2, 0, 1)) / 255
- resized_image -= 0.5
- resized_image /= 0.5
- padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
- padding_im[:, :, 0:resized_w] = resized_image
- return padding_im
-
- def recognize_text(self,
- images=[],
- paths=[],
- use_gpu=False,
- output_dir='ocr_result',
- visualization=False,
- box_thresh=0.5,
- text_thresh=0.5,
- angle_classification_thresh=0.9):
- """
- Get the chinese texts in the predicted images.
+ self.logger = get_logger()
+ self.model = hub.Module(
+ name="multi_languages_ocr_db_crnn",
+ lang="japan",
+ det=det,
+ rec=rec,
+ use_angle_cls=use_angle_cls,
+ enable_mkldnn=enable_mkldnn,
+ use_gpu=use_gpu,
+ box_thresh=box_thresh,
+ angle_classification_thresh=angle_classification_thresh)
+ self.model.name = self.name
+
+ def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
+ """
+ Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
- use_gpu (bool): Whether to use gpu.
- batch_size(int): the program deals once with one
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
- box_thresh(float): the threshold of the detected text box's confidence
- text_thresh(float): the threshold of the chinese text recognition confidence
- angle_classification_thresh(float): the threshold of the angle classification confidence
-
Returns:
- res (list): The result of chinese texts and save path of images.
+ res (list): The result of text detection box and save path of images.
"""
- if use_gpu:
- try:
- _places = os.environ["CUDA_VISIBLE_DEVICES"]
- int(_places[0])
- except:
- raise RuntimeError(
- "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
- )
-
- self.use_gpu = use_gpu
-
- if images != [] and isinstance(images, list) and paths == []:
- predicted_data = images
- elif images == [] and isinstance(paths, list) and paths != []:
- predicted_data = self.read_images(paths)
- else:
- raise TypeError("The input data is inconsistent with expectations.")
-
- assert predicted_data != [], "There is not any image to be predicted. Please check the input data."
-
- detection_results = self.text_detector_module.detect_text(
- images=predicted_data, use_gpu=self.use_gpu, box_thresh=box_thresh)
- print('*'*10)
- print(detection_results)
-
- boxes = [
- np.array(item['data']).astype(np.float32)
- for item in detection_results
- ]
- all_results = []
- for index, img_boxes in enumerate(boxes):
- original_image = predicted_data[index].copy()
- result = {'save_path': ''}
- if img_boxes.size == 0:
- result['data'] = []
- else:
- img_crop_list = []
- boxes = sorted_boxes(img_boxes)
- for num_box in range(len(boxes)):
- tmp_box = copy.deepcopy(boxes[num_box])
- img_crop = self.get_rotate_crop_image(
- original_image, tmp_box)
- img_crop_list.append(img_crop)
-
- if self.use_angle_classification:
- img_crop_list, angle_list = self._classify_text(
- img_crop_list,
- angle_classification_thresh=angle_classification_thresh)
-
- rec_results = self._recognize_text(img_crop_list)
-
- # if the recognized text confidence score is lower than text_thresh, then drop it
- rec_res_final = []
- for index, res in enumerate(rec_results):
- text, score = res
- if score >= text_thresh:
- rec_res_final.append({
- 'text':
- text,
- 'confidence':
- float(score),
- 'text_box_position':
- boxes[index].astype(np.int).tolist()
- })
- result['data'] = rec_res_final
-
- if visualization and result['data']:
- result['save_path'] = self.save_result_image(
- original_image, boxes, rec_results, output_dir,
- text_thresh)
- all_results.append(result)
-
+ all_results = self.model.recognize_text(
+ images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
@@ -310,282 +67,21 @@ class JapanOCRDBCRNNMobile(hub.Module):
results = self.recognize_text(images_decode, **kwargs)
return results
- def save_result_image(
- self,
- original_image,
- detection_boxes,
- rec_results,
- output_dir='ocr_result',
- text_thresh=0.5,
- ):
- image = Image.fromarray(cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB))
- txts = [item[0] for item in rec_results]
- scores = [item[1] for item in rec_results]
- draw_img = draw_ocr(
- image,
- detection_boxes,
- txts,
- scores,
- font_file=self.font_file,
- draw_txt=True,
- drop_score=text_thresh)
-
- if not os.path.exists(output_dir):
- os.makedirs(output_dir)
- ext = get_image_ext(original_image)
- saved_name = 'ndarray_{}{}'.format(time.time(), ext)
- save_file_path = os.path.join(output_dir, saved_name)
- cv2.imwrite(save_file_path, draw_img[:, :, ::-1])
- return save_file_path
-
- def _classify_text(self, image_list, angle_classification_thresh=0.9):
- img_list = copy.deepcopy(image_list)
- img_num = len(img_list)
- # Calculate the aspect ratio of all text bars
- width_list = []
- for img in img_list:
- width_list.append(img.shape[1] / float(img.shape[0]))
- # Sorting can speed up the cls process
- indices = np.argsort(np.array(width_list))
-
- cls_res = [['', 0.0]] * img_num
- batch_num = 30
- for beg_img_no in range(0, img_num, batch_num):
- end_img_no = min(img_num, beg_img_no + batch_num)
- norm_img_batch = []
- max_wh_ratio = 0
- for ino in range(beg_img_no, end_img_no):
- h, w = img_list[indices[ino]].shape[0:2]
- wh_ratio = w * 1.0 / h
- max_wh_ratio = max(max_wh_ratio, wh_ratio)
- for ino in range(beg_img_no, end_img_no):
- norm_img = self.resize_norm_img_cls(img_list[indices[ino]])
- norm_img = norm_img[np.newaxis, :]
- norm_img_batch.append(norm_img)
- norm_img_batch = np.concatenate(norm_img_batch)
- norm_img_batch = norm_img_batch.copy()
-
- self.cls_input_tensor.copy_from_cpu(norm_img_batch)
- self.cls_predictor.zero_copy_run()
-
- prob_out = self.cls_output_tensors[0].copy_to_cpu()
- label_out = self.cls_output_tensors[1].copy_to_cpu()
- if len(label_out.shape) != 1:
- prob_out, label_out = label_out, prob_out
- label_list = ['0', '180']
- for rno in range(len(label_out)):
- label_idx = label_out[rno]
- score = prob_out[rno][label_idx]
- label = label_list[label_idx]
- cls_res[indices[beg_img_no + rno]] = [label, score]
- if '180' in label and score > angle_classification_thresh:
- img_list[indices[beg_img_no + rno]] = cv2.rotate(
- img_list[indices[beg_img_no + rno]], 1)
- return img_list, cls_res
-
- def _recognize_text(self, img_list):
- img_num = len(img_list)
- # Calculate the aspect ratio of all text bars
- width_list = []
- for img in img_list:
- width_list.append(img.shape[1] / float(img.shape[0]))
- # Sorting can speed up the recognition process
- indices = np.argsort(np.array(width_list))
-
- rec_res = [['', 0.0]] * img_num
- batch_num = 30
- for beg_img_no in range(0, img_num, batch_num):
- end_img_no = min(img_num, beg_img_no + batch_num)
- norm_img_batch = []
- max_wh_ratio = 0
- for ino in range(beg_img_no, end_img_no):
- h, w = img_list[indices[ino]].shape[0:2]
- wh_ratio = w * 1.0 / h
- max_wh_ratio = max(max_wh_ratio, wh_ratio)
- for ino in range(beg_img_no, end_img_no):
- norm_img = self.resize_norm_img_rec(img_list[indices[ino]],
- max_wh_ratio)
- norm_img = norm_img[np.newaxis, :]
- norm_img_batch.append(norm_img)
-
- norm_img_batch = np.concatenate(norm_img_batch, axis=0)
- norm_img_batch = norm_img_batch.copy()
-
- self.rec_input_tensor.copy_from_cpu(norm_img_batch)
- self.rec_predictor.zero_copy_run()
-
- rec_idx_batch = self.rec_output_tensors[0].copy_to_cpu()
- rec_idx_lod = self.rec_output_tensors[0].lod()[0]
- predict_batch = self.rec_output_tensors[1].copy_to_cpu()
- predict_lod = self.rec_output_tensors[1].lod()[0]
- for rno in range(len(rec_idx_lod) - 1):
- beg = rec_idx_lod[rno]
- end = rec_idx_lod[rno + 1]
- rec_idx_tmp = rec_idx_batch[beg:end, 0]
- preds_text = self.char_ops.decode(rec_idx_tmp)
- beg = predict_lod[rno]
- end = predict_lod[rno + 1]
- probs = predict_batch[beg:end, :]
- ind = np.argmax(probs, axis=1)
- blank = probs.shape[1]
- valid_ind = np.where(ind != (blank - 1))[0]
- if len(valid_ind) == 0:
- continue
- score = np.mean(probs[valid_ind, ind[valid_ind]])
- # rec_res.append([preds_text, score])
- rec_res[indices[beg_img_no + rno]] = [preds_text, score]
-
- return rec_res
-
- def save_inference_model(self,
- dirname,
- model_filename=None,
- params_filename=None,
- combined=True):
- detector_dir = os.path.join(dirname, 'text_detector')
- classifier_dir = os.path.join(dirname, 'angle_classifier')
- recognizer_dir = os.path.join(dirname, 'text_recognizer')
- self._save_detector_model(detector_dir, model_filename, params_filename,
- combined)
- if self.use_angle_classification:
- self._save_classifier_model(classifier_dir, model_filename,
- params_filename, combined)
-
- self._save_recognizer_model(recognizer_dir, model_filename,
- params_filename, combined)
- logger.info("The inference model has been saved in the path {}".format(
- os.path.realpath(dirname)))
-
- def _save_detector_model(self,
- dirname,
- model_filename=None,
- params_filename=None,
- combined=True):
- self.text_detector_module.save_inference_model(
- dirname, model_filename, params_filename, combined)
-
- def _save_recognizer_model(self,
- dirname,
- model_filename=None,
- params_filename=None,
- combined=True):
- if combined:
- model_filename = "__model__" if not model_filename else model_filename
- params_filename = "__params__" if not params_filename else params_filename
- place = fluid.CPUPlace()
- exe = fluid.Executor(place)
-
- model_file_path = os.path.join(self.rec_pretrained_model_path, 'model')
- params_file_path = os.path.join(self.rec_pretrained_model_path,
- 'params')
- program, feeded_var_names, target_vars = fluid.io.load_inference_model(
- dirname=self.rec_pretrained_model_path,
- model_filename=model_file_path,
- params_filename=params_file_path,
- executor=exe)
-
- fluid.io.save_inference_model(
- dirname=dirname,
- main_program=program,
- executor=exe,
- feeded_var_names=feeded_var_names,
- target_vars=target_vars,
- model_filename=model_filename,
- params_filename=params_filename)
-
- def _save_classifier_model(self,
- dirname,
- model_filename=None,
- params_filename=None,
- combined=True):
- if combined:
- model_filename = "__model__" if not model_filename else model_filename
- params_filename = "__params__" if not params_filename else params_filename
- place = fluid.CPUPlace()
- exe = fluid.Executor(place)
-
- model_file_path = os.path.join(self.cls_pretrained_model_path, 'model')
- params_file_path = os.path.join(self.cls_pretrained_model_path,
- 'params')
- program, feeded_var_names, target_vars = fluid.io.load_inference_model(
- dirname=self.cls_pretrained_model_path,
- model_filename=model_file_path,
- params_filename=params_file_path,
- executor=exe)
-
- fluid.io.save_inference_model(
- dirname=dirname,
- main_program=program,
- executor=exe,
- feeded_var_names=feeded_var_names,
- target_vars=target_vars,
- model_filename=model_filename,
- params_filename=params_filename)
-
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
- self.parser = argparse.ArgumentParser(
- description="Run the %s module." % self.name,
- prog='hub run %s' % self.name,
- usage='%(prog)s',
- add_help=True)
-
- self.arg_input_group = self.parser.add_argument_group(
- title="Input options", description="Input data. Required")
- self.arg_config_group = self.parser.add_argument_group(
- title="Config options",
- description=
- "Run configuration for controlling module behavior, not required.")
-
- self.add_module_config_arg()
- self.add_module_input_arg()
-
- args = self.parser.parse_args(argvs)
- results = self.recognize_text(
- paths=[args.input_path],
- use_gpu=args.use_gpu,
- output_dir=args.output_dir,
- visualization=args.visualization)
+ results = self.model.run_cmd(argvs)
return results
- def add_module_config_arg(self):
- """
- Add the command config options
- """
- self.arg_config_group.add_argument(
- '--use_gpu',
- type=ast.literal_eval,
- default=False,
- help="whether use GPU or not")
- self.arg_config_group.add_argument(
- '--output_dir',
- type=str,
- default='ocr_result',
- help="The directory to save output images.")
- self.arg_config_group.add_argument(
- '--visualization',
- type=ast.literal_eval,
- default=False,
- help="whether to save output as images.")
-
- def add_module_input_arg(self):
- """
- Add the command input options
- """
- self.arg_input_group.add_argument(
- '--input_path', type=str, default=None, help="diretory to image")
-
+ def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
+ '''
+ Export the model to ONNX format.
-if __name__ == '__main__':
- ocr = JapanOCRDBCRNNMobile(enable_mkldnn=False, use_angle_classification=True)
- image_path = [
- '/mnt/zhangxuefei/PaddleOCR/doc/imgs/ger_1.jpg',
- '/mnt/zhangxuefei/PaddleOCR/doc/imgs/12.jpg',
- '/mnt/zhangxuefei/PaddleOCR/doc/imgs/test_image.jpg'
- ]
- res = ocr.recognize_text(paths=image_path, visualization=True)
- ocr.save_inference_model('save')
- print(res)
+ Args:
+ dirname(str): The directory to save the onnx model.
+ input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
+ opset_version(int): operator set
+ '''
+ self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
diff --git a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3
--- /dev/null
+++ b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/requirements.txt
@@ -0,0 +1,4 @@
+paddleocr>=2.3.0.2
+paddle2onnx>=0.9.0
+shapely
+pyclipper
diff --git a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/utils.py b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/utils.py
deleted file mode 100644
index 8c41af300cc91de369a473cb7327b794b6cf5715..0000000000000000000000000000000000000000
--- a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/utils.py
+++ /dev/null
@@ -1,190 +0,0 @@
-# -*- coding:utf-8 -*-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-
-from PIL import Image, ImageDraw, ImageFont
-import base64
-import cv2
-import numpy as np
-
-
-def draw_ocr(image,
- boxes,
- txts,
- scores,
- font_file,
- draw_txt=True,
- drop_score=0.5):
- """
- Visualize the results of OCR detection and recognition
- args:
- image(Image|array): RGB image
- boxes(list): boxes with shape(N, 4, 2)
- txts(list): the texts
- scores(list): txxs corresponding scores
- draw_txt(bool): whether draw text or not
- drop_score(float): only scores greater than drop_threshold will be visualized
- return(array):
- the visualized img
- """
- if scores is None:
- scores = [1] * len(boxes)
- for (box, score) in zip(boxes, scores):
- if score < drop_score or math.isnan(score):
- continue
- box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64)
- image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2)
-
- if draw_txt:
- img = np.array(resize_img(image, input_size=600))
- txt_img = text_visual(
- txts,
- scores,
- font_file,
- img_h=img.shape[0],
- img_w=600,
- threshold=drop_score)
- img = np.concatenate([np.array(img), np.array(txt_img)], axis=1)
- return img
- return image
-
-
-def text_visual(texts, scores, font_file, img_h=400, img_w=600, threshold=0.):
- """
- create new blank img and draw txt on it
- args:
- texts(list): the text will be draw
- scores(list|None): corresponding score of each txt
- img_h(int): the height of blank img
- img_w(int): the width of blank img
- return(array):
- """
- if scores is not None:
- assert len(texts) == len(
- scores), "The number of txts and corresponding scores must match"
-
- def create_blank_img():
- blank_img = np.ones(shape=[img_h, img_w], dtype=np.int8) * 255
- blank_img[:, img_w - 1:] = 0
- blank_img = Image.fromarray(blank_img).convert("RGB")
- draw_txt = ImageDraw.Draw(blank_img)
- return blank_img, draw_txt
-
- blank_img, draw_txt = create_blank_img()
-
- font_size = 20
- txt_color = (0, 0, 0)
- font = ImageFont.truetype(font_file, font_size, encoding="utf-8")
-
- gap = font_size + 5
- txt_img_list = []
- count, index = 1, 0
- for idx, txt in enumerate(texts):
- index += 1
- if scores[idx] < threshold or math.isnan(scores[idx]):
- index -= 1
- continue
- first_line = True
- while str_count(txt) >= img_w // font_size - 4:
- tmp = txt
- txt = tmp[:img_w // font_size - 4]
- if first_line:
- new_txt = str(index) + ': ' + txt
- first_line = False
- else:
- new_txt = ' ' + txt
- draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
- txt = tmp[img_w // font_size - 4:]
- if count >= img_h // gap - 1:
- txt_img_list.append(np.array(blank_img))
- blank_img, draw_txt = create_blank_img()
- count = 0
- count += 1
- if first_line:
- new_txt = str(index) + ': ' + txt + ' ' + '%.3f' % (scores[idx])
- else:
- new_txt = " " + txt + " " + '%.3f' % (scores[idx])
- draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
- # whether add new blank img or not
- if count >= img_h // gap - 1 and idx + 1 < len(texts):
- txt_img_list.append(np.array(blank_img))
- blank_img, draw_txt = create_blank_img()
- count = 0
- count += 1
- txt_img_list.append(np.array(blank_img))
- if len(txt_img_list) == 1:
- blank_img = np.array(txt_img_list[0])
- else:
- blank_img = np.concatenate(txt_img_list, axis=1)
- return np.array(blank_img)
-
-
-def str_count(s):
- """
- Count the number of Chinese characters,
- a single English character and a single number
- equal to half the length of Chinese characters.
- args:
- s(string): the input of string
- return(int):
- the number of Chinese characters
- """
- import string
- count_zh = count_pu = 0
- s_len = len(s)
- en_dg_count = 0
- for c in s:
- if c in string.ascii_letters or c.isdigit() or c.isspace():
- en_dg_count += 1
- elif c.isalpha():
- count_zh += 1
- else:
- count_pu += 1
- return s_len - math.ceil(en_dg_count / 2)
-
-
-def resize_img(img, input_size=600):
- img = np.array(img)
- im_shape = img.shape
- im_size_min = np.min(im_shape[0:2])
- im_size_max = np.max(im_shape[0:2])
- im_scale = float(input_size) / float(im_size_max)
- im = cv2.resize(img, None, None, fx=im_scale, fy=im_scale)
- return im
-
-
-def get_image_ext(image):
- if image.shape[2] == 4:
- return ".png"
- return ".jpg"
-
-
-def sorted_boxes(dt_boxes):
- """
- Sort text boxes in order from top to bottom, left to right
- args:
- dt_boxes(array):detected text boxes with shape [4, 2]
- return:
- sorted boxes(array) with shape [4, 2]
- """
- num_boxes = dt_boxes.shape[0]
- sorted_boxes = sorted(dt_boxes, key=lambda x: (x[0][1], x[0][0]))
- _boxes = list(sorted_boxes)
-
- for i in range(num_boxes - 1):
- if abs(_boxes[i + 1][0][1] - _boxes[i][0][1]) < 10 and \
- (_boxes[i + 1][0][0] < _boxes[i][0][0]):
- tmp = _boxes[i]
- _boxes[i] = _boxes[i + 1]
- _boxes[i + 1] = tmp
- return _boxes
-
-
-def base64_to_cv2(b64str):
- data = base64.b64decode(b64str.encode('utf8'))
- data = np.fromstring(data, np.uint8)
- data = cv2.imdecode(data, cv2.IMREAD_COLOR)
- return data
diff --git a/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..19f2b1852e4e343241a40bd21b26820928f7506d
--- /dev/null
+++ b/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/README.md
@@ -0,0 +1,165 @@
+# kannada_ocr_db_crnn_mobile
+
+|模型名称|kannada_ocr_db_crnn_mobile|
+| :--- | :---: |
+|类别|图像-文字识别|
+|网络|Differentiable Binarization+CRNN|
+|数据集|icdar2015数据集|
+|是否支持Fine-tuning|否|
+|最新更新日期|2021-12-2|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - kannada_ocr_db_crnn_mobile Module用于识别图片当中的卡纳达文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的卡纳达文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别卡纳达文的轻量级OCR模型,支持直接预测。
+
+ - 更多详情参考:
+ - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
+ - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.2
+
+ - paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install kannada_ocr_db_crnn_mobile
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run kannada_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
+ $ hub run kannada_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
+ ```
+ - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ ocr = hub.Module(name="kannada_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
+ result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
+
+ # or
+ # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+ - ```python
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9)
+ ```
+
+ - 构造KannadaOCRDBCRNNMobile对象
+
+ - **参数**
+ - det(bool): 是否开启文字检测。默认为True。
+ - rec(bool): 是否开启文字识别。默认为True。
+ - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
+ - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - box\_thresh (float): 检测文本框置信度的阈值;
+ - angle_classification_thresh(float): 文本方向分类置信度的阈值
+
+
+ - ```python
+ def recognize_text(images=[],
+ paths=[],
+ output_dir='ocr_result',
+ visualization=False)
+ ```
+
+ - 预测API,检测输入图片中的所有文本的位置和识别文本结果。
+
+ - **参数**
+
+ - paths (list\[str\]): 图片的路径;
+ - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
+ - output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
+ - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
+
+ - **返回**
+
+ - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
+ - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
+ - text(str): 识别得到的文本
+ - confidence(float): 识别文本结果置信度
+ - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
+ - orientation(str): 分类的方向,仅在只有方向分类开启时输出
+ - score(float): 分类的得分,仅在只有方向分类开启时输出
+ - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
+
+
+## 四、服务部署
+
+- PaddleHub Serving 可以部署一个目标检测的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m kannada_ocr_db_crnn_mobile
+ ```
+
+ - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/kannada_ocr_db_crnn_mobile"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+ - ```shell
+ $ hub install kannada_ocr_db_crnn_mobile==1.0.0
+ ```
diff --git a/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..a3825167a9de0d76eef57769ed8ee4606a8fa08a
--- /dev/null
+++ b/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/module.py
@@ -0,0 +1,87 @@
+import paddlehub as hub
+from paddleocr.ppocr.utils.logging import get_logger
+from paddleocr.tools.infer.utility import base64_to_cv2
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+
+@moduleinfo(
+ name="kannada_ocr_db_crnn_mobile",
+ version="1.0.0",
+ summary="ocr service",
+ author="PaddlePaddle",
+ type="cv/text_recognition")
+class KannadaOCRDBCRNNMobile:
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9):
+ """
+ initialize with the necessary elements
+ Args:
+ det(bool): Whether to use text detector.
+ rec(bool): Whether to use text recognizer.
+ use_angle_cls(bool): Whether to use text orientation classifier.
+ enable_mkldnn(bool): Whether to enable mkldnn.
+ use_gpu (bool): Whether to use gpu.
+ box_thresh(float): the threshold of the detected text box's confidence
+ angle_classification_thresh(float): the threshold of the angle classification confidence
+ """
+ self.logger = get_logger()
+ self.model = hub.Module(
+ name="multi_languages_ocr_db_crnn",
+ lang="ka",
+ det=det,
+ rec=rec,
+ use_angle_cls=use_angle_cls,
+ enable_mkldnn=enable_mkldnn,
+ use_gpu=use_gpu,
+ box_thresh=box_thresh,
+ angle_classification_thresh=angle_classification_thresh)
+ self.model.name = self.name
+
+ def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
+ """
+ Get the text in the predicted images.
+ Args:
+ images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
+ paths (list[str]): The paths of images. If paths not images
+ output_dir (str): The directory to store output images.
+ visualization (bool): Whether to save image or not.
+ Returns:
+ res (list): The result of text detection box and save path of images.
+ """
+ all_results = self.model.recognize_text(
+ images=images, paths=paths, output_dir=output_dir, visualization=visualization)
+ return all_results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.recognize_text(images_decode, **kwargs)
+ return results
+
+ @runnable
+ def run_cmd(self, argvs):
+ """
+ Run as a command
+ """
+ results = self.model.run_cmd(argvs)
+ return results
+
+ def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
+ '''
+ Export the model to ONNX format.
+
+ Args:
+ dirname(str): The directory to save the onnx model.
+ input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
+ opset_version(int): operator set
+ '''
+ self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
diff --git a/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3
--- /dev/null
+++ b/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/requirements.txt
@@ -0,0 +1,4 @@
+paddleocr>=2.3.0.2
+paddle2onnx>=0.9.0
+shapely
+pyclipper
diff --git a/modules/image/text_recognition/korean_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/korean_ocr_db_crnn_mobile/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..b11f41b59692912660d613c5a72795993357eda3
--- /dev/null
+++ b/modules/image/text_recognition/korean_ocr_db_crnn_mobile/README.md
@@ -0,0 +1,169 @@
+# korean_ocr_db_crnn_mobile
+
+|模型名称|korean_ocr_db_crnn_mobile|
+| :--- | :---: |
+|类别|图像-文字识别|
+|网络|Differentiable Binarization+CRNN|
+|数据集|icdar2015数据集|
+|是否支持Fine-tuning|否|
+|最新更新日期|2021-12-2|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - korean_ocr_db_crnn_mobile Module用于识别图片当中的韩文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的韩文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别韩文的轻量级OCR模型,支持直接预测。
+
+ - 更多详情参考:
+ - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
+ - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.2
+
+ - paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install french_ocr_db_crnn_mobile
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run korean_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
+ $ hub run korean_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
+ ```
+ - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ ocr = hub.Module(name="korean_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
+ result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
+
+ # or
+ # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+ - ```python
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9)
+ ```
+
+ - 构造KoreanOCRDBCRNNMobile对象
+
+ - **参数**
+ - det(bool): 是否开启文字检测。默认为True。
+ - rec(bool): 是否开启文字识别。默认为True。
+ - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
+ - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - box\_thresh (float): 检测文本框置信度的阈值;
+ - angle_classification_thresh(float): 文本方向分类置信度的阈值
+
+
+ - ```python
+ def recognize_text(images=[],
+ paths=[],
+ output_dir='ocr_result',
+ visualization=False)
+ ```
+
+ - 预测API,检测输入图片中的所有文本的位置和识别文本结果。
+
+ - **参数**
+
+ - paths (list\[str\]): 图片的路径;
+ - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
+ - output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
+ - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
+
+ - **返回**
+
+ - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
+ - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
+ - text(str): 识别得到的文本
+ - confidence(float): 识别文本结果置信度
+ - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
+ - orientation(str): 分类的方向,仅在只有方向分类开启时输出
+ - score(float): 分类的得分,仅在只有方向分类开启时输出
+ - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
+
+
+## 四、服务部署
+
+- PaddleHub Serving 可以部署一个目标检测的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m korean_ocr_db_crnn_mobile
+ ```
+
+ - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/korean_ocr_db_crnn_mobile"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+* 1.1.0
+
+ 优化模型
+ - ```shell
+ $ hub install korean_ocr_db_crnn_mobile==1.1.0
+ ```
diff --git a/modules/image/text_recognition/korean_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/korean_ocr_db_crnn_mobile/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/image/text_recognition/korean_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/korean_ocr_db_crnn_mobile/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..916906af68160ccb46c513076ca25ef8853c81c6
--- /dev/null
+++ b/modules/image/text_recognition/korean_ocr_db_crnn_mobile/module.py
@@ -0,0 +1,87 @@
+import paddlehub as hub
+from paddleocr.ppocr.utils.logging import get_logger
+from paddleocr.tools.infer.utility import base64_to_cv2
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+
+@moduleinfo(
+ name="korean_ocr_db_crnn_mobile",
+ version="1.1.0",
+ summary="ocr service",
+ author="PaddlePaddle",
+ type="cv/text_recognition")
+class KoreanOCRDBCRNNMobile:
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9):
+ """
+ initialize with the necessary elements
+ Args:
+ det(bool): Whether to use text detector.
+ rec(bool): Whether to use text recognizer.
+ use_angle_cls(bool): Whether to use text orientation classifier.
+ enable_mkldnn(bool): Whether to enable mkldnn.
+ use_gpu (bool): Whether to use gpu.
+ box_thresh(float): the threshold of the detected text box's confidence
+ angle_classification_thresh(float): the threshold of the angle classification confidence
+ """
+ self.logger = get_logger()
+ self.model = hub.Module(
+ name="multi_languages_ocr_db_crnn",
+ lang="korean",
+ det=det,
+ rec=rec,
+ use_angle_cls=use_angle_cls,
+ enable_mkldnn=enable_mkldnn,
+ use_gpu=use_gpu,
+ box_thresh=box_thresh,
+ angle_classification_thresh=angle_classification_thresh)
+ self.model.name = self.name
+
+ def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
+ """
+ Get the text in the predicted images.
+ Args:
+ images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
+ paths (list[str]): The paths of images. If paths not images
+ output_dir (str): The directory to store output images.
+ visualization (bool): Whether to save image or not.
+ Returns:
+ res (list): The result of text detection box and save path of images.
+ """
+ all_results = self.model.recognize_text(
+ images=images, paths=paths, output_dir=output_dir, visualization=visualization)
+ return all_results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.recognize_text(images_decode, **kwargs)
+ return results
+
+ @runnable
+ def run_cmd(self, argvs):
+ """
+ Run as a command
+ """
+ results = self.model.run_cmd(argvs)
+ return results
+
+ def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
+ '''
+ Export the model to ONNX format.
+
+ Args:
+ dirname(str): The directory to save the onnx model.
+ input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
+ opset_version(int): operator set
+ '''
+ self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
diff --git a/modules/image/text_recognition/korean_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/korean_ocr_db_crnn_mobile/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3
--- /dev/null
+++ b/modules/image/text_recognition/korean_ocr_db_crnn_mobile/requirements.txt
@@ -0,0 +1,4 @@
+paddleocr>=2.3.0.2
+paddle2onnx>=0.9.0
+shapely
+pyclipper
diff --git a/modules/image/text_recognition/latin_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/latin_ocr_db_crnn_mobile/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..c81d839c61daaf6a6ec1a7649b62d7f6698a452e
--- /dev/null
+++ b/modules/image/text_recognition/latin_ocr_db_crnn_mobile/README.md
@@ -0,0 +1,166 @@
+# latin_ocr_db_crnn_mobile
+
+
+|模型名称|latin_ocr_db_crnn_mobile|
+| :--- | :---: |
+|类别|图像-文字识别|
+|网络|Differentiable Binarization+CRNN|
+|数据集|icdar2015数据集|
+|是否支持Fine-tuning|否|
+|最新更新日期|2021-12-2|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - latin_ocr_db_crnn_mobile Module用于识别图片当中的拉丁文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的拉丁文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别拉丁文的轻量级OCR模型,支持直接预测。
+
+ - 更多详情参考:
+ - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
+ - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.2
+
+ - paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install latin_ocr_db_crnn_mobile
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run latin_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
+ $ hub run latin_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
+ ```
+ - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ ocr = hub.Module(name="latin_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
+ result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
+
+ # or
+ # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+ - ```python
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9)
+ ```
+
+ - 构造LatinOCRDBCRNNMobile对象
+
+ - **参数**
+ - det(bool): 是否开启文字检测。默认为True。
+ - rec(bool): 是否开启文字识别。默认为True。
+ - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
+ - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - box\_thresh (float): 检测文本框置信度的阈值;
+ - angle_classification_thresh(float): 文本方向分类置信度的阈值
+
+
+ - ```python
+ def recognize_text(images=[],
+ paths=[],
+ output_dir='ocr_result',
+ visualization=False)
+ ```
+
+ - 预测API,检测输入图片中的所有文本的位置和识别文本结果。
+
+ - **参数**
+
+ - paths (list\[str\]): 图片的路径;
+ - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
+ - output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
+ - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
+
+ - **返回**
+
+ - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
+ - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
+ - text(str): 识别得到的文本
+ - confidence(float): 识别文本结果置信度
+ - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
+ - orientation(str): 分类的方向,仅在只有方向分类开启时输出
+ - score(float): 分类的得分,仅在只有方向分类开启时输出
+ - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
+
+
+## 四、服务部署
+
+- PaddleHub Serving 可以部署一个目标检测的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m latin_ocr_db_crnn_mobile
+ ```
+
+ - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/latin_ocr_db_crnn_mobile"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+ - ```shell
+ $ hub install latin_ocr_db_crnn_mobile==1.0.0
+ ```
diff --git a/modules/image/text_recognition/latin_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/latin_ocr_db_crnn_mobile/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/image/text_recognition/latin_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/latin_ocr_db_crnn_mobile/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..40ca5bee4acfd5059cee6c8163e90aee6cbc19ee
--- /dev/null
+++ b/modules/image/text_recognition/latin_ocr_db_crnn_mobile/module.py
@@ -0,0 +1,87 @@
+import paddlehub as hub
+from paddleocr.ppocr.utils.logging import get_logger
+from paddleocr.tools.infer.utility import base64_to_cv2
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+
+@moduleinfo(
+ name="latin_ocr_db_crnn_mobile",
+ version="1.0.0",
+ summary="ocr service",
+ author="PaddlePaddle",
+ type="cv/text_recognition")
+class LatinOCRDBCRNNMobile:
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9):
+ """
+ initialize with the necessary elements
+ Args:
+ det(bool): Whether to use text detector.
+ rec(bool): Whether to use text recognizer.
+ use_angle_cls(bool): Whether to use text orientation classifier.
+ enable_mkldnn(bool): Whether to enable mkldnn.
+ use_gpu (bool): Whether to use gpu.
+ box_thresh(float): the threshold of the detected text box's confidence
+ angle_classification_thresh(float): the threshold of the angle classification confidence
+ """
+ self.logger = get_logger()
+ self.model = hub.Module(
+ name="multi_languages_ocr_db_crnn",
+ lang="latin",
+ det=det,
+ rec=rec,
+ use_angle_cls=use_angle_cls,
+ enable_mkldnn=enable_mkldnn,
+ use_gpu=use_gpu,
+ box_thresh=box_thresh,
+ angle_classification_thresh=angle_classification_thresh)
+ self.model.name = self.name
+
+ def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
+ """
+ Get the text in the predicted images.
+ Args:
+ images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
+ paths (list[str]): The paths of images. If paths not images
+ output_dir (str): The directory to store output images.
+ visualization (bool): Whether to save image or not.
+ Returns:
+ res (list): The result of text detection box and save path of images.
+ """
+ all_results = self.model.recognize_text(
+ images=images, paths=paths, output_dir=output_dir, visualization=visualization)
+ return all_results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.recognize_text(images_decode, **kwargs)
+ return results
+
+ @runnable
+ def run_cmd(self, argvs):
+ """
+ Run as a command
+ """
+ results = self.model.run_cmd(argvs)
+ return results
+
+ def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
+ '''
+ Export the model to ONNX format.
+
+ Args:
+ dirname(str): The directory to save the onnx model.
+ input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
+ opset_version(int): operator set
+ '''
+ self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
diff --git a/modules/image/text_recognition/latin_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/latin_ocr_db_crnn_mobile/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3
--- /dev/null
+++ b/modules/image/text_recognition/latin_ocr_db_crnn_mobile/requirements.txt
@@ -0,0 +1,4 @@
+paddleocr>=2.3.0.2
+paddle2onnx>=0.9.0
+shapely
+pyclipper
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/README.md b/modules/image/text_recognition/multi_languages_ocr_db_crnn/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..b503731a85f602f6eb4bd6eed9ecf6fa7aa8a0ac
--- /dev/null
+++ b/modules/image/text_recognition/multi_languages_ocr_db_crnn/README.md
@@ -0,0 +1,223 @@
+# multi_languages_ocr_db_crnn
+
+|模型名称|multi_languages_ocr_db_crnn|
+| :--- | :---: |
+|类别|图像-文字识别|
+|网络|Differentiable Binarization+RCNN|
+|数据集|icdar2015数据集|
+|是否支持Fine-tuning|否|
+|最新更新日期|2021-11-24|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - multi_languages_ocr_db_crnn Module用于识别图片当中的文字。其基于PaddleOCR模块,检测得到文本框,识别文本框中的文字,再对检测文本框进行角度分类。最终检测算法采用DB(Differentiable Binarization),而识别文字算法则采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。
+ 该Module不仅提供了通用场景下的中英文模型,也提供了[80个语言](#语种缩写)的小语种模型。
+
+
+
+
+
+
+ - 更多详情参考:
+ - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
+ - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.2
+
+ - paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install multi_languages_ocr_db_crnn
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run multi_languages_ocr_db_crnn --input_path "/PATH/TO/IMAGE"
+ $ hub run multi_languages_ocr_db_crnn --input_path "/PATH/TO/IMAGE" --lang "ch" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
+ ```
+ - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ ocr = hub.Module(name="multi_languages_ocr_db_crnn", lang='en', enable_mkldnn=True) # mkldnn加速仅在CPU下有效
+ result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
+
+ # or
+ # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
+ ```
+ - multi_languages_ocr_db_crnn目前支持80个语种,可以通过修改lang参数进行切换,对于英文模型,指定lang=en,具体支持的[语种](#语种缩写)可查看表格。
+
+- ### 3、API
+
+ - ```python
+ def __init__(self,
+ lang="ch",
+ det=True, rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9)
+ ```
+
+ - 构造MultiLangOCR对象
+
+ - **参数**
+ - lang(str): 多语言模型选择。默认为中文模型,即lang="ch"。
+ - det(bool): 是否开启文字检测。默认为True。
+ - rec(bool): 是否开启文字识别。默认为True。
+ - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
+ - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - box\_thresh (float): 检测文本框置信度的阈值;
+ - angle_classification_thresh(float): 文本方向分类置信度的阈值
+
+
+ - ```python
+ def recognize_text(images=[],
+ paths=[],
+ output_dir='ocr_result',
+ visualization=False)
+ ```
+
+ - 预测API,检测输入图片中的所有文本的位置和识别文本结果。
+
+ - **参数**
+
+ - paths (list\[str\]): 图片的路径;
+ - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
+ - output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
+ - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
+
+ - **返回**
+
+ - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
+ - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
+ - text(str): 识别得到的文本
+ - confidence(float): 识别文本结果置信度
+ - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
+ - orientation(str): 分类的方向,仅在只有方向分类开启时输出
+ - score(float): 分类的得分,仅在只有方向分类开启时输出
+ - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
+
+
+## 四、服务部署
+
+- PaddleHub Serving 可以部署一个目标检测的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m multi_languages_ocr_db_crnn
+ ```
+
+ - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/multi_languages_ocr_db_crnn"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+
+## 五、支持语种及缩写
+
+| 语种 | 描述 | 缩写 | | 语种 | 描述 | 缩写 |
+| --- | --- | --- | ---|--- | --- | --- |
+|中文|chinese and english|ch| |保加利亚文|Bulgarian |bg|
+|英文|english|en| |乌克兰文|Ukranian|uk|
+|法文|french|fr| |白俄罗斯文|Belarusian|be|
+|德文|german|german| |泰卢固文|Telugu |te|
+|日文|japan|japan| | 阿巴扎文 | Abaza | abq |
+|韩文|korean|korean| |泰米尔文|Tamil |ta|
+|中文繁体|chinese traditional |chinese_cht| |南非荷兰文 |Afrikaans |af|
+|意大利文| Italian |it| |阿塞拜疆文 |Azerbaijani |az|
+|西班牙文|Spanish |es| |波斯尼亚文|Bosnian|bs|
+|葡萄牙文| Portuguese|pt| |捷克文|Czech|cs|
+|俄罗斯文|Russia|ru| |威尔士文 |Welsh |cy|
+|阿拉伯文|Arabic|ar| |丹麦文 |Danish|da|
+|印地文|Hindi|hi| |爱沙尼亚文 |Estonian |et|
+|维吾尔|Uyghur|ug| |爱尔兰文 |Irish |ga|
+|波斯文|Persian|fa| |克罗地亚文|Croatian |hr|
+|乌尔都文|Urdu|ur| |匈牙利文|Hungarian |hu|
+|塞尔维亚文(latin)| Serbian(latin) |rs_latin| |印尼文|Indonesian|id|
+|欧西坦文|Occitan |oc| |冰岛文 |Icelandic|is|
+|马拉地文|Marathi|mr| |库尔德文 |Kurdish|ku|
+|尼泊尔文|Nepali|ne| |立陶宛文|Lithuanian |lt|
+|塞尔维亚文(cyrillic)|Serbian(cyrillic)|rs_cyrillic| |拉脱维亚文 |Latvian |lv|
+|毛利文|Maori|mi| | 达尔瓦文|Dargwa |dar|
+|马来文 |Malay|ms| | 因古什文|Ingush |inh|
+|马耳他文 |Maltese |mt| | 拉克文|Lak |lbe|
+|荷兰文 |Dutch |nl| | 莱兹甘文|Lezghian |lez|
+|挪威文 |Norwegian |no| |塔巴萨兰文 |Tabassaran |tab|
+|波兰文|Polish |pl| | 比尔哈文|Bihari |bh|
+| 罗马尼亚文|Romanian |ro| | 迈蒂利文|Maithili |mai|
+| 斯洛伐克文|Slovak |sk| | 昂加文|Angika |ang|
+| 斯洛文尼亚文|Slovenian |sl| | 孟加拉文|Bhojpuri |bho|
+| 阿尔巴尼亚文|Albanian |sq| | 摩揭陀文 |Magahi |mah|
+| 瑞典文|Swedish |sv| | 那格浦尔文|Nagpur |sck|
+| 西瓦希里文|Swahili |sw| | 尼瓦尔文|Newari |new|
+| 塔加洛文|Tagalog |tl| | 保加利亚文 |Goan Konkani|gom|
+| 土耳其文|Turkish |tr| | 沙特阿拉伯文|Saudi Arabia|sa|
+| 乌兹别克文|Uzbek |uz| | 阿瓦尔文|Avar |ava|
+| 越南文|Vietnamese |vi| | 阿瓦尔文|Avar |ava|
+| 蒙古文|Mongolian |mn| | 阿迪赫文|Adyghe |ady|
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+ - ```shell
+ $ hub install multi_languages_ocr_db_crnn==1.0.0
+ ```
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/__init__.py b/modules/image/text_recognition/multi_languages_ocr_db_crnn/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/arabic.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/arabic.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..064b6041ee32814d852e084f639dae75d044d357
Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/arabic.ttf differ
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/cyrillic.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/cyrillic.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..be4bf6605808d15ab25c9cbbe1fda2a1d190ac8b
Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/cyrillic.ttf differ
diff --git a/modules/image/text_recognition/german_ocr_db_crnn_mobile/assets/german.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/french.ttf
similarity index 100%
rename from modules/image/text_recognition/german_ocr_db_crnn_mobile/assets/german.ttf
rename to modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/french.ttf
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/german.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/german.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..ab68fb197d4479b3b6dec6e85bd5cbaf433a87c5
Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/german.ttf differ
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/hindi.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/hindi.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..8b0c36f5868b935464f30883094b9556c3e41009
Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/hindi.ttf differ
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/kannada.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/kannada.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..43b60d423ad5ea5f5528c9c9e5d6f013f87fa1d7
Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/kannada.ttf differ
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/korean.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/korean.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..e638ce37f67ff1cd9babf73387786eaeb5c52968
Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/korean.ttf differ
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/latin.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/latin.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..e392413ac2f82905b3c07073669c3e2058d20235
Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/latin.ttf differ
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/marathi.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/marathi.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..a796d3edc6a4cc140a9360d0fc502a9d99352db0
Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/marathi.ttf differ
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/nepali.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/nepali.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..8b0c36f5868b935464f30883094b9556c3e41009
Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/nepali.ttf differ
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/persian.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/persian.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..bdb1c8d7402148127b7633c6b4cd1586e23745ab
Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/persian.ttf differ
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/simfang.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/simfang.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..2b59eae4195d1cdbea375503c0cc34d5631cb0f9
Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/simfang.ttf differ
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/spanish.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/spanish.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..532353d2778cd2bb37a5baf06f5daeea32729168
Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/spanish.ttf differ
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/tamil.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/tamil.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..2e9998e8d8218f1e868f06ba0db3e13b4620eed1
Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/tamil.ttf differ
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/telugu.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/telugu.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..12c91e41973a4704f52984e2089fdb2eaf1ed4a5
Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/telugu.ttf differ
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/urdu.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/urdu.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..625feee2e9616809c13e17eeb7da1aec58988b65
Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/urdu.ttf differ
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/uyghur.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/uyghur.ttf
new file mode 100644
index 0000000000000000000000000000000000000000..625feee2e9616809c13e17eeb7da1aec58988b65
Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/uyghur.ttf differ
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/module.py b/modules/image/text_recognition/multi_languages_ocr_db_crnn/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..2e598f80634e282e66a374f035df24a7f8201769
--- /dev/null
+++ b/modules/image/text_recognition/multi_languages_ocr_db_crnn/module.py
@@ -0,0 +1,220 @@
+import argparse
+import sys
+import os
+import ast
+
+import paddle
+import paddle2onnx
+import paddle2onnx as p2o
+import paddle.fluid as fluid
+from paddleocr import PaddleOCR
+from paddleocr.ppocr.utils.logging import get_logger
+from paddleocr.tools.infer.utility import base64_to_cv2
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+from .utils import read_images, save_result_image, mkdir
+
+
+@moduleinfo(
+ name="multi_languages_ocr_db_crnn",
+ version="1.0.0",
+ summary="ocr service",
+ author="PaddlePaddle",
+ type="cv/text_recognition")
+class MultiLangOCR:
+ def __init__(self,
+ lang="ch",
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9):
+ """
+ initialize with the necessary elements
+ Args:
+ lang(str): the selection of languages
+ det(bool): Whether to use text detector.
+ rec(bool): Whether to use text recognizer.
+ use_angle_cls(bool): Whether to use text orientation classifier.
+ enable_mkldnn(bool): Whether to enable mkldnn.
+ use_gpu (bool): Whether to use gpu.
+ box_thresh(float): the threshold of the detected text box's confidence
+ angle_classification_thresh(float): the threshold of the angle classification confidence
+ """
+ self.lang = lang
+ self.logger = get_logger()
+ argc = len(sys.argv)
+ if argc == 1 or argc > 1 and sys.argv[1] == 'serving':
+ self.det = det
+ self.rec = rec
+ self.use_angle_cls = use_angle_cls
+ self.engine = PaddleOCR(
+ lang=lang,
+ det=det,
+ rec=rec,
+ use_angle_cls=use_angle_cls,
+ enable_mkldnn=enable_mkldnn,
+ use_gpu=use_gpu,
+ det_db_box_thresh=box_thresh,
+ cls_thresh=angle_classification_thresh)
+ self.det_model_dir = self.engine.text_detector.args.det_model_dir
+ self.rec_model_dir = self.engine.text_detector.args.rec_model_dir
+ self.cls_model_dir = self.engine.text_detector.args.cls_model_dir
+
+ def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
+ """
+ Get the text in the predicted images.
+ Args:
+ images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
+ paths (list[str]): The paths of images. If paths not images
+ output_dir (str): The directory to store output images.
+ visualization (bool): Whether to save image or not.
+ Returns:
+ res (list): The result of text detection box and save path of images.
+ """
+
+ if images != [] and isinstance(images, list) and paths == []:
+ predicted_data = images
+ elif images == [] and isinstance(paths, list) and paths != []:
+ predicted_data = read_images(paths)
+ else:
+ raise TypeError("The input data is inconsistent with expectations.")
+
+ assert predicted_data != [], "There is not any image to be predicted. Please check the input data."
+ all_results = []
+ for img in predicted_data:
+ result = {'save_path': ''}
+ if img is None:
+ result['data'] = []
+ all_results.append(result)
+ continue
+ original_image = img.copy()
+ rec_results = self.engine.ocr(img, det=self.det, rec=self.rec, cls=self.use_angle_cls)
+ rec_res_final = []
+ for line in rec_results:
+ if self.det and self.rec:
+ boxes = line[0]
+ text, score = line[1]
+ rec_res_final.append({'text': text, 'confidence': float(score), 'text_box_position': boxes})
+ elif self.det and not self.rec:
+ boxes = line
+ rec_res_final.append({'text_box_position': boxes})
+ else:
+ if self.use_angle_cls and not self.rec:
+ orientation, score = line
+ rec_res_final.append({'orientation': orientation, 'score': float(score)})
+ else:
+ text, score = line
+ rec_res_final.append({'text': text, 'confidence': float(score)})
+
+ result['data'] = rec_res_final
+ if visualization and result['data']:
+ result['save_path'] = save_result_image(original_image, rec_results, output_dir, self.directory,
+ self.lang, self.det, self.rec, self.logger)
+
+ all_results.append(result)
+ return all_results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.recognize_text(images_decode, **kwargs)
+ return results
+
+ @runnable
+ def run_cmd(self, argvs):
+ """
+ Run as a command
+ """
+ parser = self.arg_parser()
+ args = parser.parse_args(argvs)
+ if args.lang is not None:
+ self.lang = args.lang
+ self.det = args.det
+ self.rec = args.rec
+ self.use_angle_cls = args.use_angle_cls
+ self.engine = PaddleOCR(
+ lang=self.lang,
+ det=args.det,
+ rec=args.rec,
+ use_angle_cls=args.use_angle_cls,
+ enable_mkldnn=args.enable_mkldnn,
+ use_gpu=args.use_gpu,
+ det_db_box_thresh=args.box_thresh,
+ cls_thresh=args.angle_classification_thresh)
+ results = self.recognize_text(
+ paths=[args.input_path], output_dir=args.output_dir, visualization=args.visualization)
+ return results
+
+ def arg_parser(self):
+ parser = argparse.ArgumentParser(
+ description="Run the %s module." % self.name,
+ prog='hub run %s' % self.name,
+ usage='%(prog)s',
+ add_help=True)
+
+ parser.add_argument('--input_path', type=str, default=None, help="diretory to image. Required.", required=True)
+ parser.add_argument('--use_gpu', type=ast.literal_eval, default=False, help="whether use GPU or not")
+ parser.add_argument('--output_dir', type=str, default='ocr_result', help="The directory to save output images.")
+ parser.add_argument(
+ '--visualization', type=ast.literal_eval, default=False, help="whether to save output as images.")
+ parser.add_argument('--lang', type=str, default=None, help="the selection of languages")
+ parser.add_argument('--det', type=ast.literal_eval, default=True, help="whether use text detector or not")
+ parser.add_argument('--rec', type=ast.literal_eval, default=True, help="whether use text recognizer or not")
+ parser.add_argument(
+ '--use_angle_cls', type=ast.literal_eval, default=False, help="whether text orientation classifier or not")
+ parser.add_argument('--enable_mkldnn', type=ast.literal_eval, default=False, help="whether use mkldnn or not")
+ parser.add_argument(
+ "--box_thresh", type=float, default=0.6, help="set the threshold of the detected text box's confidence")
+ parser.add_argument(
+ "--angle_classification_thresh",
+ type=float,
+ default=0.9,
+ help="set the threshold of the angle classification confidence")
+
+ return parser
+
+ def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
+ '''
+ Export the model to ONNX format.
+
+ Args:
+ dirname(str): The directory to save the onnx model.
+ input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
+ opset_version(int): operator set
+ '''
+ v0, v1, v2 = paddle2onnx.__version__.split('.')
+ if int(v1) < 9:
+ raise ImportError("paddle2onnx>=0.9.0 is required")
+
+ if input_shape_dict is not None and not isinstance(input_shape_dict, dict):
+ raise Exception("input_shape_dict should be dict, eg. {'x': [-1, 3, -1, -1]}.")
+
+ if opset_version <= 9:
+ raise Exception("opset_version <= 9 is not surpported, please try with higher opset_version >=10.")
+
+ path_dict = {"det": self.det_model_dir, "rec": self.rec_model_dir, "cls": self.cls_model_dir}
+ for (key, path) in path_dict.items():
+ model_filename = 'inference.pdmodel'
+ params_filename = 'inference.pdiparams'
+ save_file = os.path.join(dirname, '{}_{}.onnx'.format(self.name, key))
+
+ # convert model save with 'paddle.fluid.io.save_inference_model'
+ if hasattr(paddle, 'enable_static'):
+ paddle.enable_static()
+ exe = fluid.Executor(fluid.CPUPlace())
+ if model_filename is None and params_filename is None:
+ [program, feed_var_names, fetch_vars] = fluid.io.load_inference_model(path, exe)
+ else:
+ [program, feed_var_names, fetch_vars] = fluid.io.load_inference_model(
+ path, exe, model_filename=model_filename, params_filename=params_filename)
+
+ onnx_proto = p2o.run_convert(program, input_shape_dict=input_shape_dict, opset_version=opset_version)
+ mkdir(save_file)
+ with open(save_file, "wb") as f:
+ f.write(onnx_proto.SerializeToString())
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/requirements.txt b/modules/image/text_recognition/multi_languages_ocr_db_crnn/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3
--- /dev/null
+++ b/modules/image/text_recognition/multi_languages_ocr_db_crnn/requirements.txt
@@ -0,0 +1,4 @@
+paddleocr>=2.3.0.2
+paddle2onnx>=0.9.0
+shapely
+pyclipper
diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/utils.py b/modules/image/text_recognition/multi_languages_ocr_db_crnn/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..e64e791e5e4e62bc90f73ad0698403028bd9bf9b
--- /dev/null
+++ b/modules/image/text_recognition/multi_languages_ocr_db_crnn/utils.py
@@ -0,0 +1,100 @@
+import os
+import time
+
+import cv2
+import numpy as np
+from PIL import Image, ImageDraw
+
+from paddleocr import draw_ocr
+
+
+def save_result_image(original_image,
+ rec_results,
+ output_dir='ocr_result',
+ directory=None,
+ lang='ch',
+ det=True,
+ rec=True,
+ logger=None):
+ image = Image.fromarray(cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB))
+ if det and rec:
+ boxes = [line[0] for line in rec_results]
+ txts = [line[1][0] for line in rec_results]
+ scores = [line[1][1] for line in rec_results]
+ fonts_lang = 'fonts/simfang.ttf'
+ lang_fonts = {
+ 'korean': 'korean',
+ 'fr': 'french',
+ 'german': 'german',
+ 'hi': 'hindi',
+ 'ne': 'nepali',
+ 'fa': 'persian',
+ 'es': 'spanish',
+ 'ta': 'tamil',
+ 'te': 'telugu',
+ 'ur': 'urdu',
+ 'ug': 'uyghur',
+ }
+ if lang in lang_fonts.keys():
+ fonts_lang = 'fonts/' + lang_fonts[lang] + '.ttf'
+ font_file = os.path.join(directory, 'assets', fonts_lang)
+ im_show = draw_ocr(image, boxes, txts, scores, font_path=font_file)
+ elif det and not rec:
+ boxes = rec_results
+ im_show = draw_boxes(image, boxes)
+ im_show = np.array(im_show)
+ else:
+ logger.warning("only cls or rec not supported visualization.")
+ return ""
+
+ if not os.path.exists(output_dir):
+ os.makedirs(output_dir)
+
+ ext = get_image_ext(original_image)
+ saved_name = 'ndarray_{}{}'.format(time.time(), ext)
+ save_file_path = os.path.join(output_dir, saved_name)
+ im_show = Image.fromarray(im_show)
+ im_show.save(save_file_path)
+ return save_file_path
+
+
+def read_images(paths=[]):
+ images = []
+ for img_path in paths:
+ assert os.path.isfile(img_path), "The {} isn't a valid file.".format(img_path)
+ img = cv2.imread(img_path)
+ if img is None:
+ continue
+ images.append(img)
+ return images
+
+
+def draw_boxes(image, boxes, scores=None, drop_score=0.5):
+ img = image.copy()
+ draw = ImageDraw.Draw(img)
+ if scores is None:
+ scores = [1] * len(boxes)
+ for (box, score) in zip(boxes, scores):
+ if score < drop_score:
+ continue
+ draw.line([(box[0][0], box[0][1]), (box[1][0], box[1][1])], fill='red')
+ draw.line([(box[1][0], box[1][1]), (box[2][0], box[2][1])], fill='red')
+ draw.line([(box[2][0], box[2][1]), (box[3][0], box[3][1])], fill='red')
+ draw.line([(box[3][0], box[3][1]), (box[0][0], box[0][1])], fill='red')
+ draw.line([(box[0][0] - 1, box[0][1] + 1), (box[1][0] - 1, box[1][1] + 1)], fill='red')
+ draw.line([(box[1][0] - 1, box[1][1] + 1), (box[2][0] - 1, box[2][1] + 1)], fill='red')
+ draw.line([(box[2][0] - 1, box[2][1] + 1), (box[3][0] - 1, box[3][1] + 1)], fill='red')
+ draw.line([(box[3][0] - 1, box[3][1] + 1), (box[0][0] - 1, box[0][1] + 1)], fill='red')
+ return img
+
+
+def get_image_ext(image):
+ if image.shape[2] == 4:
+ return ".png"
+ return ".jpg"
+
+
+def mkdir(path):
+ sub_dir = os.path.dirname(path)
+ if not os.path.exists(sub_dir):
+ os.makedirs(sub_dir)
diff --git a/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..218bfaadff6fd5b43de3a9a79d8bab8b407a6237
--- /dev/null
+++ b/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/README.md
@@ -0,0 +1,165 @@
+# tamil_ocr_db_crnn_mobile
+
+|模型名称|tamil_ocr_db_crnn_mobile|
+| :--- | :---: |
+|类别|图像-文字识别|
+|网络|Differentiable Binarization+CRNN|
+|数据集|icdar2015数据集|
+|是否支持Fine-tuning|否|
+|最新更新日期|2021-12-2|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - tamil_ocr_db_crnn_mobile Module用于识别图片当中的泰米尔文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的泰米尔文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别泰米尔文的轻量级OCR模型,支持直接预测。
+
+ - 更多详情参考:
+ - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
+ - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.2
+
+ - paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install tamil_ocr_db_crnn_mobile
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run tamil_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
+ $ hub run tamil_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
+ ```
+ - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ ocr = hub.Module(name="tamil_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
+ result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
+
+ # or
+ # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+ - ```python
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9)
+ ```
+
+ - 构造TamilOCRDBCRNNMobile对象
+
+ - **参数**
+ - det(bool): 是否开启文字检测。默认为True。
+ - rec(bool): 是否开启文字识别。默认为True。
+ - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
+ - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - box\_thresh (float): 检测文本框置信度的阈值;
+ - angle_classification_thresh(float): 文本方向分类置信度的阈值
+
+
+ - ```python
+ def recognize_text(images=[],
+ paths=[],
+ output_dir='ocr_result',
+ visualization=False)
+ ```
+
+ - 预测API,检测输入图片中的所有文本的位置和识别文本结果。
+
+ - **参数**
+
+ - paths (list\[str\]): 图片的路径;
+ - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
+ - output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
+ - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
+
+ - **返回**
+
+ - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
+ - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
+ - text(str): 识别得到的文本
+ - confidence(float): 识别文本结果置信度
+ - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
+ - orientation(str): 分类的方向,仅在只有方向分类开启时输出
+ - score(float): 分类的得分,仅在只有方向分类开启时输出
+ - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
+
+
+## 四、服务部署
+
+- PaddleHub Serving 可以部署一个目标检测的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m tamil_ocr_db_crnn_mobile
+ ```
+
+ - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/tamil_ocr_db_crnn_mobile"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+ - ```shell
+ $ hub install tamil_ocr_db_crnn_mobile==1.0.0
+ ```
diff --git a/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..22321babd3812e3f39f9670b6aa6ce2a180a5a3f
--- /dev/null
+++ b/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/module.py
@@ -0,0 +1,87 @@
+import paddlehub as hub
+from paddleocr.ppocr.utils.logging import get_logger
+from paddleocr.tools.infer.utility import base64_to_cv2
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+
+@moduleinfo(
+ name="tamil_ocr_db_crnn_mobile",
+ version="1.0.0",
+ summary="ocr service",
+ author="PaddlePaddle",
+ type="cv/text_recognition")
+class TamilOCRDBCRNNMobile:
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9):
+ """
+ initialize with the necessary elements
+ Args:
+ det(bool): Whether to use text detector.
+ rec(bool): Whether to use text recognizer.
+ use_angle_cls(bool): Whether to use text orientation classifier.
+ enable_mkldnn(bool): Whether to enable mkldnn.
+ use_gpu (bool): Whether to use gpu.
+ box_thresh(float): the threshold of the detected text box's confidence
+ angle_classification_thresh(float): the threshold of the angle classification confidence
+ """
+ self.logger = get_logger()
+ self.model = hub.Module(
+ name="multi_languages_ocr_db_crnn",
+ lang="ta",
+ det=det,
+ rec=rec,
+ use_angle_cls=use_angle_cls,
+ enable_mkldnn=enable_mkldnn,
+ use_gpu=use_gpu,
+ box_thresh=box_thresh,
+ angle_classification_thresh=angle_classification_thresh)
+ self.model.name = self.name
+
+ def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
+ """
+ Get the text in the predicted images.
+ Args:
+ images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
+ paths (list[str]): The paths of images. If paths not images
+ output_dir (str): The directory to store output images.
+ visualization (bool): Whether to save image or not.
+ Returns:
+ res (list): The result of text detection box and save path of images.
+ """
+ all_results = self.model.recognize_text(
+ images=images, paths=paths, output_dir=output_dir, visualization=visualization)
+ return all_results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.recognize_text(images_decode, **kwargs)
+ return results
+
+ @runnable
+ def run_cmd(self, argvs):
+ """
+ Run as a command
+ """
+ results = self.model.run_cmd(argvs)
+ return results
+
+ def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
+ '''
+ Export the model to ONNX format.
+
+ Args:
+ dirname(str): The directory to save the onnx model.
+ input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
+ opset_version(int): operator set
+ '''
+ self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
diff --git a/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3
--- /dev/null
+++ b/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/requirements.txt
@@ -0,0 +1,4 @@
+paddleocr>=2.3.0.2
+paddle2onnx>=0.9.0
+shapely
+pyclipper
diff --git a/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..bcf56dfb90cabf06060bf972ccefabd062552973
--- /dev/null
+++ b/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/README.md
@@ -0,0 +1,165 @@
+# telugu_ocr_db_crnn_mobile
+
+|模型名称|telugu_ocr_db_crnn_mobile|
+| :--- | :---: |
+|类别|图像-文字识别|
+|网络|Differentiable Binarization+CRNN|
+|数据集|icdar2015数据集|
+|是否支持Fine-tuning|否|
+|最新更新日期|2021-12-2|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - telugu_ocr_db_crnn_mobile Module用于识别图片当中的泰卢固文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的泰卢固文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别泰卢固文的轻量级OCR模型,支持直接预测。
+
+ - 更多详情参考:
+ - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
+ - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.2
+
+ - paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install telugu_ocr_db_crnn_mobile
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run telugu_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
+ $ hub run telugu_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
+ ```
+ - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ ocr = hub.Module(name="telugu_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
+ result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
+
+ # or
+ # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+ - ```python
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9)
+ ```
+
+ - 构造TeluguOCRDBCRNNMobile对象
+
+ - **参数**
+ - det(bool): 是否开启文字检测。默认为True。
+ - rec(bool): 是否开启文字识别。默认为True。
+ - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
+ - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - box\_thresh (float): 检测文本框置信度的阈值;
+ - angle_classification_thresh(float): 文本方向分类置信度的阈值
+
+
+ - ```python
+ def recognize_text(images=[],
+ paths=[],
+ output_dir='ocr_result',
+ visualization=False)
+ ```
+
+ - 预测API,检测输入图片中的所有文本的位置和识别文本结果。
+
+ - **参数**
+
+ - paths (list\[str\]): 图片的路径;
+ - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
+ - output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
+ - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
+
+ - **返回**
+
+ - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
+ - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
+ - text(str): 识别得到的文本
+ - confidence(float): 识别文本结果置信度
+ - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
+ - orientation(str): 分类的方向,仅在只有方向分类开启时输出
+ - score(float): 分类的得分,仅在只有方向分类开启时输出
+ - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
+
+
+## 四、服务部署
+
+- PaddleHub Serving 可以部署一个目标检测的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m telugu_ocr_db_crnn_mobile
+ ```
+
+ - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/telugu_ocr_db_crnn_mobile"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+ - ```shell
+ $ hub install telugu_ocr_db_crnn_mobile==1.0.0
+ ```
diff --git a/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..7cfd283a93c300daa080077cb8369323364ee20a
--- /dev/null
+++ b/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/module.py
@@ -0,0 +1,87 @@
+import paddlehub as hub
+from paddleocr.ppocr.utils.logging import get_logger
+from paddleocr.tools.infer.utility import base64_to_cv2
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+
+@moduleinfo(
+ name="telugu_ocr_db_crnn_mobile",
+ version="1.0.0",
+ summary="ocr service",
+ author="PaddlePaddle",
+ type="cv/text_recognition")
+class TeluguOCRDBCRNNMobile:
+ def __init__(self,
+ det=True,
+ rec=True,
+ use_angle_cls=False,
+ enable_mkldnn=False,
+ use_gpu=False,
+ box_thresh=0.6,
+ angle_classification_thresh=0.9):
+ """
+ initialize with the necessary elements
+ Args:
+ det(bool): Whether to use text detector.
+ rec(bool): Whether to use text recognizer.
+ use_angle_cls(bool): Whether to use text orientation classifier.
+ enable_mkldnn(bool): Whether to enable mkldnn.
+ use_gpu (bool): Whether to use gpu.
+ box_thresh(float): the threshold of the detected text box's confidence
+ angle_classification_thresh(float): the threshold of the angle classification confidence
+ """
+ self.logger = get_logger()
+ self.model = hub.Module(
+ name="multi_languages_ocr_db_crnn",
+ lang="te",
+ det=det,
+ rec=rec,
+ use_angle_cls=use_angle_cls,
+ enable_mkldnn=enable_mkldnn,
+ use_gpu=use_gpu,
+ box_thresh=box_thresh,
+ angle_classification_thresh=angle_classification_thresh)
+ self.model.name = self.name
+
+ def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
+ """
+ Get the text in the predicted images.
+ Args:
+ images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
+ paths (list[str]): The paths of images. If paths not images
+ output_dir (str): The directory to store output images.
+ visualization (bool): Whether to save image or not.
+ Returns:
+ res (list): The result of text detection box and save path of images.
+ """
+ all_results = self.model.recognize_text(
+ images=images, paths=paths, output_dir=output_dir, visualization=visualization)
+ return all_results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.recognize_text(images_decode, **kwargs)
+ return results
+
+ @runnable
+ def run_cmd(self, argvs):
+ """
+ Run as a command
+ """
+ results = self.model.run_cmd(argvs)
+ return results
+
+ def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
+ '''
+ Export the model to ONNX format.
+
+ Args:
+ dirname(str): The directory to save the onnx model.
+ input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
+ opset_version(int): operator set
+ '''
+ self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
diff --git a/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3
--- /dev/null
+++ b/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/requirements.txt
@@ -0,0 +1,4 @@
+paddleocr>=2.3.0.2
+paddle2onnx>=0.9.0
+shapely
+pyclipper
diff --git a/modules/text/embedding/fasttext_crawl_target_word-word_dim300_en/README_en.md b/modules/text/embedding/fasttext_crawl_target_word-word_dim300_en/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..d199dcb21f62a053eb1c60a3e40b36b67faf466b
--- /dev/null
+++ b/modules/text/embedding/fasttext_crawl_target_word-word_dim300_en/README_en.md
@@ -0,0 +1,178 @@
+# fasttext_crawl_target_word-word_dim300_en
+|Module Name|fasttext_crawl_target_word-word_dim300_en|
+| :--- | :---: |
+|Category|Word Embedding|
+|Network|fasttext|
+|Dataset|crawl|
+|Fine-tuning supported|No|
+|Module Size|1.19GB|
+|Vocab Size|2,000,002|
+|Last update date|26 Feb, 2021|
+|Data Indicators|-|
+
+## I. Basic Information
+
+- ### Module Introduction
+
+ - PaddleHub provides several open source pretrained word embedding models. These embedding models are distinguished by the corpus, training methods and word embedding dimensions. For more informations, please refer to: [Summary of embedding models](https://github.com/PaddlePaddle/models/blob/release/2.0-beta/PaddleNLP/docs/embeddings.md)
+
+## II. Installation
+
+- ### 1. Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0 | [PaddleHub Installation Guide](../../../../docs/docs_ch/get_start/installation_en.rst)
+
+- ### 2. Installation
+
+ - ```shell
+ $ hub install fasttext_crawl_target_word-word_dim300_en
+ ```
+
+ - In case of any problems during installation, please refer to: [Windows_Quickstart](../../../../docs/docs_ch/get_start/windows_quickstart_en.md) | [Linux_Quickstart](../../../../docs/docs_ch/get_start/linux_quickstart_en.md) | [Mac_Quickstart](../../../../docs/docs_ch/get_start/mac_quickstart_en.md)
+
+## III. Module API Prediction
+
+- ### 1. Prediction Code Example
+
+ - ```
+ import paddlehub as hub
+ embedding = hub.Module(name='fasttext_crawl_target_word-word_dim300_en')
+
+ # Get the embedding of the word
+ embedding.search("中国")
+ # Calculate the cosine similarity of two word vectors
+ embedding.cosine_sim("中国", "美国")
+ # Calculate the inner product of two word vectors
+ embedding.dot("中国", "美国")
+ ```
+
+- ### 2、API
+
+ - ```python
+ def __init__(
+ *args,
+ **kwargs
+ )
+ ```
+
+ - Construct an embedding module object without parameters by default.
+
+ - **Parameters**
+ - `*args`: Arguments specified by the user.
+ - `**kwargs`:Keyword arguments specified by the user.
+
+ - More info[paddlenlp.embeddings](https://github.com/PaddlePaddle/models/tree/release/2.0-beta/PaddleNLP/paddlenlp/embeddings)
+
+
+ - ```python
+ def search(
+ words: Union[List[str], str, int],
+ )
+ ```
+
+ - Return the embedding of one or multiple words. The input data type can be `str`, `List[str]` and `int`, represent word, multiple words and the embedding of specified word id accordingly. Word id is related to the model vocab, vocab can be obtained by the attribute of `vocab`.
+
+ - **参数**
+ - `words`: input words or word id.
+
+
+ - ```python
+ def cosine_sim(
+ word_a: str,
+ word_b: str,
+ )
+ ```
+
+ - Cosine similarity calculation. `word_a` and `word_b` should be in the voacb, or they will be replaced by `unknown_token`.
+
+ - **参数**
+ - `word_a`: input word a.
+ - `word_b`: input word b.
+
+
+ - ```python
+ def dot(
+ word_a: str,
+ word_b: str,
+ )
+ ```
+
+ - Inner product calculation. `word_a` and `word_b` should be in the voacb, or they will be replaced by `unknown_token`.
+
+ - **参数**
+ - `word_a`: input word a.
+ - `word_b`: input word b.
+
+
+ - ```python
+ def get_vocab_path()
+ ```
+
+ - Get the path of the local vocab file.
+
+
+ - ```python
+ def get_tokenizer(*args, **kwargs)
+ ```
+
+ - Get the tokenizer of current model, it will return an instance of JiebaTokenizer, only supports the chinese embedding models currently.
+
+ - **参数**
+ - `*args`: Arguments specified by the user.
+ - `**kwargs`: Keyword arguments specified by the user.
+
+ - For more information about the arguments, please refer to[paddlenlp.data.tokenizer.JiebaTokenizer](https://github.com/PaddlePaddle/models/blob/release/2.0-beta/PaddleNLP/paddlenlp/data/tokenizer.py)
+
+ - For more information about the usage, please refer to[paddlenlp.embeddings](https://github.com/PaddlePaddle/models/tree/release/2.0-beta/PaddleNLP/paddlenlp/embeddings)
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of cosine similarity calculation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m fasttext_crawl_target_word-word_dim300_en
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set `CUDA_VISIBLE_DEVICES` environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+ - ```python
+ import requests
+ import json
+
+ # Specify the word pairs used to calculate the cosine similarity [[word_a, word_b], [word_a, word_b], ... ]]
+ word_pairs = [["中国", "美国"], ["今天", "明天"]]
+ data = {"data": word_pairs}
+ # Send an HTTP request
+ url = "http://127.0.0.1:8866/predict/fasttext_crawl_target_word-word_dim300_en"
+ headers = {"Content-Type": "application/json"}
+
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ print(r.json())
+ ```
+
+
+## V. Release Note
+
+* 1.0.0
+
+ First release
+
+* 1.0.1
+
+ Model optimization
+ - ```shell
+ $ hub install fasttext_crawl_target_word-word_dim300_en==1.0.1
+ ```
\ No newline at end of file
diff --git a/modules/text/language_model/albert-base-v1/README.md b/modules/text/language_model/albert-base-v1/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..abef64ad567a5f1446e1a7286298d18d8049045b
--- /dev/null
+++ b/modules/text/language_model/albert-base-v1/README.md
@@ -0,0 +1,173 @@
+# albert-base-v1
+|模型名称|albert-base-v1|
+| :--- | :---: |
+|类别|文本-语义模型|
+|网络|albert-base-v1|
+|数据集|-|
+|是否支持Fine-tuning|是|
+|模型大小|90MB|
+|最新更新日期|2022-02-08|
+|数据指标|-|
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - ALBERT针对当前预训练模型参数量过大的问题,提出了以下改进方案:
+
+ - 嵌入向量参数化的因式分解。ALBERT对词嵌入参数进行了因式分解,先将单词映射到一个低维的词嵌入空间E,然后再将其映射到高维的隐藏空间H。
+
+ - 跨层参数共享。ALBERT共享了层之间的全部参数。
+
+更多详情请参考[ALBERT论文](https://arxiv.org/abs/1909.11942)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install albert-base-v1
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、预测代码示例
+
+```python
+import paddlehub as hub
+
+data = [
+ ['这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般'],
+ ['怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片'],
+ ['作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。'],
+]
+label_map = {0: 'negative', 1: 'positive'}
+
+model = hub.Module(
+ name='albert-base-v1',
+ version='1.0.0',
+ task='seq-cls',
+ load_checkpoint='/path/to/parameters',
+ label_map=label_map)
+results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
+for idx, text in enumerate(data):
+ print('Data: {} \t Label: {}'.format(text, results[idx]))
+```
+
+详情可参考PaddleHub示例:
+- [文本分类](../../../../demo/text_classification)
+- [序列标注](../../../../demo/sequence_labeling)
+
+- ### 2、API
+
+ - ```python
+ def __init__(
+ task=None,
+ load_checkpoint=None,
+ label_map=None,
+ num_classes=2,
+ suffix=False,
+ **kwargs,
+ )
+ ```
+
+ - 创建Module对象(动态图组网版本)
+
+ - **参数**
+
+ - `task`: 任务名称,可为`seq-cls`(文本分类任务)或`token-cls`(序列标注任务)。
+ - `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
+ - `label_map`:预测时的类别映射表。
+ - `num_classes`:分类任务的类别数,如果指定了`label_map`,此参数可不传,默认2分类。
+ - `suffix`: 序列标注任务的标签格式,如果设定为`True`,标签以'-B', '-I', '-E' 或者 '-S'为结尾,此参数默认为`False`。
+ - `**kwargs`:用户额外指定的关键字字典类型的参数。
+
+ - ```python
+ def predict(
+ data,
+ max_seq_len=128,
+ batch_size=1,
+ use_gpu=False
+ )
+ ```
+
+ - **参数**
+
+ - `data`: 待预测数据,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。每个样例文本数量(1个或者2个)需和训练时保持一致。
+ - `max_seq_len`:模型处理文本的最大长度
+ - `batch_size`:模型批处理大小
+ - `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
+
+ - **返回**
+
+ - `results`:list类型,不同任务类型的返回结果如下
+ - 文本分类:列表里包含每个句子的预测标签,格式为\[label\_1, label\_2, …,\]
+ - 序列标注:列表里包含每个句子每个token的预测标签,格式为\[\[token\_1, token\_2, …,\], \[token\_1, token\_2, …,\], …,\]
+
+ - ```python
+ def get_embedding(
+ data,
+ use_gpu=False
+ )
+ ```
+
+ - 用于获取输入文本的句子粒度特征与字粒度特征
+
+ - **参数**
+
+ - `data`:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
+ - `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
+
+ - **返回**
+
+ - `results`:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线获取预训练词向量。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - ```shell
+ $ hub serving start -m albert-base-v1
+ ```
+
+ - 这样就完成了一个获取预训练词向量服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+
+ # 指定用于获取embedding的文本[[text_1], [text_2], ... ]}
+ text = [["今天是个好日子"], ["天气预报说今天要下雨"]]
+ # 以key的方式指定text传入预测方法的时的参数,此例中为"data"
+ # 对应本地部署,则为module.get_embedding(data=text)
+ data = {"data": text}
+ # 发送post请求,content-type类型应指定json方式,url中的ip地址需改为对应机器的ip
+ url = "http://127.0.0.1:8866/predict/albert-base-v1"
+ # 指定post请求的headers为application/json方式
+ headers = {"Content-Type": "application/json"}
+
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ print(r.json())
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/text/language_model/albert-base-v1/__init__.py b/modules/text/language_model/albert-base-v1/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/text/language_model/albert-base-v1/module.py b/modules/text/language_model/albert-base-v1/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..b04b2a023566676420a6346d289440360a454766
--- /dev/null
+++ b/modules/text/language_model/albert-base-v1/module.py
@@ -0,0 +1,177 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import math
+import os
+from typing import Dict
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlenlp.metrics import ChunkEvaluator
+from paddlenlp.transformers.albert.modeling import AlbertForSequenceClassification
+from paddlenlp.transformers.albert.modeling import AlbertForTokenClassification
+from paddlenlp.transformers.albert.modeling import AlbertModel
+from paddlenlp.transformers.albert.tokenizer import AlbertTokenizer
+
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.nlp_module import TransformerModule
+from paddlehub.utils.log import logger
+
+
+@moduleinfo(name="albert-base-v1",
+ version="1.0.0",
+ summary="",
+ author="Baidu",
+ author_email="",
+ type="nlp/semantic_model",
+ meta=TransformerModule)
+class Albert(nn.Layer):
+ """
+ ALBERT model
+ """
+
+ def __init__(
+ self,
+ task: str = None,
+ load_checkpoint: str = None,
+ label_map: Dict = None,
+ num_classes: int = 2,
+ suffix: bool = False,
+ **kwargs,
+ ):
+ super(Albert, self).__init__()
+ if label_map:
+ self.label_map = label_map
+ self.num_classes = len(label_map)
+ else:
+ self.num_classes = num_classes
+
+ if task == 'sequence_classification':
+ task = 'seq-cls'
+ logger.warning(
+ "current task name 'sequence_classification' was renamed to 'seq-cls', "
+ "'sequence_classification' has been deprecated and will be removed in the future.", )
+ if task == 'seq-cls':
+ self.model = AlbertForSequenceClassification.from_pretrained(pretrained_model_name_or_path='albert-base-v1',
+ num_classes=self.num_classes,
+ **kwargs)
+ self.criterion = paddle.nn.loss.CrossEntropyLoss()
+ self.metric = paddle.metric.Accuracy()
+ elif task == 'token-cls':
+ self.model = AlbertForTokenClassification.from_pretrained(pretrained_model_name_or_path='albert-base-v1',
+ num_classes=self.num_classes,
+ **kwargs)
+ self.criterion = paddle.nn.loss.CrossEntropyLoss()
+ self.metric = ChunkEvaluator(label_list=[self.label_map[i] for i in sorted(self.label_map.keys())],
+ suffix=suffix)
+ elif task == 'text-matching':
+ self.model = AlbertModel.from_pretrained(pretrained_model_name_or_path='albert-base-v1', **kwargs)
+ self.dropout = paddle.nn.Dropout(0.1)
+ self.classifier = paddle.nn.Linear(self.model.config['hidden_size'] * 3, 2)
+ self.criterion = paddle.nn.loss.CrossEntropyLoss()
+ self.metric = paddle.metric.Accuracy()
+ elif task is None:
+ self.model = AlbertModel.from_pretrained(pretrained_model_name_or_path='albert-base-v1', **kwargs)
+ else:
+ raise RuntimeError("Unknown task {}, task should be one in {}".format(task, self._tasks_supported))
+
+ self.task = task
+
+ if load_checkpoint is not None and os.path.isfile(load_checkpoint):
+ state_dict = paddle.load(load_checkpoint)
+ self.set_state_dict(state_dict)
+ logger.info('Loaded parameters from %s' % os.path.abspath(load_checkpoint))
+
+ def forward(self,
+ input_ids=None,
+ token_type_ids=None,
+ position_ids=None,
+ attention_mask=None,
+ query_input_ids=None,
+ query_token_type_ids=None,
+ query_position_ids=None,
+ query_attention_mask=None,
+ title_input_ids=None,
+ title_token_type_ids=None,
+ title_position_ids=None,
+ title_attention_mask=None,
+ seq_lengths=None,
+ labels=None):
+
+ if self.task != 'text-matching':
+ result = self.model(input_ids, token_type_ids, position_ids, attention_mask)
+ else:
+ query_result = self.model(query_input_ids, query_token_type_ids, query_position_ids, query_attention_mask)
+ title_result = self.model(title_input_ids, title_token_type_ids, title_position_ids, title_attention_mask)
+
+ if self.task == 'seq-cls':
+ logits = result
+ probs = F.softmax(logits, axis=1)
+ if labels is not None:
+ loss = self.criterion(logits, labels)
+ correct = self.metric.compute(probs, labels)
+ acc = self.metric.update(correct)
+ return probs, loss, {'acc': acc}
+ return probs
+ elif self.task == 'token-cls':
+ logits = result
+ token_level_probs = F.softmax(logits, axis=-1)
+ preds = token_level_probs.argmax(axis=-1)
+ if labels is not None:
+ loss = self.criterion(logits, labels.unsqueeze(-1))
+ num_infer_chunks, num_label_chunks, num_correct_chunks = \
+ self.metric.compute(None, seq_lengths, preds, labels)
+ self.metric.update(num_infer_chunks.numpy(), num_label_chunks.numpy(), num_correct_chunks.numpy())
+ _, _, f1_score = map(float, self.metric.accumulate())
+ return token_level_probs, loss, {'f1_score': f1_score}
+ return token_level_probs
+ elif self.task == 'text-matching':
+ query_token_embedding, _ = query_result
+ query_token_embedding = self.dropout(query_token_embedding)
+ query_attention_mask = paddle.unsqueeze(
+ (query_input_ids != self.model.pad_token_id).astype(self.model.pooler.dense.weight.dtype), axis=2)
+ query_token_embedding = query_token_embedding * query_attention_mask
+ query_sum_embedding = paddle.sum(query_token_embedding, axis=1)
+ query_sum_mask = paddle.sum(query_attention_mask, axis=1)
+ query_mean = query_sum_embedding / query_sum_mask
+
+ title_token_embedding, _ = title_result
+ title_token_embedding = self.dropout(title_token_embedding)
+ title_attention_mask = paddle.unsqueeze(
+ (title_input_ids != self.model.pad_token_id).astype(self.model.pooler.dense.weight.dtype), axis=2)
+ title_token_embedding = title_token_embedding * title_attention_mask
+ title_sum_embedding = paddle.sum(title_token_embedding, axis=1)
+ title_sum_mask = paddle.sum(title_attention_mask, axis=1)
+ title_mean = title_sum_embedding / title_sum_mask
+
+ sub = paddle.abs(paddle.subtract(query_mean, title_mean))
+ projection = paddle.concat([query_mean, title_mean, sub], axis=-1)
+ logits = self.classifier(projection)
+ probs = F.softmax(logits)
+ if labels is not None:
+ loss = self.criterion(logits, labels)
+ correct = self.metric.compute(probs, labels)
+ acc = self.metric.update(correct)
+ return probs, loss, {'acc': acc}
+ return probs
+ else:
+ sequence_output, pooled_output = result
+ return sequence_output, pooled_output
+
+ @staticmethod
+ def get_tokenizer(*args, **kwargs):
+ """
+ Gets the tokenizer that is customized for this module.
+ """
+ return AlbertTokenizer.from_pretrained(pretrained_model_name_or_path='albert-base-v1', *args, **kwargs)
diff --git a/modules/text/punctuation_restoration/auto_punc/README.md b/modules/text/punctuation_restoration/auto_punc/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..b107574afe79ee4ad14da33a4a0af331cc9531b0
--- /dev/null
+++ b/modules/text/punctuation_restoration/auto_punc/README.md
@@ -0,0 +1,150 @@
+# auto_punc
+
+|模型名称|auto_punc|
+| :--- | :---: |
+|类别|文本-标点恢复|
+|网络|Ernie-1.0|
+|数据集|WuDaoCorpora 2.0|
+|是否支持Fine-tuning|否|
+|模型大小|568MB|
+|最新更新日期|2021-12-24|
+|数据指标|-|
+
+## 一、模型基本信息
+
+### 模型介绍
+
+Ernie是百度提出的基于知识增强的持续学习语义理解模型,该模型将大数据预训练与多源丰富知识相结合,通过持续学习技术,不断吸收海量文本数据中词汇、结构、语义等方面的知识,实现模型效果不断进化。
+
+["悟道"文本数据集](https://ks3-cn-beijing.ksyun.com/resources/WuDaoCorpora/WuDaoCorpora__A_Super_Large_scale_Chinese_Corporafor_Pre_training_Language_Models.pdf)
+采用20多种规则从100TB原始网页数据中清洗得出最终数据集,注重隐私数据信息的去除,源头上避免GPT-3存在的隐私泄露风险;包含教育、科技等50+个行业数据标签,可以支持多领域预训练模型的训练。
+- 数据总量:3TB
+- 数据格式:json
+- 开源数量:200GB
+- 数据集下载:https://resource.wudaoai.cn/
+- 日期:2021年12月23日
+
+auto_punc采用了Ernie1.0预训练模型,在[WuDaoCorpora 2.0](https://resource.wudaoai.cn/home)的200G开源文本数据集上进行了标点恢复任务的训练,模型可直接用于预测,对输入的对中文文本自动添加7种标点符号:逗号(,)、句号(。)、感叹号(!)、问号(?)、顿号(、)、冒号(:)和分号(;)。
+
+
+
+
+
+
+
+
+
+
+更多详情请参考
+- [WuDaoCorpora: A Super Large-scale Chinese Corpora for Pre-training Language Models](https://ks3-cn-beijing.ksyun.com/resources/WuDaoCorpora/WuDaoCorpora__A_Super_Large_scale_Chinese_Corporafor_Pre_training_Language_Models.pdf)
+- [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223)
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.1.0
+
+ - paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install auto_punc
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、预测代码示例
+
+ ```python
+ import paddlehub as hub
+
+ model = hub.Module(
+ name='auto_punc',
+ version='1.0.0')
+
+ texts = [
+ '今天的天气真好啊你下午有空吗我想约你一起去逛街',
+ '我最喜欢的诗句是先天下之忧而忧后天下之乐而乐',
+ ]
+ punc_texts = model.add_puncs(texts)
+ print(punc_texts)
+ # ['我最喜欢的诗句是:先天下之忧而忧,后天下之乐而乐。', '今天的天气真好啊!你下午有空吗?我想约你一起去逛街。']
+ ```
+
+- ### 2、API
+ - ```python
+ def add_puncs(
+ texts: Union[str, List[str]],
+ max_length=256,
+ device='cpu'
+ )
+ ```
+ - 对输入的中文文本自动添加标点符号。
+
+ - **参数**
+
+ - `texts`:输入的中文文本,可为str或List[str]类型,预测时,中英文和数字以外的字符将会被删除。
+ - `max_length`:模型预测时输入的最大长度,超过时文本会被截断,默认为256。
+ - `device`:预测时使用的设备,默认为`cpu`,如需使用gpu预测,请设置为`gpu`。
+
+ - **返回**
+
+ - `punc_texts`:List[str]类型,返回添加标点后的文本列表。
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线的文本标点添加的服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - ```shell
+ $ hub serving start -m auto_punc
+ ```
+
+ - 这样就完成了一个文本标点添加服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+
+ # 输入的中文文本,中英文和数字之外的字符在模型预测前会被删除
+ texts = [
+ '今天的天气真好啊你下午有空吗我想约你一起去逛街',
+ '我最喜欢的诗句是先天下之忧而忧后天下之乐而乐',
+ ]
+
+ # 以key的方式指定text传入预测方法的时的参数,此例中为"texts"
+ data = {"texts": texts}
+
+ # 发送post请求,content-type类型应指定json方式,url中的ip地址需改为对应机器的ip
+ url = "http://127.0.0.1:8866/predict/auto_punc"
+
+ # 指定post请求的headers为application/json方式
+ headers = {"Content-Type": "application/json"}
+
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ print(r.json())
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ ```shell
+ $ hub install auto_punc
+ ```
diff --git a/modules/text/punctuation_restoration/auto_punc/__init__.py b/modules/text/punctuation_restoration/auto_punc/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/text/punctuation_restoration/auto_punc/module.py b/modules/text/punctuation_restoration/auto_punc/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..8a07812771735c4fa46bcc770f172d9eb0304078
--- /dev/null
+++ b/modules/text/punctuation_restoration/auto_punc/module.py
@@ -0,0 +1,127 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import re
+from typing import List, Union
+
+import numpy as np
+import paddle
+from paddlehub.env import MODULE_HOME
+from paddlehub.module.module import moduleinfo, serving
+from paddlehub.utils.log import logger
+from paddlenlp.transformers import ErnieTokenizer, ErnieForTokenClassification
+from paddlenlp.data import Pad
+
+
+@moduleinfo(
+ name="auto_punc",
+ version="1.0.0",
+ summary="",
+ author="KPatrick",
+ author_email="",
+ type="text/punctuation_restoration")
+class Ernie(paddle.nn.Layer):
+ def __init__(self):
+ super(Ernie, self).__init__()
+ res_dir = os.path.join(MODULE_HOME, 'auto_punc')
+ punc_vocab_file = os.path.join(res_dir, 'assets', 'punc_vocab.txt')
+ ckpt_dir = os.path.join(res_dir, 'assets', 'ckpt')
+
+ self.punc_vocab = self._load_dict(punc_vocab_file)
+ self.punc_list = list(self.punc_vocab.keys())
+ self.model = ErnieForTokenClassification.from_pretrained(ckpt_dir)
+ self.model.eval()
+ self.tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
+
+ @staticmethod
+ def _load_dict(dict_path):
+ vocab = {}
+ i = 0
+ with open(dict_path, 'r', encoding='utf-8') as fin:
+ for line in fin:
+ key = line.strip('\n')
+ vocab[key] = i
+ i += 1
+ return vocab
+
+ @staticmethod
+ def _clean_text(text, punc_list):
+ text = text.lower()
+ text = re.sub('[^A-Za-z0-9\u4e00-\u9fa5]', '', text)
+ text = re.sub(f'[{"".join([p for p in punc_list][1:])}]', '', text)
+ return text
+
+ def forward(self, text: str):
+ wav = None
+ input_ids = self.frontend.get_input_ids(text, merge_sentences=True)
+ phone_ids = input_ids["phone_ids"]
+ for part_phone_ids in phone_ids:
+ with paddle.no_grad():
+ mel = self.fastspeech2_inference(part_phone_ids)
+ temp_wav = self.pwg_inference(mel)
+ if wav is None:
+ wav = temp_wav
+ else:
+ wav = paddle.concat([wav, temp_wav])
+ return wav
+
+ @serving
+ def add_puncs(self, texts: Union[str, List[str]], max_length=256, device='cpu'):
+ assert isinstance(texts, str) or (isinstance(texts, list) and isinstance(texts[0], str)), \
+ 'Input data should be str or List[str], but got {}'.format(type(texts))
+
+ if isinstance(texts, str):
+ texts = [texts]
+
+ input_ids = []
+ seg_ids = []
+ seq_len = []
+ for i in range(len(texts)):
+ clean_text = self._clean_text(texts[i], self.punc_list)
+ assert len(clean_text) > 0, f'Invalid input string: {texts[i]}'
+
+ tokenized_input = self.tokenizer(
+ list(clean_text), return_length=True, is_split_into_words=True, max_seq_len=max_length)
+
+ input_ids.append(tokenized_input['input_ids'])
+ seg_ids.append(tokenized_input['token_type_ids'])
+ seq_len.append(tokenized_input['seq_len'])
+
+ paddle.set_device(device)
+ with paddle.no_grad():
+ pad_func_for_input_ids = Pad(axis=0, pad_val=self.tokenizer.pad_token_id, dtype='int64')
+ pad_func_for_seg_ids = Pad(axis=0, pad_val=self.tokenizer.pad_token_type_id, dtype='int64')
+ input_ids = paddle.to_tensor(pad_func_for_input_ids(input_ids))
+ seg_ids = paddle.to_tensor(pad_func_for_seg_ids(seg_ids))
+ logits = self.model(input_ids, seg_ids)
+ preds = paddle.argmax(logits, axis=-1)
+
+ tokens = []
+ labels = []
+ for i in range(len(input_ids)):
+ tokens.append(self.tokenizer.convert_ids_to_tokens(input_ids[i, 1:seq_len[i] - 1].tolist()))
+ labels.append(preds[i, 1:seq_len[i] - 1].tolist()) # Remove predictions of special tokens.
+
+ punc_texts = []
+ for token, label in zip(tokens, labels):
+ assert len(token) == len(label)
+ text = ''
+ for t, l in zip(token, label):
+ text += t
+ if l != 0: # Non punc.
+ text += self.punc_list[l]
+ punc_texts.append(text)
+
+ return punc_texts
diff --git a/modules/text/sentiment_analysis/ernie_skep_sentiment_analysis/README.md b/modules/text/sentiment_analysis/ernie_skep_sentiment_analysis/README.md
index 92361ed695c49d5108b9ea9690eb690eca3446e5..dc23a5dfbf0aa4d564f4e658fa53171b8e27f1de 100644
--- a/modules/text/sentiment_analysis/ernie_skep_sentiment_analysis/README.md
+++ b/modules/text/sentiment_analysis/ernie_skep_sentiment_analysis/README.md
@@ -65,7 +65,6 @@
for result in results:
print(result['text'])
print(result['sentiment_label'])
- print(result['sentiment_key'])
print(result['positive_probs'])
print(result['negative_probs'])
diff --git a/modules/text/sentiment_analysis/ernie_skep_sentiment_analysis/module.py b/modules/text/sentiment_analysis/ernie_skep_sentiment_analysis/module.py
index e7021284cd76b2d4639b4ef8481ab32e16ea91df..e30d80fc2984e6592b662353629c3a68f8767380 100644
--- a/modules/text/sentiment_analysis/ernie_skep_sentiment_analysis/module.py
+++ b/modules/text/sentiment_analysis/ernie_skep_sentiment_analysis/module.py
@@ -139,14 +139,29 @@ class ErnieSkepSentimentAnalysis(TransformerModule):
)
results = []
+ feature_list = []
for text in texts:
+ # feature.shape: [1, 512, 1]
+ # batch on the first dimension
feature = self._convert_text_to_feature(text)
- inputs = [self.array2tensor(ndarray) for ndarray in feature]
- output = self.predictor.run(inputs)
- probilities = np.array(output[0].data.float_data())
+ feature_list.append(feature)
+
+ feature_batch = [
+ np.concatenate([feature[0] for feature in feature_list], axis=0),
+ np.concatenate([feature[1] for feature in feature_list], axis=0),
+ np.concatenate([feature[2] for feature in feature_list], axis=0),
+ np.concatenate([feature[3] for feature in feature_list], axis=0),
+ np.concatenate([feature[4] for feature in feature_list], axis=0),
+ ]
+
+ inputs = [self.array2tensor(ndarray) for ndarray in feature_batch]
+ output = self.predictor.run(inputs)
+ probilities_list = np.array(output[0].data.float_data())
+ probilities_list = probilities_list.reshape((-1, 2))
+ for i, probilities in enumerate(probilities_list):
label = self.label_map[np.argmax(probilities)]
result = {
- 'text': text,
+ 'text': texts[i],
'sentiment_label': label,
'positive_probs': probilities[1],
'negative_probs': probilities[0]
diff --git a/modules/text/sentiment_analysis/senta_bilstm/README_en.md b/modules/text/sentiment_analysis/senta_bilstm/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..ae7ca125aedb351dc9a01051cbe693015cc3641b
--- /dev/null
+++ b/modules/text/sentiment_analysis/senta_bilstm/README_en.md
@@ -0,0 +1,190 @@
+# senta_bilstm
+
+| Module Name | senta_bilstm |
+| :------------------ | :------------: |
+| Category | text-sentiment_analysis |
+| Network | BiLSTM |
+| Dataset | Dataset built by Baidu |
+| Fine-tuning supported or not | No |
+| Module Size | 690M |
+| Latest update date | 2021-02-26 |
+| Data indicators | - |
+
+
+## I. Basic Information of Module
+
+- ### Module Introduction
+
+ - Sentiment Classification (Senta for short) can automatically judge the emotional polarity category of Chinese texts with subjective description and give corresponding confidence, which can help enterprises understand users' consumption habits, analyze hot topics and crisis public opinion monitoring, and provide favorable decision support for enterprises. The model is based on a bidirectional LSTM structure, with positive and negative emotion types.
+
+
+
+## II. Installation
+
+- ### 1、Environmental dependence
+
+ - paddlepaddle >= 1.8.0
+
+ - paddlehub >= 1.8.0 | [How to install PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install senta_bilstm
+ ```
+ - If you have problems during installation, please refer to:[windows_quickstart](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [linux_quickstart](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [mac_quickstart](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+## III. Module API and Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run senta_bilstm --input_text "这家餐厅很好吃"
+ ```
+ or
+ - ```shell
+ $ hub run senta_bilstm --input_file test.txt
+ ```
+ - test.txt stores the text to be predicted, for example:
+
+ > 这家餐厅很好吃
+
+ > 这部电影真的很差劲
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command line instruction](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+
+ senta = hub.Module(name="senta_bilstm")
+ test_text = ["这家餐厅很好吃", "这部电影真的很差劲"]
+ results = senta.sentiment_classify(texts=test_text,
+ use_gpu=False,
+ batch_size=1)
+
+ for result in results:
+ print(result['text'])
+ print(result['sentiment_label'])
+ print(result['sentiment_key'])
+ print(result['positive_probs'])
+ print(result['negative_probs'])
+
+ # 这家餐厅很好吃 1 positive 0.9407 0.0593
+ # 这部电影真的很差劲 0 negative 0.02 0.98
+ ```
+
+- ### 3、API
+
+ - ```python
+ def sentiment_classify(texts=[], data={}, use_gpu=False, batch_size=1)
+ ```
+
+ - senta_bilstm predicting interfaces, predicting sentiment classification of input sentences (dichotomies, positive/negative)
+
+ - **Parameter**
+
+ - texts(list): data to be predicted, if texts parameter is used, there is no need to pass in data parameter. You can use any of the two parameters.
+ - data(dict): predicted data , key must be text,value is data to be predicted. if data parameter is used, there is no need to pass in texts parameter. You can use any of the two parameters. It is suggested to use texts parameter, and data parameter will be discarded later.
+ - use_gpu(bool): use GPU or not. If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before prediction. Otherwise, need not set it.
+ - batch_size(int): batch size
+
+ - **Return**
+
+ - results(list): result of sentiment classification
+
+
+ - ```python
+ def get_labels()
+ ```
+ - get the category of senta_bilstm
+
+ - **Return**
+
+ - labels(dict): the category of senta_bilstm(Dichotomies, positive/negative)
+
+ - ```python
+ def get_vocab_path()
+ ```
+ - Get a vocabulary for pre-training
+
+ - **Return**
+
+ - vocab_path(str): Vocabulary path
+
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online sentiment analysis detection service and you can use this interface for online Web applications.
+
+- ## Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+ - ```shell
+ $ hub serving start -m senta_bilstm
+ ```
+
+ - The model loading process is displayed on startup. After the startup is successful, the following information is displayed:
+ - ```shell
+ Loading senta_bilstm successful.
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before prediction. Otherwise, need not set it.
+
+- ## Step 2: Send a predictive request
+
+ - After configuring the server, the following lines of code can be used to send the prediction request and obtain the prediction result
+
+ - ```python
+ import requests
+ import json
+
+ # data to be predicted
+ text = ["这家餐厅很好吃", "这部电影真的很差劲"]
+
+ # Set the running configuration
+ # Corresponding to local prediction senta_bilstm.sentiment_classify(texts=text, batch_size=1, use_gpu=True)
+ data = {"texts": text, "batch_size": 1, "use_gpu":True}
+
+ # set the prediction method to senta_bilstm and send a POST request, content-type should be set to json
+ # HOST_IP is the IP address of the server
+ url = "http://HOST_IP:8866/predict/senta_bilstm"
+ headers = {"Content-Type": "application/json"}
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # print prediction result
+ print(json.dumps(r.json(), indent=4, ensure_ascii=False))
+ ```
+
+ - For more information about PaddleHub Serving, please refer to:[Serving Deployment](../../../../docs/docs_ch/tutorial/serving.md)
+
+
+
+## V. Release Note
+
+* 1.0.0
+
+ First release
+
+* 1.0.1
+
+ Vocabulary upgrade
+
+* 1.1.0
+
+ Significantly improve predictive performance
+
+* 1.2.0
+
+ Model upgrade, support transfer learning for text classification, text matching and other tasks
+ - ```shell
+ $ hub install senta_bilstm==1.2.0
+ ```
diff --git a/modules/text/text_correction/ernie-csc/README.md b/modules/text/text_correction/ernie-csc/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..a62e46376b658e2e10e745cdd9d657ee4b9259e3
--- /dev/null
+++ b/modules/text/text_correction/ernie-csc/README.md
@@ -0,0 +1,165 @@
+# ERNIE-CSC
+
+|模型名称|ERNIE-CSC|
+| :--- | :---: |
+|类别|文本-文本纠错|
+|网络|ERNIE-CSC|
+|数据集|SIGHAN|
+|是否支持Fine-tuning|否|
+|模型大小|436MB|
+|最新更新日期|2021-12-10|
+|数据指标|-|
+
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - 中文文本纠错任务是一项NLP基础任务,其输入是一个可能含有语法错误的中文句子,输出是一个正确的中文句子。语法错误类型很多,有多字、少字、错别字等,目前最常见的错误类型是错别字。大部分研究工作围绕错别字这一类型进行研究。本文实现了百度在ACL 2021上提出结合拼音特征的Softmask策略的中文错别字纠错的下游任务网络,并提供预训练模型,模型结构如下:
+
+
+
+
+
+ - 更多详情请[参考论文](https://aclanthology.org/2021.findings-acl.198.pdf)
+
+ - 注:论文中暂未开源融合字音特征的预训练模型参数(即MLM-phonetics),所以本文提供的纠错模型是在ERNIE-1.0的参数上进行Finetune,纠错模型结构与论文保持一致。
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.1.0
+
+ - paddlenlp >= 2.2.0
+
+ - paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ernie-csc
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run ernie-csc --input_text="遇到逆竟时,我们必须勇于面对,而且要愈挫愈勇,这样我们才能朝著成功之路前进。"
+ ```
+ - 通过命令行方式实现文本纠错ernie-csc模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+
+ # Load ernie-csc
+ module = hub.Module(name="ernie-csc")
+
+ # String input
+ results = module.predict("遇到逆竟时,我们必须勇于面对,而且要愈挫愈勇,这样我们才能朝著成功之路前进。")
+ print(results)
+ # [{'source': '遇到逆竟时,我们必须勇于面对,而且要愈挫愈勇,这样我们才能朝著成功之路前进。', 'target': '遇到逆境时,我们必须勇于面对,而且要愈挫愈勇,这样我们才能朝著成功之路前进。', 'errors': [{'position': 3, 'correction': {'竟': '境'}}]}]
+
+ # List input
+ results = module.predict(['遇到逆竟时,我们必须勇于面对,而且要愈挫愈勇,这样我们才能朝著成功之路前进。', '人生就是如此,经过磨练才能让自己更加拙壮,才能使自己更加乐观。'])
+ print(results)
+ # [{'source': '遇到逆竟时,我们必须勇于面对,而且要愈挫愈勇,这样我们才能朝著成功之路前进。', 'target': '遇到逆境时,我们必须勇于面对,而且要愈挫愈勇,这样我们才能朝著成功之路前进。', 'errors': [{'position': 3, 'correction': {'竟': '境'}}]}, {'source': '人生就是如此,经过磨练才能让自己更加拙壮,才能使自己更加乐观。', 'target': '人生就是如此,经过磨练才能让自己更加茁壮,才能使自己更加乐观。', 'errors': [{'position': 18, 'correction': {'拙': '茁'}}]}]
+ ```
+
+- ### 3、API
+
+ - ```python
+ def __init__(batch_size=32)
+ ```
+
+ - **参数**
+
+ - batch_size(int): 每个预测批次的样本数目,默认为32。
+
+ - ```python
+ def predict(texts)
+ ```
+ - 预测接口,输入文本,输出文本纠错结果。
+
+ - **参数**
+
+ - texts(str or list\[str\]): 待预测数据。
+
+ - **返回**
+
+ - results(list\[dict\]): 输出结果。每个元素都是dict类型,包含以下信息:
+
+ {
+ 'source': str, 输入文本。
+ 'target': str, 模型预测结果。
+ 'errors': list[dict], 错误字符的详细信息,包含如下信息:
+ {
+ 'position': int, 错误字符的位置。
+ 'correction': dict, 错误字符及其对应的校正结果。
+ }
+ }
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线文本纠错服务,可以将此接口用于在线web应用。
+
+- ## 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ ```shell
+ $ hub serving start -m ernie-csc
+ ```
+
+ - 这样就完成了服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ## 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+
+ # 待预测数据(input string)
+ text = ["遇到逆竟时,我们必须勇于面对,而且要愈挫愈勇,这样我们才能朝著成功之路前进。"]
+
+ # 设置运行配置
+ data = {"texts": text}
+
+ # 指定预测方法为ernie-csc并发送post请求,content-type类型应指定json方式
+ url = "http://127.0.0.1:8866/predict/ernie-csc"
+ headers = {"Content-Type": "application/json"}
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ print(r.json())
+
+ # 待预测数据(input list)
+ text = ['遇到逆竟时,我们必须勇于面对,而且要愈挫愈勇,这样我们才能朝著成功之路前进。', '人生就是如此,经过磨练才能让自己更加拙壮,才能使自己更加乐观。']
+
+ # 设置运行配置
+ data = {"texts": text}
+
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ print(r.json())
+ ```
+
+ - 关于PaddleHub Serving更多信息参考:[服务部署](../../../../docs/docs_ch/tutorial/serving.md)
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install ernie-csc==1.0.0
+ ```
diff --git a/modules/text/text_correction/ernie-csc/__init__.py b/modules/text/text_correction/ernie-csc/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/text/text_correction/ernie-csc/module.py b/modules/text/text_correction/ernie-csc/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..5d6454f4510833fcc97cdcad3983b71e28111f3f
--- /dev/null
+++ b/modules/text/text_correction/ernie-csc/module.py
@@ -0,0 +1,67 @@
+# -*- coding:utf-8 -*-
+import os
+import argparse
+
+import paddle
+import paddlehub as hub
+from paddlehub.module.module import serving, moduleinfo, runnable
+from paddlenlp import Taskflow
+
+
+@moduleinfo(
+ name="ernie-csc",
+ version="1.0.0",
+ summary="",
+ author="Baidu",
+ author_email="",
+ type="nlp/text_correction",
+ meta=hub.NLPPredictionModule)
+class Ernie_CSC(paddle.nn.Layer):
+ def __init__(self,
+ batch_size=32):
+ self.corrector = Taskflow("text_correction", batch_size=batch_size)
+
+ @serving
+ def predict(self, texts):
+ """
+ The prediction interface for ernie-csc.
+
+ Args:
+ texts(str or list[str]): the input texts to be predict.
+
+ Returns:
+ results(list[dict]): inference results. The element is a dictionary consists of:
+ {
+ 'source': str, the input texts.
+ 'target': str, the predicted correct texts.
+ 'errors': list[dict], detail information of errors, the element is a dictionary consists of:
+ {
+ 'position': int, index of wrong charactor.
+ 'correction': int, the origin charactor and the predicted correct charactor.
+ }
+ }
+ """
+ return self.corrector(texts)
+
+ @runnable
+ def run_cmd(self, argvs):
+ """
+ Run as a command
+ """
+ self.parser = argparse.ArgumentParser(
+ description='Run the %s module.' % self.name,
+ prog='hub run %s' % self.name,
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+
+ self.add_module_input_arg()
+
+ args = self.parser.parse_args(argvs)
+
+ input_data = self.check_input_data(args)
+
+ results = self.predict(texts=input_data)
+
+ return results
diff --git a/modules/text/text_correction/ernie-csc/requirements.txt b/modules/text/text_correction/ernie-csc/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..7a8a87b33bb554a24de6344006c9e6c5f4f1c066
--- /dev/null
+++ b/modules/text/text_correction/ernie-csc/requirements.txt
@@ -0,0 +1 @@
+paddlenlp>=2.2.0
diff --git a/modules/text/text_generation/ernie_gen/README_en.md b/modules/text/text_generation/ernie_gen/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..6e8264a44ebac72db09c1c94581f477f0fa9e7ac
--- /dev/null
+++ b/modules/text/text_generation/ernie_gen/README_en.md
@@ -0,0 +1,230 @@
+# ernie_gen
+
+| 模型名称 | ernie_gen |
+| :------------------ | :-----------: |
+| 类别 | 文本-文本生成 |
+| 网络 | ERNIE-GEN |
+| 数据集 | - |
+| 是否支持Fine-tuning | 是 |
+| 模型大小 | 85K |
+| 最新更新日期 | 2021-07-20 |
+| 数据指标 | - |
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+ - ERNIE-GEN 是面向生成任务的预训练-微调框架,首次在预训练阶段加入span-by-span 生成任务,让模型每次能够生成一个语义完整的片段。在预训练和微调中通过填充式生成机制和噪声感知机制来缓解曝光偏差问题。此外, ERNIE-GEN 采样多片段-多粒度目标文本采样策略, 增强源文本和目标文本的关联性,加强了编码器和解码器的交互。
+ - ernie_gen module是一个具备微调功能的module,可以快速完成特定场景module的制作。
+
+
+
+
+
+- 更多详情请查看:[ERNIE-GEN:An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation](https://arxiv.org/abs/2001.11314)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+ - paddlenlp >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ernie_gen
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ernie_gen can be used **only if it is first targeted at the specific dataset fine-tune**
+ - There are many types of text generation tasks, ernie_gen only provides the basic parameters for text generation, which can only be used after fine-tuning the dataset for a specific task
+ - Paddlehub provides a simple fine-tune dataset:[train.txt](./test_data/train.txt), [dev.txt](./test_data/dev.txt)
+ - Paddlehub also offers multiple fine-tune pre-training models that work well:[Couplet generated](../ernie_gen_couplet/),[Lover words generated](../ernie_gen_lover_words/),[Poetry generated](../ernie_gen_poetry/)等
+
+### 1、Fine-tune and encapsulation
+
+- #### Fine-tune Code Example
+
+ - ```python
+ import paddlehub as hub
+
+ module = hub.Module(name="ernie_gen")
+
+ result = module.finetune(
+ train_path='train.txt',
+ dev_path='dev.txt',
+ max_steps=300,
+ batch_size=2
+ )
+
+ module.export(params_path=result['last_save_path'], module_name="ernie_gen_test", author="test")
+ ```
+
+- #### API Instruction
+
+ - ```python
+ def finetune(train_path,
+ dev_path=None,
+ save_dir="ernie_gen_result",
+ init_ckpt_path=None,
+ use_gpu=True,
+ max_steps=500,
+ batch_size=8,
+ max_encode_len=15,
+ max_decode_len=15,
+ learning_rate=5e-5,
+ warmup_proportion=0.1,
+ weight_decay=0.1,
+ noise_prob=0,
+ label_smooth=0,
+ beam_width=5,
+ length_penalty=1.0,
+ log_interval=100,
+ save_interval=200):
+ ```
+
+ - Fine tuning model parameters API
+ - **Parameter**
+ - train_path(str): Training set path. The format of the training set should be: "serial number\tinput text\tlabel", such as "1\t床前明月光\t疑是地上霜", note that \t cannot be replaced by Spaces
+ - dev_path(str): validation set path. The format of the validation set should be: "serial number\tinput text\tlabel, such as "1\t举头望明月\t低头思故乡", note that \t cannot be replaced by Spaces
+ - save_dir(str): Model saving and validation sets predict output paths.
+ - init_ckpt_path(str): The model initializes the loading path to realize incremental training.
+ - use_gpu(bool): use gpu or not
+ - max_steps(int): Maximum training steps.
+ - batch_size(int): Batch size during training.
+ - max_encode_len(int): Maximum encoding length.
+ - max_decode_len(int): Maximum decoding length.
+ - learning_rate(float): Learning rate size.
+ - warmup_proportion(float): Warmup rate.
+ - weight_decay(float): Weight decay size.
+ - noise_prob(float): Noise probability, refer to the Ernie Gen's paper.
+ - label_smooth(float): Label smoothing weight.
+ - beam_width(int): Beam size of validation set at the time of prediction.
+ - length_penalty(float): Length penalty weight for validation set prediction.
+ - log_interval(int): Number of steps at a training log printing interval.
+ - save_interval(int): training model save interval deployment. The validation set will make predictions after the model is saved.
+ - **Return**
+ - result(dict): Run result. Contains 2 keys:
+ - last_save_path(str): Save path of model at the end of training.
+ - last_ppl(float): Model confusion at the end of training.
+
+ - ```python
+ def export(
+ params_path,
+ module_name,
+ author,
+ version="1.0.0",
+ summary="",
+ author_email="",
+ export_path="."):
+ ```
+
+ - Module exports an API through which training parameters can be packaged into a Hub Module with one click.
+ - **Parameter**
+ - params_path(str): Module parameter path.
+ - module_name(str): module name, such as "ernie_gen_couplet"。
+ - author(str): Author name
+ - max_encode_len(int): Maximum encoding length.
+ - max_decode_len(int): Maximum decoding length.
+ - version(str): The version number.
+ - summary(str): English introduction to Module.
+ - author_email(str): Email address of the author.
+ - export_path(str): Module export path.
+
+### 2、模型预测
+
+- **定义`$module_name`为export指定的module_name**
+
+- 模型转换完毕之后,通过`hub install $module_name`安装该模型,即可通过以下2种方式调用自制module:
+
+- #### 法1:命令行预测
+
+ - ```python
+ $ hub run $module_name --input_text="输入文本" --use_gpu True --beam_width 5
+ ```
+
+ - 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- #### 法2:API预测
+
+ - ```python
+ import paddlehub as hub
+
+ module = hub.Module(name="$module_name")
+
+ test_texts = ["输入文本1", "输入文本2"]
+ # generate包含3个参数,texts为输入文本列表,use_gpu指定是否使用gpu,beam_width指定beam search宽度。
+ results = module.generate(texts=test_texts, use_gpu=True, beam_width=5)
+ for result in results:
+ print(result)
+ ```
+
+- 您也可以将`$module_name`文件夹打包为tar.gz压缩包并联系PaddleHub工作人员上传至PaddleHub模型仓库,这样更多的用户可以通过一键安装的方式使用您的模型。PaddleHub非常欢迎您的贡献,共同推动开源社区成长。
+
+## 四、服务部署
+
+- PaddleHub Serving 可以部署一个文本生成的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m $module_name -p 8866
+ ```
+
+ - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 客户端通过以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+
+ # 发送HTTP请求
+
+ data = {'texts':["输入文本1", "输入文本2"],
+ 'use_gpu':True, 'beam_width':5}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/$module_name"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 保存结果
+ results = r.json()["results"]
+ for result in results:
+ print(result)
+ ```
+
+- **NOTE:** 上述`$module_name`为export指定的module_name
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+* 1.0.1
+
+ 修复模型导出bug
+
+* 1.0.2
+
+ 修复windows运行中的bug
+
+* 1.1.0
+
+ 接入PaddleNLP
+
+ - ```shell
+ $ hub install ernie_gen==1.1.0
+ ```
diff --git a/modules/text/text_generation/reading_pictures_writing_poems/readme.md b/modules/text/text_generation/reading_pictures_writing_poems/readme.md
index 5468e04ba180e4b73b2f5899b75b29db097a2a59..7a6351c346551b09b94f729beb6aaa4934a47af3 100644
--- a/modules/text/text_generation/reading_pictures_writing_poems/readme.md
+++ b/modules/text/text_generation/reading_pictures_writing_poems/readme.md
@@ -63,13 +63,13 @@
- ### 2、预测代码示例
- ```python
- import paddlehub as hub
-
- readingPicturesWritingPoems = hub.Module(name="reading_pictures_writing_poems")
- results = readingPicturesWritingPoems.WritingPoem(image = "scenery.jpg", use_gpu=False)
-
- for result in results:
- print(result)
+ import paddlehub as hub
+
+ readingPicturesWritingPoems = hub.Module(name="reading_pictures_writing_poems")
+ results = readingPicturesWritingPoems.WritingPoem(image = "scenery.jpg", use_gpu=False)
+
+ for result in results:
+ print(result)
```
- ### 3、API
diff --git a/modules/text/text_review/porn_detection_cnn/README.md b/modules/text/text_review/porn_detection_cnn/README.md
index e72a71a633cf9aea44fbc7f1c2ef84a2fe31711e..588ce206b11445a6754b8891918d92f6b9cd396f 100644
--- a/modules/text/text_review/porn_detection_cnn/README.md
+++ b/modules/text/text_review/porn_detection_cnn/README.md
@@ -1,93 +1,184 @@
-# PornDetectionCNN API说明
+# porn_detection_cnn
-## detection(texts=[], data={}, use_gpu=False, batch_size=1)
+| 模型名称 | porn_detection_cnn |
+| :------------------ | :------------: |
+| 类别 | 文本-文本审核 |
+| 网络 | CNN |
+| 数据集 | 百度自建数据集 |
+| 是否支持Fine-tuning | 否 |
+| 模型大小 | 20M |
+| 最新更新日期 | 2021-02-26 |
+| 数据指标 | - |
-porn_detection_cnn预测接口,鉴定输入句子是否包含色情文案
+## 一、模型基本信息
-**参数**
+- ### 模型介绍
+ - 色情检测模型可自动判别文本是否涉黄并给出相应的置信度,对文本中的色情描述、低俗交友、污秽文案进行识别。
+ - porn_detection_cnn采用CNN网络结构并按字粒度进行切词,具有较高的预测速度。该模型最大句子长度为256字,仅支持预测。
-* texts(list): 待预测数据,如果使用texts参数,则不用传入data参数,二选一即可
-* data(dict): 预测数据,key必须为text,value是带预测数据。如果使用data参数,则不用传入texts参数,二选一即可。建议使用texts参数,data参数后续会废弃。
-* use_gpu(bool): 是否使用GPU预测,如果使用GPU预测,则在预测之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置
-* batch_size(int): 批处理大小
-**返回**
+## 二、安装
-* results(list): 鉴定结果
+- ### 1、环境依赖
-## context(trainable=False)
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
-获取porn_detection_cnn的预训练program以及program的输入输出变量
+- ### 2、安装
-**参数**
+ - ```shell
+ $ hub install porn_detection_cnn
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
-* trainable(bool): trainable=True表示program中的参数在Fine-tune时需要微调,否则保持不变
-**返回**
+## 三、模型API预测
-* inputs(dict): program的输入变量
-* outputs(dict): program的输出变量
-* main_program(Program): 带有预训练参数的program
+- ### 1、命令行预测
-## get_labels()
+ - ```shell
+ $ hub run porn_detection_cnn --input_text "黄片下载"
+ ```
+
+ - 或者
-获取porn_detection_cnn的类别
+ - ```shell
+ $ hub run porn_detection_cnn --input_file test.txt
+ ```
+
+ - 其中test.txt存放待审查文本,每行仅放置一段待审核文本
+
+ - 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
-**返回**
+- ### 2、预测代码示例
-* labels(dict): porn_detection_cnn的类别(二分类,是/不是)
+ - ```python
+ import paddlehub as hub
+
+ porn_detection_cnn = hub.Module(name="porn_detection_cnn")
+
+ test_text = ["黄片下载", "打击黄牛党"]
+
+ results = porn_detection_cnn.detection(texts=test_text, use_gpu=True, batch_size=1)
+
+ for index, text in enumerate(test_text):
+ results[index]["text"] = text
+ for index, result in enumerate(results):
+ print(results[index])
+
+ # 输出结果如下:
+ # {'text': '黄片下载', 'porn_detection_label': 1, 'porn_detection_key': 'porn', 'porn_probs': 0.9324, 'not_porn_probs': 0.0676}
+ # {'text': '打击黄牛党', 'porn_detection_label': 0, 'porn_detection_key': 'not_porn', 'porn_probs': 0.0004, 'not_porn_probs': 0.9996}
+ ```
-## get_vocab_path()
+
+- ### 3、API
-获取预训练时使用的词汇表
+ - ```python
+ def detection(texts=[], data={}, use_gpu=False, batch_size=1)
+ ```
+
+ - porn_detection_cnn预测接口,鉴定输入句子是否包含色情文案
-**返回**
+ - **参数**
-* vocab_path(str): 词汇表路径
+ - texts(list): 待预测数据,如果使用texts参数,则不用传入data参数,二选一即可
-# PornDetectionCNN 服务部署
+ - data(dict): 预测数据,key必须为text,value是带预测数据。如果使用data参数,则不用传入texts参数,二选一即可。建议使用texts参数,data参数后续会废弃。
-PaddleHub Serving可以部署一个在线色情文案检测服务,可以将此接口用于在线web应用。
+ - use_gpu(bool): 是否使用GPU预测,如果使用GPU预测,则在预测之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置
-## 第一步:启动PaddleHub Serving
+ - batch_size(int): 批处理大小
-运行启动命令:
-```shell
-$ hub serving start -m porn_detection_cnn
-```
+ - **返回**
-启动时会显示加载模型过程,启动成功后显示
-```shell
-Loading porn_detection_cnn successful.
-```
+ - results(list): 鉴定结果
-这样就完成了服务化API的部署,默认端口号为8866。
-**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+ - ```python
+ def get_labels()
+ ```
+ - 获取porn_detection_cnn的类别
-## 第二步:发送预测请求
+ - **返回**
-配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+ - labels(dict): porn_detection_cnn的类别(二分类,是/不是)
-```python
-import requests
-import json
+ - ```python
+ def get_vocab_path()
+ ```
-# 待预测数据
-text = ["黄片下载", "打击黄牛党"]
+ - 获取预训练时使用的词汇表
-# 设置运行配置
-# 对应本地预测porn_detection_cnn.detection(texts=text, batch_size=1, use_gpu=True)
-data = {"texts": text, "batch_size": 1, "use_gpu":True}
+ - **返回**
-# 指定预测方法为porn_detection_cnn并发送post请求,content-type类型应指定json方式
-# HOST_IP为服务器IP
-url = "http://HOST_IP:8866/predict/porn_detection_cnn"
-headers = {"Content-Type": "application/json"}
-r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ - vocab_path(str): 词汇表路径
-# 打印预测结果
-print(json.dumps(r.json(), indent=4, ensure_ascii=False))
-```
-关于PaddleHub Serving更多信息参考[服务部署](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.6/docs/tutorial/serving.md)
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线色情文案检测服务,可以将此接口用于在线web应用。
+
+- ## 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m porn_detection_cnn
+ ```
+
+ - 启动时会显示加载模型过程,启动成功后显示
+ - ```shell
+ Loading porn_detection_cnn successful.
+ ```
+
+ - 这样就完成了服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ## 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+
+ # 待预测数据
+ text = ["黄片下载", "打击黄牛党"]
+
+ # 设置运行配置
+ # 对应本地预测porn_detection_cnn.detection(texts=text, batch_size=1, use_gpu=True)
+ data = {"texts": text, "batch_size": 1, "use_gpu":True}
+
+ # 指定预测方法为porn_detection_cnn并发送post请求,content-type类型应指定json方式
+ # HOST_IP为服务器IP
+ url = "http://HOST_IP:8866/predict/porn_detection_cnn"
+ headers = {"Content-Type": "application/json"}
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(json.dumps(r.json(), indent=4, ensure_ascii=False))
+ ```
+
+ - 关于PaddleHub Serving更多信息参考[服务部署](../../../../docs/docs_ch/tutorial/serving.md)
+
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+* 1.1.0
+
+ 大幅提升预测性能,同时简化接口使用
+
+ - ```shell
+ $ hub install porn_detection_cnn==1.1.0
+ ```
+
+
diff --git a/modules/text/text_review/porn_detection_gru/README.md b/modules/text/text_review/porn_detection_gru/README.md
index add8f9f971a6ea692d2091a571678b7dd1e0b042..46ba978316b494116319d82f004b4b4259327b5b 100644
--- a/modules/text/text_review/porn_detection_gru/README.md
+++ b/modules/text/text_review/porn_detection_gru/README.md
@@ -1,93 +1,185 @@
-# PornDetectionGRU API说明
+# porn_detection_gru
-## detection(texts=[], data={}, use_gpu=False, batch_size=1)
+| 模型名称 | porn_detection_gru |
+| :------------------ | :------------: |
+| 类别 | 文本-文本审核 |
+| 网络 | GRU |
+| 数据集 | 百度自建数据集 |
+| 是否支持Fine-tuning | 否 |
+| 模型大小 | 20M |
+| 最新更新日期 | 2021-02-26 |
+| 数据指标 | - |
-porn_detection_gru预测接口,鉴定输入句子是否包含色情文案
+## 一、模型基本信息
-**参数**
+- ### 模型介绍
+ - 色情检测模型可自动判别文本是否涉黄并给出相应的置信度,对文本中的色情描述、低俗交友、污秽文案进行识别。
+ - porn_detection_gru采用GRU网络结构并按字粒度进行切词,具有较高的预测速度。该模型最大句子长度为256字,仅支持预测。
-* texts(list): 待预测数据,如果使用texts参数,则不用传入data参数,二选一即可
-* data(dict): 预测数据,key必须为text,value是带预测数据。如果使用data参数,则不用传入texts参数,二选一即可。建议使用texts参数,data参数后续会废弃。
-* use_gpu(bool): 是否使用GPU预测,如果使用GPU预测,则在预测之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置
-* batch_size(int): 批处理大小
-**返回**
+## 二、安装
-* results(list): 鉴定结果
+- ### 1、环境依赖
-## context(trainable=False)
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
-获取porn_detection_gru的预训练program以及program的输入输出变量
+- ### 2、安装
-**参数**
+ - ```shell
+ $ hub install porn_detection_gru
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
-* trainable(bool): trainable=True表示program中的参数在Fine-tune时需要微调,否则保持不变
-**返回**
-* inputs(dict): program的输入变量
-* outputs(dict): program的输出变量
-* main_program(Program): 带有预训练参数的program
+## 三、模型API预测
-## get_labels()
+- ### 1、命令行预测
-获取porn_detection_gru的类别
+ - ```shell
+ $ hub run porn_detection_gru --input_text "黄片下载"
+ ```
+
+ - 或者
-**返回**
+ - ```shell
+ $ hub run porn_detection_gru --input_file test.txt
+ ```
+
+ - 其中test.txt存放待审查文本,每行仅放置一段待审核文本
+
+ - 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
-* labels(dict): porn_detection_gru的类别
+- ### 2、预测代码示例
-## get_vocab_path()
+ - ```python
+ import paddlehub as hub
+
+ porn_detection_gru = hub.Module(name="porn_detection_gru")
+
+ test_text = ["黄片下载", "打击黄牛党"]
+
+ results = porn_detection_gru.detection(texts=test_text, use_gpu=True, batch_size=1) # 如不使用GPU,请修改为use_gpu=False
+
+ for index, text in enumerate(test_text):
+ results[index]["text"] = text
+ for index, result in enumerate(results):
+ print(results[index])
+
+ # 输出结果如下:
+ # {'text': '黄片下载', 'porn_detection_label': 1, 'porn_detection_key': 'porn', 'porn_probs': 0.9324, 'not_porn_probs': 0.0676}
+ # {'text': '打击黄牛党', 'porn_detection_label': 0, 'porn_detection_key': 'not_porn', 'porn_probs': 0.0004, 'not_porn_probs': 0.9996}
+ ```
-获取预训练时使用的词汇表
+
+- ### 3、API
-**返回**
+ - ```python
+ def detection(texts=[], data={}, use_gpu=False, batch_size=1)
+ ```
+
+ - porn_detection_gru预测接口,鉴定输入句子是否包含色情文案
-* vocab_path(str): 词汇表路径
+ - **参数**
-# PornDetectionGRU 服务部署
+ - texts(list): 待预测数据,如果使用texts参数,则不用传入data参数,二选一即可
-PaddleHub Serving可以部署一个在线色情文案检测服务,可以将此接口用于在线web应用。
+ - data(dict): 预测数据,key必须为text,value是带预测数据。如果使用data参数,则不用传入texts参数,二选一即可。建议使用texts参数,data参数后续会废弃。
-## 第一步:启动PaddleHub Serving
+ - use_gpu(bool): 是否使用GPU预测,如果使用GPU预测,则在预测之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置
-运行启动命令:
-```shell
-$ hub serving start -m porn_detection_gru
-```
+ - batch_size(int): 批处理大小
-启动时会显示加载模型过程,启动成功后显示
-```shell
-Loading porn_detection_gru successful.
-```
+ - **返回**
-这样就完成了服务化API的部署,默认端口号为8866。
+ - results(list): 鉴定结果
-**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
-## 第二步:发送预测请求
+ - ```python
+ def get_labels()
+ ```
+ - 获取porn_detection_gru的类别
-配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+ - **返回**
-```python
-import requests
-import json
+ - labels(dict): porn_detection_gru的类别(二分类,是/不是)
-# 待预测数据
-text = ["黄片下载", "打击黄牛党"]
+ - ```python
+ def get_vocab_path()
+ ```
-# 设置运行配置
-# 对应本地预测porn_detection_gru.detection(texts=text, batch_size=1, use_gpu=True)
-data = {"texts": text, "batch_size": 1, "use_gpu":True}
+ - 获取预训练时使用的词汇表
-# 指定预测方法为porn_detection_gru并发送post请求,content-type类型应指定json方式
-# HOST_IP为服务器IP
-url = "http://HOST_IP:8866/predict/porn_detection_gru"
-headers = {"Content-Type": "application/json"}
-r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ - **返回**
-# 打印预测结果
-print(json.dumps(r.json(), indent=4, ensure_ascii=False))
-```
+ - vocab_path(str): 词汇表路径
-关于PaddleHub Serving更多信息参考[服务部署](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.6/docs/tutorial/serving.md)
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线色情文案检测服务,可以将此接口用于在线web应用。
+
+- ## 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m porn_detection_gru
+ ```
+
+ - 启动时会显示加载模型过程,启动成功后显示
+ - ```shell
+ Loading porn_detection_gur successful.
+ ```
+
+ - 这样就完成了服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ## 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+
+ # 待预测数据
+ text = ["黄片下载", "打击黄牛党"]
+
+ # 设置运行配置
+ # 对应本地预测porn_detection_gru.detection(texts=text, batch_size=1, use_gpu=True)
+ data = {"texts": text, "batch_size": 1, "use_gpu":True}
+
+ # 指定预测方法为porn_detection_gru并发送post请求,content-type类型应指定json方式
+ # HOST_IP为服务器IP
+ url = "http://HOST_IP:8866/predict/porn_detection_gru"
+ headers = {"Content-Type": "application/json"}
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(json.dumps(r.json(), indent=4, ensure_ascii=False))
+ ```
+
+ - 关于PaddleHub Serving更多信息参考[服务部署](../../../../docs/docs_ch/tutorial/serving.md)
+
+
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+* 1.1.0
+
+ 大幅提升预测性能,同时简化接口使用
+
+ - ```shell
+ $ hub install porn_detection_gru==1.1.0
+ ```
+
diff --git a/modules/text/text_review/porn_detection_gru/README_en.md b/modules/text/text_review/porn_detection_gru/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..3a8446fad72920a5318888ddba9aea19cd6493bf
--- /dev/null
+++ b/modules/text/text_review/porn_detection_gru/README_en.md
@@ -0,0 +1,183 @@
+# porn_detection_gru
+
+| Module Name | porn_detection_gru |
+| :------------------ | :------------: |
+| Category | text-text_review |
+| Network | GRU |
+| Dataset | Dataset built by Baidu |
+| Fine-tuning supported or not | No |
+| Module Size | 20M |
+| Latest update date | 2021-02-26 |
+| Data indicators | - |
+
+## I. Basic Information of Module
+
+- ### Module Introduction
+ - Pornography detection model can automatically distinguish whether the text is pornographic or not and give the corresponding confidence, and identify the pornographic description, vulgar communication and filthy text in the text.
+ - porn_detection_gru adopts GRU network structure and cuts words according to word granularity, which has high prediction speed. The maximum sentence length of this model is 256 words, and only prediction is supported.
+
+
+## II. Installation
+
+- ### 1、Environmental dependence
+
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [How to install PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install porn_detection_gru
+ ```
+ - If you have problems during installation, please refer to:[windows_quickstart](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [linux_quickstart](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [mac_quickstart](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+## III. Module API and Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run porn_detection_gru --input_text "黄片下载"
+ ```
+
+ - or
+
+ - ```shell
+ $ hub run porn_detection_gru --input_file test.txt
+ ```
+
+ - test.txt stores the text to be reviewed. Each line contains only one text
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command line instruction](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+
+ porn_detection_gru = hub.Module(name="porn_detection_gru")
+
+ test_text = ["黄片下载", "打击黄牛党"]
+
+ results = porn_detection_gru.detection(texts=test_text, use_gpu=True, batch_size=1) # If you do not use GPU, please set use_gpu=False
+
+ for index, text in enumerate(test_text):
+ results[index]["text"] = text
+ for index, result in enumerate(results):
+ print(results[index])
+
+ # The output:
+ # {'text': '黄片下载', 'porn_detection_label': 1, 'porn_detection_key': 'porn', 'porn_probs': 0.9324, 'not_porn_probs': 0.0676}
+ # {'text': '打击黄牛党', 'porn_detection_label': 0, 'porn_detection_key': 'not_porn', 'porn_probs': 0.0004, 'not_porn_probs': 0.9996}
+ ```
+
+
+- ### 3、API
+
+ - ```python
+ def detection(texts=[], data={}, use_gpu=False, batch_size=1)
+ ```
+
+ - prediction api of porn_detection_gru,to identify whether input sentences contain pornography
+
+ - **Parameter**
+
+ - texts(list): data to be predicted, if texts parameter is used, there is no need to pass in data parameter. You can use any of the two parameters.
+
+ - data(dict): predicted data , key must be text,value is data to be predicted. if data parameter is used, there is no need to pass in texts parameter. You can use any of the two parameters. It is suggested to use texts parameter, and data parameter will be discarded later.
+
+ - use_gpu(bool): use GPU or not. If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before prediction. Otherwise, need not set it.
+
+ - **Return**
+
+ - results(list): prediction result
+
+
+ - ```python
+ def get_labels()
+ ```
+ - get the category of porn_detection_gru
+
+ - **Return**
+
+ - labels(dict): the category of porn_detection_gru (Dichotomies, yes/no)
+
+ - ```python
+ def get_vocab_path()
+ ```
+
+ - get a vocabulary for pre-training
+
+ - **Return**
+
+ - vocab_path(str): Vocabulary path
+
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online pornography detection service and you can use this interface for online Web applications.
+
+- ## Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+ - ```shell
+ $ hub serving start -m porn_detection_gru
+ ```
+
+ - The model loading process is displayed on startup. After the startup is successful, the following information is displayed:
+ - ```shell
+ Loading porn_detection_gur successful.
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before prediction. Otherwise, need not set it.
+
+
+- ## Step 2: Send a predictive request
+
+ - After configuring the server, the following lines of code can be used to send the prediction request and obtain the prediction result
+ - ```python
+ import requests
+ import json
+
+ # data to be predicted
+ text = ["黄片下载", "打击黄牛党"]
+
+ # Set the running configuration
+ # Corresponding local forecast porn_detection_gru.detection(texts=text, batch_size=1, use_gpu=True)
+ data = {"texts": text, "batch_size": 1, "use_gpu":True}
+
+ # set the prediction method to porn_detection_gru and send a POST request, content-type should be set to json
+ # HOST_IP is the IP address of the server
+ url = "http://HOST_IP:8866/predict/porn_detection_gru"
+ headers = {"Content-Type": "application/json"}
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # print prediction result
+ print(json.dumps(r.json(), indent=4, ensure_ascii=False))
+ ```
+
+ - For more information about PaddleHub Serving, please refer to:[Serving Deployment](../../../../docs/docs_ch/tutorial/serving.md)
+
+
+
+
+## V. Release Note
+
+* 1.0.0
+
+ First release
+
+* 1.1.0
+
+ Improves prediction performance and simplifies interface usage
+
+ - ```shell
+ $ hub install porn_detection_gru==1.1.0
+ ```
+
diff --git a/modules/text/text_to_knowledge/nptag/README.md b/modules/text/text_to_knowledge/nptag/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..12711225c2b5534a6fdaedae1bb0cacbe9e314cd
--- /dev/null
+++ b/modules/text/text_to_knowledge/nptag/README.md
@@ -0,0 +1,168 @@
+# NPTag
+
+|模型名称|NPTag|
+| :--- | :---: |
+|类别|文本-文本知识关联|
+|网络|ERNIE-CTM|
+|数据集|百度自建数据集|
+|是否支持Fine-tuning|否|
+|模型大小|378MB|
+|最新更新日期|2021-12-10|
+|数据指标|-|
+
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - NPTag(名词短语标注工具)是首个能够覆盖所有中文名词性词汇及短语的细粒度知识标注工具,旨在解决NLP中,名词性短语收录不足,导致的OOV(out-of-vocabulary,超出收录词表)问题。可直接应用构造知识特征,辅助NLP任务
+
+ - NPTag特点
+
+ - 包含2000+细粒度类别,覆盖所有中文名词性短语的词类体系,更丰富的知识标注结果
+ - NPTag试用的词类体系未覆盖所有中文名词性短语的词类体系,对所有类目做了更细类目的识别(如注射剂、鱼类、博物馆等),共包含2000+细粒度类别,且可以直接关联百科知识树。
+ - 可自由定制的分类框架
+ - NPTag开源版标注使用的词类体系是我们在实践中对**百科词条**分类应用较好的一个版本,用户可以自由定制自己的词类体系和训练样本,构建自己的NPTag,以获得更好的适配效果。例如,可按照自定义的类别构造训练样本,使用小学习率、短训练周期微调NPTag模型,即可获得自己定制的NPTag工具。
+
+ - 模型结构
+ - NPTag使用ERNIE-CTM+prompt训练而成,使用启发式搜索解码,保证分类结果都在标签体系之内。
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.1.0
+
+ - paddlenlp >= 2.2.0
+
+ - paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install nptag
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run nptag --input_text="糖醋排骨"
+ ```
+ - 通过命令行方式实现NPTag模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+
+ # Load NPTag
+ module = hub.Module(name="nptag")
+
+ # String input
+ results = module.predict("糖醋排骨")
+ print(results)
+ # [{'text': '糖醋排骨', 'label': '菜品', 'category': '饮食类_菜品'}]
+
+ # List input
+ results = module.predict(["糖醋排骨", "红曲霉菌"])
+ print(results)
+ # [{'text': '糖醋排骨', 'label': '菜品', 'category': '饮食类_菜品'}, {'text': '红曲霉菌', 'label': '微生物', 'category': '生物类_微生物'}]
+ ```
+
+- ### 3、API
+
+ - ```python
+ def __init__(
+ batch_size=32,
+ max_seq_length=128,
+ linking=True,
+ )
+ ```
+
+ - **参数**
+
+ - batch_size(int): 每个预测批次的样本数目,默认为32。
+ - max_seq_length(int): 最大句子长度,默认为128。
+ - linking(bool): 实现与WordTag类别标签的linking,默认为True。
+
+ - ```python
+ def predict(texts)
+ ```
+ - 预测接口,输入文本,输出名词短语标注结果。
+
+ - **参数**
+
+ - texts(str or list\[str\]): 待预测数据。
+
+ - **返回**
+
+ - results(list\[dict\]): 输出结果。每个元素都是dict类型,包含以下信息:
+
+ {
+ 'text': str, 原始文本。
+ 'label': str,预测结果。
+ 'category':str,对应的WordTag类别标签。
+ }
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线中文名词短语标注服务,可以将此接口用于在线web应用。
+
+- ## 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ ```shell
+ $ hub serving start -m nptag
+ ```
+
+ - 这样就完成了服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ## 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+
+ # 待预测数据(input string)
+ text = ["糖醋排骨"]
+
+ # 设置运行配置
+ data = {"texts": text}
+
+ # 指定预测方法为WordTag并发送post请求,content-type类型应指定json方式
+ url = "http://127.0.0.1:8866/predict/nptag"
+ headers = {"Content-Type": "application/json"}
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ print(r.json())
+
+ # 待预测数据(input list)
+ text = ["糖醋排骨", "红曲霉菌"]
+
+ # 设置运行配置
+ data = {"texts": text}
+
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ print(r.json())
+ ```
+
+ - 关于PaddleHub Serving更多信息参考:[服务部署](../../../../docs/docs_ch/tutorial/serving.md)
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install nptag==1.0.0
+ ```
diff --git a/modules/text/text_to_knowledge/nptag/__init__.py b/modules/text/text_to_knowledge/nptag/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/modules/text/text_to_knowledge/nptag/module.py b/modules/text/text_to_knowledge/nptag/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..e7949e7fe99877edba5e11aaaf3f0b7445b4aaf2
--- /dev/null
+++ b/modules/text/text_to_knowledge/nptag/module.py
@@ -0,0 +1,72 @@
+# -*- coding:utf-8 -*-
+import os
+import argparse
+
+import paddle
+import paddlehub as hub
+from paddlehub.module.module import serving, moduleinfo, runnable
+from paddlenlp import Taskflow
+
+
+@moduleinfo(
+ name="nptag",
+ version="1.0.0",
+ summary="",
+ author="Baidu",
+ author_email="",
+ type="nlp/text_to_knowledge",
+ meta=hub.NLPPredictionModule)
+class NPTag(paddle.nn.Layer):
+ def __init__(self,
+ batch_size=32,
+ max_seq_length=128,
+ linking=True,
+ ):
+ self.nptag = Taskflow("knowledge_mining", model="nptag", batch_size=batch_size, max_seq_length=max_seq_length, linking=linking)
+
+ @serving
+ def predict(self, texts):
+ """
+ The prediction interface for nptag.
+
+ Args:
+ texts(str or list[str]): the input texts to be predict.
+
+ Returns:
+ results(list[dict]): inference results. The element is a dictionary consists of:
+ {
+ 'text': str, the input texts.
+ 'head': list[dict], tagging results, the element is a dictionary consists of:
+ {
+ 'item': str, segmented word.
+ 'offset': int, the offset compared with the first character.
+ 'nptag_label':str, Part-Of-Speech label.
+ 'length': int, word length.
+ 'termid': str, link result with encyclopedia knowledge tree.
+ }
+ }
+ """
+ return self.nptag(texts)
+
+ @runnable
+ def run_cmd(self, argvs):
+ """
+ Run as a command
+ """
+ self.parser = argparse.ArgumentParser(
+ description='Run the %s module.' % self.name,
+ prog='hub run %s' % self.name,
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+
+ self.add_module_input_arg()
+
+ args = self.parser.parse_args(argvs)
+
+ input_data = self.check_input_data(args)
+
+ results = self.predict(texts=input_data)
+
+ return results
diff --git a/modules/text/text_to_knowledge/nptag/requirements.txt b/modules/text/text_to_knowledge/nptag/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..7a8a87b33bb554a24de6344006c9e6c5f4f1c066
--- /dev/null
+++ b/modules/text/text_to_knowledge/nptag/requirements.txt
@@ -0,0 +1 @@
+paddlenlp>=2.2.0
diff --git a/modules/text/text_to_knowledge/wordtag/README.md b/modules/text/text_to_knowledge/wordtag/README.md
index 07410e3d5f000553f9ae7e956a67bd4524ea7ada..42c6ed697daee83a4e893fd899b09690e9c809bc 100644
--- a/modules/text/text_to_knowledge/wordtag/README.md
+++ b/modules/text/text_to_knowledge/wordtag/README.md
@@ -65,14 +65,14 @@
- ```shell
$ hub run wordtag --input_text="《孤女》是2010年九州出版社出版的小说,作者是余兼羽。"
```
- - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+ - 通过命令行方式实现WordTag模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
- # Load ddparser
+ # Load WordTag
module = hub.Module(name="wordtag")
# String input
diff --git a/modules/text/text_to_knowledge/wordtag/module.py b/modules/text/text_to_knowledge/wordtag/module.py
index b53da8ce76ec448b359071768ca936eb4abf745b..a0ecde839fe36e85e8382e35504bddbe282fb2cc 100644
--- a/modules/text/text_to_knowledge/wordtag/module.py
+++ b/modules/text/text_to_knowledge/wordtag/module.py
@@ -2,6 +2,7 @@
import os
import argparse
+import paddle
import paddlehub as hub
from paddlehub.module.module import serving, moduleinfo, runnable
from paddlenlp import Taskflow
@@ -13,8 +14,9 @@ from paddlenlp import Taskflow
summary="",
author="baidu-nlp",
author_email="",
- type="nlp/text_to_knowledge")
-class wordtag(hub.NLPPredictionModule):
+ type="nlp/text_to_knowledge",
+ meta=hub.NLPPredictionModule)
+class WordTag(paddle.nn.Layer):
def __init__(self,
batch_size=32,
max_seq_length=128,
diff --git a/modules/text/text_to_knowledge/wordtag/requirements.txt b/modules/text/text_to_knowledge/wordtag/requirements.txt
index b31e23a5991364192fadf368fe382953e56b31bb..7a8a87b33bb554a24de6344006c9e6c5f4f1c066 100644
--- a/modules/text/text_to_knowledge/wordtag/requirements.txt
+++ b/modules/text/text_to_knowledge/wordtag/requirements.txt
@@ -1 +1 @@
-paddlenlp>=2.1.1
+paddlenlp>=2.2.0
diff --git a/modules/video/Video_editing/SkyAR/README.md b/modules/video/Video_editing/SkyAR/README.md
index 7e6cb468f2220dcdc12d58cdf8be2986372d5f66..0b43e10fffa98d72a7b4c82406712250520da02a 100644
--- a/modules/video/Video_editing/SkyAR/README.md
+++ b/modules/video/Video_editing/SkyAR/README.md
@@ -2,9 +2,9 @@
|模型名称|SkyAR|
| :--- | :---: |
-|类别|图像-图像分割|
+|类别|视频-视频编辑|
|网络|UNet|
-|数据集|UNet|
+|数据集|-|
|是否支持Fine-tuning|否|
|模型大小|206MB|
|指标|-|
@@ -71,7 +71,7 @@
## 三、模型API预测
-- ### 1、代码示例
+- ### 1、预测代码示例
```python
import paddlehub as hub
@@ -79,8 +79,8 @@
model = hub.Module(name='SkyAR')
model.MagicSky(
- video_path=[path to input video path],
- save_path=[path to save video path]
+ video_path="/PATH/TO/VIDEO",
+ save_path="/PATH/TO/SAVE/RESULT"
)
```
- ### 2、API
diff --git a/modules/video/Video_editing/SkyAR/README_en.md b/modules/video/Video_editing/SkyAR/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..1b122baa1fcf903c8c47aa0303c35ef0f01bdfe8
--- /dev/null
+++ b/modules/video/Video_editing/SkyAR/README_en.md
@@ -0,0 +1,124 @@
+# SkyAR
+
+|Module Name|SkyAR|
+| :--- | :---: |
+|Category|Video editing|
+|Network|UNet|
+|Dataset|-|
+|Fine-tuning supported or not|No|
+|Module Size|206MB|
+|Data indicators|-|
+|Latest update date|2021-02-26|
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+ * Input video:
+
+ ![Input video](https://img-blog.csdnimg.cn/20210126142046572.gif)
+
+ * Jupiter:
+
+ ![Jupiter](https://img-blog.csdnimg.cn/20210125211435619.gif)
+ * Rainy day:
+
+ ![Rainy day](https://img-blog.csdnimg.cn/2021012521152492.gif)
+ * Galaxy:
+
+ ![Galaxy](https://img-blog.csdnimg.cn/20210125211523491.gif)
+ * Ninth area spacecraft:
+
+ ![Ninth area spacecraft](https://img-blog.csdnimg.cn/20210125211520955.gif)
+
+ * Input video:
+
+ ![Input video](https://img-blog.csdnimg.cn/20210126142038716.gif)
+ * Floating castle:
+
+ ![Floating castle](https://img-blog.csdnimg.cn/20210125211514997.gif)
+ * Thunder and lightning:
+
+ ![Thunder and lightning](https://img-blog.csdnimg.cn/20210125211433591.gif)
+
+ * Super moon:
+
+ ![Super moon](https://img-blog.csdnimg.cn/20210125211417524.gif)
+
+- ### Module Introduction
+
+ - SkyAR is based on [Castle in the Sky: Dynamic Sky Replacement and Harmonization in Videos](https://arxiv.org/abs/2010.11800). It mainly consists of three parts: sky matting network, motion estimation and image fusion.
+
+ - For more information, please refer to:[SkyAR](https://github.com/jiupinjia/SkyAR)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $hub install SkyAR
+ ```
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='SkyAR')
+
+ model.MagicSky(
+ video_path=[path to input video path],
+ save_path=[path to save video path]
+ )
+ ```
+- ### 2、API
+
+ ```python
+ def MagicSky(
+ video_path, save_path, config='jupiter',
+ is_rainy=False, preview_frames_num=0, is_video_sky=False, is_show=False,
+ skybox_img=None, skybox_video=None, rain_cap_path=None,
+ halo_effect=True, auto_light_matching=False,
+ relighting_factor=0.8, recoloring_factor=0.5, skybox_center_crop=0.5
+ )
+ ```
+
+ - **Parameter**
+
+ * video_path(str):input video path.
+ * save_path(str):save videp path.
+ * config(str): SkyBox configuration, all preset configurations are as follows: `['cloudy', 'district9ship', 'floatingcastle', 'galaxy', 'jupiter',
+ 'rainy', 'sunny', 'sunset', 'supermoon', 'thunderstorm'
+ ]`, if you use a custom SkyBox, please set it to None.
+
+ * skybox_img(str):custom SkyBox image path
+ * skybox_video(str):custom SkyBox video path
+ * is_video_sky(bool):customize whether SkyBox is a video
+ * rain_cap_path(str):custom video path with rain
+ * is_rainy(bool): whether the sky is raining
+ * halo_effect(bool):whether to open halo effect
+ * auto_light_matching(bool):whether to enable automatic brightness matching
+ * relighting_factor(float): relighting factor
+ * recoloring_factor(float): recoloring factor
+ * skybox_center_crop(float):skyBox center crop factor
+ * preview_frames_num(int):set the number of preview frames
+ * is_show(bool):whether to preview graphically
+
+
+## IV. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/video/classification/nonlocal_kinetics400/README.md b/modules/video/classification/nonlocal_kinetics400/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..0e88d19b45bfcc4b7427aed84823949984660496
--- /dev/null
+++ b/modules/video/classification/nonlocal_kinetics400/README.md
@@ -0,0 +1,109 @@
+# nonlocal_kinetics400
+
+|模型名称|nonlocal_kinetics400|
+| :--- | :---: |
+|类别|视频-视频分类|
+|网络|Non-local|
+|数据集|Kinetics-400|
+|是否支持Fine-tuning|否|
+|模型大小|129MB|
+|最新更新日期|2021-02-26|
+|数据指标|-|
+
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - Non-local Neural Networks是由Xiaolong Wang等研究者在2017年提出的模型,主要特点是通过引入Non-local操作来描述距离较远的像素点之间的关联关系。其借助于传统计算机视觉中的non-local mean的思想,并将该思想扩展到神经网络中,通过定义输出位置和所有输入位置之间的关联函数,建立全局关联特性。Non-local模型的训练数据采用由DeepMind公布的Kinetics-400动作识别数据集。该PaddleHub Module可支持预测。
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.4.0
+
+ - paddlehub >= 1.0.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install nonlocal_kinetics400
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ hub run nonlocal_kinetics400 --input_path "/PATH/TO/VIDEO" --use_gpu True
+ ```
+
+ 或者
+
+ - ```shell
+ hub run nonlocal_kinetics400 --input_file test.txt --use_gpu True
+ ```
+
+ - test.txt 存放待分类视频的存放路径;
+ - Note: 该PaddleHub Module目前只支持在GPU环境下使用,在使用前,请使用下述命令指定GPU设备(设备ID请根据实际情况指定)
+
+ - ```shell
+ export CUDA_VISIBLE_DEVICES=0
+ ```
+
+ - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+
+ import paddlehub as hub
+
+ nonlocal = hub.Module(name="nonlocal_kinetics400")
+
+ test_video_path = "/PATH/TO/VIDEO"
+
+ # set input dict
+ input_dict = {"image": [test_video_path]}
+
+ # execute predict and print the result
+ results = nonlocal.video_classification(data=input_dict)
+ for result in results:
+ print(result)
+ ```
+
+- ### 3、API
+
+ - ```python
+ def video_classification(data)
+ ```
+
+ - 用于视频分类预测
+
+ - **参数**
+
+ - data(dict): dict类型,key为image,str类型;value为待分类的视频路径,list类型。
+
+
+ - **返回**
+
+ - result(list\[dict\]): list类型,每个元素为对应输入视频的预测结果。预测结果为dict类型,key为label,value为该label对应的概率值。
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install nonlocal_kinetics400==1.0.0
+ ```
diff --git a/modules/video/classification/stnet_kinetics400/README.md b/modules/video/classification/stnet_kinetics400/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..4cbed17470154ced6d747c1b01933f4d80109693
--- /dev/null
+++ b/modules/video/classification/stnet_kinetics400/README.md
@@ -0,0 +1,106 @@
+# stnet_kinetics400
+
+|模型名称|stnet_kinetics400|
+| :--- | :---: |
+|类别|视频-视频分类|
+|网络|StNet|
+|数据集|Kinetics-400|
+|是否支持Fine-tuning|否|
+|模型大小|129MB|
+|最新更新日期|2021-02-26|
+|数据指标|-|
+
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - StNet模型框架为ActivityNet Kinetics Challenge 2018中夺冠的基础网络框架,是基于ResNet50实现的。该模型提出super-image的概念,在super-image上进行2D卷积,建模视频中局部时空相关性。另外通过temporal modeling block建模视频的全局时空依赖,最后用一个temporal Xception block对抽取的特征序列进行长时序建模。StNet的训练数据采用由DeepMind公布的Kinetics-400动作识别数据集。该PaddleHub Module可支持预测。
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.4.0
+
+ - paddlehub >= 1.0.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install stnet_kinetics400
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ hub run stnet_kinetics400 --input_path "/PATH/TO/VIDEO"
+ ```
+
+ 或者
+
+ - ```shell
+ hub run stnet_kinetics400 --input_file test.txt
+ ```
+
+ - test.txt 存放待分类视频的存放路径
+
+
+ - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+
+ import paddlehub as hub
+
+ stnet = hub.Module(name="stnet_kinetics400")
+
+ test_video_path = "/PATH/TO/VIDEO"
+
+ # set input dict
+ input_dict = {"image": [test_video_path]}
+
+ # execute predict and print the result
+ results = stnet.video_classification(data=input_dict)
+ for result in results:
+ print(result)
+ ```
+
+- ### 3、API
+
+ - ```python
+ def video_classification(data)
+ ```
+
+ - 用于视频分类预测
+
+ - **参数**
+
+ - data(dict): dict类型,key为image,str类型;value为待分类的视频路径,list类型。
+
+
+ - **返回**
+
+ - result(list\[dict\]): list类型,每个元素为对应输入视频的预测结果。预测结果为dict类型,key为label,value为该label对应的概率值。
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install stnet_kinetics400==1.0.0
+ ```
diff --git a/modules/video/classification/tsm_kinetics400/README.md b/modules/video/classification/tsm_kinetics400/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..5301071bcbd79b96dce5f2eafb9005d340ba0820
--- /dev/null
+++ b/modules/video/classification/tsm_kinetics400/README.md
@@ -0,0 +1,106 @@
+# tsm_kinetics400
+
+|模型名称|tsm_kinetics400|
+| :--- | :---: |
+|类别|视频-视频分类|
+|网络|TSM|
+|数据集|Kinetics-400|
+|是否支持Fine-tuning|否|
+|模型大小|95MB|
+|最新更新日期|2021-02-26|
+|数据指标|-|
+
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - TSM(Temporal Shift Module)是由MIT和IBM Watson AI Lab的JiLin,ChuangGan和SongHan等人提出的通过时间位移来提高网络视频理解能力的模块。TSM的训练数据采用由DeepMind公布的Kinetics-400动作识别数据集。该PaddleHub Module可支持预测。
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.4.0
+
+ - paddlehub >= 1.0.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install tsm_kinetics400
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ hub run tsm_kinetics400 --input_path "/PATH/TO/VIDEO"
+ ```
+
+ 或者
+
+ - ```shell
+ hub run tsm_kinetics400 --input_file test.txt
+ ```
+
+ - Note: test.txt 存放待分类视频的存放路径
+
+
+ - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+
+ import paddlehub as hub
+
+ tsm = hub.Module(name="tsm_kinetics400")
+
+ test_video_path = "/PATH/TO/VIDEO"
+
+ # set input dict
+ input_dict = {"image": [test_video_path]}
+
+ # execute predict and print the result
+ results = tsm.video_classification(data=input_dict)
+ for result in results:
+ print(result)
+ ```
+
+- ### 3、API
+
+ - ```python
+ def video_classification(data)
+ ```
+
+ - 用于视频分类预测
+
+ - **参数**
+
+ - data(dict): dict类型,key为image,str类型;value为待分类的视频路径,list类型。
+
+
+ - **返回**
+
+ - result(list\[dict\]): list类型,每个元素为对应输入视频的预测结果。预测结果为dict类型,key为label,value为该label对应的概率值。
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install tsm_kinetics400==1.0.0
+ ```
diff --git a/modules/video/classification/tsn_kinetics400/README.md b/modules/video/classification/tsn_kinetics400/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e2d2e87630277819c0cf5b269d56e2128a05c32d
--- /dev/null
+++ b/modules/video/classification/tsn_kinetics400/README.md
@@ -0,0 +1,108 @@
+# tsn_kinetics400
+
+|模型名称|tsn_kinetics400|
+| :--- | :---: |
+|类别|视频-视频分类|
+|网络|TSN|
+|数据集|Kinetics-400|
+|是否支持Fine-tuning|否|
+|模型大小|95MB|
+|最新更新日期|2021-02-26|
+|数据指标|-|
+
+
+
+## 一、模型基本信息
+
+- ### 模型介绍
+
+ - TSN(Temporal Segment Network)是视频分类领域经典的基于2D-CNN的解决方案。该方法主要解决视频的长时间行为判断问题,通过稀疏采样视频帧的方式代替稠密采样,既能捕获视频全局信息,也能去除冗余,降低计算量。最终将每帧特征平均融合后得到视频的整体特征,并用于分类。TSN的训练数据采用由DeepMind公布的Kinetics-400动作识别数据集。该PaddleHub Module可支持预测。
+
+ - 具体网络结构可参考论文:[TSN](https://arxiv.org/abs/1608.00859)。
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.4.0
+
+ - paddlehub >= 1.0.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install tsn_kinetics400
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ hub run tsn_kinetics400 --input_path "/PATH/TO/VIDEO"
+ ```
+
+ 或者
+
+ - ```shell
+ hub run tsn_kinetics400 --input_file test.txt
+ ```
+
+ - Note: test.txt 存放待分类视频的存放路径
+
+
+ - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+
+ import paddlehub as hub
+
+ tsn = hub.Module(name="tsn_kinetics400")
+
+ test_video_path = "/PATH/TO/VIDEO"
+
+ # set input dict
+ input_dict = {"image": [test_video_path]}
+
+ # execute predict and print the result
+ results = tsn.video_classification(data=input_dict)
+ for result in results:
+ print(result)
+ ```
+
+- ### 3、API
+
+ - ```python
+ def video_classification(data)
+ ```
+
+ - 用于视频分类预测
+
+ - **参数**
+
+ - data(dict): dict类型,key为image,str类型;value为待分类的视频路径,list类型。
+
+
+ - **返回**
+
+ - result(list\[dict\]): list类型,每个元素为对应输入视频的预测结果。预测结果为dict类型,key为label,value为该label对应的概率值。
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install tsn_kinetics400==1.0.0
+ ```
diff --git a/modules/video/multiple_object_tracking/fairmot_dla34/config/_base_/fairmot_dla34.yml b/modules/video/multiple_object_tracking/fairmot_dla34/config/_base_/fairmot_dla34.yml
index c5f07de702fbeb594c9eeda60d709c0c40af8b1b..e2ca32a2b6c31d66a1b8f5fa42d278d0609dbdca 100644
--- a/modules/video/multiple_object_tracking/fairmot_dla34/config/_base_/fairmot_dla34.yml
+++ b/modules/video/multiple_object_tracking/fairmot_dla34/config/_base_/fairmot_dla34.yml
@@ -5,7 +5,7 @@ FairMOT:
detector: CenterNet
reid: FairMOTEmbeddingHead
loss: FairMOTLoss
- tracker: JDETracker
+ tracker: FrozenJDETracker
CenterNet:
backbone: DLA
diff --git a/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/__init__.py b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..258e4c9010832936f098e6febe777ac556f0668f
--- /dev/null
+++ b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/__init__.py
@@ -0,0 +1,25 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from . import matching
+from . import tracker
+from . import motion
+from . import visualization
+from . import utils
+
+from .matching import *
+from .tracker import *
+from .motion import *
+from .visualization import *
+from .utils import *
diff --git a/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/matching/__init__.py b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/matching/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..54c6680f79f16247c562a9da1024dd3e1de4c57f
--- /dev/null
+++ b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/matching/__init__.py
@@ -0,0 +1,19 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from . import jde_matching
+from . import deepsort_matching
+
+from .jde_matching import *
+from .deepsort_matching import *
diff --git a/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/matching/deepsort_matching.py b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/matching/deepsort_matching.py
new file mode 100644
index 0000000000000000000000000000000000000000..c55aa8876cc128f512aa4e2e4e48a935a3f8dd77
--- /dev/null
+++ b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/matching/deepsort_matching.py
@@ -0,0 +1,368 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This code is borrow from https://github.com/nwojke/deep_sort/tree/master/deep_sort
+"""
+
+import numpy as np
+from scipy.optimize import linear_sum_assignment
+from ..motion import kalman_filter
+
+INFTY_COST = 1e+5
+
+__all__ = [
+ 'iou_1toN',
+ 'iou_cost',
+ '_nn_euclidean_distance',
+ '_nn_cosine_distance',
+ 'NearestNeighborDistanceMetric',
+ 'min_cost_matching',
+ 'matching_cascade',
+ 'gate_cost_matrix',
+]
+
+
+def iou_1toN(bbox, candidates):
+ """
+ Computer intersection over union (IoU) by one box to N candidates.
+
+ Args:
+ bbox (ndarray): A bounding box in format `(top left x, top left y, width, height)`.
+ candidates (ndarray): A matrix of candidate bounding boxes (one per row) in the
+ same format as `bbox`.
+
+ Returns:
+ ious (ndarray): The intersection over union in [0, 1] between the `bbox`
+ and each candidate. A higher score means a larger fraction of the
+ `bbox` is occluded by the candidate.
+ """
+ bbox_tl = bbox[:2]
+ bbox_br = bbox[:2] + bbox[2:]
+ candidates_tl = candidates[:, :2]
+ candidates_br = candidates[:, :2] + candidates[:, 2:]
+
+ tl = np.c_[np.maximum(bbox_tl[0], candidates_tl[:, 0])[:, np.newaxis],
+ np.maximum(bbox_tl[1], candidates_tl[:, 1])[:, np.newaxis]]
+ br = np.c_[np.minimum(bbox_br[0], candidates_br[:, 0])[:, np.newaxis],
+ np.minimum(bbox_br[1], candidates_br[:, 1])[:, np.newaxis]]
+ wh = np.maximum(0., br - tl)
+
+ area_intersection = wh.prod(axis=1)
+ area_bbox = bbox[2:].prod()
+ area_candidates = candidates[:, 2:].prod(axis=1)
+ ious = area_intersection / (area_bbox + area_candidates - area_intersection)
+ return ious
+
+
+def iou_cost(tracks, detections, track_indices=None, detection_indices=None):
+ """
+ IoU distance metric.
+
+ Args:
+ tracks (list[Track]): A list of tracks.
+ detections (list[Detection]): A list of detections.
+ track_indices (Optional[list[int]]): A list of indices to tracks that
+ should be matched. Defaults to all `tracks`.
+ detection_indices (Optional[list[int]]): A list of indices to detections
+ that should be matched. Defaults to all `detections`.
+
+ Returns:
+ cost_matrix (ndarray): A cost matrix of shape len(track_indices),
+ len(detection_indices) where entry (i, j) is
+ `1 - iou(tracks[track_indices[i]], detections[detection_indices[j]])`.
+ """
+ if track_indices is None:
+ track_indices = np.arange(len(tracks))
+ if detection_indices is None:
+ detection_indices = np.arange(len(detections))
+
+ cost_matrix = np.zeros((len(track_indices), len(detection_indices)))
+ for row, track_idx in enumerate(track_indices):
+ if tracks[track_idx].time_since_update > 1:
+ cost_matrix[row, :] = 1e+5
+ continue
+
+ bbox = tracks[track_idx].to_tlwh()
+ candidates = np.asarray([detections[i].tlwh for i in detection_indices])
+ cost_matrix[row, :] = 1. - iou_1toN(bbox, candidates)
+ return cost_matrix
+
+
+def _nn_euclidean_distance(s, q):
+ """
+ Compute pair-wise squared (Euclidean) distance between points in `s` and `q`.
+
+ Args:
+ s (ndarray): Sample points: an NxM matrix of N samples of dimensionality M.
+ q (ndarray): Query points: an LxM matrix of L samples of dimensionality M.
+
+ Returns:
+ distances (ndarray): A vector of length M that contains for each entry in `q` the
+ smallest Euclidean distance to a sample in `s`.
+ """
+ s, q = np.asarray(s), np.asarray(q)
+ if len(s) == 0 or len(q) == 0:
+ return np.zeros((len(s), len(q)))
+ s2, q2 = np.square(s).sum(axis=1), np.square(q).sum(axis=1)
+ distances = -2. * np.dot(s, q.T) + s2[:, None] + q2[None, :]
+ distances = np.clip(distances, 0., float(np.inf))
+
+ return np.maximum(0.0, distances.min(axis=0))
+
+
+def _nn_cosine_distance(s, q):
+ """
+ Compute pair-wise cosine distance between points in `s` and `q`.
+
+ Args:
+ s (ndarray): Sample points: an NxM matrix of N samples of dimensionality M.
+ q (ndarray): Query points: an LxM matrix of L samples of dimensionality M.
+
+ Returns:
+ distances (ndarray): A vector of length M that contains for each entry in `q` the
+ smallest Euclidean distance to a sample in `s`.
+ """
+ s = np.asarray(s) / np.linalg.norm(s, axis=1, keepdims=True)
+ q = np.asarray(q) / np.linalg.norm(q, axis=1, keepdims=True)
+ distances = 1. - np.dot(s, q.T)
+
+ return distances.min(axis=0)
+
+
+class NearestNeighborDistanceMetric(object):
+ """
+ A nearest neighbor distance metric that, for each target, returns
+ the closest distance to any sample that has been observed so far.
+
+ Args:
+ metric (str): Either "euclidean" or "cosine".
+ matching_threshold (float): The matching threshold. Samples with larger
+ distance are considered an invalid match.
+ budget (Optional[int]): If not None, fix samples per class to at most
+ this number. Removes the oldest samples when the budget is reached.
+
+ Attributes:
+ samples (Dict[int -> List[ndarray]]): A dictionary that maps from target
+ identities to the list of samples that have been observed so far.
+ """
+
+ def __init__(self, metric, matching_threshold, budget=None):
+ if metric == "euclidean":
+ self._metric = _nn_euclidean_distance
+ elif metric == "cosine":
+ self._metric = _nn_cosine_distance
+ else:
+ raise ValueError("Invalid metric; must be either 'euclidean' or 'cosine'")
+ self.matching_threshold = matching_threshold
+ self.budget = budget
+ self.samples = {}
+
+ def partial_fit(self, features, targets, active_targets):
+ """
+ Update the distance metric with new data.
+
+ Args:
+ features (ndarray): An NxM matrix of N features of dimensionality M.
+ targets (ndarray): An integer array of associated target identities.
+ active_targets (List[int]): A list of targets that are currently
+ present in the scene.
+ """
+ for feature, target in zip(features, targets):
+ self.samples.setdefault(target, []).append(feature)
+ if self.budget is not None:
+ self.samples[target] = self.samples[target][-self.budget:]
+ self.samples = {k: self.samples[k] for k in active_targets}
+
+ def distance(self, features, targets):
+ """
+ Compute distance between features and targets.
+
+ Args:
+ features (ndarray): An NxM matrix of N features of dimensionality M.
+ targets (list[int]): A list of targets to match the given `features` against.
+
+ Returns:
+ cost_matrix (ndarray): a cost matrix of shape len(targets), len(features),
+ where element (i, j) contains the closest squared distance between
+ `targets[i]` and `features[j]`.
+ """
+ cost_matrix = np.zeros((len(targets), len(features)))
+ for i, target in enumerate(targets):
+ cost_matrix[i, :] = self._metric(self.samples[target], features)
+ return cost_matrix
+
+
+def min_cost_matching(distance_metric, max_distance, tracks, detections, track_indices=None, detection_indices=None):
+ """
+ Solve linear assignment problem.
+
+ Args:
+ distance_metric :
+ Callable[List[Track], List[Detection], List[int], List[int]) -> ndarray
+ The distance metric is given a list of tracks and detections as
+ well as a list of N track indices and M detection indices. The
+ metric should return the NxM dimensional cost matrix, where element
+ (i, j) is the association cost between the i-th track in the given
+ track indices and the j-th detection in the given detection_indices.
+ max_distance (float): Gating threshold. Associations with cost larger
+ than this value are disregarded.
+ tracks (list[Track]): A list of predicted tracks at the current time
+ step.
+ detections (list[Detection]): A list of detections at the current time
+ step.
+ track_indices (list[int]): List of track indices that maps rows in
+ `cost_matrix` to tracks in `tracks`.
+ detection_indices (List[int]): List of detection indices that maps
+ columns in `cost_matrix` to detections in `detections`.
+
+ Returns:
+ A tuple (List[(int, int)], List[int], List[int]) with the following
+ three entries:
+ * A list of matched track and detection indices.
+ * A list of unmatched track indices.
+ * A list of unmatched detection indices.
+ """
+ if track_indices is None:
+ track_indices = np.arange(len(tracks))
+ if detection_indices is None:
+ detection_indices = np.arange(len(detections))
+
+ if len(detection_indices) == 0 or len(track_indices) == 0:
+ return [], track_indices, detection_indices # Nothing to match.
+
+ cost_matrix = distance_metric(tracks, detections, track_indices, detection_indices)
+
+ cost_matrix[cost_matrix > max_distance] = max_distance + 1e-5
+ indices = linear_sum_assignment(cost_matrix)
+
+ matches, unmatched_tracks, unmatched_detections = [], [], []
+ for col, detection_idx in enumerate(detection_indices):
+ if col not in indices[1]:
+ unmatched_detections.append(detection_idx)
+ for row, track_idx in enumerate(track_indices):
+ if row not in indices[0]:
+ unmatched_tracks.append(track_idx)
+ for row, col in zip(indices[0], indices[1]):
+ track_idx = track_indices[row]
+ detection_idx = detection_indices[col]
+ if cost_matrix[row, col] > max_distance:
+ unmatched_tracks.append(track_idx)
+ unmatched_detections.append(detection_idx)
+ else:
+ matches.append((track_idx, detection_idx))
+ return matches, unmatched_tracks, unmatched_detections
+
+
+def matching_cascade(distance_metric,
+ max_distance,
+ cascade_depth,
+ tracks,
+ detections,
+ track_indices=None,
+ detection_indices=None):
+ """
+ Run matching cascade.
+
+ Args:
+ distance_metric :
+ Callable[List[Track], List[Detection], List[int], List[int]) -> ndarray
+ The distance metric is given a list of tracks and detections as
+ well as a list of N track indices and M detection indices. The
+ metric should return the NxM dimensional cost matrix, where element
+ (i, j) is the association cost between the i-th track in the given
+ track indices and the j-th detection in the given detection_indices.
+ max_distance (float): Gating threshold. Associations with cost larger
+ than this value are disregarded.
+ cascade_depth (int): The cascade depth, should be se to the maximum
+ track age.
+ tracks (list[Track]): A list of predicted tracks at the current time
+ step.
+ detections (list[Detection]): A list of detections at the current time
+ step.
+ track_indices (list[int]): List of track indices that maps rows in
+ `cost_matrix` to tracks in `tracks`.
+ detection_indices (List[int]): List of detection indices that maps
+ columns in `cost_matrix` to detections in `detections`.
+
+ Returns:
+ A tuple (List[(int, int)], List[int], List[int]) with the following
+ three entries:
+ * A list of matched track and detection indices.
+ * A list of unmatched track indices.
+ * A list of unmatched detection indices.
+ """
+ if track_indices is None:
+ track_indices = list(range(len(tracks)))
+ if detection_indices is None:
+ detection_indices = list(range(len(detections)))
+
+ unmatched_detections = detection_indices
+ matches = []
+ for level in range(cascade_depth):
+ if len(unmatched_detections) == 0: # No detections left
+ break
+
+ track_indices_l = [k for k in track_indices if tracks[k].time_since_update == 1 + level]
+ if len(track_indices_l) == 0: # Nothing to match at this level
+ continue
+
+ matches_l, _, unmatched_detections = \
+ min_cost_matching(
+ distance_metric, max_distance, tracks, detections,
+ track_indices_l, unmatched_detections)
+ matches += matches_l
+ unmatched_tracks = list(set(track_indices) - set(k for k, _ in matches))
+ return matches, unmatched_tracks, unmatched_detections
+
+
+def gate_cost_matrix(kf,
+ cost_matrix,
+ tracks,
+ detections,
+ track_indices,
+ detection_indices,
+ gated_cost=INFTY_COST,
+ only_position=False):
+ """
+ Invalidate infeasible entries in cost matrix based on the state
+ distributions obtained by Kalman filtering.
+
+ Args:
+ kf (object): The Kalman filter.
+ cost_matrix (ndarray): The NxM dimensional cost matrix, where N is the
+ number of track indices and M is the number of detection indices,
+ such that entry (i, j) is the association cost between
+ `tracks[track_indices[i]]` and `detections[detection_indices[j]]`.
+ tracks (list[Track]): A list of predicted tracks at the current time
+ step.
+ detections (list[Detection]): A list of detections at the current time
+ step.
+ track_indices (List[int]): List of track indices that maps rows in
+ `cost_matrix` to tracks in `tracks`.
+ detection_indices (List[int]): List of detection indices that maps
+ columns in `cost_matrix` to detections in `detections`.
+ gated_cost (Optional[float]): Entries in the cost matrix corresponding
+ to infeasible associations are set this value. Defaults to a very
+ large value.
+ only_position (Optional[bool]): If True, only the x, y position of the
+ state distribution is considered during gating. Default False.
+ """
+ gating_dim = 2 if only_position else 4
+ gating_threshold = kalman_filter.chi2inv95[gating_dim]
+ measurements = np.asarray([detections[i].to_xyah() for i in detection_indices])
+ for row, track_idx in enumerate(track_indices):
+ track = tracks[track_idx]
+ gating_distance = kf.gating_distance(track.mean, track.covariance, measurements, only_position)
+ cost_matrix[row, gating_distance > gating_threshold] = gated_cost
+ return cost_matrix
diff --git a/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/matching/jde_matching.py b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/matching/jde_matching.py
new file mode 100644
index 0000000000000000000000000000000000000000..bf2e891c391c98ed8944f88377f62c9722fa5155
--- /dev/null
+++ b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/matching/jde_matching.py
@@ -0,0 +1,123 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This code is borrow from https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/tracker/matching.py
+"""
+
+import lap
+import scipy
+import numpy as np
+from scipy.spatial.distance import cdist
+from ..motion import kalman_filter
+
+from ppdet.utils.logger import setup_logger
+logger = setup_logger(__name__)
+
+__all__ = [
+ 'merge_matches',
+ 'linear_assignment',
+ 'cython_bbox_ious',
+ 'iou_distance',
+ 'embedding_distance',
+ 'fuse_motion',
+]
+
+
+def merge_matches(m1, m2, shape):
+ O, P, Q = shape
+ m1 = np.asarray(m1)
+ m2 = np.asarray(m2)
+
+ M1 = scipy.sparse.coo_matrix((np.ones(len(m1)), (m1[:, 0], m1[:, 1])), shape=(O, P))
+ M2 = scipy.sparse.coo_matrix((np.ones(len(m2)), (m2[:, 0], m2[:, 1])), shape=(P, Q))
+
+ mask = M1 * M2
+ match = mask.nonzero()
+ match = list(zip(match[0], match[1]))
+ unmatched_O = tuple(set(range(O)) - set([i for i, j in match]))
+ unmatched_Q = tuple(set(range(Q)) - set([j for i, j in match]))
+
+ return match, unmatched_O, unmatched_Q
+
+
+def linear_assignment(cost_matrix, thresh):
+ if cost_matrix.size == 0:
+ return np.empty((0, 2), dtype=int), tuple(range(cost_matrix.shape[0])), tuple(range(cost_matrix.shape[1]))
+ matches, unmatched_a, unmatched_b = [], [], []
+ cost, x, y = lap.lapjv(cost_matrix, extend_cost=True, cost_limit=thresh)
+ for ix, mx in enumerate(x):
+ if mx >= 0:
+ matches.append([ix, mx])
+ unmatched_a = np.where(x < 0)[0]
+ unmatched_b = np.where(y < 0)[0]
+ matches = np.asarray(matches)
+ return matches, unmatched_a, unmatched_b
+
+
+def cython_bbox_ious(atlbrs, btlbrs):
+ ious = np.zeros((len(atlbrs), len(btlbrs)), dtype=np.float)
+ if ious.size == 0:
+ return ious
+ try:
+ import cython_bbox
+ except Exception as e:
+ logger.error('cython_bbox not found, please install cython_bbox.' 'for example: `pip install cython_bbox`.')
+ raise e
+
+ ious = cython_bbox.bbox_overlaps(
+ np.ascontiguousarray(atlbrs, dtype=np.float), np.ascontiguousarray(btlbrs, dtype=np.float))
+ return ious
+
+
+def iou_distance(atracks, btracks):
+ """
+ Compute cost based on IoU between two list[STrack].
+ """
+ if (len(atracks) > 0 and isinstance(atracks[0], np.ndarray)) or (len(btracks) > 0
+ and isinstance(btracks[0], np.ndarray)):
+ atlbrs = atracks
+ btlbrs = btracks
+ else:
+ atlbrs = [track.tlbr for track in atracks]
+ btlbrs = [track.tlbr for track in btracks]
+ _ious = cython_bbox_ious(atlbrs, btlbrs)
+ cost_matrix = 1 - _ious
+
+ return cost_matrix
+
+
+def embedding_distance(tracks, detections, metric='euclidean'):
+ """
+ Compute cost based on features between two list[STrack].
+ """
+ cost_matrix = np.zeros((len(tracks), len(detections)), dtype=np.float)
+ if cost_matrix.size == 0:
+ return cost_matrix
+ det_features = np.asarray([track.curr_feat for track in detections], dtype=np.float)
+ track_features = np.asarray([track.smooth_feat for track in tracks], dtype=np.float)
+ cost_matrix = np.maximum(0.0, cdist(track_features, det_features, metric)) # Nomalized features
+ return cost_matrix
+
+
+def fuse_motion(kf, cost_matrix, tracks, detections, only_position=False, lambda_=0.98):
+ if cost_matrix.size == 0:
+ return cost_matrix
+ gating_dim = 2 if only_position else 4
+ gating_threshold = kalman_filter.chi2inv95[gating_dim]
+ measurements = np.asarray([det.to_xyah() for det in detections])
+ for row, track in enumerate(tracks):
+ gating_distance = kf.gating_distance(track.mean, track.covariance, measurements, only_position, metric='maha')
+ cost_matrix[row, gating_distance > gating_threshold] = np.inf
+ cost_matrix[row] = lambda_ * cost_matrix[row] + (1 - lambda_) * gating_distance
+ return cost_matrix
diff --git a/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/motion/__init__.py b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/motion/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e42dd0b019d66d6ea07bec1ad90cf9a8d53d8172
--- /dev/null
+++ b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/motion/__init__.py
@@ -0,0 +1,17 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from . import kalman_filter
+
+from .kalman_filter import *
diff --git a/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/motion/kalman_filter.py b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/motion/kalman_filter.py
new file mode 100644
index 0000000000000000000000000000000000000000..7cc182e4c5e76e0688688c883b2a24fa30df9c74
--- /dev/null
+++ b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/motion/kalman_filter.py
@@ -0,0 +1,237 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This code is borrow from https://github.com/nwojke/deep_sort/blob/master/deep_sort/kalman_filter.py
+"""
+
+import numpy as np
+import scipy.linalg
+
+__all__ = ['KalmanFilter']
+"""
+Table for the 0.95 quantile of the chi-square distribution with N degrees of
+freedom (contains values for N=1, ..., 9). Taken from MATLAB/Octave's chi2inv
+function and used as Mahalanobis gating threshold.
+"""
+
+chi2inv95 = {1: 3.8415, 2: 5.9915, 3: 7.8147, 4: 9.4877, 5: 11.070, 6: 12.592, 7: 14.067, 8: 15.507, 9: 16.919}
+
+
+class KalmanFilter(object):
+ """
+ A simple Kalman filter for tracking bounding boxes in image space.
+
+ The 8-dimensional state space
+
+ x, y, a, h, vx, vy, va, vh
+
+ contains the bounding box center position (x, y), aspect ratio a, height h,
+ and their respective velocities.
+
+ Object motion follows a constant velocity model. The bounding box location
+ (x, y, a, h) is taken as direct observation of the state space (linear
+ observation model).
+
+ """
+
+ def __init__(self):
+ ndim, dt = 4, 1.
+
+ # Create Kalman filter model matrices.
+ self._motion_mat = np.eye(2 * ndim, 2 * ndim)
+ for i in range(ndim):
+ self._motion_mat[i, ndim + i] = dt
+ self._update_mat = np.eye(ndim, 2 * ndim)
+
+ # Motion and observation uncertainty are chosen relative to the current
+ # state estimate. These weights control the amount of uncertainty in
+ # the model. This is a bit hacky.
+ self._std_weight_position = 1. / 20
+ self._std_weight_velocity = 1. / 160
+
+ def initiate(self, measurement):
+ """
+ Create track from unassociated measurement.
+
+ Args:
+ measurement (ndarray): Bounding box coordinates (x, y, a, h) with
+ center position (x, y), aspect ratio a, and height h.
+
+ Returns:
+ The mean vector (8 dimensional) and covariance matrix (8x8
+ dimensional) of the new track. Unobserved velocities are
+ initialized to 0 mean.
+ """
+ mean_pos = measurement
+ mean_vel = np.zeros_like(mean_pos)
+ mean = np.r_[mean_pos, mean_vel]
+
+ std = [
+ 2 * self._std_weight_position * measurement[3], 2 * self._std_weight_position * measurement[3], 1e-2,
+ 2 * self._std_weight_position * measurement[3], 10 * self._std_weight_velocity * measurement[3],
+ 10 * self._std_weight_velocity * measurement[3], 1e-5, 10 * self._std_weight_velocity * measurement[3]
+ ]
+ covariance = np.diag(np.square(std))
+ return mean, covariance
+
+ def predict(self, mean, covariance):
+ """
+ Run Kalman filter prediction step.
+
+ Args:
+ mean (ndarray): The 8 dimensional mean vector of the object state
+ at the previous time step.
+ covariance (ndarray): The 8x8 dimensional covariance matrix of the
+ object state at the previous time step.
+
+ Returns:
+ The mean vector and covariance matrix of the predicted state.
+ Unobserved velocities are initialized to 0 mean.
+ """
+ std_pos = [
+ self._std_weight_position * mean[3], self._std_weight_position * mean[3], 1e-2,
+ self._std_weight_position * mean[3]
+ ]
+ std_vel = [
+ self._std_weight_velocity * mean[3], self._std_weight_velocity * mean[3], 1e-5,
+ self._std_weight_velocity * mean[3]
+ ]
+ motion_cov = np.diag(np.square(np.r_[std_pos, std_vel]))
+
+ #mean = np.dot(self._motion_mat, mean)
+ mean = np.dot(mean, self._motion_mat.T)
+ covariance = np.linalg.multi_dot((self._motion_mat, covariance, self._motion_mat.T)) + motion_cov
+
+ return mean, covariance
+
+ def project(self, mean, covariance):
+ """
+ Project state distribution to measurement space.
+
+ Args
+ mean (ndarray): The state's mean vector (8 dimensional array).
+ covariance (ndarray): The state's covariance matrix (8x8 dimensional).
+
+ Returns:
+ The projected mean and covariance matrix of the given state estimate.
+ """
+ std = [
+ self._std_weight_position * mean[3], self._std_weight_position * mean[3], 1e-1,
+ self._std_weight_position * mean[3]
+ ]
+ innovation_cov = np.diag(np.square(std))
+
+ mean = np.dot(self._update_mat, mean)
+ covariance = np.linalg.multi_dot((self._update_mat, covariance, self._update_mat.T))
+ return mean, covariance + innovation_cov
+
+ def multi_predict(self, mean, covariance):
+ """
+ Run Kalman filter prediction step (Vectorized version).
+
+ Args:
+ mean (ndarray): The Nx8 dimensional mean matrix of the object states
+ at the previous time step.
+ covariance (ndarray): The Nx8x8 dimensional covariance matrics of the
+ object states at the previous time step.
+
+ Returns:
+ The mean vector and covariance matrix of the predicted state.
+ Unobserved velocities are initialized to 0 mean.
+ """
+ std_pos = [
+ self._std_weight_position * mean[:, 3], self._std_weight_position * mean[:, 3],
+ 1e-2 * np.ones_like(mean[:, 3]), self._std_weight_position * mean[:, 3]
+ ]
+ std_vel = [
+ self._std_weight_velocity * mean[:, 3], self._std_weight_velocity * mean[:, 3],
+ 1e-5 * np.ones_like(mean[:, 3]), self._std_weight_velocity * mean[:, 3]
+ ]
+ sqr = np.square(np.r_[std_pos, std_vel]).T
+
+ motion_cov = []
+ for i in range(len(mean)):
+ motion_cov.append(np.diag(sqr[i]))
+ motion_cov = np.asarray(motion_cov)
+
+ mean = np.dot(mean, self._motion_mat.T)
+ left = np.dot(self._motion_mat, covariance).transpose((1, 0, 2))
+ covariance = np.dot(left, self._motion_mat.T) + motion_cov
+
+ return mean, covariance
+
+ def update(self, mean, covariance, measurement):
+ """
+ Run Kalman filter correction step.
+
+ Args:
+ mean (ndarray): The predicted state's mean vector (8 dimensional).
+ covariance (ndarray): The state's covariance matrix (8x8 dimensional).
+ measurement (ndarray): The 4 dimensional measurement vector
+ (x, y, a, h), where (x, y) is the center position, a the aspect
+ ratio, and h the height of the bounding box.
+
+ Returns:
+ The measurement-corrected state distribution.
+ """
+ projected_mean, projected_cov = self.project(mean, covariance)
+
+ chol_factor, lower = scipy.linalg.cho_factor(projected_cov, lower=True, check_finite=False)
+ kalman_gain = scipy.linalg.cho_solve((chol_factor, lower),
+ np.dot(covariance, self._update_mat.T).T,
+ check_finite=False).T
+ innovation = measurement - projected_mean
+
+ new_mean = mean + np.dot(innovation, kalman_gain.T)
+ new_covariance = covariance - np.linalg.multi_dot((kalman_gain, projected_cov, kalman_gain.T))
+ return new_mean, new_covariance
+
+ def gating_distance(self, mean, covariance, measurements, only_position=False, metric='maha'):
+ """
+ Compute gating distance between state distribution and measurements.
+ A suitable distance threshold can be obtained from `chi2inv95`. If
+ `only_position` is False, the chi-square distribution has 4 degrees of
+ freedom, otherwise 2.
+
+ Args:
+ mean (ndarray): Mean vector over the state distribution (8
+ dimensional).
+ covariance (ndarray): Covariance of the state distribution (8x8
+ dimensional).
+ measurements (ndarray): An Nx4 dimensional matrix of N measurements,
+ each in format (x, y, a, h) where (x, y) is the bounding box center
+ position, a the aspect ratio, and h the height.
+ only_position (Optional[bool]): If True, distance computation is
+ done with respect to the bounding box center position only.
+ metric (str): Metric type, 'gaussian' or 'maha'.
+
+ Returns
+ An array of length N, where the i-th element contains the squared
+ Mahalanobis distance between (mean, covariance) and `measurements[i]`.
+ """
+ mean, covariance = self.project(mean, covariance)
+ if only_position:
+ mean, covariance = mean[:2], covariance[:2, :2]
+ measurements = measurements[:, :2]
+
+ d = measurements - mean
+ if metric == 'gaussian':
+ return np.sum(d * d, axis=1)
+ elif metric == 'maha':
+ cholesky_factor = np.linalg.cholesky(covariance)
+ z = scipy.linalg.solve_triangular(cholesky_factor, d.T, lower=True, check_finite=False, overwrite_b=True)
+ squared_maha = np.sum(z * z, axis=0)
+ return squared_maha
+ else:
+ raise ValueError('invalid distance metric')
diff --git a/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/tracker/__init__.py b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/tracker/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..904822119661be61141715c638388db9d045fee1
--- /dev/null
+++ b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/tracker/__init__.py
@@ -0,0 +1,21 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from . import base_jde_tracker
+from . import base_sde_tracker
+from . import jde_tracker
+
+from .base_jde_tracker import *
+from .base_sde_tracker import *
+from .jde_tracker import *
diff --git a/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/tracker/base_jde_tracker.py b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/tracker/base_jde_tracker.py
new file mode 100644
index 0000000000000000000000000000000000000000..9505a709ee573acecf4b5dd7e02a06cee9d44284
--- /dev/null
+++ b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/tracker/base_jde_tracker.py
@@ -0,0 +1,257 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This code is borrow from https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/tracker/multitracker.py
+"""
+
+import numpy as np
+from collections import deque, OrderedDict
+from ..matching import jde_matching as matching
+from ppdet.core.workspace import register, serializable
+
+__all__ = [
+ 'TrackState',
+ 'BaseTrack',
+ 'STrack',
+ 'joint_stracks',
+ 'sub_stracks',
+ 'remove_duplicate_stracks',
+]
+
+
+class TrackState(object):
+ New = 0
+ Tracked = 1
+ Lost = 2
+ Removed = 3
+
+
+class BaseTrack(object):
+ _count = 0
+
+ track_id = 0
+ is_activated = False
+ state = TrackState.New
+
+ history = OrderedDict()
+ features = []
+ curr_feature = None
+ score = 0
+ start_frame = 0
+ frame_id = 0
+ time_since_update = 0
+
+ # multi-camera
+ location = (np.inf, np.inf)
+
+ @property
+ def end_frame(self):
+ return self.frame_id
+
+ @staticmethod
+ def next_id():
+ BaseTrack._count += 1
+ return BaseTrack._count
+
+ def activate(self, *args):
+ raise NotImplementedError
+
+ def predict(self):
+ raise NotImplementedError
+
+ def update(self, *args, **kwargs):
+ raise NotImplementedError
+
+ def mark_lost(self):
+ self.state = TrackState.Lost
+
+ def mark_removed(self):
+ self.state = TrackState.Removed
+
+
+class STrack(BaseTrack):
+ def __init__(self, tlwh, score, temp_feat, buffer_size=30):
+ # wait activate
+ self._tlwh = np.asarray(tlwh, dtype=np.float)
+ self.kalman_filter = None
+ self.mean, self.covariance = None, None
+ self.is_activated = False
+
+ self.score = score
+ self.tracklet_len = 0
+
+ self.smooth_feat = None
+ self.update_features(temp_feat)
+ self.features = deque([], maxlen=buffer_size)
+ self.alpha = 0.9
+
+ def update_features(self, feat):
+ feat /= np.linalg.norm(feat)
+ self.curr_feat = feat
+ if self.smooth_feat is None:
+ self.smooth_feat = feat
+ else:
+ self.smooth_feat = self.alpha * self.smooth_feat + (1 - self.alpha) * feat
+ self.features.append(feat)
+ self.smooth_feat /= np.linalg.norm(self.smooth_feat)
+
+ def predict(self):
+ mean_state = self.mean.copy()
+ if self.state != TrackState.Tracked:
+ mean_state[7] = 0
+ self.mean, self.covariance = self.kalman_filter.predict(mean_state, self.covariance)
+
+ @staticmethod
+ def multi_predict(stracks, kalman_filter):
+ if len(stracks) > 0:
+ multi_mean = np.asarray([st.mean.copy() for st in stracks])
+ multi_covariance = np.asarray([st.covariance for st in stracks])
+ for i, st in enumerate(stracks):
+ if st.state != TrackState.Tracked:
+ multi_mean[i][7] = 0
+ multi_mean, multi_covariance = kalman_filter.multi_predict(multi_mean, multi_covariance)
+ for i, (mean, cov) in enumerate(zip(multi_mean, multi_covariance)):
+ stracks[i].mean = mean
+ stracks[i].covariance = cov
+
+ def activate(self, kalman_filter, frame_id):
+ """Start a new tracklet"""
+ self.kalman_filter = kalman_filter
+ self.track_id = self.next_id()
+ self.mean, self.covariance = self.kalman_filter.initiate(self.tlwh_to_xyah(self._tlwh))
+
+ self.tracklet_len = 0
+ self.state = TrackState.Tracked
+ if frame_id == 1:
+ self.is_activated = True
+ self.frame_id = frame_id
+ self.start_frame = frame_id
+
+ def re_activate(self, new_track, frame_id, new_id=False):
+ self.mean, self.covariance = self.kalman_filter.update(self.mean, self.covariance,
+ self.tlwh_to_xyah(new_track.tlwh))
+
+ self.update_features(new_track.curr_feat)
+ self.tracklet_len = 0
+ self.state = TrackState.Tracked
+ self.is_activated = True
+ self.frame_id = frame_id
+ if new_id:
+ self.track_id = self.next_id()
+
+ def update(self, new_track, frame_id, update_feature=True):
+ self.frame_id = frame_id
+ self.tracklet_len += 1
+
+ new_tlwh = new_track.tlwh
+ self.mean, self.covariance = self.kalman_filter.update(self.mean, self.covariance, self.tlwh_to_xyah(new_tlwh))
+ self.state = TrackState.Tracked
+ self.is_activated = True
+
+ self.score = new_track.score
+ if update_feature:
+ self.update_features(new_track.curr_feat)
+
+ @property
+ def tlwh(self):
+ """
+ Get current position in bounding box format `(top left x, top left y,
+ width, height)`.
+ """
+ if self.mean is None:
+ return self._tlwh.copy()
+ ret = self.mean[:4].copy()
+ ret[2] *= ret[3]
+ ret[:2] -= ret[2:] / 2
+ return ret
+
+ @property
+ def tlbr(self):
+ """
+ Convert bounding box to format `(min x, min y, max x, max y)`, i.e.,
+ `(top left, bottom right)`.
+ """
+ ret = self.tlwh.copy()
+ ret[2:] += ret[:2]
+ return ret
+
+ @staticmethod
+ def tlwh_to_xyah(tlwh):
+ """
+ Convert bounding box to format `(center x, center y, aspect ratio,
+ height)`, where the aspect ratio is `width / height`.
+ """
+ ret = np.asarray(tlwh).copy()
+ ret[:2] += ret[2:] / 2
+ ret[2] /= ret[3]
+ return ret
+
+ def to_xyah(self):
+ return self.tlwh_to_xyah(self.tlwh)
+
+ @staticmethod
+ def tlbr_to_tlwh(tlbr):
+ ret = np.asarray(tlbr).copy()
+ ret[2:] -= ret[:2]
+ return ret
+
+ @staticmethod
+ def tlwh_to_tlbr(tlwh):
+ ret = np.asarray(tlwh).copy()
+ ret[2:] += ret[:2]
+ return ret
+
+ def __repr__(self):
+ return 'OT_{}_({}-{})'.format(self.track_id, self.start_frame, self.end_frame)
+
+
+def joint_stracks(tlista, tlistb):
+ exists = {}
+ res = []
+ for t in tlista:
+ exists[t.track_id] = 1
+ res.append(t)
+ for t in tlistb:
+ tid = t.track_id
+ if not exists.get(tid, 0):
+ exists[tid] = 1
+ res.append(t)
+ return res
+
+
+def sub_stracks(tlista, tlistb):
+ stracks = {}
+ for t in tlista:
+ stracks[t.track_id] = t
+ for t in tlistb:
+ tid = t.track_id
+ if stracks.get(tid, 0):
+ del stracks[tid]
+ return list(stracks.values())
+
+
+def remove_duplicate_stracks(stracksa, stracksb):
+ pdist = matching.iou_distance(stracksa, stracksb)
+ pairs = np.where(pdist < 0.15)
+ dupa, dupb = list(), list()
+ for p, q in zip(*pairs):
+ timep = stracksa[p].frame_id - stracksa[p].start_frame
+ timeq = stracksb[q].frame_id - stracksb[q].start_frame
+ if timep > timeq:
+ dupb.append(q)
+ else:
+ dupa.append(p)
+ resa = [t for i, t in enumerate(stracksa) if not i in dupa]
+ resb = [t for i, t in enumerate(stracksb) if not i in dupb]
+ return resa, resb
diff --git a/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/tracker/base_sde_tracker.py b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/tracker/base_sde_tracker.py
new file mode 100644
index 0000000000000000000000000000000000000000..2e811e536a42ff781f60872b448b251de0301f61
--- /dev/null
+++ b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/tracker/base_sde_tracker.py
@@ -0,0 +1,133 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This code is borrow from https://github.com/nwojke/deep_sort/blob/master/deep_sort/track.py
+"""
+
+from ppdet.core.workspace import register, serializable
+
+__all__ = ['TrackState', 'Track']
+
+
+class TrackState(object):
+ """
+ Enumeration type for the single target track state. Newly created tracks are
+ classified as `tentative` until enough evidence has been collected. Then,
+ the track state is changed to `confirmed`. Tracks that are no longer alive
+ are classified as `deleted` to mark them for removal from the set of active
+ tracks.
+ """
+ Tentative = 1
+ Confirmed = 2
+ Deleted = 3
+
+
+class Track(object):
+ """
+ A single target track with state space `(x, y, a, h)` and associated
+ velocities, where `(x, y)` is the center of the bounding box, `a` is the
+ aspect ratio and `h` is the height.
+
+ Args:
+ mean (ndarray): Mean vector of the initial state distribution.
+ covariance (ndarray): Covariance matrix of the initial state distribution.
+ track_id (int): A unique track identifier.
+ n_init (int): Number of consecutive detections before the track is confirmed.
+ The track state is set to `Deleted` if a miss occurs within the first
+ `n_init` frames.
+ max_age (int): The maximum number of consecutive misses before the track
+ state is set to `Deleted`.
+ feature (Optional[ndarray]): Feature vector of the detection this track
+ originates from. If not None, this feature is added to the `features` cache.
+
+ Attributes:
+ hits (int): Total number of measurement updates.
+ age (int): Total number of frames since first occurance.
+ time_since_update (int): Total number of frames since last measurement
+ update.
+ state (TrackState): The current track state.
+ features (List[ndarray]): A cache of features. On each measurement update,
+ the associated feature vector is added to this list.
+ """
+
+ def __init__(self, mean, covariance, track_id, n_init, max_age, feature=None):
+ self.mean = mean
+ self.covariance = covariance
+ self.track_id = track_id
+ self.hits = 1
+ self.age = 1
+ self.time_since_update = 0
+
+ self.state = TrackState.Tentative
+ self.features = []
+ if feature is not None:
+ self.features.append(feature)
+
+ self._n_init = n_init
+ self._max_age = max_age
+
+ def to_tlwh(self):
+ """Get position in format `(top left x, top left y, width, height)`."""
+ ret = self.mean[:4].copy()
+ ret[2] *= ret[3]
+ ret[:2] -= ret[2:] / 2
+ return ret
+
+ def to_tlbr(self):
+ """Get position in bounding box format `(min x, miny, max x, max y)`."""
+ ret = self.to_tlwh()
+ ret[2:] = ret[:2] + ret[2:]
+ return ret
+
+ def predict(self, kalman_filter):
+ """
+ Propagate the state distribution to the current time step using a Kalman
+ filter prediction step.
+ """
+ self.mean, self.covariance = kalman_filter.predict(self.mean, self.covariance)
+ self.age += 1
+ self.time_since_update += 1
+
+ def update(self, kalman_filter, detection):
+ """
+ Perform Kalman filter measurement update step and update the associated
+ detection feature cache.
+ """
+ self.mean, self.covariance = kalman_filter.update(self.mean, self.covariance, detection.to_xyah())
+ self.features.append(detection.feature)
+
+ self.hits += 1
+ self.time_since_update = 0
+ if self.state == TrackState.Tentative and self.hits >= self._n_init:
+ self.state = TrackState.Confirmed
+
+ def mark_missed(self):
+ """Mark this track as missed (no association at the current time step).
+ """
+ if self.state == TrackState.Tentative:
+ self.state = TrackState.Deleted
+ elif self.time_since_update > self._max_age:
+ self.state = TrackState.Deleted
+
+ def is_tentative(self):
+ """Returns True if this track is tentative (unconfirmed)."""
+ return self.state == TrackState.Tentative
+
+ def is_confirmed(self):
+ """Returns True if this track is confirmed."""
+ return self.state == TrackState.Confirmed
+
+ def is_deleted(self):
+ """Returns True if this track is dead and should be deleted."""
+ return self.state == TrackState.Deleted
diff --git a/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/tracker/jde_tracker.py b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/tracker/jde_tracker.py
new file mode 100644
index 0000000000000000000000000000000000000000..2e1cafb345b7687e563fc6d9c2c1769cb39d690c
--- /dev/null
+++ b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/tracker/jde_tracker.py
@@ -0,0 +1,248 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This code is borrow from https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/tracker/multitracker.py
+"""
+
+import paddle
+
+from ..matching import jde_matching as matching
+from .base_jde_tracker import TrackState, BaseTrack, STrack
+from .base_jde_tracker import joint_stracks, sub_stracks, remove_duplicate_stracks
+
+from ppdet.core.workspace import register, serializable
+from ppdet.utils.logger import setup_logger
+logger = setup_logger(__name__)
+
+__all__ = ['FrozenJDETracker']
+
+
+@register
+@serializable
+class FrozenJDETracker(object):
+ __inject__ = ['motion']
+ """
+ JDE tracker
+
+ Args:
+ det_thresh (float): threshold of detection score
+ track_buffer (int): buffer for tracker
+ min_box_area (int): min box area to filter out low quality boxes
+ vertical_ratio (float): w/h, the vertical ratio of the bbox to filter
+ bad results, set 1.6 default for pedestrian tracking. If set -1
+ means no need to filter bboxes.
+ tracked_thresh (float): linear assignment threshold of tracked
+ stracks and detections
+ r_tracked_thresh (float): linear assignment threshold of
+ tracked stracks and unmatched detections
+ unconfirmed_thresh (float): linear assignment threshold of
+ unconfirmed stracks and unmatched detections
+ motion (object): KalmanFilter instance
+ conf_thres (float): confidence threshold for tracking
+ metric_type (str): either "euclidean" or "cosine", the distance metric
+ used for measurement to track association.
+ """
+
+ def __init__(self,
+ det_thresh=0.3,
+ track_buffer=30,
+ min_box_area=200,
+ vertical_ratio=1.6,
+ tracked_thresh=0.7,
+ r_tracked_thresh=0.5,
+ unconfirmed_thresh=0.7,
+ motion='KalmanFilter',
+ conf_thres=0,
+ metric_type='euclidean'):
+ self.det_thresh = det_thresh
+ self.track_buffer = track_buffer
+ self.min_box_area = min_box_area
+ self.vertical_ratio = vertical_ratio
+
+ self.tracked_thresh = tracked_thresh
+ self.r_tracked_thresh = r_tracked_thresh
+ self.unconfirmed_thresh = unconfirmed_thresh
+ self.motion = motion
+ self.conf_thres = conf_thres
+ self.metric_type = metric_type
+
+ self.frame_id = 0
+ self.tracked_stracks = []
+ self.lost_stracks = []
+ self.removed_stracks = []
+
+ self.max_time_lost = 0
+ # max_time_lost will be calculated: int(frame_rate / 30.0 * track_buffer)
+
+ def update(self, pred_dets, pred_embs):
+ """
+ Processes the image frame and finds bounding box(detections).
+ Associates the detection with corresponding tracklets and also handles
+ lost, removed, refound and active tracklets.
+
+ Args:
+ pred_dets (Tensor): Detection results of the image, shape is [N, 5].
+ pred_embs (Tensor): Embedding results of the image, shape is [N, 512].
+
+ Return:
+ output_stracks (list): The list contains information regarding the
+ online_tracklets for the recieved image tensor.
+ """
+ self.frame_id += 1
+ activated_starcks = []
+ # for storing active tracks, for the current frame
+ refind_stracks = []
+ # Lost Tracks whose detections are obtained in the current frame
+ lost_stracks = []
+ # The tracks which are not obtained in the current frame but are not
+ # removed. (Lost for some time lesser than the threshold for removing)
+ removed_stracks = []
+
+ remain_inds = paddle.nonzero(pred_dets[:, 4] > self.conf_thres)
+ if remain_inds.shape[0] == 0:
+ pred_dets = paddle.zeros([0, 1])
+ pred_embs = paddle.zeros([0, 1])
+ else:
+ pred_dets = paddle.gather(pred_dets, remain_inds)
+ pred_embs = paddle.gather(pred_embs, remain_inds)
+
+ # Filter out the image with box_num = 0. pred_dets = [[0.0, 0.0, 0.0 ,0.0]]
+ empty_pred = True if len(pred_dets) == 1 and paddle.sum(pred_dets) == 0.0 else False
+ """ Step 1: Network forward, get detections & embeddings"""
+ if len(pred_dets) > 0 and not empty_pred:
+ pred_dets = pred_dets.numpy()
+ pred_embs = pred_embs.numpy()
+ detections = [
+ STrack(STrack.tlbr_to_tlwh(tlbrs[:4]), tlbrs[4], f, 30) for (tlbrs, f) in zip(pred_dets, pred_embs)
+ ]
+ else:
+ detections = []
+ ''' Add newly detected tracklets to tracked_stracks'''
+ unconfirmed = []
+ tracked_stracks = [] # type: list[STrack]
+ for track in self.tracked_stracks:
+ if not track.is_activated:
+ # previous tracks which are not active in the current frame are added in unconfirmed list
+ unconfirmed.append(track)
+ else:
+ # Active tracks are added to the local list 'tracked_stracks'
+ tracked_stracks.append(track)
+ """ Step 2: First association, with embedding"""
+ # Combining currently tracked_stracks and lost_stracks
+ strack_pool = joint_stracks(tracked_stracks, self.lost_stracks)
+ # Predict the current location with KF
+ STrack.multi_predict(strack_pool, self.motion)
+
+ dists = matching.embedding_distance(strack_pool, detections, metric=self.metric_type)
+ dists = matching.fuse_motion(self.motion, dists, strack_pool, detections)
+ # The dists is the list of distances of the detection with the tracks in strack_pool
+ matches, u_track, u_detection = matching.linear_assignment(dists, thresh=self.tracked_thresh)
+ # The matches is the array for corresponding matches of the detection with the corresponding strack_pool
+
+ for itracked, idet in matches:
+ # itracked is the id of the track and idet is the detection
+ track = strack_pool[itracked]
+ det = detections[idet]
+ if track.state == TrackState.Tracked:
+ # If the track is active, add the detection to the track
+ track.update(detections[idet], self.frame_id)
+ activated_starcks.append(track)
+ else:
+ # We have obtained a detection from a track which is not active,
+ # hence put the track in refind_stracks list
+ track.re_activate(det, self.frame_id, new_id=False)
+ refind_stracks.append(track)
+
+ # None of the steps below happen if there are no undetected tracks.
+ """ Step 3: Second association, with IOU"""
+ detections = [detections[i] for i in u_detection]
+ # detections is now a list of the unmatched detections
+ r_tracked_stracks = []
+ # This is container for stracks which were tracked till the previous
+ # frame but no detection was found for it in the current frame.
+
+ for i in u_track:
+ if strack_pool[i].state == TrackState.Tracked:
+ r_tracked_stracks.append(strack_pool[i])
+ dists = matching.iou_distance(r_tracked_stracks, detections)
+ matches, u_track, u_detection = matching.linear_assignment(dists, thresh=self.r_tracked_thresh)
+ # matches is the list of detections which matched with corresponding
+ # tracks by IOU distance method.
+
+ for itracked, idet in matches:
+ track = r_tracked_stracks[itracked]
+ det = detections[idet]
+ if track.state == TrackState.Tracked:
+ track.update(det, self.frame_id)
+ activated_starcks.append(track)
+ else:
+ track.re_activate(det, self.frame_id, new_id=False)
+ refind_stracks.append(track)
+ # Same process done for some unmatched detections, but now considering IOU_distance as measure
+
+ for it in u_track:
+ track = r_tracked_stracks[it]
+ if not track.state == TrackState.Lost:
+ track.mark_lost()
+ lost_stracks.append(track)
+ # If no detections are obtained for tracks (u_track), the tracks are added to lost_tracks list and are marked lost
+ '''Deal with unconfirmed tracks, usually tracks with only one beginning frame'''
+ detections = [detections[i] for i in u_detection]
+ dists = matching.iou_distance(unconfirmed, detections)
+ matches, u_unconfirmed, u_detection = matching.linear_assignment(dists, thresh=self.unconfirmed_thresh)
+ for itracked, idet in matches:
+ unconfirmed[itracked].update(detections[idet], self.frame_id)
+ activated_starcks.append(unconfirmed[itracked])
+
+ # The tracks which are yet not matched
+ for it in u_unconfirmed:
+ track = unconfirmed[it]
+ track.mark_removed()
+ removed_stracks.append(track)
+
+ # after all these confirmation steps, if a new detection is found, it is initialized for a new track
+ """ Step 4: Init new stracks"""
+ for inew in u_detection:
+ track = detections[inew]
+ if track.score < self.det_thresh:
+ continue
+ track.activate(self.motion, self.frame_id)
+ activated_starcks.append(track)
+ """ Step 5: Update state"""
+ # If the tracks are lost for more frames than the threshold number, the tracks are removed.
+ for track in self.lost_stracks:
+ if self.frame_id - track.end_frame > self.max_time_lost:
+ track.mark_removed()
+ removed_stracks.append(track)
+
+ # Update the self.tracked_stracks and self.lost_stracks using the updates in this step.
+ self.tracked_stracks = [t for t in self.tracked_stracks if t.state == TrackState.Tracked]
+ self.tracked_stracks = joint_stracks(self.tracked_stracks, activated_starcks)
+ self.tracked_stracks = joint_stracks(self.tracked_stracks, refind_stracks)
+
+ self.lost_stracks = sub_stracks(self.lost_stracks, self.tracked_stracks)
+ self.lost_stracks.extend(lost_stracks)
+ self.lost_stracks = sub_stracks(self.lost_stracks, self.removed_stracks)
+ self.removed_stracks.extend(removed_stracks)
+ self.tracked_stracks, self.lost_stracks = remove_duplicate_stracks(self.tracked_stracks, self.lost_stracks)
+ # get scores of lost tracks
+ output_stracks = [track for track in self.tracked_stracks if track.is_activated]
+
+ logger.debug('===========Frame {}=========='.format(self.frame_id))
+ logger.debug('Activated: {}'.format([track.track_id for track in activated_starcks]))
+ logger.debug('Refind: {}'.format([track.track_id for track in refind_stracks]))
+ logger.debug('Lost: {}'.format([track.track_id for track in lost_stracks]))
+ logger.debug('Removed: {}'.format([track.track_id for track in removed_stracks]))
+
+ return output_stracks
diff --git a/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/utils.py b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..12c61686a1715a965407822dcf19fd1081f292d7
--- /dev/null
+++ b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/utils.py
@@ -0,0 +1,176 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import cv2
+import time
+import paddle
+import numpy as np
+
+__all__ = [
+ 'Timer',
+ 'Detection',
+ 'load_det_results',
+ 'preprocess_reid',
+ 'get_crops',
+ 'clip_box',
+ 'scale_coords',
+]
+
+
+class Timer(object):
+ """
+ This class used to compute and print the current FPS while evaling.
+ """
+
+ def __init__(self):
+ self.total_time = 0.
+ self.calls = 0
+ self.start_time = 0.
+ self.diff = 0.
+ self.average_time = 0.
+ self.duration = 0.
+
+ def tic(self):
+ # using time.time instead of time.clock because time time.clock
+ # does not normalize for multithreading
+ self.start_time = time.time()
+
+ def toc(self, average=True):
+ self.diff = time.time() - self.start_time
+ self.total_time += self.diff
+ self.calls += 1
+ self.average_time = self.total_time / self.calls
+ if average:
+ self.duration = self.average_time
+ else:
+ self.duration = self.diff
+ return self.duration
+
+ def clear(self):
+ self.total_time = 0.
+ self.calls = 0
+ self.start_time = 0.
+ self.diff = 0.
+ self.average_time = 0.
+ self.duration = 0.
+
+
+class Detection(object):
+ """
+ This class represents a bounding box detection in a single image.
+
+ Args:
+ tlwh (ndarray): Bounding box in format `(top left x, top left y,
+ width, height)`.
+ confidence (ndarray): Detector confidence score.
+ feature (Tensor): A feature vector that describes the object
+ contained in this image.
+ """
+
+ def __init__(self, tlwh, confidence, feature):
+ self.tlwh = np.asarray(tlwh, dtype=np.float32)
+ self.confidence = np.asarray(confidence, dtype=np.float32)
+ self.feature = feature
+
+ def to_tlbr(self):
+ """
+ Convert bounding box to format `(min x, min y, max x, max y)`, i.e.,
+ `(top left, bottom right)`.
+ """
+ ret = self.tlwh.copy()
+ ret[2:] += ret[:2]
+ return ret
+
+ def to_xyah(self):
+ """
+ Convert bounding box to format `(center x, center y, aspect ratio,
+ height)`, where the aspect ratio is `width / height`.
+ """
+ ret = self.tlwh.copy()
+ ret[:2] += ret[2:] / 2
+ ret[2] /= ret[3]
+ return ret
+
+
+def load_det_results(det_file, num_frames):
+ assert os.path.exists(det_file) and os.path.isfile(det_file), \
+ 'Error: det_file: {} not exist or not a file.'.format(det_file)
+ labels = np.loadtxt(det_file, dtype='float32', delimiter=',')
+ results_list = []
+ for frame_i in range(0, num_frames):
+ results = {'bbox': [], 'score': []}
+ lables_with_frame = labels[labels[:, 0] == frame_i + 1]
+ for l in lables_with_frame:
+ results['bbox'].append(l[1:5])
+ results['score'].append(l[5])
+ results_list.append(results)
+ return results_list
+
+
+def scale_coords(coords, input_shape, im_shape, scale_factor):
+ im_shape = im_shape.numpy()[0]
+ ratio = scale_factor[0][0]
+ pad_w = (input_shape[1] - int(im_shape[1])) / 2
+ pad_h = (input_shape[0] - int(im_shape[0])) / 2
+ coords = paddle.cast(coords, 'float32')
+ coords[:, 0::2] -= pad_w
+ coords[:, 1::2] -= pad_h
+ coords[:, 0:4] /= ratio
+ coords[:, :4] = paddle.clip(coords[:, :4], min=0, max=coords[:, :4].max())
+ return coords.round()
+
+
+def clip_box(xyxy, input_shape, im_shape, scale_factor):
+ im_shape = im_shape.numpy()[0]
+ ratio = scale_factor.numpy()[0][0]
+ img0_shape = [int(im_shape[0] / ratio), int(im_shape[1] / ratio)]
+
+ xyxy[:, 0::2] = paddle.clip(xyxy[:, 0::2], min=0, max=img0_shape[1])
+ xyxy[:, 1::2] = paddle.clip(xyxy[:, 1::2], min=0, max=img0_shape[0])
+ return xyxy
+
+
+def get_crops(xyxy, ori_img, pred_scores, w, h):
+ crops = []
+ keep_scores = []
+ xyxy = xyxy.numpy().astype(np.int64)
+ ori_img = ori_img.numpy()
+ ori_img = np.squeeze(ori_img, axis=0).transpose(1, 0, 2)
+ pred_scores = pred_scores.numpy()
+ for i, bbox in enumerate(xyxy):
+ if bbox[2] <= bbox[0] or bbox[3] <= bbox[1]:
+ continue
+ crop = ori_img[bbox[0]:bbox[2], bbox[1]:bbox[3], :]
+ crops.append(crop)
+ keep_scores.append(pred_scores[i])
+ if len(crops) == 0:
+ return [], []
+ crops = preprocess_reid(crops, w, h)
+ return crops, keep_scores
+
+
+def preprocess_reid(imgs, w=64, h=192, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
+ im_batch = []
+ for img in imgs:
+ img = cv2.resize(img, (w, h))
+ img = img[:, :, ::-1].astype('float32').transpose((2, 0, 1)) / 255
+ img_mean = np.array(mean).reshape((3, 1, 1))
+ img_std = np.array(std).reshape((3, 1, 1))
+ img -= img_mean
+ img /= img_std
+ img = np.expand_dims(img, axis=0)
+ im_batch.append(img)
+ im_batch = np.concatenate(im_batch, 0)
+ return im_batch
diff --git a/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/visualization.py b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/visualization.py
new file mode 100644
index 0000000000000000000000000000000000000000..cd9c5b15e15f677b7955dd4eba40798e985315a1
--- /dev/null
+++ b/modules/video/multiple_object_tracking/fairmot_dla34/modeling/mot/visualization.py
@@ -0,0 +1,117 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import cv2
+import numpy as np
+
+
+def tlwhs_to_tlbrs(tlwhs):
+ tlbrs = np.copy(tlwhs)
+ if len(tlbrs) == 0:
+ return tlbrs
+ tlbrs[:, 2] += tlwhs[:, 0]
+ tlbrs[:, 3] += tlwhs[:, 1]
+ return tlbrs
+
+
+def get_color(idx):
+ idx = idx * 3
+ color = ((37 * idx) % 255, (17 * idx) % 255, (29 * idx) % 255)
+ return color
+
+
+def resize_image(image, max_size=800):
+ if max(image.shape[:2]) > max_size:
+ scale = float(max_size) / max(image.shape[:2])
+ image = cv2.resize(image, None, fx=scale, fy=scale)
+ return image
+
+
+def plot_tracking(image, tlwhs, obj_ids, scores=None, frame_id=0, fps=0., ids2=None):
+ im = np.ascontiguousarray(np.copy(image))
+ im_h, im_w = im.shape[:2]
+
+ top_view = np.zeros([im_w, im_w, 3], dtype=np.uint8) + 255
+
+ text_scale = max(1, image.shape[1] / 1600.)
+ text_thickness = 2
+ line_thickness = max(1, int(image.shape[1] / 500.))
+
+ radius = max(5, int(im_w / 140.))
+ cv2.putText(
+ im,
+ 'frame: %d fps: %.2f num: %d' % (frame_id, fps, len(tlwhs)), (0, int(15 * text_scale)),
+ cv2.FONT_HERSHEY_PLAIN,
+ text_scale, (0, 0, 255),
+ thickness=2)
+
+ for i, tlwh in enumerate(tlwhs):
+ x1, y1, w, h = tlwh
+ intbox = tuple(map(int, (x1, y1, x1 + w, y1 + h)))
+ obj_id = int(obj_ids[i])
+ id_text = '{}'.format(int(obj_id))
+ if ids2 is not None:
+ id_text = id_text + ', {}'.format(int(ids2[i]))
+ _line_thickness = 1 if obj_id <= 0 else line_thickness
+ color = get_color(abs(obj_id))
+ cv2.rectangle(im, intbox[0:2], intbox[2:4], color=color, thickness=line_thickness)
+ cv2.putText(
+ im,
+ id_text, (intbox[0], intbox[1] + 10),
+ cv2.FONT_HERSHEY_PLAIN,
+ text_scale, (0, 0, 255),
+ thickness=text_thickness)
+
+ if scores is not None:
+ text = '{:.2f}'.format(float(scores[i]))
+ cv2.putText(
+ im,
+ text, (intbox[0], intbox[1] - 10),
+ cv2.FONT_HERSHEY_PLAIN,
+ text_scale, (0, 255, 255),
+ thickness=text_thickness)
+ return im
+
+
+def plot_trajectory(image, tlwhs, track_ids):
+ image = image.copy()
+ for one_tlwhs, track_id in zip(tlwhs, track_ids):
+ color = get_color(int(track_id))
+ for tlwh in one_tlwhs:
+ x1, y1, w, h = tuple(map(int, tlwh))
+ cv2.circle(image, (int(x1 + 0.5 * w), int(y1 + h)), 2, color, thickness=2)
+ return image
+
+
+def plot_detections(image, tlbrs, scores=None, color=(255, 0, 0), ids=None):
+ im = np.copy(image)
+ text_scale = max(1, image.shape[1] / 800.)
+ thickness = 2 if text_scale > 1.3 else 1
+ for i, det in enumerate(tlbrs):
+ x1, y1, x2, y2 = np.asarray(det[:4], dtype=np.int)
+ if len(det) >= 7:
+ label = 'det' if det[5] > 0 else 'trk'
+ if ids is not None:
+ text = '{}# {:.2f}: {:d}'.format(label, det[6], ids[i])
+ cv2.putText(
+ im, text, (x1, y1 + 30), cv2.FONT_HERSHEY_PLAIN, text_scale, (0, 255, 255), thickness=thickness)
+ else:
+ text = '{}# {:.2f}'.format(label, det[6])
+
+ if scores is not None:
+ text = '{:.2f}'.format(scores[i])
+ cv2.putText(im, text, (x1, y1 + 30), cv2.FONT_HERSHEY_PLAIN, text_scale, (0, 255, 255), thickness=thickness)
+
+ cv2.rectangle(im, (x1, y1), (x2, y2), color, 2)
+ return im
diff --git a/modules/video/multiple_object_tracking/fairmot_dla34/tracker.py b/modules/video/multiple_object_tracking/fairmot_dla34/tracker.py
index f641527ce94c8014db1afc0c5418bf6a278c352e..016f1e5878b12418ebb29344287bcfc6af830a8e 100644
--- a/modules/video/multiple_object_tracking/fairmot_dla34/tracker.py
+++ b/modules/video/multiple_object_tracking/fairmot_dla34/tracker.py
@@ -16,18 +16,19 @@ import cv2
import glob
import paddle
import numpy as np
+import collections
from ppdet.core.workspace import create
from ppdet.utils.checkpoint import load_weight, load_pretrain_weight
-from ppdet.modeling.mot.utils import Detection, get_crops, scale_coords, clip_box
-from ppdet.modeling.mot.utils import Timer, load_det_results
-from ppdet.modeling.mot import visualization as mot_vis
from ppdet.metrics import Metric, MOTMetric, KITTIMOTMetric
import ppdet.utils.stats as stats
from ppdet.engine.callbacks import Callback, ComposeCallback
from ppdet.utils.logger import setup_logger
from .dataset import MOTVideoStream, MOTImageStream
+from .utils import Timer
+from .modeling.mot.utils import Detection, get_crops, scale_coords, clip_box
+from .modeling.mot import visualization as mot_vis
logger = setup_logger(__name__)
@@ -71,7 +72,6 @@ class StreamTracker(object):
timer.tic()
pred_dets, pred_embs = self.model(data)
online_targets = self.model.tracker.update(pred_dets, pred_embs)
-
online_tlwhs, online_ids = [], []
online_scores = []
for t in online_targets:
@@ -109,7 +109,6 @@ class StreamTracker(object):
timer.tic()
pred_dets, pred_embs = self.model(data)
online_targets = self.model.tracker.update(pred_dets, pred_embs)
-
online_tlwhs, online_ids = [], []
online_scores = []
for t in online_targets:
diff --git a/modules/video/multiple_object_tracking/fairmot_dla34/utils.py b/modules/video/multiple_object_tracking/fairmot_dla34/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..4426f217f9f5fb5c7afa6593c2b83ce4b67236f9
--- /dev/null
+++ b/modules/video/multiple_object_tracking/fairmot_dla34/utils.py
@@ -0,0 +1,39 @@
+import time
+
+
+class Timer(object):
+ """
+ This class used to compute and print the current FPS while evaling.
+ """
+
+ def __init__(self):
+ self.total_time = 0.
+ self.calls = 0
+ self.start_time = 0.
+ self.diff = 0.
+ self.average_time = 0.
+ self.duration = 0.
+
+ def tic(self):
+ # using time.time instead of time.clock because time time.clock
+ # does not normalize for multithreading
+ self.start_time = time.time()
+
+ def toc(self, average=True):
+ self.diff = time.time() - self.start_time
+ self.total_time += self.diff
+ self.calls += 1
+ self.average_time = self.total_time / self.calls
+ if average:
+ self.duration = self.average_time
+ else:
+ self.duration = self.diff
+ return self.duration
+
+ def clear(self):
+ self.total_time = 0.
+ self.calls = 0
+ self.start_time = 0.
+ self.diff = 0.
+ self.average_time = 0.
+ self.duration = 0.
diff --git a/modules/video/multiple_object_tracking/jde_darknet53/config/_base_/jde_darknet53.yml b/modules/video/multiple_object_tracking/jde_darknet53/config/_base_/jde_darknet53.yml
index 73faa52f662e7db24ef40c25c029561225d1a3b8..dcc67ac4276c3e8a3abd81950d970f3643d05551 100644
--- a/modules/video/multiple_object_tracking/jde_darknet53/config/_base_/jde_darknet53.yml
+++ b/modules/video/multiple_object_tracking/jde_darknet53/config/_base_/jde_darknet53.yml
@@ -5,7 +5,7 @@ find_unused_parameters: True
JDE:
detector: YOLOv3
reid: JDEEmbeddingHead
- tracker: JDETracker
+ tracker: FrozenJDETracker
YOLOv3:
backbone: DarkNet
diff --git a/modules/video/multiple_object_tracking/jde_darknet53/config/jde_darknet53_30e_1088x608.yml b/modules/video/multiple_object_tracking/jde_darknet53/config/jde_darknet53_30e_1088x608.yml
index d2ac3aee460aaa378dcef11c3a3fce9aa4c29f05..33fa547afe9f95f5dfe7ea321c3e9be1c3634e1d 100644
--- a/modules/video/multiple_object_tracking/jde_darknet53/config/jde_darknet53_30e_1088x608.yml
+++ b/modules/video/multiple_object_tracking/jde_darknet53/config/jde_darknet53_30e_1088x608.yml
@@ -9,7 +9,7 @@ _BASE_: [
JDE:
detector: YOLOv3
reid: JDEEmbeddingHead
- tracker: JDETracker
+ tracker: FrozenJDETracker
YOLOv3:
backbone: DarkNet
diff --git a/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/__init__.py b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..258e4c9010832936f098e6febe777ac556f0668f
--- /dev/null
+++ b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/__init__.py
@@ -0,0 +1,25 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from . import matching
+from . import tracker
+from . import motion
+from . import visualization
+from . import utils
+
+from .matching import *
+from .tracker import *
+from .motion import *
+from .visualization import *
+from .utils import *
diff --git a/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/matching/__init__.py b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/matching/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..54c6680f79f16247c562a9da1024dd3e1de4c57f
--- /dev/null
+++ b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/matching/__init__.py
@@ -0,0 +1,19 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from . import jde_matching
+from . import deepsort_matching
+
+from .jde_matching import *
+from .deepsort_matching import *
diff --git a/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/matching/deepsort_matching.py b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/matching/deepsort_matching.py
new file mode 100644
index 0000000000000000000000000000000000000000..c55aa8876cc128f512aa4e2e4e48a935a3f8dd77
--- /dev/null
+++ b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/matching/deepsort_matching.py
@@ -0,0 +1,368 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This code is borrow from https://github.com/nwojke/deep_sort/tree/master/deep_sort
+"""
+
+import numpy as np
+from scipy.optimize import linear_sum_assignment
+from ..motion import kalman_filter
+
+INFTY_COST = 1e+5
+
+__all__ = [
+ 'iou_1toN',
+ 'iou_cost',
+ '_nn_euclidean_distance',
+ '_nn_cosine_distance',
+ 'NearestNeighborDistanceMetric',
+ 'min_cost_matching',
+ 'matching_cascade',
+ 'gate_cost_matrix',
+]
+
+
+def iou_1toN(bbox, candidates):
+ """
+ Computer intersection over union (IoU) by one box to N candidates.
+
+ Args:
+ bbox (ndarray): A bounding box in format `(top left x, top left y, width, height)`.
+ candidates (ndarray): A matrix of candidate bounding boxes (one per row) in the
+ same format as `bbox`.
+
+ Returns:
+ ious (ndarray): The intersection over union in [0, 1] between the `bbox`
+ and each candidate. A higher score means a larger fraction of the
+ `bbox` is occluded by the candidate.
+ """
+ bbox_tl = bbox[:2]
+ bbox_br = bbox[:2] + bbox[2:]
+ candidates_tl = candidates[:, :2]
+ candidates_br = candidates[:, :2] + candidates[:, 2:]
+
+ tl = np.c_[np.maximum(bbox_tl[0], candidates_tl[:, 0])[:, np.newaxis],
+ np.maximum(bbox_tl[1], candidates_tl[:, 1])[:, np.newaxis]]
+ br = np.c_[np.minimum(bbox_br[0], candidates_br[:, 0])[:, np.newaxis],
+ np.minimum(bbox_br[1], candidates_br[:, 1])[:, np.newaxis]]
+ wh = np.maximum(0., br - tl)
+
+ area_intersection = wh.prod(axis=1)
+ area_bbox = bbox[2:].prod()
+ area_candidates = candidates[:, 2:].prod(axis=1)
+ ious = area_intersection / (area_bbox + area_candidates - area_intersection)
+ return ious
+
+
+def iou_cost(tracks, detections, track_indices=None, detection_indices=None):
+ """
+ IoU distance metric.
+
+ Args:
+ tracks (list[Track]): A list of tracks.
+ detections (list[Detection]): A list of detections.
+ track_indices (Optional[list[int]]): A list of indices to tracks that
+ should be matched. Defaults to all `tracks`.
+ detection_indices (Optional[list[int]]): A list of indices to detections
+ that should be matched. Defaults to all `detections`.
+
+ Returns:
+ cost_matrix (ndarray): A cost matrix of shape len(track_indices),
+ len(detection_indices) where entry (i, j) is
+ `1 - iou(tracks[track_indices[i]], detections[detection_indices[j]])`.
+ """
+ if track_indices is None:
+ track_indices = np.arange(len(tracks))
+ if detection_indices is None:
+ detection_indices = np.arange(len(detections))
+
+ cost_matrix = np.zeros((len(track_indices), len(detection_indices)))
+ for row, track_idx in enumerate(track_indices):
+ if tracks[track_idx].time_since_update > 1:
+ cost_matrix[row, :] = 1e+5
+ continue
+
+ bbox = tracks[track_idx].to_tlwh()
+ candidates = np.asarray([detections[i].tlwh for i in detection_indices])
+ cost_matrix[row, :] = 1. - iou_1toN(bbox, candidates)
+ return cost_matrix
+
+
+def _nn_euclidean_distance(s, q):
+ """
+ Compute pair-wise squared (Euclidean) distance between points in `s` and `q`.
+
+ Args:
+ s (ndarray): Sample points: an NxM matrix of N samples of dimensionality M.
+ q (ndarray): Query points: an LxM matrix of L samples of dimensionality M.
+
+ Returns:
+ distances (ndarray): A vector of length M that contains for each entry in `q` the
+ smallest Euclidean distance to a sample in `s`.
+ """
+ s, q = np.asarray(s), np.asarray(q)
+ if len(s) == 0 or len(q) == 0:
+ return np.zeros((len(s), len(q)))
+ s2, q2 = np.square(s).sum(axis=1), np.square(q).sum(axis=1)
+ distances = -2. * np.dot(s, q.T) + s2[:, None] + q2[None, :]
+ distances = np.clip(distances, 0., float(np.inf))
+
+ return np.maximum(0.0, distances.min(axis=0))
+
+
+def _nn_cosine_distance(s, q):
+ """
+ Compute pair-wise cosine distance between points in `s` and `q`.
+
+ Args:
+ s (ndarray): Sample points: an NxM matrix of N samples of dimensionality M.
+ q (ndarray): Query points: an LxM matrix of L samples of dimensionality M.
+
+ Returns:
+ distances (ndarray): A vector of length M that contains for each entry in `q` the
+ smallest Euclidean distance to a sample in `s`.
+ """
+ s = np.asarray(s) / np.linalg.norm(s, axis=1, keepdims=True)
+ q = np.asarray(q) / np.linalg.norm(q, axis=1, keepdims=True)
+ distances = 1. - np.dot(s, q.T)
+
+ return distances.min(axis=0)
+
+
+class NearestNeighborDistanceMetric(object):
+ """
+ A nearest neighbor distance metric that, for each target, returns
+ the closest distance to any sample that has been observed so far.
+
+ Args:
+ metric (str): Either "euclidean" or "cosine".
+ matching_threshold (float): The matching threshold. Samples with larger
+ distance are considered an invalid match.
+ budget (Optional[int]): If not None, fix samples per class to at most
+ this number. Removes the oldest samples when the budget is reached.
+
+ Attributes:
+ samples (Dict[int -> List[ndarray]]): A dictionary that maps from target
+ identities to the list of samples that have been observed so far.
+ """
+
+ def __init__(self, metric, matching_threshold, budget=None):
+ if metric == "euclidean":
+ self._metric = _nn_euclidean_distance
+ elif metric == "cosine":
+ self._metric = _nn_cosine_distance
+ else:
+ raise ValueError("Invalid metric; must be either 'euclidean' or 'cosine'")
+ self.matching_threshold = matching_threshold
+ self.budget = budget
+ self.samples = {}
+
+ def partial_fit(self, features, targets, active_targets):
+ """
+ Update the distance metric with new data.
+
+ Args:
+ features (ndarray): An NxM matrix of N features of dimensionality M.
+ targets (ndarray): An integer array of associated target identities.
+ active_targets (List[int]): A list of targets that are currently
+ present in the scene.
+ """
+ for feature, target in zip(features, targets):
+ self.samples.setdefault(target, []).append(feature)
+ if self.budget is not None:
+ self.samples[target] = self.samples[target][-self.budget:]
+ self.samples = {k: self.samples[k] for k in active_targets}
+
+ def distance(self, features, targets):
+ """
+ Compute distance between features and targets.
+
+ Args:
+ features (ndarray): An NxM matrix of N features of dimensionality M.
+ targets (list[int]): A list of targets to match the given `features` against.
+
+ Returns:
+ cost_matrix (ndarray): a cost matrix of shape len(targets), len(features),
+ where element (i, j) contains the closest squared distance between
+ `targets[i]` and `features[j]`.
+ """
+ cost_matrix = np.zeros((len(targets), len(features)))
+ for i, target in enumerate(targets):
+ cost_matrix[i, :] = self._metric(self.samples[target], features)
+ return cost_matrix
+
+
+def min_cost_matching(distance_metric, max_distance, tracks, detections, track_indices=None, detection_indices=None):
+ """
+ Solve linear assignment problem.
+
+ Args:
+ distance_metric :
+ Callable[List[Track], List[Detection], List[int], List[int]) -> ndarray
+ The distance metric is given a list of tracks and detections as
+ well as a list of N track indices and M detection indices. The
+ metric should return the NxM dimensional cost matrix, where element
+ (i, j) is the association cost between the i-th track in the given
+ track indices and the j-th detection in the given detection_indices.
+ max_distance (float): Gating threshold. Associations with cost larger
+ than this value are disregarded.
+ tracks (list[Track]): A list of predicted tracks at the current time
+ step.
+ detections (list[Detection]): A list of detections at the current time
+ step.
+ track_indices (list[int]): List of track indices that maps rows in
+ `cost_matrix` to tracks in `tracks`.
+ detection_indices (List[int]): List of detection indices that maps
+ columns in `cost_matrix` to detections in `detections`.
+
+ Returns:
+ A tuple (List[(int, int)], List[int], List[int]) with the following
+ three entries:
+ * A list of matched track and detection indices.
+ * A list of unmatched track indices.
+ * A list of unmatched detection indices.
+ """
+ if track_indices is None:
+ track_indices = np.arange(len(tracks))
+ if detection_indices is None:
+ detection_indices = np.arange(len(detections))
+
+ if len(detection_indices) == 0 or len(track_indices) == 0:
+ return [], track_indices, detection_indices # Nothing to match.
+
+ cost_matrix = distance_metric(tracks, detections, track_indices, detection_indices)
+
+ cost_matrix[cost_matrix > max_distance] = max_distance + 1e-5
+ indices = linear_sum_assignment(cost_matrix)
+
+ matches, unmatched_tracks, unmatched_detections = [], [], []
+ for col, detection_idx in enumerate(detection_indices):
+ if col not in indices[1]:
+ unmatched_detections.append(detection_idx)
+ for row, track_idx in enumerate(track_indices):
+ if row not in indices[0]:
+ unmatched_tracks.append(track_idx)
+ for row, col in zip(indices[0], indices[1]):
+ track_idx = track_indices[row]
+ detection_idx = detection_indices[col]
+ if cost_matrix[row, col] > max_distance:
+ unmatched_tracks.append(track_idx)
+ unmatched_detections.append(detection_idx)
+ else:
+ matches.append((track_idx, detection_idx))
+ return matches, unmatched_tracks, unmatched_detections
+
+
+def matching_cascade(distance_metric,
+ max_distance,
+ cascade_depth,
+ tracks,
+ detections,
+ track_indices=None,
+ detection_indices=None):
+ """
+ Run matching cascade.
+
+ Args:
+ distance_metric :
+ Callable[List[Track], List[Detection], List[int], List[int]) -> ndarray
+ The distance metric is given a list of tracks and detections as
+ well as a list of N track indices and M detection indices. The
+ metric should return the NxM dimensional cost matrix, where element
+ (i, j) is the association cost between the i-th track in the given
+ track indices and the j-th detection in the given detection_indices.
+ max_distance (float): Gating threshold. Associations with cost larger
+ than this value are disregarded.
+ cascade_depth (int): The cascade depth, should be se to the maximum
+ track age.
+ tracks (list[Track]): A list of predicted tracks at the current time
+ step.
+ detections (list[Detection]): A list of detections at the current time
+ step.
+ track_indices (list[int]): List of track indices that maps rows in
+ `cost_matrix` to tracks in `tracks`.
+ detection_indices (List[int]): List of detection indices that maps
+ columns in `cost_matrix` to detections in `detections`.
+
+ Returns:
+ A tuple (List[(int, int)], List[int], List[int]) with the following
+ three entries:
+ * A list of matched track and detection indices.
+ * A list of unmatched track indices.
+ * A list of unmatched detection indices.
+ """
+ if track_indices is None:
+ track_indices = list(range(len(tracks)))
+ if detection_indices is None:
+ detection_indices = list(range(len(detections)))
+
+ unmatched_detections = detection_indices
+ matches = []
+ for level in range(cascade_depth):
+ if len(unmatched_detections) == 0: # No detections left
+ break
+
+ track_indices_l = [k for k in track_indices if tracks[k].time_since_update == 1 + level]
+ if len(track_indices_l) == 0: # Nothing to match at this level
+ continue
+
+ matches_l, _, unmatched_detections = \
+ min_cost_matching(
+ distance_metric, max_distance, tracks, detections,
+ track_indices_l, unmatched_detections)
+ matches += matches_l
+ unmatched_tracks = list(set(track_indices) - set(k for k, _ in matches))
+ return matches, unmatched_tracks, unmatched_detections
+
+
+def gate_cost_matrix(kf,
+ cost_matrix,
+ tracks,
+ detections,
+ track_indices,
+ detection_indices,
+ gated_cost=INFTY_COST,
+ only_position=False):
+ """
+ Invalidate infeasible entries in cost matrix based on the state
+ distributions obtained by Kalman filtering.
+
+ Args:
+ kf (object): The Kalman filter.
+ cost_matrix (ndarray): The NxM dimensional cost matrix, where N is the
+ number of track indices and M is the number of detection indices,
+ such that entry (i, j) is the association cost between
+ `tracks[track_indices[i]]` and `detections[detection_indices[j]]`.
+ tracks (list[Track]): A list of predicted tracks at the current time
+ step.
+ detections (list[Detection]): A list of detections at the current time
+ step.
+ track_indices (List[int]): List of track indices that maps rows in
+ `cost_matrix` to tracks in `tracks`.
+ detection_indices (List[int]): List of detection indices that maps
+ columns in `cost_matrix` to detections in `detections`.
+ gated_cost (Optional[float]): Entries in the cost matrix corresponding
+ to infeasible associations are set this value. Defaults to a very
+ large value.
+ only_position (Optional[bool]): If True, only the x, y position of the
+ state distribution is considered during gating. Default False.
+ """
+ gating_dim = 2 if only_position else 4
+ gating_threshold = kalman_filter.chi2inv95[gating_dim]
+ measurements = np.asarray([detections[i].to_xyah() for i in detection_indices])
+ for row, track_idx in enumerate(track_indices):
+ track = tracks[track_idx]
+ gating_distance = kf.gating_distance(track.mean, track.covariance, measurements, only_position)
+ cost_matrix[row, gating_distance > gating_threshold] = gated_cost
+ return cost_matrix
diff --git a/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/matching/jde_matching.py b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/matching/jde_matching.py
new file mode 100644
index 0000000000000000000000000000000000000000..bf2e891c391c98ed8944f88377f62c9722fa5155
--- /dev/null
+++ b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/matching/jde_matching.py
@@ -0,0 +1,123 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This code is borrow from https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/tracker/matching.py
+"""
+
+import lap
+import scipy
+import numpy as np
+from scipy.spatial.distance import cdist
+from ..motion import kalman_filter
+
+from ppdet.utils.logger import setup_logger
+logger = setup_logger(__name__)
+
+__all__ = [
+ 'merge_matches',
+ 'linear_assignment',
+ 'cython_bbox_ious',
+ 'iou_distance',
+ 'embedding_distance',
+ 'fuse_motion',
+]
+
+
+def merge_matches(m1, m2, shape):
+ O, P, Q = shape
+ m1 = np.asarray(m1)
+ m2 = np.asarray(m2)
+
+ M1 = scipy.sparse.coo_matrix((np.ones(len(m1)), (m1[:, 0], m1[:, 1])), shape=(O, P))
+ M2 = scipy.sparse.coo_matrix((np.ones(len(m2)), (m2[:, 0], m2[:, 1])), shape=(P, Q))
+
+ mask = M1 * M2
+ match = mask.nonzero()
+ match = list(zip(match[0], match[1]))
+ unmatched_O = tuple(set(range(O)) - set([i for i, j in match]))
+ unmatched_Q = tuple(set(range(Q)) - set([j for i, j in match]))
+
+ return match, unmatched_O, unmatched_Q
+
+
+def linear_assignment(cost_matrix, thresh):
+ if cost_matrix.size == 0:
+ return np.empty((0, 2), dtype=int), tuple(range(cost_matrix.shape[0])), tuple(range(cost_matrix.shape[1]))
+ matches, unmatched_a, unmatched_b = [], [], []
+ cost, x, y = lap.lapjv(cost_matrix, extend_cost=True, cost_limit=thresh)
+ for ix, mx in enumerate(x):
+ if mx >= 0:
+ matches.append([ix, mx])
+ unmatched_a = np.where(x < 0)[0]
+ unmatched_b = np.where(y < 0)[0]
+ matches = np.asarray(matches)
+ return matches, unmatched_a, unmatched_b
+
+
+def cython_bbox_ious(atlbrs, btlbrs):
+ ious = np.zeros((len(atlbrs), len(btlbrs)), dtype=np.float)
+ if ious.size == 0:
+ return ious
+ try:
+ import cython_bbox
+ except Exception as e:
+ logger.error('cython_bbox not found, please install cython_bbox.' 'for example: `pip install cython_bbox`.')
+ raise e
+
+ ious = cython_bbox.bbox_overlaps(
+ np.ascontiguousarray(atlbrs, dtype=np.float), np.ascontiguousarray(btlbrs, dtype=np.float))
+ return ious
+
+
+def iou_distance(atracks, btracks):
+ """
+ Compute cost based on IoU between two list[STrack].
+ """
+ if (len(atracks) > 0 and isinstance(atracks[0], np.ndarray)) or (len(btracks) > 0
+ and isinstance(btracks[0], np.ndarray)):
+ atlbrs = atracks
+ btlbrs = btracks
+ else:
+ atlbrs = [track.tlbr for track in atracks]
+ btlbrs = [track.tlbr for track in btracks]
+ _ious = cython_bbox_ious(atlbrs, btlbrs)
+ cost_matrix = 1 - _ious
+
+ return cost_matrix
+
+
+def embedding_distance(tracks, detections, metric='euclidean'):
+ """
+ Compute cost based on features between two list[STrack].
+ """
+ cost_matrix = np.zeros((len(tracks), len(detections)), dtype=np.float)
+ if cost_matrix.size == 0:
+ return cost_matrix
+ det_features = np.asarray([track.curr_feat for track in detections], dtype=np.float)
+ track_features = np.asarray([track.smooth_feat for track in tracks], dtype=np.float)
+ cost_matrix = np.maximum(0.0, cdist(track_features, det_features, metric)) # Nomalized features
+ return cost_matrix
+
+
+def fuse_motion(kf, cost_matrix, tracks, detections, only_position=False, lambda_=0.98):
+ if cost_matrix.size == 0:
+ return cost_matrix
+ gating_dim = 2 if only_position else 4
+ gating_threshold = kalman_filter.chi2inv95[gating_dim]
+ measurements = np.asarray([det.to_xyah() for det in detections])
+ for row, track in enumerate(tracks):
+ gating_distance = kf.gating_distance(track.mean, track.covariance, measurements, only_position, metric='maha')
+ cost_matrix[row, gating_distance > gating_threshold] = np.inf
+ cost_matrix[row] = lambda_ * cost_matrix[row] + (1 - lambda_) * gating_distance
+ return cost_matrix
diff --git a/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/motion/__init__.py b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/motion/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e42dd0b019d66d6ea07bec1ad90cf9a8d53d8172
--- /dev/null
+++ b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/motion/__init__.py
@@ -0,0 +1,17 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from . import kalman_filter
+
+from .kalman_filter import *
diff --git a/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/motion/kalman_filter.py b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/motion/kalman_filter.py
new file mode 100644
index 0000000000000000000000000000000000000000..7cc182e4c5e76e0688688c883b2a24fa30df9c74
--- /dev/null
+++ b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/motion/kalman_filter.py
@@ -0,0 +1,237 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This code is borrow from https://github.com/nwojke/deep_sort/blob/master/deep_sort/kalman_filter.py
+"""
+
+import numpy as np
+import scipy.linalg
+
+__all__ = ['KalmanFilter']
+"""
+Table for the 0.95 quantile of the chi-square distribution with N degrees of
+freedom (contains values for N=1, ..., 9). Taken from MATLAB/Octave's chi2inv
+function and used as Mahalanobis gating threshold.
+"""
+
+chi2inv95 = {1: 3.8415, 2: 5.9915, 3: 7.8147, 4: 9.4877, 5: 11.070, 6: 12.592, 7: 14.067, 8: 15.507, 9: 16.919}
+
+
+class KalmanFilter(object):
+ """
+ A simple Kalman filter for tracking bounding boxes in image space.
+
+ The 8-dimensional state space
+
+ x, y, a, h, vx, vy, va, vh
+
+ contains the bounding box center position (x, y), aspect ratio a, height h,
+ and their respective velocities.
+
+ Object motion follows a constant velocity model. The bounding box location
+ (x, y, a, h) is taken as direct observation of the state space (linear
+ observation model).
+
+ """
+
+ def __init__(self):
+ ndim, dt = 4, 1.
+
+ # Create Kalman filter model matrices.
+ self._motion_mat = np.eye(2 * ndim, 2 * ndim)
+ for i in range(ndim):
+ self._motion_mat[i, ndim + i] = dt
+ self._update_mat = np.eye(ndim, 2 * ndim)
+
+ # Motion and observation uncertainty are chosen relative to the current
+ # state estimate. These weights control the amount of uncertainty in
+ # the model. This is a bit hacky.
+ self._std_weight_position = 1. / 20
+ self._std_weight_velocity = 1. / 160
+
+ def initiate(self, measurement):
+ """
+ Create track from unassociated measurement.
+
+ Args:
+ measurement (ndarray): Bounding box coordinates (x, y, a, h) with
+ center position (x, y), aspect ratio a, and height h.
+
+ Returns:
+ The mean vector (8 dimensional) and covariance matrix (8x8
+ dimensional) of the new track. Unobserved velocities are
+ initialized to 0 mean.
+ """
+ mean_pos = measurement
+ mean_vel = np.zeros_like(mean_pos)
+ mean = np.r_[mean_pos, mean_vel]
+
+ std = [
+ 2 * self._std_weight_position * measurement[3], 2 * self._std_weight_position * measurement[3], 1e-2,
+ 2 * self._std_weight_position * measurement[3], 10 * self._std_weight_velocity * measurement[3],
+ 10 * self._std_weight_velocity * measurement[3], 1e-5, 10 * self._std_weight_velocity * measurement[3]
+ ]
+ covariance = np.diag(np.square(std))
+ return mean, covariance
+
+ def predict(self, mean, covariance):
+ """
+ Run Kalman filter prediction step.
+
+ Args:
+ mean (ndarray): The 8 dimensional mean vector of the object state
+ at the previous time step.
+ covariance (ndarray): The 8x8 dimensional covariance matrix of the
+ object state at the previous time step.
+
+ Returns:
+ The mean vector and covariance matrix of the predicted state.
+ Unobserved velocities are initialized to 0 mean.
+ """
+ std_pos = [
+ self._std_weight_position * mean[3], self._std_weight_position * mean[3], 1e-2,
+ self._std_weight_position * mean[3]
+ ]
+ std_vel = [
+ self._std_weight_velocity * mean[3], self._std_weight_velocity * mean[3], 1e-5,
+ self._std_weight_velocity * mean[3]
+ ]
+ motion_cov = np.diag(np.square(np.r_[std_pos, std_vel]))
+
+ #mean = np.dot(self._motion_mat, mean)
+ mean = np.dot(mean, self._motion_mat.T)
+ covariance = np.linalg.multi_dot((self._motion_mat, covariance, self._motion_mat.T)) + motion_cov
+
+ return mean, covariance
+
+ def project(self, mean, covariance):
+ """
+ Project state distribution to measurement space.
+
+ Args
+ mean (ndarray): The state's mean vector (8 dimensional array).
+ covariance (ndarray): The state's covariance matrix (8x8 dimensional).
+
+ Returns:
+ The projected mean and covariance matrix of the given state estimate.
+ """
+ std = [
+ self._std_weight_position * mean[3], self._std_weight_position * mean[3], 1e-1,
+ self._std_weight_position * mean[3]
+ ]
+ innovation_cov = np.diag(np.square(std))
+
+ mean = np.dot(self._update_mat, mean)
+ covariance = np.linalg.multi_dot((self._update_mat, covariance, self._update_mat.T))
+ return mean, covariance + innovation_cov
+
+ def multi_predict(self, mean, covariance):
+ """
+ Run Kalman filter prediction step (Vectorized version).
+
+ Args:
+ mean (ndarray): The Nx8 dimensional mean matrix of the object states
+ at the previous time step.
+ covariance (ndarray): The Nx8x8 dimensional covariance matrics of the
+ object states at the previous time step.
+
+ Returns:
+ The mean vector and covariance matrix of the predicted state.
+ Unobserved velocities are initialized to 0 mean.
+ """
+ std_pos = [
+ self._std_weight_position * mean[:, 3], self._std_weight_position * mean[:, 3],
+ 1e-2 * np.ones_like(mean[:, 3]), self._std_weight_position * mean[:, 3]
+ ]
+ std_vel = [
+ self._std_weight_velocity * mean[:, 3], self._std_weight_velocity * mean[:, 3],
+ 1e-5 * np.ones_like(mean[:, 3]), self._std_weight_velocity * mean[:, 3]
+ ]
+ sqr = np.square(np.r_[std_pos, std_vel]).T
+
+ motion_cov = []
+ for i in range(len(mean)):
+ motion_cov.append(np.diag(sqr[i]))
+ motion_cov = np.asarray(motion_cov)
+
+ mean = np.dot(mean, self._motion_mat.T)
+ left = np.dot(self._motion_mat, covariance).transpose((1, 0, 2))
+ covariance = np.dot(left, self._motion_mat.T) + motion_cov
+
+ return mean, covariance
+
+ def update(self, mean, covariance, measurement):
+ """
+ Run Kalman filter correction step.
+
+ Args:
+ mean (ndarray): The predicted state's mean vector (8 dimensional).
+ covariance (ndarray): The state's covariance matrix (8x8 dimensional).
+ measurement (ndarray): The 4 dimensional measurement vector
+ (x, y, a, h), where (x, y) is the center position, a the aspect
+ ratio, and h the height of the bounding box.
+
+ Returns:
+ The measurement-corrected state distribution.
+ """
+ projected_mean, projected_cov = self.project(mean, covariance)
+
+ chol_factor, lower = scipy.linalg.cho_factor(projected_cov, lower=True, check_finite=False)
+ kalman_gain = scipy.linalg.cho_solve((chol_factor, lower),
+ np.dot(covariance, self._update_mat.T).T,
+ check_finite=False).T
+ innovation = measurement - projected_mean
+
+ new_mean = mean + np.dot(innovation, kalman_gain.T)
+ new_covariance = covariance - np.linalg.multi_dot((kalman_gain, projected_cov, kalman_gain.T))
+ return new_mean, new_covariance
+
+ def gating_distance(self, mean, covariance, measurements, only_position=False, metric='maha'):
+ """
+ Compute gating distance between state distribution and measurements.
+ A suitable distance threshold can be obtained from `chi2inv95`. If
+ `only_position` is False, the chi-square distribution has 4 degrees of
+ freedom, otherwise 2.
+
+ Args:
+ mean (ndarray): Mean vector over the state distribution (8
+ dimensional).
+ covariance (ndarray): Covariance of the state distribution (8x8
+ dimensional).
+ measurements (ndarray): An Nx4 dimensional matrix of N measurements,
+ each in format (x, y, a, h) where (x, y) is the bounding box center
+ position, a the aspect ratio, and h the height.
+ only_position (Optional[bool]): If True, distance computation is
+ done with respect to the bounding box center position only.
+ metric (str): Metric type, 'gaussian' or 'maha'.
+
+ Returns
+ An array of length N, where the i-th element contains the squared
+ Mahalanobis distance between (mean, covariance) and `measurements[i]`.
+ """
+ mean, covariance = self.project(mean, covariance)
+ if only_position:
+ mean, covariance = mean[:2], covariance[:2, :2]
+ measurements = measurements[:, :2]
+
+ d = measurements - mean
+ if metric == 'gaussian':
+ return np.sum(d * d, axis=1)
+ elif metric == 'maha':
+ cholesky_factor = np.linalg.cholesky(covariance)
+ z = scipy.linalg.solve_triangular(cholesky_factor, d.T, lower=True, check_finite=False, overwrite_b=True)
+ squared_maha = np.sum(z * z, axis=0)
+ return squared_maha
+ else:
+ raise ValueError('invalid distance metric')
diff --git a/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/tracker/__init__.py b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/tracker/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..904822119661be61141715c638388db9d045fee1
--- /dev/null
+++ b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/tracker/__init__.py
@@ -0,0 +1,21 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from . import base_jde_tracker
+from . import base_sde_tracker
+from . import jde_tracker
+
+from .base_jde_tracker import *
+from .base_sde_tracker import *
+from .jde_tracker import *
diff --git a/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/tracker/base_jde_tracker.py b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/tracker/base_jde_tracker.py
new file mode 100644
index 0000000000000000000000000000000000000000..9505a709ee573acecf4b5dd7e02a06cee9d44284
--- /dev/null
+++ b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/tracker/base_jde_tracker.py
@@ -0,0 +1,257 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This code is borrow from https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/tracker/multitracker.py
+"""
+
+import numpy as np
+from collections import deque, OrderedDict
+from ..matching import jde_matching as matching
+from ppdet.core.workspace import register, serializable
+
+__all__ = [
+ 'TrackState',
+ 'BaseTrack',
+ 'STrack',
+ 'joint_stracks',
+ 'sub_stracks',
+ 'remove_duplicate_stracks',
+]
+
+
+class TrackState(object):
+ New = 0
+ Tracked = 1
+ Lost = 2
+ Removed = 3
+
+
+class BaseTrack(object):
+ _count = 0
+
+ track_id = 0
+ is_activated = False
+ state = TrackState.New
+
+ history = OrderedDict()
+ features = []
+ curr_feature = None
+ score = 0
+ start_frame = 0
+ frame_id = 0
+ time_since_update = 0
+
+ # multi-camera
+ location = (np.inf, np.inf)
+
+ @property
+ def end_frame(self):
+ return self.frame_id
+
+ @staticmethod
+ def next_id():
+ BaseTrack._count += 1
+ return BaseTrack._count
+
+ def activate(self, *args):
+ raise NotImplementedError
+
+ def predict(self):
+ raise NotImplementedError
+
+ def update(self, *args, **kwargs):
+ raise NotImplementedError
+
+ def mark_lost(self):
+ self.state = TrackState.Lost
+
+ def mark_removed(self):
+ self.state = TrackState.Removed
+
+
+class STrack(BaseTrack):
+ def __init__(self, tlwh, score, temp_feat, buffer_size=30):
+ # wait activate
+ self._tlwh = np.asarray(tlwh, dtype=np.float)
+ self.kalman_filter = None
+ self.mean, self.covariance = None, None
+ self.is_activated = False
+
+ self.score = score
+ self.tracklet_len = 0
+
+ self.smooth_feat = None
+ self.update_features(temp_feat)
+ self.features = deque([], maxlen=buffer_size)
+ self.alpha = 0.9
+
+ def update_features(self, feat):
+ feat /= np.linalg.norm(feat)
+ self.curr_feat = feat
+ if self.smooth_feat is None:
+ self.smooth_feat = feat
+ else:
+ self.smooth_feat = self.alpha * self.smooth_feat + (1 - self.alpha) * feat
+ self.features.append(feat)
+ self.smooth_feat /= np.linalg.norm(self.smooth_feat)
+
+ def predict(self):
+ mean_state = self.mean.copy()
+ if self.state != TrackState.Tracked:
+ mean_state[7] = 0
+ self.mean, self.covariance = self.kalman_filter.predict(mean_state, self.covariance)
+
+ @staticmethod
+ def multi_predict(stracks, kalman_filter):
+ if len(stracks) > 0:
+ multi_mean = np.asarray([st.mean.copy() for st in stracks])
+ multi_covariance = np.asarray([st.covariance for st in stracks])
+ for i, st in enumerate(stracks):
+ if st.state != TrackState.Tracked:
+ multi_mean[i][7] = 0
+ multi_mean, multi_covariance = kalman_filter.multi_predict(multi_mean, multi_covariance)
+ for i, (mean, cov) in enumerate(zip(multi_mean, multi_covariance)):
+ stracks[i].mean = mean
+ stracks[i].covariance = cov
+
+ def activate(self, kalman_filter, frame_id):
+ """Start a new tracklet"""
+ self.kalman_filter = kalman_filter
+ self.track_id = self.next_id()
+ self.mean, self.covariance = self.kalman_filter.initiate(self.tlwh_to_xyah(self._tlwh))
+
+ self.tracklet_len = 0
+ self.state = TrackState.Tracked
+ if frame_id == 1:
+ self.is_activated = True
+ self.frame_id = frame_id
+ self.start_frame = frame_id
+
+ def re_activate(self, new_track, frame_id, new_id=False):
+ self.mean, self.covariance = self.kalman_filter.update(self.mean, self.covariance,
+ self.tlwh_to_xyah(new_track.tlwh))
+
+ self.update_features(new_track.curr_feat)
+ self.tracklet_len = 0
+ self.state = TrackState.Tracked
+ self.is_activated = True
+ self.frame_id = frame_id
+ if new_id:
+ self.track_id = self.next_id()
+
+ def update(self, new_track, frame_id, update_feature=True):
+ self.frame_id = frame_id
+ self.tracklet_len += 1
+
+ new_tlwh = new_track.tlwh
+ self.mean, self.covariance = self.kalman_filter.update(self.mean, self.covariance, self.tlwh_to_xyah(new_tlwh))
+ self.state = TrackState.Tracked
+ self.is_activated = True
+
+ self.score = new_track.score
+ if update_feature:
+ self.update_features(new_track.curr_feat)
+
+ @property
+ def tlwh(self):
+ """
+ Get current position in bounding box format `(top left x, top left y,
+ width, height)`.
+ """
+ if self.mean is None:
+ return self._tlwh.copy()
+ ret = self.mean[:4].copy()
+ ret[2] *= ret[3]
+ ret[:2] -= ret[2:] / 2
+ return ret
+
+ @property
+ def tlbr(self):
+ """
+ Convert bounding box to format `(min x, min y, max x, max y)`, i.e.,
+ `(top left, bottom right)`.
+ """
+ ret = self.tlwh.copy()
+ ret[2:] += ret[:2]
+ return ret
+
+ @staticmethod
+ def tlwh_to_xyah(tlwh):
+ """
+ Convert bounding box to format `(center x, center y, aspect ratio,
+ height)`, where the aspect ratio is `width / height`.
+ """
+ ret = np.asarray(tlwh).copy()
+ ret[:2] += ret[2:] / 2
+ ret[2] /= ret[3]
+ return ret
+
+ def to_xyah(self):
+ return self.tlwh_to_xyah(self.tlwh)
+
+ @staticmethod
+ def tlbr_to_tlwh(tlbr):
+ ret = np.asarray(tlbr).copy()
+ ret[2:] -= ret[:2]
+ return ret
+
+ @staticmethod
+ def tlwh_to_tlbr(tlwh):
+ ret = np.asarray(tlwh).copy()
+ ret[2:] += ret[:2]
+ return ret
+
+ def __repr__(self):
+ return 'OT_{}_({}-{})'.format(self.track_id, self.start_frame, self.end_frame)
+
+
+def joint_stracks(tlista, tlistb):
+ exists = {}
+ res = []
+ for t in tlista:
+ exists[t.track_id] = 1
+ res.append(t)
+ for t in tlistb:
+ tid = t.track_id
+ if not exists.get(tid, 0):
+ exists[tid] = 1
+ res.append(t)
+ return res
+
+
+def sub_stracks(tlista, tlistb):
+ stracks = {}
+ for t in tlista:
+ stracks[t.track_id] = t
+ for t in tlistb:
+ tid = t.track_id
+ if stracks.get(tid, 0):
+ del stracks[tid]
+ return list(stracks.values())
+
+
+def remove_duplicate_stracks(stracksa, stracksb):
+ pdist = matching.iou_distance(stracksa, stracksb)
+ pairs = np.where(pdist < 0.15)
+ dupa, dupb = list(), list()
+ for p, q in zip(*pairs):
+ timep = stracksa[p].frame_id - stracksa[p].start_frame
+ timeq = stracksb[q].frame_id - stracksb[q].start_frame
+ if timep > timeq:
+ dupb.append(q)
+ else:
+ dupa.append(p)
+ resa = [t for i, t in enumerate(stracksa) if not i in dupa]
+ resb = [t for i, t in enumerate(stracksb) if not i in dupb]
+ return resa, resb
diff --git a/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/tracker/base_sde_tracker.py b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/tracker/base_sde_tracker.py
new file mode 100644
index 0000000000000000000000000000000000000000..2e811e536a42ff781f60872b448b251de0301f61
--- /dev/null
+++ b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/tracker/base_sde_tracker.py
@@ -0,0 +1,133 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This code is borrow from https://github.com/nwojke/deep_sort/blob/master/deep_sort/track.py
+"""
+
+from ppdet.core.workspace import register, serializable
+
+__all__ = ['TrackState', 'Track']
+
+
+class TrackState(object):
+ """
+ Enumeration type for the single target track state. Newly created tracks are
+ classified as `tentative` until enough evidence has been collected. Then,
+ the track state is changed to `confirmed`. Tracks that are no longer alive
+ are classified as `deleted` to mark them for removal from the set of active
+ tracks.
+ """
+ Tentative = 1
+ Confirmed = 2
+ Deleted = 3
+
+
+class Track(object):
+ """
+ A single target track with state space `(x, y, a, h)` and associated
+ velocities, where `(x, y)` is the center of the bounding box, `a` is the
+ aspect ratio and `h` is the height.
+
+ Args:
+ mean (ndarray): Mean vector of the initial state distribution.
+ covariance (ndarray): Covariance matrix of the initial state distribution.
+ track_id (int): A unique track identifier.
+ n_init (int): Number of consecutive detections before the track is confirmed.
+ The track state is set to `Deleted` if a miss occurs within the first
+ `n_init` frames.
+ max_age (int): The maximum number of consecutive misses before the track
+ state is set to `Deleted`.
+ feature (Optional[ndarray]): Feature vector of the detection this track
+ originates from. If not None, this feature is added to the `features` cache.
+
+ Attributes:
+ hits (int): Total number of measurement updates.
+ age (int): Total number of frames since first occurance.
+ time_since_update (int): Total number of frames since last measurement
+ update.
+ state (TrackState): The current track state.
+ features (List[ndarray]): A cache of features. On each measurement update,
+ the associated feature vector is added to this list.
+ """
+
+ def __init__(self, mean, covariance, track_id, n_init, max_age, feature=None):
+ self.mean = mean
+ self.covariance = covariance
+ self.track_id = track_id
+ self.hits = 1
+ self.age = 1
+ self.time_since_update = 0
+
+ self.state = TrackState.Tentative
+ self.features = []
+ if feature is not None:
+ self.features.append(feature)
+
+ self._n_init = n_init
+ self._max_age = max_age
+
+ def to_tlwh(self):
+ """Get position in format `(top left x, top left y, width, height)`."""
+ ret = self.mean[:4].copy()
+ ret[2] *= ret[3]
+ ret[:2] -= ret[2:] / 2
+ return ret
+
+ def to_tlbr(self):
+ """Get position in bounding box format `(min x, miny, max x, max y)`."""
+ ret = self.to_tlwh()
+ ret[2:] = ret[:2] + ret[2:]
+ return ret
+
+ def predict(self, kalman_filter):
+ """
+ Propagate the state distribution to the current time step using a Kalman
+ filter prediction step.
+ """
+ self.mean, self.covariance = kalman_filter.predict(self.mean, self.covariance)
+ self.age += 1
+ self.time_since_update += 1
+
+ def update(self, kalman_filter, detection):
+ """
+ Perform Kalman filter measurement update step and update the associated
+ detection feature cache.
+ """
+ self.mean, self.covariance = kalman_filter.update(self.mean, self.covariance, detection.to_xyah())
+ self.features.append(detection.feature)
+
+ self.hits += 1
+ self.time_since_update = 0
+ if self.state == TrackState.Tentative and self.hits >= self._n_init:
+ self.state = TrackState.Confirmed
+
+ def mark_missed(self):
+ """Mark this track as missed (no association at the current time step).
+ """
+ if self.state == TrackState.Tentative:
+ self.state = TrackState.Deleted
+ elif self.time_since_update > self._max_age:
+ self.state = TrackState.Deleted
+
+ def is_tentative(self):
+ """Returns True if this track is tentative (unconfirmed)."""
+ return self.state == TrackState.Tentative
+
+ def is_confirmed(self):
+ """Returns True if this track is confirmed."""
+ return self.state == TrackState.Confirmed
+
+ def is_deleted(self):
+ """Returns True if this track is dead and should be deleted."""
+ return self.state == TrackState.Deleted
diff --git a/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/tracker/jde_tracker.py b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/tracker/jde_tracker.py
new file mode 100644
index 0000000000000000000000000000000000000000..2e1cafb345b7687e563fc6d9c2c1769cb39d690c
--- /dev/null
+++ b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/tracker/jde_tracker.py
@@ -0,0 +1,248 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This code is borrow from https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/tracker/multitracker.py
+"""
+
+import paddle
+
+from ..matching import jde_matching as matching
+from .base_jde_tracker import TrackState, BaseTrack, STrack
+from .base_jde_tracker import joint_stracks, sub_stracks, remove_duplicate_stracks
+
+from ppdet.core.workspace import register, serializable
+from ppdet.utils.logger import setup_logger
+logger = setup_logger(__name__)
+
+__all__ = ['FrozenJDETracker']
+
+
+@register
+@serializable
+class FrozenJDETracker(object):
+ __inject__ = ['motion']
+ """
+ JDE tracker
+
+ Args:
+ det_thresh (float): threshold of detection score
+ track_buffer (int): buffer for tracker
+ min_box_area (int): min box area to filter out low quality boxes
+ vertical_ratio (float): w/h, the vertical ratio of the bbox to filter
+ bad results, set 1.6 default for pedestrian tracking. If set -1
+ means no need to filter bboxes.
+ tracked_thresh (float): linear assignment threshold of tracked
+ stracks and detections
+ r_tracked_thresh (float): linear assignment threshold of
+ tracked stracks and unmatched detections
+ unconfirmed_thresh (float): linear assignment threshold of
+ unconfirmed stracks and unmatched detections
+ motion (object): KalmanFilter instance
+ conf_thres (float): confidence threshold for tracking
+ metric_type (str): either "euclidean" or "cosine", the distance metric
+ used for measurement to track association.
+ """
+
+ def __init__(self,
+ det_thresh=0.3,
+ track_buffer=30,
+ min_box_area=200,
+ vertical_ratio=1.6,
+ tracked_thresh=0.7,
+ r_tracked_thresh=0.5,
+ unconfirmed_thresh=0.7,
+ motion='KalmanFilter',
+ conf_thres=0,
+ metric_type='euclidean'):
+ self.det_thresh = det_thresh
+ self.track_buffer = track_buffer
+ self.min_box_area = min_box_area
+ self.vertical_ratio = vertical_ratio
+
+ self.tracked_thresh = tracked_thresh
+ self.r_tracked_thresh = r_tracked_thresh
+ self.unconfirmed_thresh = unconfirmed_thresh
+ self.motion = motion
+ self.conf_thres = conf_thres
+ self.metric_type = metric_type
+
+ self.frame_id = 0
+ self.tracked_stracks = []
+ self.lost_stracks = []
+ self.removed_stracks = []
+
+ self.max_time_lost = 0
+ # max_time_lost will be calculated: int(frame_rate / 30.0 * track_buffer)
+
+ def update(self, pred_dets, pred_embs):
+ """
+ Processes the image frame and finds bounding box(detections).
+ Associates the detection with corresponding tracklets and also handles
+ lost, removed, refound and active tracklets.
+
+ Args:
+ pred_dets (Tensor): Detection results of the image, shape is [N, 5].
+ pred_embs (Tensor): Embedding results of the image, shape is [N, 512].
+
+ Return:
+ output_stracks (list): The list contains information regarding the
+ online_tracklets for the recieved image tensor.
+ """
+ self.frame_id += 1
+ activated_starcks = []
+ # for storing active tracks, for the current frame
+ refind_stracks = []
+ # Lost Tracks whose detections are obtained in the current frame
+ lost_stracks = []
+ # The tracks which are not obtained in the current frame but are not
+ # removed. (Lost for some time lesser than the threshold for removing)
+ removed_stracks = []
+
+ remain_inds = paddle.nonzero(pred_dets[:, 4] > self.conf_thres)
+ if remain_inds.shape[0] == 0:
+ pred_dets = paddle.zeros([0, 1])
+ pred_embs = paddle.zeros([0, 1])
+ else:
+ pred_dets = paddle.gather(pred_dets, remain_inds)
+ pred_embs = paddle.gather(pred_embs, remain_inds)
+
+ # Filter out the image with box_num = 0. pred_dets = [[0.0, 0.0, 0.0 ,0.0]]
+ empty_pred = True if len(pred_dets) == 1 and paddle.sum(pred_dets) == 0.0 else False
+ """ Step 1: Network forward, get detections & embeddings"""
+ if len(pred_dets) > 0 and not empty_pred:
+ pred_dets = pred_dets.numpy()
+ pred_embs = pred_embs.numpy()
+ detections = [
+ STrack(STrack.tlbr_to_tlwh(tlbrs[:4]), tlbrs[4], f, 30) for (tlbrs, f) in zip(pred_dets, pred_embs)
+ ]
+ else:
+ detections = []
+ ''' Add newly detected tracklets to tracked_stracks'''
+ unconfirmed = []
+ tracked_stracks = [] # type: list[STrack]
+ for track in self.tracked_stracks:
+ if not track.is_activated:
+ # previous tracks which are not active in the current frame are added in unconfirmed list
+ unconfirmed.append(track)
+ else:
+ # Active tracks are added to the local list 'tracked_stracks'
+ tracked_stracks.append(track)
+ """ Step 2: First association, with embedding"""
+ # Combining currently tracked_stracks and lost_stracks
+ strack_pool = joint_stracks(tracked_stracks, self.lost_stracks)
+ # Predict the current location with KF
+ STrack.multi_predict(strack_pool, self.motion)
+
+ dists = matching.embedding_distance(strack_pool, detections, metric=self.metric_type)
+ dists = matching.fuse_motion(self.motion, dists, strack_pool, detections)
+ # The dists is the list of distances of the detection with the tracks in strack_pool
+ matches, u_track, u_detection = matching.linear_assignment(dists, thresh=self.tracked_thresh)
+ # The matches is the array for corresponding matches of the detection with the corresponding strack_pool
+
+ for itracked, idet in matches:
+ # itracked is the id of the track and idet is the detection
+ track = strack_pool[itracked]
+ det = detections[idet]
+ if track.state == TrackState.Tracked:
+ # If the track is active, add the detection to the track
+ track.update(detections[idet], self.frame_id)
+ activated_starcks.append(track)
+ else:
+ # We have obtained a detection from a track which is not active,
+ # hence put the track in refind_stracks list
+ track.re_activate(det, self.frame_id, new_id=False)
+ refind_stracks.append(track)
+
+ # None of the steps below happen if there are no undetected tracks.
+ """ Step 3: Second association, with IOU"""
+ detections = [detections[i] for i in u_detection]
+ # detections is now a list of the unmatched detections
+ r_tracked_stracks = []
+ # This is container for stracks which were tracked till the previous
+ # frame but no detection was found for it in the current frame.
+
+ for i in u_track:
+ if strack_pool[i].state == TrackState.Tracked:
+ r_tracked_stracks.append(strack_pool[i])
+ dists = matching.iou_distance(r_tracked_stracks, detections)
+ matches, u_track, u_detection = matching.linear_assignment(dists, thresh=self.r_tracked_thresh)
+ # matches is the list of detections which matched with corresponding
+ # tracks by IOU distance method.
+
+ for itracked, idet in matches:
+ track = r_tracked_stracks[itracked]
+ det = detections[idet]
+ if track.state == TrackState.Tracked:
+ track.update(det, self.frame_id)
+ activated_starcks.append(track)
+ else:
+ track.re_activate(det, self.frame_id, new_id=False)
+ refind_stracks.append(track)
+ # Same process done for some unmatched detections, but now considering IOU_distance as measure
+
+ for it in u_track:
+ track = r_tracked_stracks[it]
+ if not track.state == TrackState.Lost:
+ track.mark_lost()
+ lost_stracks.append(track)
+ # If no detections are obtained for tracks (u_track), the tracks are added to lost_tracks list and are marked lost
+ '''Deal with unconfirmed tracks, usually tracks with only one beginning frame'''
+ detections = [detections[i] for i in u_detection]
+ dists = matching.iou_distance(unconfirmed, detections)
+ matches, u_unconfirmed, u_detection = matching.linear_assignment(dists, thresh=self.unconfirmed_thresh)
+ for itracked, idet in matches:
+ unconfirmed[itracked].update(detections[idet], self.frame_id)
+ activated_starcks.append(unconfirmed[itracked])
+
+ # The tracks which are yet not matched
+ for it in u_unconfirmed:
+ track = unconfirmed[it]
+ track.mark_removed()
+ removed_stracks.append(track)
+
+ # after all these confirmation steps, if a new detection is found, it is initialized for a new track
+ """ Step 4: Init new stracks"""
+ for inew in u_detection:
+ track = detections[inew]
+ if track.score < self.det_thresh:
+ continue
+ track.activate(self.motion, self.frame_id)
+ activated_starcks.append(track)
+ """ Step 5: Update state"""
+ # If the tracks are lost for more frames than the threshold number, the tracks are removed.
+ for track in self.lost_stracks:
+ if self.frame_id - track.end_frame > self.max_time_lost:
+ track.mark_removed()
+ removed_stracks.append(track)
+
+ # Update the self.tracked_stracks and self.lost_stracks using the updates in this step.
+ self.tracked_stracks = [t for t in self.tracked_stracks if t.state == TrackState.Tracked]
+ self.tracked_stracks = joint_stracks(self.tracked_stracks, activated_starcks)
+ self.tracked_stracks = joint_stracks(self.tracked_stracks, refind_stracks)
+
+ self.lost_stracks = sub_stracks(self.lost_stracks, self.tracked_stracks)
+ self.lost_stracks.extend(lost_stracks)
+ self.lost_stracks = sub_stracks(self.lost_stracks, self.removed_stracks)
+ self.removed_stracks.extend(removed_stracks)
+ self.tracked_stracks, self.lost_stracks = remove_duplicate_stracks(self.tracked_stracks, self.lost_stracks)
+ # get scores of lost tracks
+ output_stracks = [track for track in self.tracked_stracks if track.is_activated]
+
+ logger.debug('===========Frame {}=========='.format(self.frame_id))
+ logger.debug('Activated: {}'.format([track.track_id for track in activated_starcks]))
+ logger.debug('Refind: {}'.format([track.track_id for track in refind_stracks]))
+ logger.debug('Lost: {}'.format([track.track_id for track in lost_stracks]))
+ logger.debug('Removed: {}'.format([track.track_id for track in removed_stracks]))
+
+ return output_stracks
diff --git a/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/utils.py b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..12c61686a1715a965407822dcf19fd1081f292d7
--- /dev/null
+++ b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/utils.py
@@ -0,0 +1,176 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import cv2
+import time
+import paddle
+import numpy as np
+
+__all__ = [
+ 'Timer',
+ 'Detection',
+ 'load_det_results',
+ 'preprocess_reid',
+ 'get_crops',
+ 'clip_box',
+ 'scale_coords',
+]
+
+
+class Timer(object):
+ """
+ This class used to compute and print the current FPS while evaling.
+ """
+
+ def __init__(self):
+ self.total_time = 0.
+ self.calls = 0
+ self.start_time = 0.
+ self.diff = 0.
+ self.average_time = 0.
+ self.duration = 0.
+
+ def tic(self):
+ # using time.time instead of time.clock because time time.clock
+ # does not normalize for multithreading
+ self.start_time = time.time()
+
+ def toc(self, average=True):
+ self.diff = time.time() - self.start_time
+ self.total_time += self.diff
+ self.calls += 1
+ self.average_time = self.total_time / self.calls
+ if average:
+ self.duration = self.average_time
+ else:
+ self.duration = self.diff
+ return self.duration
+
+ def clear(self):
+ self.total_time = 0.
+ self.calls = 0
+ self.start_time = 0.
+ self.diff = 0.
+ self.average_time = 0.
+ self.duration = 0.
+
+
+class Detection(object):
+ """
+ This class represents a bounding box detection in a single image.
+
+ Args:
+ tlwh (ndarray): Bounding box in format `(top left x, top left y,
+ width, height)`.
+ confidence (ndarray): Detector confidence score.
+ feature (Tensor): A feature vector that describes the object
+ contained in this image.
+ """
+
+ def __init__(self, tlwh, confidence, feature):
+ self.tlwh = np.asarray(tlwh, dtype=np.float32)
+ self.confidence = np.asarray(confidence, dtype=np.float32)
+ self.feature = feature
+
+ def to_tlbr(self):
+ """
+ Convert bounding box to format `(min x, min y, max x, max y)`, i.e.,
+ `(top left, bottom right)`.
+ """
+ ret = self.tlwh.copy()
+ ret[2:] += ret[:2]
+ return ret
+
+ def to_xyah(self):
+ """
+ Convert bounding box to format `(center x, center y, aspect ratio,
+ height)`, where the aspect ratio is `width / height`.
+ """
+ ret = self.tlwh.copy()
+ ret[:2] += ret[2:] / 2
+ ret[2] /= ret[3]
+ return ret
+
+
+def load_det_results(det_file, num_frames):
+ assert os.path.exists(det_file) and os.path.isfile(det_file), \
+ 'Error: det_file: {} not exist or not a file.'.format(det_file)
+ labels = np.loadtxt(det_file, dtype='float32', delimiter=',')
+ results_list = []
+ for frame_i in range(0, num_frames):
+ results = {'bbox': [], 'score': []}
+ lables_with_frame = labels[labels[:, 0] == frame_i + 1]
+ for l in lables_with_frame:
+ results['bbox'].append(l[1:5])
+ results['score'].append(l[5])
+ results_list.append(results)
+ return results_list
+
+
+def scale_coords(coords, input_shape, im_shape, scale_factor):
+ im_shape = im_shape.numpy()[0]
+ ratio = scale_factor[0][0]
+ pad_w = (input_shape[1] - int(im_shape[1])) / 2
+ pad_h = (input_shape[0] - int(im_shape[0])) / 2
+ coords = paddle.cast(coords, 'float32')
+ coords[:, 0::2] -= pad_w
+ coords[:, 1::2] -= pad_h
+ coords[:, 0:4] /= ratio
+ coords[:, :4] = paddle.clip(coords[:, :4], min=0, max=coords[:, :4].max())
+ return coords.round()
+
+
+def clip_box(xyxy, input_shape, im_shape, scale_factor):
+ im_shape = im_shape.numpy()[0]
+ ratio = scale_factor.numpy()[0][0]
+ img0_shape = [int(im_shape[0] / ratio), int(im_shape[1] / ratio)]
+
+ xyxy[:, 0::2] = paddle.clip(xyxy[:, 0::2], min=0, max=img0_shape[1])
+ xyxy[:, 1::2] = paddle.clip(xyxy[:, 1::2], min=0, max=img0_shape[0])
+ return xyxy
+
+
+def get_crops(xyxy, ori_img, pred_scores, w, h):
+ crops = []
+ keep_scores = []
+ xyxy = xyxy.numpy().astype(np.int64)
+ ori_img = ori_img.numpy()
+ ori_img = np.squeeze(ori_img, axis=0).transpose(1, 0, 2)
+ pred_scores = pred_scores.numpy()
+ for i, bbox in enumerate(xyxy):
+ if bbox[2] <= bbox[0] or bbox[3] <= bbox[1]:
+ continue
+ crop = ori_img[bbox[0]:bbox[2], bbox[1]:bbox[3], :]
+ crops.append(crop)
+ keep_scores.append(pred_scores[i])
+ if len(crops) == 0:
+ return [], []
+ crops = preprocess_reid(crops, w, h)
+ return crops, keep_scores
+
+
+def preprocess_reid(imgs, w=64, h=192, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
+ im_batch = []
+ for img in imgs:
+ img = cv2.resize(img, (w, h))
+ img = img[:, :, ::-1].astype('float32').transpose((2, 0, 1)) / 255
+ img_mean = np.array(mean).reshape((3, 1, 1))
+ img_std = np.array(std).reshape((3, 1, 1))
+ img -= img_mean
+ img /= img_std
+ img = np.expand_dims(img, axis=0)
+ im_batch.append(img)
+ im_batch = np.concatenate(im_batch, 0)
+ return im_batch
diff --git a/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/visualization.py b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/visualization.py
new file mode 100644
index 0000000000000000000000000000000000000000..cd9c5b15e15f677b7955dd4eba40798e985315a1
--- /dev/null
+++ b/modules/video/multiple_object_tracking/jde_darknet53/modeling/mot/visualization.py
@@ -0,0 +1,117 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import cv2
+import numpy as np
+
+
+def tlwhs_to_tlbrs(tlwhs):
+ tlbrs = np.copy(tlwhs)
+ if len(tlbrs) == 0:
+ return tlbrs
+ tlbrs[:, 2] += tlwhs[:, 0]
+ tlbrs[:, 3] += tlwhs[:, 1]
+ return tlbrs
+
+
+def get_color(idx):
+ idx = idx * 3
+ color = ((37 * idx) % 255, (17 * idx) % 255, (29 * idx) % 255)
+ return color
+
+
+def resize_image(image, max_size=800):
+ if max(image.shape[:2]) > max_size:
+ scale = float(max_size) / max(image.shape[:2])
+ image = cv2.resize(image, None, fx=scale, fy=scale)
+ return image
+
+
+def plot_tracking(image, tlwhs, obj_ids, scores=None, frame_id=0, fps=0., ids2=None):
+ im = np.ascontiguousarray(np.copy(image))
+ im_h, im_w = im.shape[:2]
+
+ top_view = np.zeros([im_w, im_w, 3], dtype=np.uint8) + 255
+
+ text_scale = max(1, image.shape[1] / 1600.)
+ text_thickness = 2
+ line_thickness = max(1, int(image.shape[1] / 500.))
+
+ radius = max(5, int(im_w / 140.))
+ cv2.putText(
+ im,
+ 'frame: %d fps: %.2f num: %d' % (frame_id, fps, len(tlwhs)), (0, int(15 * text_scale)),
+ cv2.FONT_HERSHEY_PLAIN,
+ text_scale, (0, 0, 255),
+ thickness=2)
+
+ for i, tlwh in enumerate(tlwhs):
+ x1, y1, w, h = tlwh
+ intbox = tuple(map(int, (x1, y1, x1 + w, y1 + h)))
+ obj_id = int(obj_ids[i])
+ id_text = '{}'.format(int(obj_id))
+ if ids2 is not None:
+ id_text = id_text + ', {}'.format(int(ids2[i]))
+ _line_thickness = 1 if obj_id <= 0 else line_thickness
+ color = get_color(abs(obj_id))
+ cv2.rectangle(im, intbox[0:2], intbox[2:4], color=color, thickness=line_thickness)
+ cv2.putText(
+ im,
+ id_text, (intbox[0], intbox[1] + 10),
+ cv2.FONT_HERSHEY_PLAIN,
+ text_scale, (0, 0, 255),
+ thickness=text_thickness)
+
+ if scores is not None:
+ text = '{:.2f}'.format(float(scores[i]))
+ cv2.putText(
+ im,
+ text, (intbox[0], intbox[1] - 10),
+ cv2.FONT_HERSHEY_PLAIN,
+ text_scale, (0, 255, 255),
+ thickness=text_thickness)
+ return im
+
+
+def plot_trajectory(image, tlwhs, track_ids):
+ image = image.copy()
+ for one_tlwhs, track_id in zip(tlwhs, track_ids):
+ color = get_color(int(track_id))
+ for tlwh in one_tlwhs:
+ x1, y1, w, h = tuple(map(int, tlwh))
+ cv2.circle(image, (int(x1 + 0.5 * w), int(y1 + h)), 2, color, thickness=2)
+ return image
+
+
+def plot_detections(image, tlbrs, scores=None, color=(255, 0, 0), ids=None):
+ im = np.copy(image)
+ text_scale = max(1, image.shape[1] / 800.)
+ thickness = 2 if text_scale > 1.3 else 1
+ for i, det in enumerate(tlbrs):
+ x1, y1, x2, y2 = np.asarray(det[:4], dtype=np.int)
+ if len(det) >= 7:
+ label = 'det' if det[5] > 0 else 'trk'
+ if ids is not None:
+ text = '{}# {:.2f}: {:d}'.format(label, det[6], ids[i])
+ cv2.putText(
+ im, text, (x1, y1 + 30), cv2.FONT_HERSHEY_PLAIN, text_scale, (0, 255, 255), thickness=thickness)
+ else:
+ text = '{}# {:.2f}'.format(label, det[6])
+
+ if scores is not None:
+ text = '{:.2f}'.format(scores[i])
+ cv2.putText(im, text, (x1, y1 + 30), cv2.FONT_HERSHEY_PLAIN, text_scale, (0, 255, 255), thickness=thickness)
+
+ cv2.rectangle(im, (x1, y1), (x2, y2), color, 2)
+ return im
diff --git a/modules/video/multiple_object_tracking/jde_darknet53/tracker.py b/modules/video/multiple_object_tracking/jde_darknet53/tracker.py
index a4488125b11e09d7fb6e4328252ad61e8e844aac..1e4ab7d0b3a996775407eb1334c6183db26129d7 100644
--- a/modules/video/multiple_object_tracking/jde_darknet53/tracker.py
+++ b/modules/video/multiple_object_tracking/jde_darknet53/tracker.py
@@ -16,18 +16,19 @@ import cv2
import glob
import paddle
import numpy as np
+import collections
-from ppdet.core.workspace import create
from ppdet.utils.checkpoint import load_weight, load_pretrain_weight
-from ppdet.modeling.mot.utils import Detection, get_crops, scale_coords, clip_box
-from ppdet.modeling.mot.utils import Timer, load_det_results
-from ppdet.modeling.mot import visualization as mot_vis
from ppdet.metrics import Metric, MOTMetric, KITTIMOTMetric
import ppdet.utils.stats as stats
from ppdet.engine.callbacks import Callback, ComposeCallback
+from ppdet.core.workspace import create
from ppdet.utils.logger import setup_logger
from .dataset import MOTVideoStream, MOTImageStream
+from .modeling.mot.utils import Detection, get_crops, scale_coords, clip_box
+from .modeling.mot import visualization as mot_vis
+from .utils import Timer
logger = setup_logger(__name__)
@@ -70,7 +71,6 @@ class StreamTracker(object):
timer.tic()
pred_dets, pred_embs = self.model(data)
online_targets = self.model.tracker.update(pred_dets, pred_embs)
-
online_tlwhs, online_ids = [], []
online_scores = []
for t in online_targets:
@@ -109,7 +109,6 @@ class StreamTracker(object):
with paddle.no_grad():
pred_dets, pred_embs = self.model(data)
online_targets = self.model.tracker.update(pred_dets, pred_embs)
-
online_tlwhs, online_ids = [], []
online_scores = []
for t in online_targets:
diff --git a/modules/video/multiple_object_tracking/jde_darknet53/utils.py b/modules/video/multiple_object_tracking/jde_darknet53/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..4426f217f9f5fb5c7afa6593c2b83ce4b67236f9
--- /dev/null
+++ b/modules/video/multiple_object_tracking/jde_darknet53/utils.py
@@ -0,0 +1,39 @@
+import time
+
+
+class Timer(object):
+ """
+ This class used to compute and print the current FPS while evaling.
+ """
+
+ def __init__(self):
+ self.total_time = 0.
+ self.calls = 0
+ self.start_time = 0.
+ self.diff = 0.
+ self.average_time = 0.
+ self.duration = 0.
+
+ def tic(self):
+ # using time.time instead of time.clock because time time.clock
+ # does not normalize for multithreading
+ self.start_time = time.time()
+
+ def toc(self, average=True):
+ self.diff = time.time() - self.start_time
+ self.total_time += self.diff
+ self.calls += 1
+ self.average_time = self.total_time / self.calls
+ if average:
+ self.duration = self.average_time
+ else:
+ self.duration = self.diff
+ return self.duration
+
+ def clear(self):
+ self.total_time = 0.
+ self.calls = 0
+ self.start_time = 0.
+ self.diff = 0.
+ self.average_time = 0.
+ self.duration = 0.
diff --git a/paddlehub/__init__.py b/paddlehub/__init__.py
index be47df84575f252fcc85dde16ba40479d2835310..f0dd1dbefe99ff11afe64c97e9f277a6c1bdbbbc 100644
--- a/paddlehub/__init__.py
+++ b/paddlehub/__init__.py
@@ -13,7 +13,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
-__version__ = '2.1.0'
+__version__ = 'develop'
import paddle
from packaging.version import Version
diff --git a/paddlehub/server/server.py b/paddlehub/server/server.py
index 64d8f61ca7177a810336f4b14cac67b662e15611..840549c24c62e3438b1fc37ffa4fe1c198f7a4dc 100644
--- a/paddlehub/server/server.py
+++ b/paddlehub/server/server.py
@@ -159,7 +159,7 @@ class CacheUpdater(threading.Thread):
if version:
payload['version'] = version
api_url = uri_path(hubconf.server, 'search')
- cache_path = os.path.join("~")
+ cache_path = os.path.join("~")
hub_name = cache_config.hub_name
if os.path.exists(cache_path):
extra = {"command": command, "mtime": os.stat(cache_path).st_mtime, "hub_name": hub_name}
diff --git a/paddlehub/utils/pypi.py b/paddlehub/utils/pypi.py
index 8f0f3c68c4a254138b0d647b81d484803735ede6..6a6d76535f25b13e46448e72eef7852c9d9f9654 100644
--- a/paddlehub/utils/pypi.py
+++ b/paddlehub/utils/pypi.py
@@ -15,10 +15,11 @@
import os
import subprocess
+import sys
from typing import IO
from paddlehub.utils.utils import Version
-from paddlehub.utils.io import discard_oe, typein
+from paddlehub.utils.io import discard_oe
def get_installed_packages() -> dict:
@@ -40,13 +41,14 @@ def check(package: str, version: str = '') -> bool:
return pdict[package].match(version)
-def install(package: str, version: str = '', upgrade: bool = False, ostream: IO = None, estream: IO = None) -> bool:
+def install(package: str, version: str = '', upgrade: bool = False, ostream: IO = sys.stdout,
+ estream: IO = sys.stderr) -> bool:
'''Install the python package.'''
package = package.replace(' ', '')
if version:
package = '{}=={}'.format(package, version)
- cmd = 'pip install "{}"'.format(package)
+ cmd = '{} -m pip install "{}"'.format(sys.executable, package)
if upgrade:
cmd += ' --upgrade'
@@ -59,9 +61,9 @@ def install(package: str, version: str = '', upgrade: bool = False, ostream: IO
return result == 0
-def install_from_file(file: str, ostream: IO = None, estream: IO = None) -> bool:
+def install_from_file(file: str, ostream: IO = sys.stdout, estream: IO = sys.stderr) -> bool:
'''Install the python package.'''
- cmd = 'pip install -r {}'.format(file)
+ cmd = '{} -m pip install -r {}'.format(sys.executable, file)
result, content = subprocess.getstatusoutput(cmd)
if result:
@@ -71,14 +73,13 @@ def install_from_file(file: str, ostream: IO = None, estream: IO = None) -> bool
return result == 0
-def uninstall(package: str, ostream: IO = None, estream: IO = None) -> bool:
+def uninstall(package: str, ostream: IO = sys.stdout, estream: IO = sys.stderr) -> bool:
'''Uninstall the python package.'''
- with typein('y'):
- # type in 'y' to confirm the uninstall operation
- cmd = 'pip uninstall {}'.format(package)
- result, content = subprocess.getstatusoutput(cmd)
- if result:
- estream.write(content)
- else:
- ostream.write(content)
+ # type in 'y' to confirm the uninstall operation
+ cmd = '{} -m pip uninstall {} -y'.format(sys.executable, package)
+ result, content = subprocess.getstatusoutput(cmd)
+ if result:
+ estream.write(content)
+ else:
+ ostream.write(content)
return result == 0
diff --git a/requirements.txt b/requirements.txt
index 3990fb4bd0547cd4001784e0d74bbe98995421ac..f95cfe689940dbd35f750b4267f2804d8a6fe53b 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -8,7 +8,7 @@ matplotlib
opencv-python
packaging
paddle2onnx >= 0.5.1
-paddlenlp >= 2.0.0rc5
+paddlenlp >= 2.0.0
Pillow
pyyaml
pyzmq