提交 cbe2fa8a 编写于 作者: A AK391

Merge branch 'release/v2.2' of https://github.com/PaddlePaddle/PaddleHub into...

Merge branch 'release/v2.2' of https://github.com/PaddlePaddle/PaddleHub into PaddlePaddle-release/v2.2
...@@ -30,3 +30,9 @@ ...@@ -30,3 +30,9 @@
- --show-source - --show-source
- --statistics - --statistics
files: \.py$ files: \.py$
- repo: https://github.com/asottile/reorder_python_imports
rev: v2.4.0
hooks:
- id: reorder-python-imports
exclude: (?=third_party).*(\.py)$
...@@ -4,7 +4,7 @@ English | [简体中文](README_ch.md) ...@@ -4,7 +4,7 @@ English | [简体中文](README_ch.md)
<img src="./docs/imgs/paddlehub_logo.jpg" align="middle"> <img src="./docs/imgs/paddlehub_logo.jpg" align="middle">
<p align="center"> <p align="center">
<div align="center"> <div align="center">
<h3> <a href=#QuickStart> QuickStart </a> | <a href="https://paddlehub.readthedocs.io/en/release-v2.1"> Tutorial </a> | <a href="https://www.paddlepaddle.org.cn/hublist"> Models List </a> | <a href="https://www.paddlepaddle.org.cn/hub"> Demos </a> </h3> <h3> <a href=#QuickStart> QuickStart </a> | <a href="https://paddlehub.readthedocs.io/en/release-v2.1"> Tutorial </a> | <a href="./modules"> Models List </a> | <a href="https://www.paddlepaddle.org.cn/hub"> Demos </a> </h3>
</div> </div>
------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------
...@@ -29,7 +29,7 @@ English | [简体中文](README_ch.md) ...@@ -29,7 +29,7 @@ English | [简体中文](README_ch.md)
## Introduction and Features ## Introduction and Features
- **PaddleHub** aims to provide developers with rich, high-quality, and directly usable pre-trained models. - **PaddleHub** aims to provide developers with rich, high-quality, and directly usable pre-trained models.
- **Abundant Pre-trained Models**: 300+ pre-trained models cover the 5 major categories, including Image, Text, Audio, Video, and Industrial application. All of them are free for download and offline usage. - **Abundant Pre-trained Models**: 360+ pre-trained models cover the 5 major categories, including Image, Text, Audio, Video, and Industrial application. All of them are free for download and offline usage.
- **No Need for Deep Learning Background**: you can use AI models quickly and enjoy the dividends of the artificial intelligence era. - **No Need for Deep Learning Background**: you can use AI models quickly and enjoy the dividends of the artificial intelligence era.
- **Quick Model Prediction**: model prediction can be realized through a few lines of scripts to quickly experience the model effect. - **Quick Model Prediction**: model prediction can be realized through a few lines of scripts to quickly experience the model effect.
- **Model As Service**: one-line command to build deep learning model API service deployment capabilities. - **Model As Service**: one-line command to build deep learning model API service deployment capabilities.
...@@ -38,6 +38,7 @@ English | [简体中文](README_ch.md) ...@@ -38,6 +38,7 @@ English | [简体中文](README_ch.md)
### Recent updates ### Recent updates
- **2022.02.18:** Added Huggingface Org, add spaces and models to the org: [PaddlePaddle Huggingface](https://huggingface.co/PaddlePaddle) - **2022.02.18:** Added Huggingface Org, add spaces and models to the org: [PaddlePaddle Huggingface](https://huggingface.co/PaddlePaddle)
- **2021.12.22**,The v2.2.0 version is released. [1]More than 100 new models released,including dialog, speech, segmentation, OCR, text processing, GANs, and many other categories. The total number of pre-trained models reaches [**【360】**](https://www.paddlepaddle.org.cn/hublist). [2]Add an [indexed file](./modules/README.md) including useful information of pretrained models supported by PaddleHub. [3]Refactor README of pretrained models.
- **2021.05.12:** Add an open-domain dialogue system, i.e., [plato-mini](https://www.paddlepaddle.org.cn/hubdetail?name=plato-mini&en_category=TextGeneration), to make it easy to build a chatbot in wechat with the help of the wechaty, [See Demo](https://github.com/KPatr1ck/paddlehub-wechaty-demo) - **2021.05.12:** Add an open-domain dialogue system, i.e., [plato-mini](https://www.paddlepaddle.org.cn/hubdetail?name=plato-mini&en_category=TextGeneration), to make it easy to build a chatbot in wechat with the help of the wechaty, [See Demo](https://github.com/KPatr1ck/paddlehub-wechaty-demo)
- **2021.04.27:** The v2.1.0 version is released. [1] Add supports for five new models, including two high-precision semantic segmentation models based on VOC dataset and three voice classification models. [2] Enforce the transfer learning capabilities for image semantic segmentation, text semantic matching and voice classification on related datasets. [3] Add the export function APIs for two kinds of model formats, i.,e, ONNX and PaddleInference. [4] Add the support for [BentoML](https://github.com/bentoml/BentoML/), which is a cloud native framework for serving deployment. Users can easily serve pre-trained models from PaddleHub by following the [Tutorial notebooks](https://github.com/PaddlePaddle/PaddleHub/blob/release/v2.1/demo/serving/bentoml/cloud-native-model-serving-with-bentoml.ipynb). Also, see this announcement and [Release note](https://github.com/bentoml/BentoML/releases/tag/v0.12.1) from BentoML. (Many thanks to @[parano](https://github.com/parano) @[cqvu](https://github.com/cqvu) @[deehrlic](https://github.com/deehrlic) for contributing this feature in PaddleHub). [5] The total number of pre-trained models reaches **【300】**. - **2021.04.27:** The v2.1.0 version is released. [1] Add supports for five new models, including two high-precision semantic segmentation models based on VOC dataset and three voice classification models. [2] Enforce the transfer learning capabilities for image semantic segmentation, text semantic matching and voice classification on related datasets. [3] Add the export function APIs for two kinds of model formats, i.,e, ONNX and PaddleInference. [4] Add the support for [BentoML](https://github.com/bentoml/BentoML/), which is a cloud native framework for serving deployment. Users can easily serve pre-trained models from PaddleHub by following the [Tutorial notebooks](https://github.com/PaddlePaddle/PaddleHub/blob/release/v2.1/demo/serving/bentoml/cloud-native-model-serving-with-bentoml.ipynb). Also, see this announcement and [Release note](https://github.com/bentoml/BentoML/releases/tag/v0.12.1) from BentoML. (Many thanks to @[parano](https://github.com/parano) @[cqvu](https://github.com/cqvu) @[deehrlic](https://github.com/deehrlic) for contributing this feature in PaddleHub). [5] The total number of pre-trained models reaches **【300】**.
- **2021.02.18:** The v2.0.0 version is released, making model development and debugging easier, and the finetune task is more flexible and easy to use.The ability to transfer learning for visual tasks is fully upgraded, supporting various tasks such as image classification, image coloring, and style transfer; Transformer models such as BERT, ERNIE, and RoBERTa are upgraded to dynamic graphs, supporting Fine-Tune capabilities for text classification and sequence labeling; Optimize the Serving capability, support multi-card prediction, automatic load balancing, and greatly improve performance; the new automatic data enhancement capability Auto Augment can efficiently search for data enhancement strategy combinations suitable for data sets. 61 new word vector models were added, including 51 Chinese models and 10 English models; add 4 image segmentation models, 2 depth models, 7 image generation models, and 3 text generation models, the total number of pre-trained models reaches **【274】**. - **2021.02.18:** The v2.0.0 version is released, making model development and debugging easier, and the finetune task is more flexible and easy to use.The ability to transfer learning for visual tasks is fully upgraded, supporting various tasks such as image classification, image coloring, and style transfer; Transformer models such as BERT, ERNIE, and RoBERTa are upgraded to dynamic graphs, supporting Fine-Tune capabilities for text classification and sequence labeling; Optimize the Serving capability, support multi-card prediction, automatic load balancing, and greatly improve performance; the new automatic data enhancement capability Auto Augment can efficiently search for data enhancement strategy combinations suitable for data sets. 61 new word vector models were added, including 51 Chinese models and 10 English models; add 4 image segmentation models, 2 depth models, 7 image generation models, and 3 text generation models, the total number of pre-trained models reaches **【274】**.
...@@ -46,8 +47,8 @@ English | [简体中文](README_ch.md) ...@@ -46,8 +47,8 @@ English | [简体中文](README_ch.md)
## Visualization Demo [[More]](./docs/docs_en/visualization.md) ## Visualization Demo [[More]](./docs/docs_en/visualization.md) [[ModelList]](./modules)
### **Computer Vision (161 models)** ### **[Computer Vision (212 models)](./modules#Image)**
<div align="center"> <div align="center">
<img src="./docs/imgs/Readme_Related/Image_all.gif" width = "530" height = "400" /> <img src="./docs/imgs/Readme_Related/Image_all.gif" width = "530" height = "400" />
</div> </div>
...@@ -55,7 +56,7 @@ English | [简体中文](README_ch.md) ...@@ -55,7 +56,7 @@ English | [简体中文](README_ch.md)
- Many thanks to CopyRight@[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)[PaddleGAN](https://github.com/PaddlePaddle/PaddleGAN)[AnimeGAN](https://github.com/TachibanaYoshino/AnimeGANv2)[openpose](https://github.com/CMU-Perceptual-Computing-Lab/openpose)[PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg)[Zhengxia Zou](https://github.com/jiupinjia/SkyAR)[PaddleClas](https://github.com/PaddlePaddle/PaddleClas) for the pre-trained models, you can try to train your models with them. - Many thanks to CopyRight@[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)[PaddleGAN](https://github.com/PaddlePaddle/PaddleGAN)[AnimeGAN](https://github.com/TachibanaYoshino/AnimeGANv2)[openpose](https://github.com/CMU-Perceptual-Computing-Lab/openpose)[PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg)[Zhengxia Zou](https://github.com/jiupinjia/SkyAR)[PaddleClas](https://github.com/PaddlePaddle/PaddleClas) for the pre-trained models, you can try to train your models with them.
### **Natural Language Processing (129 models)** ### **[Natural Language Processing (130 models)](./modules#Text)**
<div align="center"> <div align="center">
<img src="./docs/imgs/Readme_Related/Text_all.gif" width = "640" height = "240" /> <img src="./docs/imgs/Readme_Related/Text_all.gif" width = "640" height = "240" />
</div> </div>
...@@ -64,9 +65,37 @@ English | [简体中文](README_ch.md) ...@@ -64,9 +65,37 @@ English | [简体中文](README_ch.md)
### Speech (3 models) ### [Speech (15 models)](./modules#Audio)
- ASR speech recognition algorithm, multiple algorithms are available.
- The speech recognition effect is as follows:
<div align="center">
<table>
<thead>
<tr>
<th width=250> Input Audio </th>
<th width=550> Recognition Result </th>
</tr>
</thead>
<tbody>
<tr>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav" rel="nofollow">
<img align="center" src="./docs/imgs/Readme_Related/audio_icon.png" width=250 ></a><br>
</td>
<td >I knocked at the door on the ancient side of the building.</td>
</tr>
<tr>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav" rel="nofollow">
<img align="center" src="./docs/imgs/Readme_Related/audio_icon.png" width=250></a><br>
</td>
<td>我认为跑步最重要的就是给我带来了身体健康。</td>
</tr>
</tbody>
</table>
</div>
- TTS speech synthesis algorithm, multiple algorithms are available. - TTS speech synthesis algorithm, multiple algorithms are available.
- Many thanks to CopyRight@[Parakeet](https://github.com/PaddlePaddle/Parakeet) for the pre-trained models, you can try to train your models with Parakeet.
- Input: `Life was like a box of chocolates, you never know what you're gonna get.` - Input: `Life was like a box of chocolates, you never know what you're gonna get.`
- The synthesis effect is as follows: - The synthesis effect is as follows:
<div align="center"> <div align="center">
...@@ -97,7 +126,9 @@ English | [简体中文](README_ch.md) ...@@ -97,7 +126,9 @@ English | [简体中文](README_ch.md)
</table> </table>
</div> </div>
### Video (8 models) - Many thanks to CopyRight@[PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech) for the pre-trained models, you can try to train your models with PaddleSpeech.
### [Video (8 models)](./modules#Video)
- Short video classification trained via large-scale video datasets, supports 3000+ tag types prediction for short Form Videos. - Short video classification trained via large-scale video datasets, supports 3000+ tag types prediction for short Form Videos.
- Many thanks to CopyRight@[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo) for the pre-trained model, you can try to train your models with PaddleVideo. - Many thanks to CopyRight@[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo) for the pre-trained model, you can try to train your models with PaddleVideo.
- `Example: Input a short video of swimming, the algorithm can output the result of "swimming"` - `Example: Input a short video of swimming, the algorithm can output the result of "swimming"`
......
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
<img src="./docs/imgs/paddlehub_logo.jpg" align="middle"> <img src="./docs/imgs/paddlehub_logo.jpg" align="middle">
<p align="center"> <p align="center">
<div align="center"> <div align="center">
<h3> <a href=#QuickStart> 快速开始 </a> | <a href="https://paddlehub.readthedocs.io/zh_CN/release-v2.1//"> 教程文档 </a> | <a href="https://www.paddlepaddle.org.cn/hublist"> 模型搜索 </a> | <a href="https://www.paddlepaddle.org.cn/hub"> 演示Demo </a> <h3> <a href=#QuickStart> 快速开始 </a> | <a href="https://paddlehub.readthedocs.io/zh_CN/release-v2.1//"> 教程文档 </a> | <a href="./modules/README_ch.md"> 模型库 </a> | <a href="https://www.paddlepaddle.org.cn/hub"> 演示Demo </a>
</h3> </h3>
</div> </div>
...@@ -30,7 +30,7 @@ ...@@ -30,7 +30,7 @@
## 简介与特性 ## 简介与特性
- PaddleHub旨在为开发者提供丰富的、高质量的、直接可用的预训练模型 - PaddleHub旨在为开发者提供丰富的、高质量的、直接可用的预训练模型
- **【模型种类丰富】**: 涵盖CV、NLP、Audio、Video、工业应用主流五大品类的 350+ 预训练模型,全部开源下载,离线可运行 - **【模型种类丰富】**: 涵盖CV、NLP、Audio、Video、工业应用主流五大品类的 **360+** 预训练模型,全部开源下载,离线可运行
- **【超低使用门槛】**:无需深度学习背景、无需数据与训练过程,可快速使用AI模型 - **【超低使用门槛】**:无需深度学习背景、无需数据与训练过程,可快速使用AI模型
- **【一键模型快速预测】**:通过一行命令行或者极简的Python API实现模型调用,可快速体验模型效果 - **【一键模型快速预测】**:通过一行命令行或者极简的Python API实现模型调用,可快速体验模型效果
- **【一键模型转服务化】**:一行命令,搭建深度学习模型API服务化部署能力 - **【一键模型转服务化】**:一行命令,搭建深度学习模型API服务化部署能力
...@@ -38,6 +38,7 @@ ...@@ -38,6 +38,7 @@
- **【跨平台兼容性】**:可运行于Linux、Windows、MacOS等多种操作系统 - **【跨平台兼容性】**:可运行于Linux、Windows、MacOS等多种操作系统
## 近期更新 ## 近期更新
- **2021.12.22**,发布v2.2.0版本。【1】新增100+高质量模型,涵盖对话、语音处理、语义分割、文字识别、文本处理、图像生成等多个领域,预训练模型总量达到[**【360+】**](https://www.paddlepaddle.org.cn/hublist);【2】新增模型[检索列表](./modules/README_ch.md),包含模型名称、网络、数据集和使用场景等信息,快速定位用户所需的模型;【3】模型文档排版优化,呈现数据集、指标、模型大小等更多实用信息。
- **2021.05.12**,新增轻量级中文对话模型[plato-mini](https://www.paddlepaddle.org.cn/hubdetail?name=plato-mini&en_category=TextGeneration),可以配合使用wechaty实现微信闲聊机器人,[参考demo](https://github.com/KPatr1ck/paddlehub-wechaty-demo) - **2021.05.12**,新增轻量级中文对话模型[plato-mini](https://www.paddlepaddle.org.cn/hubdetail?name=plato-mini&en_category=TextGeneration),可以配合使用wechaty实现微信闲聊机器人,[参考demo](https://github.com/KPatr1ck/paddlehub-wechaty-demo)
- **2021.04.27**,发布v2.1.0版本。【1】新增基于VOC数据集的高精度语义分割模型2个,语音分类模型3个。【2】新增图像语义分割、文本语义匹配、语音分类等相关任务的Fine-Tune能力以及相关任务数据集;完善部署能力:【3】新增ONNX和PaddleInference等模型格式的导出功能。【4】新增[BentoML](https://github.com/bentoml/BentoML) 云原生服务化部署能力,可以支持统一的多框架模型管理和模型部署的工作流,[详细教程](https://github.com/PaddlePaddle/PaddleHub/blob/release/v2.1/demo/serving/bentoml/cloud-native-model-serving-with-bentoml.ipynb). 更多内容可以参考BentoML 最新 v0.12.1 [Releasenote](https://github.com/bentoml/BentoML/releases/tag/v0.12.1).(感谢@[parano](https://github.com/parano) @[cqvu](https://github.com/cqvu) @[deehrlic](https://github.com/deehrlic))的贡献与支持。【5】预训练模型总量达到[**【300】**](https://www.paddlepaddle.org.cn/hublist)个。 - **2021.04.27**,发布v2.1.0版本。【1】新增基于VOC数据集的高精度语义分割模型2个,语音分类模型3个。【2】新增图像语义分割、文本语义匹配、语音分类等相关任务的Fine-Tune能力以及相关任务数据集;完善部署能力:【3】新增ONNX和PaddleInference等模型格式的导出功能。【4】新增[BentoML](https://github.com/bentoml/BentoML) 云原生服务化部署能力,可以支持统一的多框架模型管理和模型部署的工作流,[详细教程](https://github.com/PaddlePaddle/PaddleHub/blob/release/v2.1/demo/serving/bentoml/cloud-native-model-serving-with-bentoml.ipynb). 更多内容可以参考BentoML 最新 v0.12.1 [Releasenote](https://github.com/bentoml/BentoML/releases/tag/v0.12.1).(感谢@[parano](https://github.com/parano) @[cqvu](https://github.com/cqvu) @[deehrlic](https://github.com/deehrlic))的贡献与支持。【5】预训练模型总量达到[**【300】**](https://www.paddlepaddle.org.cn/hublist)个。
- **2021.02.18**,发布v2.0.0版本,【1】模型开发调试更简单,finetune接口更加灵活易用。视觉类任务迁移学习能力全面升级,支持[图像分类](./demo/image_classification/README.md)[图像着色](./demo/colorization/README.md)[风格迁移](./demo/style_transfer/README.md)等多种任务;BERT、ERNIE、RoBERTa等Transformer类模型升级至动态图,支持[文本分类](./demo/text_classification/README.md)[序列标注](./demo/sequence_labeling/README.md)的Fine-Tune能力;【2】优化服务化部署Serving能力,支持多卡预测、自动负载均衡,性能大幅度提升;【3】新增自动数据增强能力[Auto Augment](./demo/autoaug/README.md),能高效地搜索适合数据集的数据增强策略组合。【4】新增[词向量模型](./modules/text/embedding)61个,其中包含中文模型51个,英文模型10个;新增[图像分割](./modules/thirdparty/image/semantic_segmentation)模型4个、[深度模型](./modules/thirdparty/image/depth_estimation)2个、[图像生成](./modules/thirdparty/image/Image_gan/style_transfer)模型7个、[文本生成](./modules/thirdparty/text/text_generation)模型3个。【5】预训练模型总量达到[**【274】**](https://www.paddlepaddle.org.cn/hublist) 个。 - **2021.02.18**,发布v2.0.0版本,【1】模型开发调试更简单,finetune接口更加灵活易用。视觉类任务迁移学习能力全面升级,支持[图像分类](./demo/image_classification/README.md)[图像着色](./demo/colorization/README.md)[风格迁移](./demo/style_transfer/README.md)等多种任务;BERT、ERNIE、RoBERTa等Transformer类模型升级至动态图,支持[文本分类](./demo/text_classification/README.md)[序列标注](./demo/sequence_labeling/README.md)的Fine-Tune能力;【2】优化服务化部署Serving能力,支持多卡预测、自动负载均衡,性能大幅度提升;【3】新增自动数据增强能力[Auto Augment](./demo/autoaug/README.md),能高效地搜索适合数据集的数据增强策略组合。【4】新增[词向量模型](./modules/text/embedding)61个,其中包含中文模型51个,英文模型10个;新增[图像分割](./modules/thirdparty/image/semantic_segmentation)模型4个、[深度模型](./modules/thirdparty/image/depth_estimation)2个、[图像生成](./modules/thirdparty/image/Image_gan/style_transfer)模型7个、[文本生成](./modules/thirdparty/text/text_generation)模型3个。【5】预训练模型总量达到[**【274】**](https://www.paddlepaddle.org.cn/hublist) 个。
...@@ -47,9 +48,9 @@ ...@@ -47,9 +48,9 @@
## **精品模型效果展示[【更多】](./docs/docs_ch/visualization.md)** ## **精品模型效果展示[【更多】](./docs/docs_ch/visualization.md)[【模型库】](./modules/README_ch.md)**
### **图像类(161个)** ### **[图像类(212个)](./modules/README_ch.md#图像)**
- 包括图像分类、人脸检测、口罩检测、车辆检测、人脸/人体/手部关键点检测、人像分割、80+语言文本识别、图像超分/上色/动漫化等 - 包括图像分类、人脸检测、口罩检测、车辆检测、人脸/人体/手部关键点检测、人像分割、80+语言文本识别、图像超分/上色/动漫化等
<div align="center"> <div align="center">
<img src="./docs/imgs/Readme_Related/Image_all.gif" width = "530" height = "400" /> <img src="./docs/imgs/Readme_Related/Image_all.gif" width = "530" height = "400" />
...@@ -58,7 +59,7 @@ ...@@ -58,7 +59,7 @@
- 感谢CopyRight@[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)[PaddleGAN](https://github.com/PaddlePaddle/PaddleGAN)[AnimeGAN](https://github.com/TachibanaYoshino/AnimeGANv2)[openpose](https://github.com/CMU-Perceptual-Computing-Lab/openpose)[PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg)[Zhengxia Zou](https://github.com/jiupinjia/SkyAR)[PaddleClas](https://github.com/PaddlePaddle/PaddleClas) 提供相关预训练模型,训练能力开放,欢迎体验。 - 感谢CopyRight@[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)[PaddleGAN](https://github.com/PaddlePaddle/PaddleGAN)[AnimeGAN](https://github.com/TachibanaYoshino/AnimeGANv2)[openpose](https://github.com/CMU-Perceptual-Computing-Lab/openpose)[PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg)[Zhengxia Zou](https://github.com/jiupinjia/SkyAR)[PaddleClas](https://github.com/PaddlePaddle/PaddleClas) 提供相关预训练模型,训练能力开放,欢迎体验。
### **文本类(129个)** ### **[文本类(130个)](./modules/README_ch.md#文本)**
- 包括中文分词、词性标注与命名实体识别、句法分析、AI写诗/对联/情话/藏头诗、中文的评论情感分析、中文色情文本审核等 - 包括中文分词、词性标注与命名实体识别、句法分析、AI写诗/对联/情话/藏头诗、中文的评论情感分析、中文色情文本审核等
<div align="center"> <div align="center">
<img src="./docs/imgs/Readme_Related/Text_all.gif" width = "640" height = "240" /> <img src="./docs/imgs/Readme_Related/Text_all.gif" width = "640" height = "240" />
...@@ -67,9 +68,37 @@ ...@@ -67,9 +68,37 @@
- 感谢CopyRight@[ERNIE](https://github.com/PaddlePaddle/ERNIE)[LAC](https://github.com/baidu/LAC)[DDParser](https://github.com/baidu/DDParser)提供相关预训练模型,训练能力开放,欢迎体验。 - 感谢CopyRight@[ERNIE](https://github.com/PaddlePaddle/ERNIE)[LAC](https://github.com/baidu/LAC)[DDParser](https://github.com/baidu/DDParser)提供相关预训练模型,训练能力开放,欢迎体验。
### **语音类(3个)** ### **[语音类(15个)](./modules/README_ch.md#语音)**
- ASR语音识别算法,多种算法可选
- 语音识别效果如下:
<div align="center">
<table>
<thead>
<tr>
<th width=250> Input Audio </th>
<th width=550> Recognition Result </th>
</tr>
</thead>
<tbody>
<tr>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav" rel="nofollow">
<img align="center" src="./docs/imgs/Readme_Related/audio_icon.png" width=250 ></a><br>
</td>
<td >I knocked at the door on the ancient side of the building.</td>
</tr>
<tr>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav" rel="nofollow">
<img align="center" src="./docs/imgs/Readme_Related/audio_icon.png" width=250></a><br>
</td>
<td>我认为跑步最重要的就是给我带来了身体健康。</td>
</tr>
</tbody>
</table>
</div>
- TTS语音合成算法,多种算法可选 - TTS语音合成算法,多种算法可选
- 感谢CopyRight@[Parakeet](https://github.com/PaddlePaddle/Parakeet)提供预训练模型,训练能力开放,欢迎体验。
- 输入:`Life was like a box of chocolates, you never know what you're gonna get.` - 输入:`Life was like a box of chocolates, you never know what you're gonna get.`
- 合成效果如下: - 合成效果如下:
<div align="center"> <div align="center">
...@@ -100,7 +129,9 @@ ...@@ -100,7 +129,9 @@
</table> </table>
</div> </div>
### **视频类(8个)** - 感谢CopyRight@[PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech)提供预训练模型,训练能力开放,欢迎体验。
### **[视频类(8个)](./modules/README_ch.md#视频)**
- 包含短视频分类,支持3000+标签种类,可输出TOP-K标签,多种算法可选。 - 包含短视频分类,支持3000+标签种类,可输出TOP-K标签,多种算法可选。
- 感谢CopyRight@[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo)提供预训练模型,训练能力开放,欢迎体验。 - 感谢CopyRight@[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo)提供预训练模型,训练能力开放,欢迎体验。
- `举例:输入一段游泳的短视频,算法可以输出"游泳"结果` - `举例:输入一段游泳的短视频,算法可以输出"游泳"结果`
......
...@@ -8,6 +8,18 @@ ...@@ -8,6 +8,18 @@
$ hub run resnet50_vd_imagenet_ssld --input_path "/PATH/TO/IMAGE" --top_k 5 $ hub run resnet50_vd_imagenet_ssld --input_path "/PATH/TO/IMAGE" --top_k 5
``` ```
## 脚本预测
```python
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='resnet50_vd_imagenet_ssld',)
result = model.predict([PATH/TO/IMAGE])
```
## 如何开始Fine-tune ## 如何开始Fine-tune
在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用resnet50_vd_imagenet_ssld对[Flowers](../../docs/reference/datasets.md#class-hubdatasetsflowers)等数据集进行Fine-tune。 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用resnet50_vd_imagenet_ssld对[Flowers](../../docs/reference/datasets.md#class-hubdatasetsflowers)等数据集进行Fine-tune。
......
...@@ -91,10 +91,12 @@ train_dataset = hub.datasets.MSRA_NER( ...@@ -91,10 +91,12 @@ train_dataset = hub.datasets.MSRA_NER(
tokenizer=model.get_tokenizer(), max_seq_len=128, mode='train') tokenizer=model.get_tokenizer(), max_seq_len=128, mode='train')
dev_dataset = hub.datasets.MSRA_NER( dev_dataset = hub.datasets.MSRA_NER(
tokenizer=model.get_tokenizer(), max_seq_len=128, mode='dev') tokenizer=model.get_tokenizer(), max_seq_len=128, mode='dev')
test_dataset = hub.datasets.MSRA_NER(
tokenizer=model.get_tokenizer(), max_seq_len=128, mode='test')
``` ```
* `tokenizer`:表示该module所需用到的tokenizer,其将对输入文本完成切词,并转化成module运行所需模型输入格式。 * `tokenizer`:表示该module所需用到的tokenizer,其将对输入文本完成切词,并转化成module运行所需模型输入格式。
* `mode`:选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train` * `mode`:选择数据模式,可选项有 `train`, `test`, `dev`, 默认为`train`
* `max_seq_len`:ERNIE/BERT模型使用的最大序列长度,若出现显存不足,请适当调低这一参数。 * `max_seq_len`:ERNIE/BERT模型使用的最大序列长度,若出现显存不足,请适当调低这一参数。
预训练模型ERNIE对中文数据的处理是以字为单位,tokenizer作用为将原始输入文本转化成模型model可以接受的输入数据形式。 PaddleHub 2.0中的各种预训练模型已经内置了相应的tokenizer,可以通过`model.get_tokenizer`方法获取。 预训练模型ERNIE对中文数据的处理是以字为单位,tokenizer作用为将原始输入文本转化成模型model可以接受的输入数据形式。 PaddleHub 2.0中的各种预训练模型已经内置了相应的tokenizer,可以通过`model.get_tokenizer`方法获取。
...@@ -106,7 +108,7 @@ dev_dataset = hub.datasets.MSRA_NER( ...@@ -106,7 +108,7 @@ dev_dataset = hub.datasets.MSRA_NER(
```python ```python
optimizer = paddle.optimizer.AdamW(learning_rate=5e-5, parameters=model.parameters()) optimizer = paddle.optimizer.AdamW(learning_rate=5e-5, parameters=model.parameters())
trainer = hub.Trainer(model, optimizer, checkpoint_dir='test_ernie_token_cls', use_gpu=False) trainer = hub.Trainer(model, optimizer, checkpoint_dir='test_ernie_token_cls', use_gpu=True)
trainer.train(train_dataset, epochs=3, batch_size=32, eval_dataset=dev_dataset) trainer.train(train_dataset, epochs=3, batch_size=32, eval_dataset=dev_dataset)
......
...@@ -8,6 +8,17 @@ ...@@ -8,6 +8,17 @@
$ hub run msgnet --input_path "/PATH/TO/ORIGIN/IMAGE" --style_path "/PATH/TO/STYLE/IMAGE" $ hub run msgnet --input_path "/PATH/TO/ORIGIN/IMAGE" --style_path "/PATH/TO/STYLE/IMAGE"
``` ```
## 脚本预测
```python
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='msgnet')
result = model.predict(origin=["venice-boat.jpg"], style="candy.jpg", visualization=True, save_path ='style_tranfer')
```
## 如何开始Fine-tune ## 如何开始Fine-tune
在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用msgnet模型对[MiniCOCO](../../docs/reference/datasets.md#class-hubdatasetsMiniCOCO)等数据集进行Fine-tune。 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用msgnet模型对[MiniCOCO](../../docs/reference/datasets.md#class-hubdatasetsMiniCOCO)等数据集进行Fine-tune。
......
...@@ -80,10 +80,12 @@ train_dataset = hub.datasets.ChnSentiCorp( ...@@ -80,10 +80,12 @@ train_dataset = hub.datasets.ChnSentiCorp(
tokenizer=model.get_tokenizer(), max_seq_len=128, mode='train') tokenizer=model.get_tokenizer(), max_seq_len=128, mode='train')
dev_dataset = hub.datasets.ChnSentiCorp( dev_dataset = hub.datasets.ChnSentiCorp(
tokenizer=model.get_tokenizer(), max_seq_len=128, mode='dev') tokenizer=model.get_tokenizer(), max_seq_len=128, mode='dev')
test_dataset = hub.datasets.ChnSentiCorp(
tokenizer=model.get_tokenizer(), max_seq_len=128, mode='test')
``` ```
* `tokenizer`:表示该module所需用到的tokenizer,其将对输入文本完成切词,并转化成module运行所需模型输入格式。 * `tokenizer`:表示该module所需用到的tokenizer,其将对输入文本完成切词,并转化成module运行所需模型输入格式。
* `mode`:选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train` * `mode`:选择数据模式,可选项有 `train`, `test`, `dev`, 默认为`train`
* `max_seq_len`:ERNIE/BERT模型使用的最大序列长度,若出现显存不足,请适当调低这一参数。 * `max_seq_len`:ERNIE/BERT模型使用的最大序列长度,若出现显存不足,请适当调低这一参数。
预训练模型ERNIE对中文数据的处理是以字为单位,tokenizer作用为将原始输入文本转化成模型model可以接受的输入数据形式。 PaddleHub 2.0中的各种预训练模型已经内置了相应的tokenizer,可以通过`model.get_tokenizer`方法获取。 预训练模型ERNIE对中文数据的处理是以字为单位,tokenizer作用为将原始输入文本转化成模型model可以接受的输入数据形式。 PaddleHub 2.0中的各种预训练模型已经内置了相应的tokenizer,可以通过`model.get_tokenizer`方法获取。
...@@ -95,7 +97,7 @@ dev_dataset = hub.datasets.ChnSentiCorp( ...@@ -95,7 +97,7 @@ dev_dataset = hub.datasets.ChnSentiCorp(
```python ```python
optimizer = paddle.optimizer.Adam(learning_rate=5e-5, parameters=model.parameters()) optimizer = paddle.optimizer.Adam(learning_rate=5e-5, parameters=model.parameters())
trainer = hub.Trainer(model, optimizer, checkpoint_dir='test_ernie_text_cls') trainer = hub.Trainer(model, optimizer, checkpoint_dir='test_ernie_text_cls', use_gpu=True)
trainer.train(train_dataset, epochs=3, batch_size=32, eval_dataset=dev_dataset) trainer.train(train_dataset, epochs=3, batch_size=32, eval_dataset=dev_dataset)
......
此差异已折叠。
此差异已折叠。
# deepspeech2_aishell
|模型名称|deepspeech2_aishell|
| :--- | :---: |
|类别|语音-语音识别|
|网络|DeepSpeech2|
|数据集|AISHELL-1|
|是否支持Fine-tuning|否|
|模型大小|306MB|
|最新更新日期|2021-10-20|
|数据指标|中文CER 0.065|
## 一、模型基本信息
### 模型介绍
DeepSpeech2是百度于2015年提出的适用于英文和中文的end-to-end语音识别模型。deepspeech2_aishell使用了DeepSpeech2离线模型的结构,模型主要由2层卷积网络和3层GRU组成,并在中文普通话开源语音数据集[AISHELL-1](http://www.aishelltech.com/kysjcp)进行了预训练,该模型在其测试集上的CER指标是0.065。
<p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/DeepSpeech/Hub/docs/images/ds2offlineModel.png" hspace='10'/> <br />
</p>
更多详情请参考[Deep Speech 2: End-to-End Speech Recognition in English and Mandarin](https://arxiv.org/abs/1512.02595)
## 二、安装
- ### 1、系统依赖
- libsndfile, swig >= 3.0
- Linux
```shell
$ sudo apt-get install libsndfile swig
or
$ sudo yum install libsndfile swig
```
- MacOs
```
$ brew install libsndfile swig
```
- ### 2、环境依赖
- swig_decoder:
```
git clone https://github.com/PaddlePaddle/DeepSpeech.git && cd DeepSpeech && git reset --hard b53171694e7b87abe7ea96870b2f4d8e0e2b1485 && cd deepspeech/decoders/ctcdecoder/swig && sh setup.sh
```
- paddlepaddle >= 2.1.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 3、安装
- ```shell
$ hub install deepspeech2_aishell
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
```python
import paddlehub as hub
# 采样率为16k,格式为wav的中文语音音频
wav_file = '/PATH/TO/AUDIO'
model = hub.Module(
name='deepspeech2_aishell',
version='1.0.0')
text = model.speech_recognize(wav_file)
print(text)
```
- ### 2、API
- ```python
def check_audio(audio_file)
```
- 检查输入音频格式和采样率是否满足为16000
- **参数**
- `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
- ```python
def speech_recognize(
audio_file,
device='cpu',
)
```
- 将输入的音频识别成文字
- **参数**
- `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
- `device`:预测时使用的设备,默认为`cpu`,如需使用gpu预测,请设置为`gpu`。
- **返回**
- `text`:str类型,返回输入音频的识别文字结果。
## 四、服务部署
- PaddleHub Serving可以部署一个在线的语音识别服务。
- ### 第一步:启动PaddleHub Serving
- ```shell
$ hub serving start -m deepspeech2_aishell
```
- 这样就完成了一个语音识别服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
# 需要识别的音频的存放路径,确保部署服务的机器可访问
file = '/path/to/input.wav'
# 以key的方式指定text传入预测方法的时的参数,此例中为"audio_file"
data = {"audio_file": file}
# 发送post请求,content-type类型应指定json方式,url中的ip地址需改为对应机器的ip
url = "http://127.0.0.1:8866/predict/deepspeech2_aishell"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 五、更新历史
* 1.0.0
初始发布
```shell
$ hub install deepspeech2_aishell
```
# https://yaml.org/type/float.html
data:
train_manifest: data/manifest.train
dev_manifest: data/manifest.dev
test_manifest: data/manifest.test
min_input_len: 0.0
max_input_len: 27.0 # second
min_output_len: 0.0
max_output_len: .inf
min_output_input_ratio: 0.00
max_output_input_ratio: .inf
collator:
batch_size: 64 # one gpu
mean_std_filepath: data/mean_std.json
unit_type: char
vocab_filepath: data/vocab.txt
augmentation_config: conf/augmentation.json
random_seed: 0
spm_model_prefix:
spectrum_type: linear
feat_dim:
delta_delta: False
stride_ms: 10.0
window_ms: 20.0
n_fft: None
max_freq: None
target_sample_rate: 16000
use_dB_normalization: True
target_dB: -20
dither: 1.0
keep_transcription_text: False
sortagrad: True
shuffle_method: batch_shuffle
num_workers: 2
model:
num_conv_layers: 2
num_rnn_layers: 3
rnn_layer_size: 1024
use_gru: True
share_rnn_weights: False
blank_id: 0
ctc_grad_norm_type: instance
training:
n_epoch: 80
accum_grad: 1
lr: 2e-3
lr_decay: 0.83
weight_decay: 1e-06
global_grad_clip: 3.0
log_interval: 100
checkpoint:
kbest_n: 50
latest_n: 5
decoding:
batch_size: 128
error_rate_type: cer
decoding_method: ctc_beam_search
lang_model_path: data/lm/zh_giga.no_cna_cmn.prune01244.klm
alpha: 1.9
beta: 5.0
beam_size: 300
cutoff_prob: 0.99
cutoff_top_n: 40
num_proc_bsearch: 10
{"mean_stat": [-13505966.65209869, -12778154.889588555, -13487728.30750011, -12897344.94123812, -12472281.490772562, -12631566.475106332, -13391790.349327326, -14045382.570026815, -14159320.465516506, -14273422.438486755, -14639805.161347123, -15145380.07768254, -15612893.133258691, -15938542.05012206, -16115293.502621327, -16188225.698757892, -16317206.280373082, -16500598.476283036, -16671564.297937019, -16804599.860397574, -16916423.142814968, -17011785.59439087, -17075067.62262626, -17154580.16740178, -17257812.961825978, -17355683.228599995, -17441455.258318607, -17473199.925130684, -17488835.5763828, -17491232.15414511, -17485000.29006962, -17499471.646940477, -17551398.97122984, -17641732.10682403, -17757209.077974595, -17843801.500521667, -17935647.58641936, -18020362.347413756, -18117633.806080323, -18232427.58935143, -18316024.35215119, -18378789.145393644, -18421147.25807373, -18445805.18294822, -18460946.27810118, -18467914.04034822, -18469404.319909714, -18469606.974339806, -18470754.294192698, -18458320.91921723, -18441354.111811973, -18428332.216321833, -18422281.413955193, -18433421.585668042, -18460521.025954794, -18494800.856363494, -18539532.288011573, -18583823.79899225, -18614474.56256926, -18646872.180154275, -18661137.85367877, -18673590.719379324, -18702967.62040798, -18736434.748098046, -18777912.13098326, -18794675.486509323, -18837225.856196072, -18874872.796128694, -18927340.44407057, -18994929.076545004, -19060701.164406348, -19118006.18996682, -19175792.05766062, -19230755.996405277, -19270174.594219487, -19334788.35904946, -19401456.988906194, -19484580.095938426, -19582040.4715673, -19696598.86662636, -19810401.513227757, -19931755.37941177, -20021867.47620737, -20082298.984455004, -20114708.336475413, -20143802.72793865, -20146821.988139726, -20165613.317683898, -20189938.602584295, -20220059.08673595, -20242848.528134122, -20250859.979931064, -20267382.93048284, -20267964.544716164, -20261372.89563879, -20252878.74023849, -20247550.771284755, -20231778.31093504, -20231376.103159923, -20236926.52293088, -20248068.41488535, -20255076.901920393, -20262924.167151034, -20263926.583205637, -20263790.273742784, -20268560.080967404, -20268997.150654405, -20269810.816284582, -20267771.864327505, -20256472.703380838, -20241790.559690386, -20241865.794732895, -20244924.716114976, -20249736.631184842, -20257257.816903576, -20268027.212145977, -20277399.95533857, -20281840.8112546, -20270512.52002465, -20255938.63066214, -20242421.685443826, -20241986.654626504, -20237836.034444932, -20231458.31132546, -20218092.819713395, -20204994.19634715, -20198880.142133974, -20197376.49014031, -20198117.60450857, -20197443.473929476, -20191142.03632657, -20174428.452719454, -20159204.32090646, -20137981.294740904, -20124944.79897834, -20112774.604521394, -20109389.248600915, -20115248.61302806, -20117743.853294585, -20123076.93515528, -20132224.95454374, -20147099.26793121, -20169581.367630124, -20190957.518733896, -20215197.057997894, -20242033.589256056, -20282032.217160087, -20316778.653784916, -20360354.215504933, -20425089.908502825, -20534553.0465662, -20737928.349233944, -21091705.14104186, -21646013.197923105, -22403182.076235127, -23313516.63322832, -24244679.879594248, -25027534.00417361, -25502455.708560493, -25665136.744125813, -26602318.88405537], "var_stat": [209924783.1093623, 185218712.4577822, 209991180.89829063, 196198511.40798286, 186098265.7827955, 191905798.58923203, 214281935.29191792, 235042114.51049897, 240179456.24597096, 244657890.3963041, 256099586.32657292, 271849135.9872555, 287174069.13527167, 298171137.28863454, 304112589.91933817, 306553976.2206335, 310813670.30674237, 316958840.3099824, 322651440.3639528, 327213725.196089, 331252123.26114285, 334856188.3081607, 337217897.6545214, 340385427.82557064, 344400488.5633641, 348086880.08086526, 351349070.53148264, 352648076.18415344, 353409462.33704513, 353598061.4967693, 353405322.74993587, 353917215.6834277, 355784796.898883, 359222461.3224974, 363671441.7428676, 366908651.69908494, 370304677.0615045, 373477194.79721, 377174088.9808273, 381531608.6574547, 384703574.426059, 387104126.9474883, 388723211.11308575, 389687817.27351815, 390351031.4418706, 390659006.3690262, 390704649.89417714, 390702370.1919126, 390731862.59274197, 390216004.4126628, 389516083.054853, 389017745.636457, 388788872.1127645, 389269311.2239042, 390401819.5968815, 391842612.97859454, 393708801.05223197, 395569598.4694, 396868892.67152405, 398210915.02133286, 398743299.4753882, 399330344.88417244, 400565940.1325846, 401901693.4656316, 403513855.43933284, 404103248.96526104, 405986814.274556, 407507145.4104169, 409598353.6517908, 412453848.0248063, 415138273.0558441, 417479272.96907294, 419785633.3276395, 422003065.1681787, 423610264.8868346, 426260552.96545905, 428973536.3620236, 432368654.40899384, 436359561.5468266, 441119512.777527, 445884989.25794005, 451037422.65838546, 454872292.24179226, 457497136.8780015, 458904066.0675219, 460155836.4432799, 460272943.80738074, 461087498.6828549, 462144907.7850926, 463483598.81228757, 464530694.44478536, 464971538.85301507, 465771535.6019992, 465936698.93801653, 465741012.7287712, 465448625.0011534, 465296363.8603534, 464718299.2207512, 464720391.25778216, 465016640.5248736, 465564374.0248998, 465982788.8695927, 466425068.01245564, 466595649.90489674, 466707658.8296169, 467015570.78026086, 467099213.08769494, 467201640.15951264, 467163862.3709329, 466727597.56313753, 466174871.71213347, 466255498.45248336, 466439062.65458614, 466693130.99620277, 467068587.1422199, 467536070.1402474, 467955819.1549621, 468187227.1069643, 467742976.2778335, 467159585.250493, 466592359.52916145, 466583195.8099961, 466424348.9572719, 466155323.6074322, 465569620.1801811, 465021642.5158305, 464757658.6383867, 464713882.60103834, 464724239.2941314, 464679163.728191, 464407007.8705965, 463660736.0136739, 463001339.2385198, 462077058.47595775, 461505071.67199403, 460946277.95973784, 460816158.9197017, 461123589.268546, 461232998.1572812, 461445601.0442877, 461803238.28569543, 462436966.22005004, 463391404.7434971, 464299608.85523456, 465319405.3931429, 466432961.70208246, 468168080.3331244, 469640808.6809098, 471501539.22440934, 474301795.1694898, 479155711.93441755, 488314271.10405815, 504537056.23994666, 530509400.5201074, 566892036.4437443, 611792826.0442055, 658913502.9004005, 699716882.9169292, 725237302.8248898, 734259159.9571886, 789267050.8287783], "frame_num": 899422}
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Evaluation for DeepSpeech2 model."""
import os
import sys
from pathlib import Path
import paddle
from deepspeech.frontend.featurizer.text_featurizer import TextFeaturizer
from deepspeech.io.collator import SpeechCollator
from deepspeech.models.ds2 import DeepSpeech2Model
from deepspeech.utils import mp_tools
from deepspeech.utils.utility import UpdateConfig
class DeepSpeech2Tester:
def __init__(self, config):
self.config = config
self.collate_fn_test = SpeechCollator.from_config(config)
self._text_featurizer = TextFeaturizer(unit_type=config.collator.unit_type, vocab_filepath=None)
def compute_result_transcripts(self, audio, audio_len, vocab_list, cfg):
result_transcripts = self.model.decode(
audio,
audio_len,
vocab_list,
decoding_method=cfg.decoding_method,
lang_model_path=cfg.lang_model_path,
beam_alpha=cfg.alpha,
beam_beta=cfg.beta,
beam_size=cfg.beam_size,
cutoff_prob=cfg.cutoff_prob,
cutoff_top_n=cfg.cutoff_top_n,
num_processes=cfg.num_proc_bsearch)
#replace the '<space>' with ' '
result_transcripts = [self._text_featurizer.detokenize(sentence) for sentence in result_transcripts]
return result_transcripts
@mp_tools.rank_zero_only
@paddle.no_grad()
def test(self, audio_file):
self.model.eval()
cfg = self.config
collate_fn_test = self.collate_fn_test
audio, _ = collate_fn_test.process_utterance(audio_file=audio_file, transcript=" ")
audio_len = audio.shape[0]
audio = paddle.to_tensor(audio, dtype='float32')
audio_len = paddle.to_tensor(audio_len)
audio = paddle.unsqueeze(audio, axis=0)
vocab_list = collate_fn_test.vocab_list
result_transcripts = self.compute_result_transcripts(audio, audio_len, vocab_list, cfg.decoding)
return result_transcripts
def setup_model(self):
config = self.config.clone()
with UpdateConfig(config):
config.model.feat_size = self.collate_fn_test.feature_size
config.model.dict_size = self.collate_fn_test.vocab_size
model = DeepSpeech2Model.from_config(config.model)
self.model = model
def resume(self, checkpoint):
"""Resume from the checkpoint at checkpoints in the output
directory or load a specified checkpoint.
"""
model_dict = paddle.load(checkpoint)
self.model.set_state_dict(model_dict)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from pathlib import Path
import sys
import numpy as np
from paddlehub.env import MODULE_HOME
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddle.utils.download import get_path_from_url
try:
import swig_decoders
except ModuleNotFoundError as e:
logger.error(e)
logger.info('The module requires additional dependencies: swig_decoders. '
'please install via:\n\'git clone https://github.com/PaddlePaddle/DeepSpeech.git '
'&& cd DeepSpeech && git reset --hard b53171694e7b87abe7ea96870b2f4d8e0e2b1485 '
'&& cd deepspeech/decoders/ctcdecoder/swig && sh setup.sh\'')
sys.exit(1)
import paddle
import soundfile as sf
# TODO: Remove system path when deepspeech can be installed via pip.
sys.path.append(os.path.join(MODULE_HOME, 'deepspeech2_aishell'))
from deepspeech.exps.deepspeech2.config import get_cfg_defaults
from deepspeech.utils.utility import UpdateConfig
from .deepspeech_tester import DeepSpeech2Tester
LM_URL = 'https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm'
LM_MD5 = '29e02312deb2e59b3c8686c7966d4fe3'
@moduleinfo(name="deepspeech2_aishell", version="1.0.0", summary="", author="Baidu", author_email="", type="audio/asr")
class DeepSpeech2(paddle.nn.Layer):
def __init__(self):
super(DeepSpeech2, self).__init__()
# resource
res_dir = os.path.join(MODULE_HOME, 'deepspeech2_aishell', 'assets')
conf_file = os.path.join(res_dir, 'conf/deepspeech2.yaml')
checkpoint = os.path.join(res_dir, 'checkpoints/avg_1.pdparams')
# Download LM manually cause its large size.
lm_path = os.path.join(res_dir, 'data', 'lm')
lm_file = os.path.join(lm_path, LM_URL.split('/')[-1])
if not os.path.isfile(lm_file):
logger.info(f'Downloading lm from {LM_URL}.')
get_path_from_url(url=LM_URL, root_dir=lm_path, md5sum=LM_MD5)
# config
self.model_type = 'offline'
self.config = get_cfg_defaults(self.model_type)
self.config.merge_from_file(conf_file)
# TODO: Remove path updating snippet.
with UpdateConfig(self.config):
self.config.collator.mean_std_filepath = os.path.join(res_dir, self.config.collator.mean_std_filepath)
self.config.collator.vocab_filepath = os.path.join(res_dir, self.config.collator.vocab_filepath)
self.config.collator.augmentation_config = os.path.join(res_dir, self.config.collator.augmentation_config)
self.config.decoding.lang_model_path = os.path.join(res_dir, self.config.decoding.lang_model_path)
# model
self.tester = DeepSpeech2Tester(self.config)
self.tester.setup_model()
self.tester.resume(checkpoint)
@staticmethod
def check_audio(audio_file):
sig, sample_rate = sf.read(audio_file)
assert sample_rate == 16000, 'Excepting sample rate of input audio is 16000, but got {}'.format(sample_rate)
@serving
def speech_recognize(self, audio_file, device='cpu'):
assert os.path.isfile(audio_file), 'File not exists: {}'.format(audio_file)
self.check_audio(audio_file)
paddle.set_device(device)
return self.tester.test(audio_file)[0]
# system level: libsnd swig
loguru
yacs
jsonlines
scipy==1.2.1
sentencepiece
resampy==0.2.2
SoundFile==0.9.0.post1
soxbindings
kaldiio
typeguard
editdistance
# deepspeech2_librispeech
|模型名称|deepspeech2_librispeech|
| :--- | :---: |
|类别|语音-语音识别|
|网络|DeepSpeech2|
|数据集|LibriSpeech|
|是否支持Fine-tuning|否|
|模型大小|518MB|
|最新更新日期|2021-10-20|
|数据指标|英文WER 0.072|
## 一、模型基本信息
### 模型介绍
DeepSpeech2是百度于2015年提出的适用于英文和中文的end-to-end语音识别模型。deepspeech2_librispeech使用了DeepSpeech2离线模型的结构,模型主要由2层卷积网络和3层GRU组成,并在英文开源语音数据集[LibriSpeech ASR corpus](http://www.openslr.org/12/)进行了预训练,该模型在其测试集上的WER指标是0.072。
<p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/DeepSpeech/Hub/docs/images/ds2offlineModel.png" hspace='10'/> <br />
</p>
更多详情请参考[Deep Speech 2: End-to-End Speech Recognition in English and Mandarin](https://arxiv.org/abs/1512.02595)
## 二、安装
- ### 1、系统依赖
- libsndfile, swig >= 3.0
- Linux
```shell
$ sudo apt-get install libsndfile swig
or
$ sudo yum install libsndfile swig
```
- MacOs
```
$ brew install libsndfile swig
```
- ### 2、环境依赖
- swig_decoder:
```
git clone https://github.com/paddlepaddle/deepspeech && cd DeepSpeech && git reset --hard b53171694e7b87abe7ea96870b2f4d8e0e2b1485 && cd deepspeech/decoders/ctcdecoder/swig && sh setup.sh
```
- paddlepaddle >= 2.1.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 3、安装
- ```shell
$ hub install deepspeech2_librispeech
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
```python
import paddlehub as hub
# 采样率为16k,格式为wav的英文语音音频
wav_file = '/PATH/TO/AUDIO'
model = hub.Module(
name='deepspeech2_librispeech',
version='1.0.0')
text = model.speech_recognize(wav_file)
print(text)
```
- ### 2、API
- ```python
def check_audio(audio_file)
```
- 检查输入音频格式和采样率是否满足为16000
- **参数**
- `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
- ```python
def speech_recognize(
audio_file,
device='cpu',
)
```
- 将输入的音频识别成文字
- **参数**
- `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
- `device`:预测时使用的设备,默认为`cpu`,如需使用gpu预测,请设置为`gpu`。
- **返回**
- `text`:str类型,返回输入音频的识别文字结果。
## 四、服务部署
- PaddleHub Serving可以部署一个在线的语音识别服务。
- ### 第一步:启动PaddleHub Serving
- ```shell
$ hub serving start -m deepspeech2_librispeech
```
- 这样就完成了一个语音识别服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
# 需要识别的音频的存放路径,确保部署服务的机器可访问
file = '/path/to/input.wav'
# 以key的方式指定text传入预测方法的时的参数,此例中为"audio_file"
data = {"audio_file": file}
# 发送post请求,content-type类型应指定json方式,url中的ip地址需改为对应机器的ip
url = "http://127.0.0.1:8866/predict/deepspeech2_librispeech"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 五、更新历史
* 1.0.0
初始发布
```shell
$ hub install deepspeech2_librispeech
```
# https://yaml.org/type/float.html
data:
train_manifest: data/manifest.train
dev_manifest: data/manifest.dev-clean
test_manifest: data/manifest.test-clean
min_input_len: 0.0
max_input_len: 30.0 # second
min_output_len: 0.0
max_output_len: .inf
min_output_input_ratio: 0.00
max_output_input_ratio: .inf
collator:
batch_size: 20
mean_std_filepath: data/mean_std.json
unit_type: char
vocab_filepath: data/vocab.txt
augmentation_config: conf/augmentation.json
random_seed: 0
spm_model_prefix:
spectrum_type: linear
target_sample_rate: 16000
max_freq: None
n_fft: None
stride_ms: 10.0
window_ms: 20.0
delta_delta: False
dither: 1.0
use_dB_normalization: True
target_dB: -20
random_seed: 0
keep_transcription_text: False
sortagrad: True
shuffle_method: batch_shuffle
num_workers: 2
model:
num_conv_layers: 2
num_rnn_layers: 3
rnn_layer_size: 2048
use_gru: False
share_rnn_weights: True
blank_id: 0
ctc_grad_norm_type: instance
training:
n_epoch: 50
accum_grad: 1
lr: 1e-3
lr_decay: 0.83
weight_decay: 1e-06
global_grad_clip: 5.0
log_interval: 100
checkpoint:
kbest_n: 50
latest_n: 5
decoding:
batch_size: 128
error_rate_type: wer
decoding_method: ctc_beam_search
lang_model_path: data/lm/common_crawl_00.prune01111.trie.klm
alpha: 1.9
beta: 0.3
beam_size: 500
cutoff_prob: 1.0
cutoff_top_n: 40
num_proc_bsearch: 8
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Evaluation for DeepSpeech2 model."""
import os
import sys
from pathlib import Path
import paddle
from deepspeech.frontend.featurizer.text_featurizer import TextFeaturizer
from deepspeech.io.collator import SpeechCollator
from deepspeech.models.ds2 import DeepSpeech2Model
from deepspeech.utils import mp_tools
from deepspeech.utils.utility import UpdateConfig
class DeepSpeech2Tester:
def __init__(self, config):
self.config = config
self.collate_fn_test = SpeechCollator.from_config(config)
self._text_featurizer = TextFeaturizer(unit_type=config.collator.unit_type, vocab_filepath=None)
def compute_result_transcripts(self, audio, audio_len, vocab_list, cfg):
result_transcripts = self.model.decode(
audio,
audio_len,
vocab_list,
decoding_method=cfg.decoding_method,
lang_model_path=cfg.lang_model_path,
beam_alpha=cfg.alpha,
beam_beta=cfg.beta,
beam_size=cfg.beam_size,
cutoff_prob=cfg.cutoff_prob,
cutoff_top_n=cfg.cutoff_top_n,
num_processes=cfg.num_proc_bsearch)
#replace the '<space>' with ' '
result_transcripts = [self._text_featurizer.detokenize(sentence) for sentence in result_transcripts]
return result_transcripts
@mp_tools.rank_zero_only
@paddle.no_grad()
def test(self, audio_file):
self.model.eval()
cfg = self.config
collate_fn_test = self.collate_fn_test
audio, _ = collate_fn_test.process_utterance(audio_file=audio_file, transcript=" ")
audio_len = audio.shape[0]
audio = paddle.to_tensor(audio, dtype='float32')
audio_len = paddle.to_tensor(audio_len)
audio = paddle.unsqueeze(audio, axis=0)
vocab_list = collate_fn_test.vocab_list
result_transcripts = self.compute_result_transcripts(audio, audio_len, vocab_list, cfg.decoding)
return result_transcripts
def setup_model(self):
config = self.config.clone()
with UpdateConfig(config):
config.model.feat_size = self.collate_fn_test.feature_size
config.model.dict_size = self.collate_fn_test.vocab_size
model = DeepSpeech2Model.from_config(config.model)
self.model = model
def resume(self, checkpoint):
"""Resume from the checkpoint at checkpoints in the output
directory or load a specified checkpoint.
"""
model_dict = paddle.load(checkpoint)
self.model.set_state_dict(model_dict)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from pathlib import Path
import sys
import numpy as np
from paddlehub.env import MODULE_HOME
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddle.utils.download import get_path_from_url
try:
import swig_decoders
except ModuleNotFoundError as e:
logger.error(e)
logger.info('The module requires additional dependencies: swig_decoders. '
'please install via:\n\'git clone https://github.com/PaddlePaddle/DeepSpeech.git '
'&& cd DeepSpeech && git reset --hard b53171694e7b87abe7ea96870b2f4d8e0e2b1485 '
'&& cd deepspeech/decoders/ctcdecoder/swig && sh setup.sh\'')
sys.exit(1)
import paddle
import soundfile as sf
# TODO: Remove system path when deepspeech can be installed via pip.
sys.path.append(os.path.join(MODULE_HOME, 'deepspeech2_librispeech'))
from deepspeech.exps.deepspeech2.config import get_cfg_defaults
from deepspeech.utils.utility import UpdateConfig
from .deepspeech_tester import DeepSpeech2Tester
LM_URL = 'https://deepspeech.bj.bcebos.com/en_lm/common_crawl_00.prune01111.trie.klm'
LM_MD5 = '099a601759d467cd0a8523ff939819c5'
@moduleinfo(
name="deepspeech2_librispeech", version="1.0.0", summary="", author="Baidu", author_email="", type="audio/asr")
class DeepSpeech2(paddle.nn.Layer):
def __init__(self):
super(DeepSpeech2, self).__init__()
# resource
res_dir = os.path.join(MODULE_HOME, 'deepspeech2_librispeech', 'assets')
conf_file = os.path.join(res_dir, 'conf/deepspeech2.yaml')
checkpoint = os.path.join(res_dir, 'checkpoints/avg_1.pdparams')
# Download LM manually cause its large size.
lm_path = os.path.join(res_dir, 'data', 'lm')
lm_file = os.path.join(lm_path, LM_URL.split('/')[-1])
if not os.path.isfile(lm_file):
logger.info(f'Downloading lm from {LM_URL}.')
get_path_from_url(url=LM_URL, root_dir=lm_path, md5sum=LM_MD5)
# config
self.model_type = 'offline'
self.config = get_cfg_defaults(self.model_type)
self.config.merge_from_file(conf_file)
# TODO: Remove path updating snippet.
with UpdateConfig(self.config):
self.config.collator.mean_std_filepath = os.path.join(res_dir, self.config.collator.mean_std_filepath)
self.config.collator.vocab_filepath = os.path.join(res_dir, self.config.collator.vocab_filepath)
self.config.collator.augmentation_config = os.path.join(res_dir, self.config.collator.augmentation_config)
self.config.decoding.lang_model_path = os.path.join(res_dir, self.config.decoding.lang_model_path)
# model
self.tester = DeepSpeech2Tester(self.config)
self.tester.setup_model()
self.tester.resume(checkpoint)
@staticmethod
def check_audio(audio_file):
sig, sample_rate = sf.read(audio_file)
assert sample_rate == 16000, 'Excepting sample rate of input audio is 16000, but got {}'.format(sample_rate)
@serving
def speech_recognize(self, audio_file, device='cpu'):
assert os.path.isfile(audio_file), 'File not exists: {}'.format(audio_file)
self.check_audio(audio_file)
paddle.set_device(device)
return self.tester.test(audio_file)[0]
loguru
yacs
jsonlines
scipy==1.2.1
sentencepiece
resampy==0.2.2
SoundFile==0.9.0.post1
soxbindings
kaldiio
typeguard
editdistance
# u2_conformer_aishell
|模型名称|u2_conformer_aishell|
| :--- | :---: |
|类别|语音-语音识别|
|网络|Conformer|
|数据集|AISHELL-1|
|是否支持Fine-tuning|否|
|模型大小|284MB|
|最新更新日期|2021-11-01|
|数据指标|中文CER 0.055|
## 一、模型基本信息
### 模型介绍
U2 Conformer模型是一种适用于英文和中文的end-to-end语音识别模型。u2_conformer_aishell采用了conformer的encoder和transformer的decoder的模型结构,并且使用了ctc-prefix beam search的方式进行一遍打分,再利用attention decoder进行二次打分的方式进行解码来得到最终结果。
u2_conformer_aishell在中文普通话开源语音数据集[AISHELL-1](http://www.aishelltech.com/kysjcp)进行了预训练,该模型在其测试集上的CER指标是0.055257。
<p align="center">
<img src="https://paddlehub.bj.bcebos.com/paddlehub-img/conformer.png" hspace='10'/> <br />
</p>
<p align="center">
<img src="https://paddlehub.bj.bcebos.com/paddlehub-img/u2_conformer.png" hspace='10'/> <br />
</p>
更多详情请参考:
- [Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition](https://arxiv.org/abs/2012.05481)
- [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100)
## 二、安装
- ### 1、系统依赖
- libsndfile
- Linux
```shell
$ sudo apt-get install libsndfile
or
$ sudo yum install libsndfile
```
- MacOs
```
$ brew install libsndfile
```
- ### 2、环境依赖
- paddlepaddle >= 2.1.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 3、安装
- ```shell
$ hub install u2_conformer_aishell
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
```python
import paddlehub as hub
# 采样率为16k,格式为wav的中文语音音频
wav_file = '/PATH/TO/AUDIO'
model = hub.Module(
name='u2_conformer_aishell',
version='1.0.0')
text = model.speech_recognize(wav_file)
print(text)
```
- ### 2、API
- ```python
def check_audio(audio_file)
```
- 检查输入音频格式和采样率是否满足为16000
- **参数**
- `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
- ```python
def speech_recognize(
audio_file,
device='cpu',
)
```
- 将输入的音频识别成文字
- **参数**
- `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
- `device`:预测时使用的设备,默认为`cpu`,如需使用gpu预测,请设置为`gpu`。
- **返回**
- `text`:str类型,返回输入音频的识别文字结果。
## 四、服务部署
- PaddleHub Serving可以部署一个在线的语音识别服务。
- ### 第一步:启动PaddleHub Serving
- ```shell
$ hub serving start -m u2_conformer_aishell
```
- 这样就完成了一个语音识别服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
# 需要识别的音频的存放路径,确保部署服务的机器可访问
file = '/path/to/input.wav'
# 以key的方式指定text传入预测方法的时的参数,此例中为"audio_file"
data = {"audio_file": file}
# 发送post请求,content-type类型应指定json方式,url中的ip地址需改为对应机器的ip
url = "http://127.0.0.1:8866/predict/u2_conformer_aishell"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 五、更新历史
* 1.0.0
初始发布
```shell
$ hub install u2_conformer_aishell
```
data:
train_manifest: data/manifest.train
dev_manifest: data/manifest.dev
test_manifest: data/manifest.test
min_input_len: 0.5
max_input_len: 20.0 # second
min_output_len: 0.0
max_output_len: 400.0
min_output_input_ratio: 0.05
max_output_input_ratio: 10.0
collator:
vocab_filepath: data/vocab.txt
unit_type: 'char'
spm_model_prefix: ''
augmentation_config: conf/augmentation.json
batch_size: 64
raw_wav: True # use raw_wav or kaldi feature
spectrum_type: fbank #linear, mfcc, fbank
feat_dim: 80
delta_delta: False
dither: 1.0
target_sample_rate: 16000
max_freq: None
n_fft: None
stride_ms: 10.0
window_ms: 25.0
use_dB_normalization: False
target_dB: -20
random_seed: 0
keep_transcription_text: False
sortagrad: True
shuffle_method: batch_shuffle
num_workers: 2
decoding:
alpha: 2.5
batch_size: 128
beam_size: 10
beta: 0.3
ctc_weight: 0.0
cutoff_prob: 1.0
cutoff_top_n: 0
decoding_chunk_size: -1
decoding_method: attention
error_rate_type: cer
lang_model_path: data/lm/common_crawl_00.prune01111.trie.klm
num_decoding_left_chunks: -1
num_proc_bsearch: 8
simulate_streaming: False
model:
cmvn_file: data/mean_std.json
cmvn_file_type: json
decoder: transformer
decoder_conf:
attention_heads: 4
dropout_rate: 0.1
linear_units: 2048
num_blocks: 6
positional_dropout_rate: 0.1
self_attention_dropout_rate: 0.0
src_attention_dropout_rate: 0.0
encoder: conformer
encoder_conf:
activation_type: swish
attention_dropout_rate: 0.0
attention_heads: 4
cnn_module_kernel: 15
dropout_rate: 0.1
input_layer: conv2d
linear_units: 2048
normalize_before: True
num_blocks: 12
output_size: 256
pos_enc_layer_type: rel_pos
positional_dropout_rate: 0.1
selfattention_layer_type: rel_selfattn
use_cnn_module: True
input_dim: 0
model_conf:
ctc_weight: 0.3
ctc_dropoutrate: 0.0
ctc_grad_norm_type: instance
length_normalized_loss: False
lsm_weight: 0.1
output_dim: 0
training:
accum_grad: 2
global_grad_clip: 5.0
log_interval: 100
n_epoch: 300
optim: adam
optim_conf:
lr: 0.002
weight_decay: 1e-06
scheduler: warmuplr
scheduler_conf:
lr_decay: 1.0
warmup_steps: 25000
checkpoint:
kbest_n: 50
latest_n: 5
{"mean_stat": [533749178.75492024, 537379151.9412827, 553560684.251823, 587164297.7995199, 631868827.5506272, 662598279.7375823, 684377628.7270963, 695391900.076011, 692470493.5234187, 679434068.1698124, 666124153.9164762, 656323498.7897255, 665750586.0282139, 678693518.7836165, 681921713.5434498, 679622373.0941861, 669891550.4909347, 656595089.7941492, 653838531.0994304, 637678601.7858486, 628412248.7348012, 644835299.462052, 638840698.1892803, 646181879.4332589, 639724189.2981818, 642757470.3933163, 637471382.8647255, 642368839.4687729, 643414999.4559816, 647384269.1630985, 649348352.9727564, 649293860.0141628, 650234047.7200857, 654485430.6703687, 660474314.9996675, 667417041.2224753, 673157601.3226709, 675674470.304284, 675124085.6890339, 668017589.4583111, 670061307.6169846, 662625614.6886193, 663144526.4351237, 662504003.7634674, 666413530.1149732, 672263295.5639057, 678483738.2530766, 685387098.3034457, 692570857.529439, 699066050.4399202, 700784878.5879861, 701201520.50868, 702666292.305144, 705443439.2278953, 706070270.9023902, 705988909.8337733, 702843339.0362502, 699318566.4701376, 696089900.3030818, 687559674.541517, 675279201.9502573, 663676352.2301354, 662963751.7464145, 664300133.8414352, 666095384.4212626, 671682092.7777623, 676652386.6696675, 680097668.2490273, 683810023.0071762, 688701544.3655603, 692082724.9923568, 695788849.6782106, 701085780.0070009, 706389529.7959046, 711492753.1344281, 717637923.73355, 719691678.2081754, 715810733.4964175, 696362890.4862831, 604649423.9932467], "var_stat": [5413314850.92017, 5559847287.933615, 6150990253.613769, 6921242242.585692, 7999776708.347419, 8789877370.390867, 9405801233.462742, 9768050110.323652, 9759783206.942099, 9430647265.679018, 9090547056.72849, 8873147345.425886, 9155912918.518642, 9542539953.84679, 9653547618.806402, 9593434792.936714, 9316633026.420147, 8959273999.588833, 8863548125.445953, 8450615911.730164, 8211598033.615433, 8587083872.162145, 8432613574.987708, 8583943640.722399, 8401731458.393406, 8439359231.367369, 8293779802.711447, 8401506934.147289, 8427506949.839874, 8525176341.071184, 8577080109.482346, 8575106681.347283, 8594987363.896849, 8701703698.13697, 8854967559.695303, 9029484499.828356, 9168774993.437275, 9221457044.693224, 9194525496.858181, 8997085233.031223, 9024585998.805922, 8819398159.92156, 8807895653.788486, 8777245867.886335, 8869681168.825321, 9017397167.041729, 9173402827.38027, 9345595113.30765, 9530638054.282673, 9701241750.610865, 9749002220.142677, 9762753891.356327, 9802020174.527405, 9874432300.977995, 9883303068.689241, 9873499335.610315, 9780680890.924107, 9672603363.913414, 9569436761.47915, 9321842521.985804, 8968140697.297707, 8646348638.918655, 8616965457.523136, 8648620220.395298, 8702086138.675117, 8859213220.99842, 8999405313.087536, 9105949447.399998, 9220413227.016796, 9358601578.269663, 9451405873.00428, 9552727080.824707, 9695443509.54488, 9836687193.669691, 9970962418.410656, 10135881535.317768, 10189390919.400673, 10070483257.345238, 9532953296.22076, 7261219636.045063], "frame_num": 54068199}
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from pathlib import Path
import sys
import numpy as np
from paddlehub.env import MODULE_HOME
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
import paddle
import soundfile as sf
# TODO: Remove system path when deepspeech can be installed via pip.
sys.path.append(os.path.join(MODULE_HOME, 'u2_conformer_aishell'))
from deepspeech.exps.u2.config import get_cfg_defaults
from deepspeech.utils.utility import UpdateConfig
from .u2_conformer_tester import U2ConformerTester
@moduleinfo(name="u2_conformer_aishell", version="1.0.0", summary="", author="Baidu", author_email="", type="audio/asr")
class U2Conformer(paddle.nn.Layer):
def __init__(self):
super(U2Conformer, self).__init__()
# resource
res_dir = os.path.join(MODULE_HOME, 'u2_conformer_aishell', 'assets')
conf_file = os.path.join(res_dir, 'conf/conformer.yaml')
checkpoint = os.path.join(res_dir, 'checkpoints/avg_20.pdparams')
# config
self.config = get_cfg_defaults()
self.config.merge_from_file(conf_file)
# TODO: Remove path updating snippet.
with UpdateConfig(self.config):
self.config.collator.vocab_filepath = os.path.join(res_dir, self.config.collator.vocab_filepath)
# self.config.collator.spm_model_prefix = os.path.join(res_dir, self.config.collator.spm_model_prefix)
self.config.collator.augmentation_config = os.path.join(res_dir, self.config.collator.augmentation_config)
self.config.model.cmvn_file = os.path.join(res_dir, self.config.model.cmvn_file)
self.config.decoding.decoding_method = 'attention_rescoring'
self.config.decoding.batch_size = 1
# model
self.tester = U2ConformerTester(self.config)
self.tester.setup_model()
self.tester.resume(checkpoint)
@staticmethod
def check_audio(audio_file):
sig, sample_rate = sf.read(audio_file)
assert sample_rate == 16000, 'Excepting sample rate of input audio is 16000, but got {}'.format(sample_rate)
@serving
def speech_recognize(self, audio_file, device='cpu'):
assert os.path.isfile(audio_file), 'File not exists: {}'.format(audio_file)
self.check_audio(audio_file)
paddle.set_device(device)
return self.tester.test(audio_file)[0][0]
loguru
yacs
jsonlines
scipy==1.2.1
sentencepiece
resampy==0.2.2
SoundFile==0.9.0.post1
soxbindings
kaldiio
typeguard
editdistance
textgrid
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Evaluation for U2 model."""
import os
import sys
import paddle
from deepspeech.frontend.featurizer.text_featurizer import TextFeaturizer
from deepspeech.io.collator import SpeechCollator
from deepspeech.models.u2 import U2Model
from deepspeech.utils import mp_tools
from deepspeech.utils.utility import UpdateConfig
class U2ConformerTester:
def __init__(self, config):
self.config = config
self.collate_fn_test = SpeechCollator.from_config(config)
self._text_featurizer = TextFeaturizer(
unit_type=config.collator.unit_type, vocab_filepath=None, spm_model_prefix=config.collator.spm_model_prefix)
@mp_tools.rank_zero_only
@paddle.no_grad()
def test(self, audio_file):
self.model.eval()
cfg = self.config.decoding
collate_fn_test = self.collate_fn_test
audio, _ = collate_fn_test.process_utterance(audio_file=audio_file, transcript="Hello")
audio_len = audio.shape[0]
audio = paddle.to_tensor(audio, dtype='float32')
audio_len = paddle.to_tensor(audio_len)
audio = paddle.unsqueeze(audio, axis=0)
vocab_list = collate_fn_test.vocab_list
text_feature = self.collate_fn_test.text_feature
result_transcripts = self.model.decode(
audio,
audio_len,
text_feature=text_feature,
decoding_method=cfg.decoding_method,
lang_model_path=cfg.lang_model_path,
beam_alpha=cfg.alpha,
beam_beta=cfg.beta,
beam_size=cfg.beam_size,
cutoff_prob=cfg.cutoff_prob,
cutoff_top_n=cfg.cutoff_top_n,
num_processes=cfg.num_proc_bsearch,
ctc_weight=cfg.ctc_weight,
decoding_chunk_size=cfg.decoding_chunk_size,
num_decoding_left_chunks=cfg.num_decoding_left_chunks,
simulate_streaming=cfg.simulate_streaming)
return result_transcripts
def setup_model(self):
config = self.config.clone()
with UpdateConfig(config):
config.model.input_dim = self.collate_fn_test.feature_size
config.model.output_dim = self.collate_fn_test.vocab_size
self.model = U2Model.from_config(config.model)
def resume(self, checkpoint):
"""Resume from the checkpoint at checkpoints in the output
directory or load a specified checkpoint.
"""
model_dict = paddle.load(checkpoint)
self.model.set_state_dict(model_dict)
# u2_conformer_librispeech
|模型名称|u2_conformer_librispeech|
| :--- | :---: |
|类别|语音-语音识别|
|网络|Conformer|
|数据集|LibriSpeech|
|是否支持Fine-tuning|否|
|模型大小|191MB|
|最新更新日期|2021-11-01|
|数据指标|英文WER 0.034|
## 一、模型基本信息
### 模型介绍
U2 Conformer模型是一种适用于英文和中文的end-to-end语音识别模型。u2_conformer_libirspeech采用了conformer的encoder和transformer的decoder的模型结构,并且使用了ctc-prefix beam search的方式进行一遍打分,再利用attention decoder进行二次打分的方式进行解码来得到最终结果。
u2_conformer_libirspeech在英文开源语音数据集[LibriSpeech ASR corpus](http://www.openslr.org/12/)进行了预训练,该模型在其测试集上的WER指标是0.034655。
<p align="center">
<img src="https://paddlehub.bj.bcebos.com/paddlehub-img/conformer.png" hspace='10'/> <br />
</p>
<p align="center">
<img src="https://paddlehub.bj.bcebos.com/paddlehub-img/u2_conformer.png" hspace='10'/> <br />
</p>
更多详情请参考:
- [Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition](https://arxiv.org/abs/2012.05481)
- [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100)
## 二、安装
- ### 1、系统依赖
- libsndfile
- Linux
```shell
$ sudo apt-get install libsndfile
or
$ sudo yum install libsndfile
```
- MacOs
```
$ brew install libsndfile
```
- ### 2、环境依赖
- paddlepaddle >= 2.1.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 3、安装
- ```shell
$ hub install u2_conformer_librispeech
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
- ```python
import paddlehub as hub
# 采样率为16k,格式为wav的英文语音音频
wav_file = '/PATH/TO/AUDIO'
model = hub.Module(
name='u2_conformer_librispeech',
version='1.0.0')
text = model.speech_recognize(wav_file)
print(text)
```
- ### 2、API
- ```python
def check_audio(audio_file)
```
- 检查输入音频格式和采样率是否满足为16000
- **参数**
- `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
- ```python
def speech_recognize(
audio_file,
device='cpu',
)
```
- 将输入的音频识别成文字
- **参数**
- `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
- `device`:预测时使用的设备,默认为`cpu`,如需使用gpu预测,请设置为`gpu`。
- **返回**
- `text`:str类型,返回输入音频的识别文字结果。
## 四、服务部署
- PaddleHub Serving可以部署一个在线的语音识别服务。
- ### 第一步:启动PaddleHub Serving
- ```shell
$ hub serving start -m u2_conformer_librispeech
```
- 这样就完成了一个语音识别服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
# 需要识别的音频的存放路径,确保部署服务的机器可访问
file = '/path/to/input.wav'
# 以key的方式指定text传入预测方法的时的参数,此例中为"audio_file"
data = {"audio_file": file}
# 发送post请求,content-type类型应指定json方式,url中的ip地址需改为对应机器的ip
url = "http://127.0.0.1:8866/predict/u2_conformer_librispeech"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 五、更新历史
* 1.0.0
初始发布
```shell
$ hub install u2_conformer_librispeech
```
# https://yaml.org/type/float.html
data:
train_manifest: data/manifest.test-clean
dev_manifest: data/manifest.test-clean
test_manifest: data/manifest.test-clean
min_input_len: 0.5 # seconds
max_input_len: 30.0 # seconds
min_output_len: 0.0 # tokens
max_output_len: 400.0 # tokens
min_output_input_ratio: 0.05
max_output_input_ratio: 100.0
collator:
vocab_filepath: data/vocab.txt
unit_type: 'spm'
spm_model_prefix: 'data/bpe_unigram_5000'
mean_std_filepath: ""
augmentation_config: conf/augmentation.json
batch_size: 16
raw_wav: True # use raw_wav or kaldi feature
spectrum_type: fbank #linear, mfcc, fbank
feat_dim: 80
delta_delta: False
dither: 1.0
target_sample_rate: 16000
max_freq: None
n_fft: None
stride_ms: 10.0
window_ms: 25.0
use_dB_normalization: True
target_dB: -20
random_seed: 0
keep_transcription_text: False
sortagrad: True
shuffle_method: batch_shuffle
num_workers: 2
# network architecture
model:
cmvn_file: "data/mean_std.json"
cmvn_file_type: "json"
# encoder related
encoder: conformer
encoder_conf:
output_size: 256 # dimension of attention
attention_heads: 4
linear_units: 2048 # the number of units of position-wise feed forward
num_blocks: 12 # the number of encoder blocks
dropout_rate: 0.1
positional_dropout_rate: 0.1
attention_dropout_rate: 0.0
input_layer: conv2d # encoder input type, you can chose conv2d, conv2d6 and conv2d8
normalize_before: True
use_cnn_module: True
cnn_module_kernel: 15
activation_type: 'swish'
pos_enc_layer_type: 'rel_pos'
selfattention_layer_type: 'rel_selfattn'
# decoder related
decoder: transformer
decoder_conf:
attention_heads: 4
linear_units: 2048
num_blocks: 6
dropout_rate: 0.1
positional_dropout_rate: 0.1
self_attention_dropout_rate: 0.0
src_attention_dropout_rate: 0.0
# hybrid CTC/attention
model_conf:
ctc_weight: 0.3
ctc_dropoutrate: 0.0
ctc_grad_norm_type: instance
lsm_weight: 0.1 # label smoothing option
length_normalized_loss: false
training:
n_epoch: 120
accum_grad: 8
global_grad_clip: 3.0
optim: adam
optim_conf:
lr: 0.004
weight_decay: 1e-06
scheduler: warmuplr # pytorch v1.1.0+ required
scheduler_conf:
warmup_steps: 25000
lr_decay: 1.0
log_interval: 100
checkpoint:
kbest_n: 50
latest_n: 5
decoding:
batch_size: 64
error_rate_type: wer
decoding_method: attention # 'attention', 'ctc_greedy_search', 'ctc_prefix_beam_search', 'attention_rescoring'
lang_model_path: data/lm/common_crawl_00.prune01111.trie.klm
alpha: 2.5
beta: 0.3
beam_size: 10
cutoff_prob: 1.0
cutoff_top_n: 0
num_proc_bsearch: 8
ctc_weight: 0.5 # ctc weight for attention rescoring decode mode.
decoding_chunk_size: -1 # decoding chunk size. Defaults to -1.
# <0: for decoding, use full chunk.
# >0: for decoding, use fixed chunk size as set.
# 0: used for training, it's prohibited here.
num_decoding_left_chunks: -1 # number of left chunks for decoding. Defaults to -1.
simulate_streaming: False # simulate streaming inference. Defaults to False.
{"mean_stat": [3419817384.9589553, 3554070049.1888413, 3818511309.9166613, 4066044518.3850017, 4291564631.2871633, 4447813845.146345, 4533096457.680424, 4535743891.989957, 4529762966.952207, 4506798370.255702, 4563810141.721841, 4621582319.277632, 4717208210.814803, 4782916961.295261, 4800534153.252695, 4816978042.979026, 4813370098.242317, 4783029495.131413, 4797780594.144404, 4697681126.278327, 4615891408.325888, 4660549391.6024275, 4576180438.146472, 4609080513.250168, 4575296489.058092, 4602504837.872262, 4568039825.650208, 4596829549.204861, 4590634987.343898, 4604371982.549804, 4623782318.317643, 4643582410.8842745, 4681460771.788484, 4759470876.31175, 4808639788.683043, 4828470941.416027, 4868984035.113543, 4906503986.801533, 4945995579.443381, 4936645225.986488, 4975902400.919519, 4960230208.656678, 4986734786.199859, 4983472199.8246765, 5002204376.162232, 5030432036.352981, 5060386169.086892, 5093482058.577236, 5118330657.308789, 5137270836.326198, 5140137363.319094, 5144296534.330122, 5158812605.654329, 5166263515.51458, 5156261604.282723, 5155820011.532965, 5154511256.8968, 5152063882.193671, 5153425524.412178, 5149000486.683038, 5154587156.35868, 5134412165.07972, 5092874838.792056, 5062281231.5140915, 5029059442.072953, 4996045017.917702, 4962203662.170533, 4928110046.282831, 4900476581.092096, 4881407033.533021, 4859626116.955097, 4851430742.3865795, 4850317443.454599, 4848197040.155383, 4837178106.464577, 4818448202.7298765, 4803345264.527405, 4765785994.104498, 4735296707.352132, 4699957946.40757], "var_stat": [39487786239.20539, 42865198005.60155, 49718916704.468704, 55953639455.490585, 62156293826.00315, 66738657819.12445, 69416921986.47835, 69657873431.17258, 69240303799.53061, 68286972351.43054, 69718367152.18843, 71405427710.7103, 74174200331.87572, 76047347951.43869, 76478048614.40665, 76810929560.19212, 76540466184.85634, 75538479521.34026, 75775624554.07217, 72775991318.16557, 70350402972.93352, 71358602366.48341, 68872845697.9878, 69552396791.49916, 68471390455.59991, 69022047288.07498, 67982260910.11236, 68656154716.71916, 68461419064.9241, 68795285460.65717, 69270474608.52791, 69754495937.76433, 70596044579.14969, 72207936275.97945, 73629619360.65047, 74746445259.57487, 75925168496.81197, 76973508692.04265, 78074337163.3413, 77765963787.96971, 78839167623.49733, 78328768943.2287, 79016127287.03778, 78922638306.99306, 79489768324.9408, 80354861037.44005, 81311991408.12526, 82368205917.26112, 83134782296.1741, 83667769421.23245, 83673751953.46239, 83806087685.62842, 84193971202.07523, 84424752763.34825, 84092846117.64104, 84039114093.08766, 83982515225.7085, 83909645482.75613, 83947278563.15077, 83800767707.19617, 83851106027.8772, 83089292432.37892, 82056425825.3622, 81138570746.92316, 80131843258.75557, 79130160837.19037, 78092166878.71533, 77104785522.79205, 76308548392.10454, 75709445890.58063, 75084778641.6033, 74795849006.19067, 74725807683.832, 74645651838.2169, 74300193368.39339, 73696619147.86806, 73212785808.97992, 72240491743.0697, 71420246227.32545, 70457076435.4593], "frame_num": 345484372}
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from pathlib import Path
import sys
import numpy as np
from paddlehub.env import MODULE_HOME
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
import paddle
import soundfile as sf
# TODO: Remove system path when deepspeech can be installed via pip.
sys.path.append(os.path.join(MODULE_HOME, 'u2_conformer_librispeech'))
from deepspeech.exps.u2.config import get_cfg_defaults
from deepspeech.utils.utility import UpdateConfig
from .u2_conformer_tester import U2ConformerTester
@moduleinfo(
name="u2_conformer_librispeech", version="1.0.0", summary="", author="Baidu", author_email="", type="audio/asr")
class U2Conformer(paddle.nn.Layer):
def __init__(self):
super(U2Conformer, self).__init__()
# resource
res_dir = os.path.join(MODULE_HOME, 'u2_conformer_librispeech', 'assets')
conf_file = os.path.join(res_dir, 'conf/conformer.yaml')
checkpoint = os.path.join(res_dir, 'checkpoints/avg_30.pdparams')
# config
self.config = get_cfg_defaults()
self.config.merge_from_file(conf_file)
# TODO: Remove path updating snippet.
with UpdateConfig(self.config):
self.config.collator.vocab_filepath = os.path.join(res_dir, self.config.collator.vocab_filepath)
self.config.collator.spm_model_prefix = os.path.join(res_dir, self.config.collator.spm_model_prefix)
self.config.collator.augmentation_config = os.path.join(res_dir, self.config.collator.augmentation_config)
self.config.model.cmvn_file = os.path.join(res_dir, self.config.model.cmvn_file)
self.config.decoding.decoding_method = 'attention_rescoring'
self.config.decoding.batch_size = 1
# model
self.tester = U2ConformerTester(self.config)
self.tester.setup_model()
self.tester.resume(checkpoint)
@staticmethod
def check_audio(audio_file):
sig, sample_rate = sf.read(audio_file)
assert sample_rate == 16000, 'Excepting sample rate of input audio is 16000, but got {}'.format(sample_rate)
@serving
def speech_recognize(self, audio_file, device='cpu'):
assert os.path.isfile(audio_file), 'File not exists: {}'.format(audio_file)
self.check_audio(audio_file)
paddle.set_device(device)
return self.tester.test(audio_file)[0][0]
loguru
yacs
jsonlines
scipy==1.2.1
sentencepiece
resampy==0.2.2
SoundFile==0.9.0.post1
soxbindings
kaldiio
typeguard
editdistance
textgrid
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Evaluation for U2 model."""
import os
import sys
import paddle
from deepspeech.frontend.featurizer.text_featurizer import TextFeaturizer
from deepspeech.io.collator import SpeechCollator
from deepspeech.models.u2 import U2Model
from deepspeech.utils import mp_tools
from deepspeech.utils.utility import UpdateConfig
class U2ConformerTester:
def __init__(self, config):
self.config = config
self.collate_fn_test = SpeechCollator.from_config(config)
self._text_featurizer = TextFeaturizer(
unit_type=config.collator.unit_type, vocab_filepath=None, spm_model_prefix=config.collator.spm_model_prefix)
@mp_tools.rank_zero_only
@paddle.no_grad()
def test(self, audio_file):
self.model.eval()
cfg = self.config.decoding
collate_fn_test = self.collate_fn_test
audio, _ = collate_fn_test.process_utterance(audio_file=audio_file, transcript="Hello")
audio_len = audio.shape[0]
audio = paddle.to_tensor(audio, dtype='float32')
audio_len = paddle.to_tensor(audio_len)
audio = paddle.unsqueeze(audio, axis=0)
vocab_list = collate_fn_test.vocab_list
text_feature = self.collate_fn_test.text_feature
result_transcripts = self.model.decode(
audio,
audio_len,
text_feature=text_feature,
decoding_method=cfg.decoding_method,
lang_model_path=cfg.lang_model_path,
beam_alpha=cfg.alpha,
beam_beta=cfg.beta,
beam_size=cfg.beam_size,
cutoff_prob=cfg.cutoff_prob,
cutoff_top_n=cfg.cutoff_top_n,
num_processes=cfg.num_proc_bsearch,
ctc_weight=cfg.ctc_weight,
decoding_chunk_size=cfg.decoding_chunk_size,
num_decoding_left_chunks=cfg.num_decoding_left_chunks,
simulate_streaming=cfg.simulate_streaming)
return result_transcripts
def setup_model(self):
config = self.config.clone()
with UpdateConfig(config):
config.model.input_dim = self.collate_fn_test.feature_size
config.model.output_dim = self.collate_fn_test.vocab_size
self.model = U2Model.from_config(config.model)
def resume(self, checkpoint):
"""Resume from the checkpoint at checkpoints in the output
directory or load a specified checkpoint.
"""
model_dict = paddle.load(checkpoint)
self.model.set_state_dict(model_dict)
# u2_conformer_wenetspeech
|模型名称|u2_conformer_wenetspeech|
| :--- | :---: |
|类别|语音-语音识别|
|网络|Conformer|
|数据集|WenetSpeech|
|是否支持Fine-tuning|否|
|模型大小|494MB|
|最新更新日期|2021-12-10|
|数据指标|中文CER 0.087 |
## 一、模型基本信息
### 模型介绍
U2 Conformer模型是一种适用于英文和中文的end-to-end语音识别模型。u2_conformer_wenetspeech采用了conformer的encoder和transformer的decoder的模型结构,并且使用了ctc-prefix beam search的方式进行一遍打分,再利用attention decoder进行二次打分的方式进行解码来得到最终结果。
u2_conformer_wenetspeech在中文普通话开源语音数据集[WenetSpeech](https://wenet-e2e.github.io/WenetSpeech/)进行了预训练,该模型在其DEV测试集上的CER指标是0.087。
<p align="center">
<img src="https://paddlehub.bj.bcebos.com/paddlehub-img/conformer.png" hspace='10'/> <br />
</p>
<p align="center">
<img src="https://paddlehub.bj.bcebos.com/paddlehub-img/u2_conformer.png" hspace='10'/> <br />
</p>
更多详情请参考:
- [Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition](https://arxiv.org/abs/2012.05481)
- [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100)
- [WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition](https://arxiv.org/abs/2110.03370)
## 二、安装
- ### 1、系统依赖
- libsndfile
- Linux
```shell
$ sudo apt-get install libsndfile
or
$ sudo yum install libsndfile
```
- MacOs
```
$ brew install libsndfile
```
- ### 2、环境依赖
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 3、安装
- ```shell
$ hub install u2_conformer_wenetspeech
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
```python
import paddlehub as hub
# 采样率为16k,格式为wav的中文语音音频
wav_file = '/PATH/TO/AUDIO'
model = hub.Module(
name='u2_conformer_wenetspeech',
version='1.0.0')
text = model.speech_recognize(wav_file)
print(text)
```
- ### 2、API
- ```python
def check_audio(audio_file)
```
- 检查输入音频格式和采样率是否满足为16000,如果不满足,则重新采样至16000并将新的音频文件保存至相同目录。
- **参数**
- `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
- ```python
def speech_recognize(
audio_file,
device='cpu',
)
```
- 将输入的音频识别成文字
- **参数**
- `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
- `device`:预测时使用的设备,默认为`cpu`,如需使用gpu预测,请设置为`gpu`。
- **返回**
- `text`:str类型,返回输入音频的识别文字结果。
## 四、服务部署
- PaddleHub Serving可以部署一个在线的语音识别服务。
- ### 第一步:启动PaddleHub Serving
- ```shell
$ hub serving start -m u2_conformer_wenetspeech
```
- 这样就完成了一个语音识别服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
# 需要识别的音频的存放路径,确保部署服务的机器可访问
file = '/path/to/input.wav'
# 以key的方式指定text传入预测方法的时的参数,此例中为"audio_file"
data = {"audio_file": file}
# 发送post请求,content-type类型应指定json方式,url中的ip地址需改为对应机器的ip
url = "http://127.0.0.1:8866/predict/u2_conformer_wenetspeech"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 五、更新历史
* 1.0.0
初始发布
```shell
$ hub install u2_conformer_wenetspeech
```
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import paddle
from paddleaudio import load, save_wav
from paddlespeech.cli import ASRExecutor
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
@moduleinfo(
name="u2_conformer_wenetspeech", version="1.0.0", summary="", author="Wenet", author_email="", type="audio/asr")
class U2Conformer(paddle.nn.Layer):
def __init__(self):
super(U2Conformer, self).__init__()
self.asr_executor = ASRExecutor()
self.asr_kw_args = {
'model': 'conformer_wenetspeech',
'lang': 'zh',
'sample_rate': 16000,
'config': None, # Set `config` and `ckpt_path` to None to use pretrained model.
'ckpt_path': None,
}
@staticmethod
def check_audio(audio_file):
assert audio_file.endswith('.wav'), 'Input file must be a wave file `*.wav`.'
sig, sample_rate = load(audio_file)
if sample_rate != 16000:
sig, _ = load(audio_file, 16000)
audio_file_16k = audio_file[:audio_file.rindex('.')] + '_16k.wav'
logger.info('Resampling to 16000 sample rate to new audio file: {}'.format(audio_file_16k))
save_wav(sig, 16000, audio_file_16k)
return audio_file_16k
else:
return audio_file
@serving
def speech_recognize(self, audio_file, device='cpu'):
assert os.path.isfile(audio_file), 'File not exists: {}'.format(audio_file)
audio_file = self.check_audio(audio_file)
text = self.asr_executor(audio_file=audio_file, device=device, **self.asr_kw_args)
return text
```shell # panns_cnn10
$ hub install panns_cnn10==1.0.0
```
|模型名称|panns_cnn10|
| :--- | :---: |
|类别|语音-声音分类|
|网络|PANNs|
|数据集|Google Audioset|
|是否支持Fine-tuning|是|
|模型大小|31MB|
|最新更新日期|2021-06-15|
|数据指标|mAP 0.380|
`panns_cnn10`是一个基于[Google Audioset](https://research.google.com/audioset/)数据集训练的声音分类/识别的模型。该模型主要包含8个卷积层和2个全连接层,模型参数为4.9M。经过预训练后,可以用于提取音频的embbedding,维度是512。 ## 一、模型基本信息
更多详情请参考论文:[PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition](https://arxiv.org/pdf/1912.10211.pdf) ### 模型介绍
## API `panns_cnn10`是一个基于[Google Audioset](https://research.google.com/audioset/)数据集训练的声音分类/识别的模型。该模型主要包含8个卷积层和2个全连接层,模型参数为4.9M。经过预训练后,可以用于提取音频的embbedding,维度是512。
```python
def __init__(
task,
num_class=None,
label_map=None,
load_checkpoint=None,
**kwargs,
)
```
创建Module对象。 更多详情请参考论文:[PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition](https://arxiv.org/pdf/1912.10211.pdf)
**参数** ## 二、安装
* `task`: 任务名称,可为`sound-cls`或者`None``sound-cls`代表声音分类任务,可以对声音分类的数据集进行finetune;为`None`时可以获取预训练模型对音频进行分类/Tagging。 - ### 1、环境依赖
* `num_classes`:声音分类任务的类别数,finetune时需要指定,数值与具体使用的数据集类别数一致。
* `label_map`:预测时的类别映射表。
* `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
* `**kwargs`:用户额外指定的关键字字典类型的参数。
```python - paddlepaddle >= 2.0.0
def predict(
data,
sample_rate,
batch_size=1,
feat_type='mel',
use_gpu=False
)
```
**参数** - paddlehub >= 2.0.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
* `data`: 待预测数据,格式为\[waveform1, wavwform2…,\],其中每个元素都是一个一维numpy列表,是音频的波形采样数值列表。 - ### 2、安装
* `sample_rate`:音频文件的采样率。
* `feat_type`:音频特征的种类选取,当前支持`'mel'`(详情可查看[Mel-frequency cepstrum](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum))和原始波形特征`'raw'`
* `batch_size`:模型批处理大小。
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回** - ```shell
$ hub install panns_cnn10
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
* `results`:list类型,不同任务类型的返回结果如下
* 声音分类(task参数为`sound-cls`):列表里包含每个音频文件的分类标签。
* Tagging(task参数为`None`):列表里包含每个音频文件527个类别([Audioset标签](https://research.google.com/audioset/))的得分。
## 三、模型API预测
**代码示例** - ### 1、预测代码示例
- [ESC50](https://github.com/karolpiczak/ESC-50)声音分类预测 - ```python
```python # ESC50声音分类预测
import librosa import librosa
import paddlehub as hub import paddlehub as hub
from paddlehub.datasets import ESC50 from paddlehub.datasets import ESC50
sr = 44100 # 音频文件的采样率 sr = 44100 # 音频文件的采样率
wav_file = '/data/cat.wav' # 用于预测的音频文件路径 wav_file = '/PATH/TO/AUDIO' # 用于预测的音频文件路径
checkpoint = 'model.pdparams' # 用于预测的模型参数 checkpoint = 'model.pdparams' # 用于预测的模型参数
label_map = {idx: label for idx, label in enumerate(ESC50.label_list)} label_map = {idx: label for idx, label in enumerate(ESC50.label_list)}
...@@ -86,8 +70,8 @@ def predict( ...@@ -86,8 +70,8 @@ def predict(
print('File: {}\tLable: {}'.format(wav_file, result[0])) print('File: {}\tLable: {}'.format(wav_file, result[0]))
``` ```
- Audioset Tagging - ```python
```python # Audioset Tagging
import librosa import librosa
import numpy as np import numpy as np
import paddlehub as hub import paddlehub as hub
...@@ -105,7 +89,7 @@ def predict( ...@@ -105,7 +89,7 @@ def predict(
print(msg) print(msg)
sr = 44100 # 音频文件的采样率 sr = 44100 # 音频文件的采样率
wav_file = '/data/cat.wav' # 用于预测的音频文件路径 wav_file = '/PATH/TO/AUDIO' # 用于预测的音频文件路径
label_file = './audioset_labels.txt' # audioset标签文本文件 label_file = './audioset_labels.txt' # audioset标签文本文件
topk = 10 # 展示的topk数 topk = 10 # 展示的topk数
...@@ -130,23 +114,58 @@ def predict( ...@@ -130,23 +114,58 @@ def predict(
show_topk(topk, label_map, wav_file, result[0]) show_topk(topk, label_map, wav_file, result[0])
``` ```
详情可参考PaddleHub示例: - ### 2、API
- [AudioClassification](https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0/demo/audio_classification)
- ```python
def __init__(
task,
num_class=None,
label_map=None,
load_checkpoint=None,
**kwargs,
)
```
- 创建Module对象。
## 查看代码 - **参数**
- `task`: 任务名称,可为`sound-cls`或者`None`。`sound-cls`代表声音分类任务,可以对声音分类的数据集进行finetune;为`None`时可以获取预训练模型对音频进行分类/Tagging。
https://github.com/qiuqiangkong/audioset_tagging_cnn - `num_classes`:声音分类任务的类别数,finetune时需要指定,数值与具体使用的数据集类别数一致。
- `label_map`:预测时的类别映射表。
- `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
- `**kwargs`:用户额外指定的关键字字典类型的参数。
- ```python
def predict(
data,
sample_rate,
batch_size=1,
feat_type='mel',
use_gpu=False
)
```
- 模型预测,输入为音频波形数据,输出为分类标签。
## 依赖 - **参数**
- `data`: 待预测数据,格式为\[waveform1, wavwform2…,\],其中每个元素都是一个一维numpy列表,是音频的波形采样数值列表。
- `sample_rate`:音频文件的采样率。
- `feat_type`:音频特征的种类选取,当前支持`'mel'`(详情可查看[Mel-frequency cepstrum](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum))和原始波形特征`'raw'`。
- `batch_size`:模型批处理大小。
- `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
paddlepaddle >= 2.0.0 - **返回**
- `results`:list类型,不同任务类型的返回结果如下
- 声音分类(task参数为`sound-cls`):列表里包含每个音频文件的分类标签。
- Tagging(task参数为`None`):列表里包含每个音频文件527个类别([Audioset标签](https://research.google.com/audioset/))的得分。
paddlehub >= 2.0.0 详情可参考PaddleHub示例:
- [AudioClassification](https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0/demo/audio_classification)
## 更新历史 ## 四、更新历史
* 1.0.0 * 1.0.0
初始发布,动态图版本模型,支持声音分类`sound-cls`任务的fine-tune和基于Audioset Tagging预测。 初始发布,动态图版本模型,支持声音分类`sound-cls`任务的fine-tune和基于Audioset Tagging预测。
```shell
$ hub install panns_cnn10
```
```shell # panns_cnn14
$ hub install panns_cnn14==1.0.0
```
|模型名称|panns_cnn14|
| :--- | :---: |
|类别|语音-声音分类|
|网络|PANNs|
|数据集|Google Audioset|
|是否支持Fine-tuning|是|
|模型大小|469MB|
|最新更新日期|2021-06-15|
|数据指标|mAP 0.431|
`panns_cnn14`是一个基于[Google Audioset](https://research.google.com/audioset/)数据集训练的声音分类/识别的模型。该模型主要包含12个卷积层和2个全连接层,模型参数为79.6M。经过预训练后,可以用于提取音频的embbedding,维度是2048。 ## 一、模型基本信息
更多详情请参考论文:[PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition](https://arxiv.org/pdf/1912.10211.pdf) ### 模型介绍
## API `panns_cnn14`是一个基于[Google Audioset](https://research.google.com/audioset/)数据集训练的声音分类/识别的模型。该模型主要包含12个卷积层和2个全连接层,模型参数为79.6M。经过预训练后,可以用于提取音频的embbedding,维度是2048。
```python
def __init__(
task,
num_class=None,
label_map=None,
load_checkpoint=None,
**kwargs,
)
```
创建Module对象。 更多详情请参考论文:[PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition](https://arxiv.org/pdf/1912.10211.pdf)
**参数** ## 二、安装
* `task`: 任务名称,可为`sound-cls`或者`None``sound-cls`代表声音分类任务,可以对声音分类的数据集进行finetune;为`None`时可以获取预训练模型对音频进行分类/Tagging。 - ### 1、环境依赖
* `num_classes`:声音分类任务的类别数,finetune时需要指定,数值与具体使用的数据集类别数一致。
* `label_map`:预测时的类别映射表。
* `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
* `**kwargs`:用户额外指定的关键字字典类型的参数。
```python - paddlepaddle >= 2.0.0
def predict(
data,
sample_rate,
batch_size=1,
feat_type='mel',
use_gpu=False
)
```
**参数** - paddlehub >= 2.0.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
* `data`: 待预测数据,格式为\[waveform1, wavwform2…,\],其中每个元素都是一个一维numpy列表,是音频的波形采样数值列表。 - ### 2、安装
* `sample_rate`:音频文件的采样率。
* `feat_type`:音频特征的种类选取,当前支持`'mel'`(详情可查看[Mel-frequency cepstrum](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum))和原始波形特征`'raw'`
* `batch_size`:模型批处理大小。
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回** - ```shell
$ hub install panns_cnn14
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
* `results`:list类型,不同任务类型的返回结果如下
* 声音分类(task参数为`sound-cls`):列表里包含每个音频文件的分类标签。
* Tagging(task参数为`None`):列表里包含每个音频文件527个类别([Audioset标签](https://research.google.com/audioset/))的得分。
## 三、模型API预测
**代码示例** - ### 1、预测代码示例
- [ESC50](https://github.com/karolpiczak/ESC-50)声音分类预测 - ```python
```python # ESC50声音分类预测
import librosa import librosa
import paddlehub as hub import paddlehub as hub
from paddlehub.datasets import ESC50 from paddlehub.datasets import ESC50
sr = 44100 # 音频文件的采样率 sr = 44100 # 音频文件的采样率
wav_file = '/data/cat.wav' # 用于预测的音频文件路径 wav_file = '/PATH/TO/AUDIO' # 用于预测的音频文件路径
checkpoint = 'model.pdparams' # 用于预测的模型参数 checkpoint = 'model.pdparams' # 用于预测的模型参数
label_map = {idx: label for idx, label in enumerate(ESC50.label_list)} label_map = {idx: label for idx, label in enumerate(ESC50.label_list)}
...@@ -86,8 +70,8 @@ def predict( ...@@ -86,8 +70,8 @@ def predict(
print('File: {}\tLable: {}'.format(wav_file, result[0])) print('File: {}\tLable: {}'.format(wav_file, result[0]))
``` ```
- Audioset Tagging - ```python
```python # Audioset Tagging
import librosa import librosa
import numpy as np import numpy as np
import paddlehub as hub import paddlehub as hub
...@@ -105,7 +89,7 @@ def predict( ...@@ -105,7 +89,7 @@ def predict(
print(msg) print(msg)
sr = 44100 # 音频文件的采样率 sr = 44100 # 音频文件的采样率
wav_file = '/data/cat.wav' # 用于预测的音频文件路径 wav_file = '/PATH/TO/AUDIO' # 用于预测的音频文件路径
label_file = './audioset_labels.txt' # audioset标签文本文件 label_file = './audioset_labels.txt' # audioset标签文本文件
topk = 10 # 展示的topk数 topk = 10 # 展示的topk数
...@@ -130,23 +114,58 @@ def predict( ...@@ -130,23 +114,58 @@ def predict(
show_topk(topk, label_map, wav_file, result[0]) show_topk(topk, label_map, wav_file, result[0])
``` ```
详情可参考PaddleHub示例: - ### 2、API
- [AudioClassification](https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0/demo/audio_classification)
- ```python
def __init__(
task,
num_class=None,
label_map=None,
load_checkpoint=None,
**kwargs,
)
```
- 创建Module对象。
## 查看代码 - **参数**
- `task`: 任务名称,可为`sound-cls`或者`None`。`sound-cls`代表声音分类任务,可以对声音分类的数据集进行finetune;为`None`时可以获取预训练模型对音频进行分类/Tagging。
https://github.com/qiuqiangkong/audioset_tagging_cnn - `num_classes`:声音分类任务的类别数,finetune时需要指定,数值与具体使用的数据集类别数一致。
- `label_map`:预测时的类别映射表。
- `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
- `**kwargs`:用户额外指定的关键字字典类型的参数。
- ```python
def predict(
data,
sample_rate,
batch_size=1,
feat_type='mel',
use_gpu=False
)
```
- 模型预测,输入为音频波形数据,输出为分类标签。
## 依赖 - **参数**
- `data`: 待预测数据,格式为\[waveform1, wavwform2…,\],其中每个元素都是一个一维numpy列表,是音频的波形采样数值列表。
- `sample_rate`:音频文件的采样率。
- `feat_type`:音频特征的种类选取,当前支持`'mel'`(详情可查看[Mel-frequency cepstrum](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum))和原始波形特征`'raw'`。
- `batch_size`:模型批处理大小。
- `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
paddlepaddle >= 2.0.0 - **返回**
- `results`:list类型,不同任务类型的返回结果如下
- 声音分类(task参数为`sound-cls`):列表里包含每个音频文件的分类标签。
- Tagging(task参数为`None`):列表里包含每个音频文件527个类别([Audioset标签](https://research.google.com/audioset/))的得分。
paddlehub >= 2.0.0 详情可参考PaddleHub示例:
- [AudioClassification](https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0/demo/audio_classification)
## 更新历史 ## 四、更新历史
* 1.0.0 * 1.0.0
初始发布,动态图版本模型,支持声音分类`sound-cls`任务的fine-tune和基于Audioset Tagging预测。 初始发布,动态图版本模型,支持声音分类`sound-cls`任务的fine-tune和基于Audioset Tagging预测。
```shell
$ hub install panns_cnn14
```
```shell # panns_cnn6
$ hub install panns_cnn6==1.0.0
```
|模型名称|panns_cnn6|
| :--- | :---: |
|类别|语音-声音分类|
|网络|PANNs|
|数据集|Google Audioset|
|是否支持Fine-tuning|是|
|模型大小|29MB|
|最新更新日期|2021-06-15|
|数据指标|mAP 0.343|
`panns_cnn6`是一个基于[Google Audioset](https://research.google.com/audioset/)数据集训练的声音分类/识别的模型。该模型主要包含4个卷积层和2个全连接层,模型参数为4.5M。经过预训练后,可以用于提取音频的embbedding,维度是512。 ## 一、模型基本信息
更多详情请参考论文:[PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition](https://arxiv.org/pdf/1912.10211.pdf) ### 模型介绍
## API `panns_cnn6`是一个基于[Google Audioset](https://research.google.com/audioset/)数据集训练的声音分类/识别的模型。该模型主要包含4个卷积层和2个全连接层,模型参数为4.5M。经过预训练后,可以用于提取音频的embbedding,维度是512。
```python
def __init__(
task,
num_class=None,
label_map=None,
load_checkpoint=None,
**kwargs,
)
```
创建Module对象。 更多详情请参考:[PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition](https://arxiv.org/pdf/1912.10211.pdf)
**参数** ## 二、安装
* `task`: 任务名称,可为`sound-cls`或者`None``sound-cls`代表声音分类任务,可以对声音分类的数据集进行finetune;为`None`时可以获取预训练模型对音频进行分类/Tagging。 - ### 1、环境依赖
* `num_classes`:声音分类任务的类别数,finetune时需要指定,数值与具体使用的数据集类别数一致。
* `label_map`:预测时的类别映射表。
* `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
* `**kwargs`:用户额外指定的关键字字典类型的参数。
```python - paddlepaddle >= 2.0.0
def predict(
data,
sample_rate,
batch_size=1,
feat_type='mel',
use_gpu=False
)
```
**参数** - paddlehub >= 2.0.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
* `data`: 待预测数据,格式为\[waveform1, wavwform2…,\],其中每个元素都是一个一维numpy列表,是音频的波形采样数值列表。 - ### 2、安装
* `sample_rate`:音频文件的采样率。
* `feat_type`:音频特征的种类选取,当前支持`'mel'`(详情可查看[Mel-frequency cepstrum](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum))和原始波形特征`'raw'`
* `batch_size`:模型批处理大小。
* `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
**返回** - ```shell
$ hub install panns_cnn6
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
* `results`:list类型,不同任务类型的返回结果如下
* 声音分类(task参数为`sound-cls`):列表里包含每个音频文件的分类标签。
* Tagging(task参数为`None`):列表里包含每个音频文件527个类别([Audioset标签](https://research.google.com/audioset/))的得分。
## 三、模型API预测
**代码示例** - ### 1、预测代码示例
- [ESC50](https://github.com/karolpiczak/ESC-50)声音分类预测 - ```python
```python # ESC50声音分类预测
import librosa import librosa
import paddlehub as hub import paddlehub as hub
from paddlehub.datasets import ESC50 from paddlehub.datasets import ESC50
sr = 44100 # 音频文件的采样率 sr = 44100 # 音频文件的采样率
wav_file = '/data/cat.wav' # 用于预测的音频文件路径 wav_file = '/PATH/TO/AUDIO' # 用于预测的音频文件路径
checkpoint = 'model.pdparams' # 用于预测的模型参数 checkpoint = 'model.pdparams' # 用于预测的模型参数
label_map = {idx: label for idx, label in enumerate(ESC50.label_list)} label_map = {idx: label for idx, label in enumerate(ESC50.label_list)}
...@@ -86,8 +70,8 @@ def predict( ...@@ -86,8 +70,8 @@ def predict(
print('File: {}\tLable: {}'.format(wav_file, result[0])) print('File: {}\tLable: {}'.format(wav_file, result[0]))
``` ```
- Audioset Tagging - ```python
```python # Audioset Tagging
import librosa import librosa
import numpy as np import numpy as np
import paddlehub as hub import paddlehub as hub
...@@ -105,7 +89,7 @@ def predict( ...@@ -105,7 +89,7 @@ def predict(
print(msg) print(msg)
sr = 44100 # 音频文件的采样率 sr = 44100 # 音频文件的采样率
wav_file = '/data/cat.wav' # 用于预测的音频文件路径 wav_file = '/PATH/TO/AUDIO' # 用于预测的音频文件路径
label_file = './audioset_labels.txt' # audioset标签文本文件 label_file = './audioset_labels.txt' # audioset标签文本文件
topk = 10 # 展示的topk数 topk = 10 # 展示的topk数
...@@ -130,23 +114,58 @@ def predict( ...@@ -130,23 +114,58 @@ def predict(
show_topk(topk, label_map, wav_file, result[0]) show_topk(topk, label_map, wav_file, result[0])
``` ```
详情可参考PaddleHub示例: - ### 2、API
- [AudioClassification](https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0/demo/audio_classification)
## 查看代码 - ```python
def __init__(
task,
num_class=None,
label_map=None,
load_checkpoint=None,
**kwargs,
)
```
- 创建Module对象。
https://github.com/qiuqiangkong/audioset_tagging_cnn - **参数**
- `task`: 任务名称,可为`sound-cls`或者`None`。`sound-cls`代表声音分类任务,可以对声音分类的数据集进行finetune;为`None`时可以获取预训练模型对音频进行分类/Tagging。
- `num_classes`:声音分类任务的类别数,finetune时需要指定,数值与具体使用的数据集类别数一致。
- `label_map`:预测时的类别映射表。
- `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
- `**kwargs`:用户额外指定的关键字字典类型的参数。
- ```python
def predict(
data,
sample_rate,
batch_size=1,
feat_type='mel',
use_gpu=False
)
```
- 模型预测,输入为音频波形数据,输出为分类标签。
## 依赖 - **参数**
- `data`: 待预测数据,格式为\[waveform1, wavwform2…,\],其中每个元素都是一个一维numpy列表,是音频的波形采样数值列表。
- `sample_rate`:音频文件的采样率。
- `feat_type`:音频特征的种类选取,当前支持`'mel'`(详情可查看[Mel-frequency cepstrum](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum))和原始波形特征`'raw'`。
- `batch_size`:模型批处理大小。
- `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
paddlepaddle >= 2.0.0 - **返回**
- `results`:list类型,不同任务类型的返回结果如下
- 声音分类(task参数为`sound-cls`):列表里包含每个音频文件的分类标签。
- Tagging(task参数为`None`):列表里包含每个音频文件527个类别([Audioset标签](https://research.google.com/audioset/))的得分。
paddlehub >= 2.0.0 详情可参考PaddleHub示例:
- [AudioClassification](https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.0/demo/audio_classification)
## 更新历史 ## 四、更新历史
* 1.0.0 * 1.0.0
初始发布,动态图版本模型,支持声音分类`sound-cls`任务的fine-tune和基于Audioset Tagging预测。 初始发布,动态图版本模型,支持声音分类`sound-cls`任务的fine-tune和基于Audioset Tagging预测。
```shell
$ hub install panns_cnn6
```
## 概述 # deepvoice3_ljspeech
|模型名称|deepvoice3_ljspeech|
| :--- | :---: |
|类别|语音-语音合成|
|网络|DeepVoice3|
|数据集|LJSpeech-1.1|
|是否支持Fine-tuning|否|
|模型大小|58MB|
|最新更新日期|2020-10-27|
|数据指标|-|
## 一、模型基本信息
### 模型介绍
Deep Voice 3是百度研究院2017年发布的端到端的TTS模型(论文录用于ICLR 2018)。它是一个基于卷积神经网络和注意力机制的seq2seq模型,由于不包含循环神经网络,它可以并行训练,远快于基于循环神经网络的模型。Deep Voice 3可以学习到多个说话人的特征,也支持搭配多种声码器使用。deepvoice3_ljspeech是基于ljspeech英文语音数据集预训练得到的英文TTS模型,仅支持预测。 Deep Voice 3是百度研究院2017年发布的端到端的TTS模型(论文录用于ICLR 2018)。它是一个基于卷积神经网络和注意力机制的seq2seq模型,由于不包含循环神经网络,它可以并行训练,远快于基于循环神经网络的模型。Deep Voice 3可以学习到多个说话人的特征,也支持搭配多种声码器使用。deepvoice3_ljspeech是基于ljspeech英文语音数据集预训练得到的英文TTS模型,仅支持预测。
<p align="center"> <p align="center">
<img src="https://github.com/PaddlePaddle/Parakeet/blob/develop/examples/deepvoice3/images/model_architecture.png" hspace='10'/> <br /> <img src="https://raw.githubusercontent.com/PaddlePaddle/Parakeet/release/v0.1/examples/deepvoice3/images/model_architecture.png" hspace='10'/> <br/>
</p> </p>
更多详情参考论文[Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning](https://arxiv.org/abs/1710.07654) 更多详情参考论文[Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning](https://arxiv.org/abs/1710.07654)
## 命令行预测
```shell ## 二、安装
$ hub run deepvoice3_ljspeech --input_text='Simple as this proposition is, it is necessary to be stated' --use_gpu True --vocoder griffin-lim
```
## API - ### 1、系统依赖
```python 对于Ubuntu用户,请执行:
def synthesize(texts, use_gpu=False, vocoder="griffin-lim"): ```
``` sudo apt-get install libsndfile1
```
对于Centos用户,请执行:
```
sudo yum install libsndfile
```
预测API,由输入文本合成对应音频波形。 - ### 2、环境依赖
**参数** - 2.0.0 > paddlepaddle >= 1.8.2
* texts (list\[str\]): 待预测文本; - 2.0.0 > paddlehub >= 1.7.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
* use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA\_VISIBLE\_DEVICES环境变量**
* vocoder: 指定声码器,可选 "griffin-lim"或"waveflow"
**返回** - ### 3、安装
* wavs (list): 语音合成结果列表,列表中每一个元素为对应输入文本的音频波形,可使用`soundfile.write`进一步处理或保存。 - ```shell
* sample\_rate (int): 合成音频的采样率。 $ hub install deepvoice3_ljspeech
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
**代码示例**
```python ## 三、模型API预测
import paddlehub as hub
import soundfile as sf
# Load deepvoice3_ljspeech module. - ### 1、命令行预测
module = hub.Module(name="deepvoice3_ljspeech")
# Predict sentiment label - ```shell
test_texts = ['Simple as this proposition is, it is necessary to be stated', $ hub run deepvoice3_ljspeech --input_text='Simple as this proposition is, it is necessary to be stated' --use_gpu True --vocoder griffin-lim
'Parakeet stands for Paddle PARAllel text-to-speech toolkit'] ```
wavs, sample_rate = module.synthesize(texts=test_texts) - 通过命令行方式实现语音合成模型的调用,更多请见[PaddleHub命令行指令](https://github.com/shinichiye/PaddleHub/blob/release/v2.1/docs/docs_ch/tutorial/cmd_usage.rst)
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
```
## 服务部署 - ### 2、预测代码示例
PaddleHub Serving 可以部署在线服务。 - ```python
import paddlehub as hub
import soundfile as sf
### 第一步:启动PaddleHub Serving # Load deepvoice3_ljspeech module.
module = hub.Module(name="deepvoice3_ljspeech")
运行启动命令: # Predict sentiment label
```shell test_texts = ['Simple as this proposition is, it is necessary to be stated',
$ hub serving start -m deepvoice3_ljspeech 'Parakeet stands for Paddle PARAllel text-to-speech toolkit']
``` wavs, sample_rate = module.synthesize(texts=test_texts)
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
```
这样就完成了一个服务化API的部署,默认端口号为8866。 - ### 3、API
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 - ```python
def synthesize(texts, use_gpu=False, vocoder="griffin-lim"):
```
### 第二步:发送预测请求 - 预测API,由输入文本合成对应音频波形。
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 - **参数**
- texts (list\[str\]): 待预测文本;
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA\_VISIBLE\_DEVICES环境变量**;
- vocoder: 指定声码器,可选 "griffin-lim"或"waveflow"
```python - **返回**
import requests - wavs (list): 语音合成结果列表,列表中每一个元素为对应输入文本的音频波形,可使用`soundfile.write`进一步处理或保存。
import json - sample\_rate (int): 合成音频的采样率。
import soundfile as sf
# 发送HTTP请求 ## 四、服务部署
data = {'texts':['Simple as this proposition is, it is necessary to be stated', - PaddleHub Serving可以部署一个在线语音合成服务,可以将此接口用于在线web应用。
'Parakeet stands for Paddle PARAllel text-to-speech toolkit'],
'use_gpu':False} - ### 第一步:启动PaddleHub Serving
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/deepvoice3_ljspeech" - 运行启动命令
r = requests.post(url=url, headers=headers, data=json.dumps(data)) - ```shell
$ hub serving start -m deepvoice3_ljspeech
# 保存结果 ```
result = r.json()["results"] - 这样就完成了服务化API的部署,默认端口号为8866。
wavs = result["wavs"] - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
sample_rate = result["sample_rate"]
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
```
## 查看代码 - ### 第二步:发送预测请求
https://github.com/PaddlePaddle/Parakeet - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
### 依赖 - ```python
import requests
import json
paddlepaddle >= 1.8.2 import soundfile as sf
paddlehub >= 1.7.0 # 发送HTTP请求
**NOTE:** 除了python依赖外还必须安装libsndfile库 data = {'texts':['Simple as this proposition is, it is necessary to be stated',
'Parakeet stands for Paddle PARAllel text-to-speech toolkit'],
'use_gpu':False}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/deepvoice3_ljspeech"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 保存结果
result = r.json()["results"]
wavs = result["wavs"]
sample_rate = result["sample_rate"]
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
```
对于Ubuntu用户,请执行:
```
sudo apt-get install libsndfile1
```
对于Centos用户,请执行:
```
sudo yum install libsndfile
```
## 更新历史 ## 五、更新历史
* 1.0.0 * 1.0.0
初始发布 初始发布
```shell
$ hub install deepvoice3_ljspeech
```
# fastspeech2_baker
|模型名称|fastspeech2_baker|
| :--- | :---: |
|类别|语音-语音合成|
|网络|FastSpeech2|
|数据集|Chinese Standard Mandarin Speech Copus|
|是否支持Fine-tuning|否|
|模型大小|621MB|
|最新更新日期|2021-10-20|
|数据指标|-|
## 一、模型基本信息
### 模型介绍
FastSpeech2是微软亚洲研究院和微软Azure语音团队联合浙江大学于2020年提出的语音合成(Text to Speech, TTS)模型。FastSpeech2是FastSpeech的改进版,解决了FastSpeech依赖Teacher-Student的知识蒸馏框架,训练流程比较复杂和训练目标相比真实语音存在信息损失的问题。
FastSpeech2的模型架构如下图所示,它沿用FastSpeech中提出的Feed-Forward Transformer(FFT)架构,但在音素编码器和梅尔频谱解码器中加入了一个可变信息适配器(Variance Adaptor),从而支持在FastSpeech2中引入更多语音中变化的信息,例如时长、音高、音量(频谱能量)等,来解决语音合成中的一对多映射问题。
<p align="center">
<img src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/images/fastspeech2.png" hspace='10'/> <br />
</p>
Parallel WaveGAN是一种使用了无蒸馏的对抗生成网络,快速且占用空间小的波形生成方法。该方法通过联合优化多分辨率谱图和对抗损失函数来训练非自回归WaveNet,可以有效捕获真实语音波形的时频分布。Parallel WaveGAN的结构如下图所示:
<p align="center">
<img src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/images/pwg.png" hspace='10'/> <br />
</p>
fastspeech2_baker使用了FastSpeech2作为声学模型,使用Parallel WaveGAN作为声码器,并在[中文标准女声音库(Chinese Standard Mandarin Speech Copus)](https://www.data-baker.com/open_source.html)数据集上进行了预训练,可直接用于预测合成音频。
更多详情请参考:
- [FastSpeech 2: Fast and High-Quality End-to-End Text-to-Speech](https://arxiv.org/abs/2006.04558)
- [FastSpeech语音合成系统技术升级,微软联合浙大提出FastSpeech2](https://www.msra.cn/zh-cn/news/features/fastspeech2)
- [Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram](https://arxiv.org/abs/1910.11480)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.1.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install fastspeech2_baker
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
```python
import paddlehub as hub
# 需要合成语音的文本
sentences = ['这是一段测试语音合成的音频。']
model = hub.Module(
name='fastspeech2_baker',
version='1.0.0')
wav_files = model.generate(sentences)
# 打印合成的音频文件的路径
print(wav_files)
```
详情可参考PaddleHub示例:
- [语音合成](../../../../demo/text_to_speech)
- ### 2、API
- ```python
def __init__(output_dir)
```
- 创建Module对象(动态图组网版本)
- **参数**
- `output_dir`: 合成音频文件的输出目录。
- ```python
def generate(
sentences,
device='cpu',
)
```
- 将输入的文本合成为音频文件并保存到输出目录。
- **参数**
- `sentences`:合成音频的文本列表,类型为`List[str]`。
- `device`:预测时使用的设备,默认为`cpu`,如需使用gpu预测,请设置为`gpu`。
- **返回**
- `wav_files`:`List[str]`类型,返回合成音频的存放路径。
## 四、服务部署
- PaddleHub Serving可以部署一个在线的语音识别服务。
- ### 第一步:启动PaddleHub Serving
- ```shell
$ hub serving start -m fastspeech2_baker
```
- 这样就完成了一个语音识别服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
# 需要合成语音的文本
sentences = [
'这是第一段测试语音合成的音频。',
'这是第二段测试语音合成的音频。',
]
# 以key的方式指定text传入预测方法的时的参数,此例中为"sentences"
data = {"sentences": sentences}
# 发送post请求,content-type类型应指定json方式,url中的ip地址需改为对应机器的ip
url = "http://127.0.0.1:8866/predict/fastspeech2_baker"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 五、更新历史
* 1.0.0
初始发布
```shell
$ hub install fastspeech2_baker
```
###########################################################
# FEATURE EXTRACTION SETTING #
###########################################################
fs: 24000 # sr
n_fft: 2048 # FFT size.
n_shift: 300 # Hop size.
win_length: 1200 # Window length.
# If set to null, it will be the same as fft_size.
window: "hann" # Window function.
# Only used for feats_type != raw
fmin: 80 # Minimum frequency of Mel basis.
fmax: 7600 # Maximum frequency of Mel basis.
n_mels: 80 # The number of mel basis.
# Only used for the model using pitch features (e.g. FastSpeech2)
f0min: 80 # Maximum f0 for pitch extraction.
f0max: 400 # Minimum f0 for pitch extraction.
###########################################################
# DATA SETTING #
###########################################################
batch_size: 64
num_workers: 4
###########################################################
# MODEL SETTING #
###########################################################
model:
adim: 384 # attention dimension
aheads: 2 # number of attention heads
elayers: 4 # number of encoder layers
eunits: 1536 # number of encoder ff units
dlayers: 4 # number of decoder layers
dunits: 1536 # number of decoder ff units
positionwise_layer_type: conv1d # type of position-wise layer
positionwise_conv_kernel_size: 3 # kernel size of position wise conv layer
duration_predictor_layers: 2 # number of layers of duration predictor
duration_predictor_chans: 256 # number of channels of duration predictor
duration_predictor_kernel_size: 3 # filter size of duration predictor
postnet_layers: 5 # number of layers of postnset
postnet_filts: 5 # filter size of conv layers in postnet
postnet_chans: 256 # number of channels of conv layers in postnet
use_masking: True # whether to apply masking for padded part in loss calculation
use_scaled_pos_enc: True # whether to use scaled positional encoding
encoder_normalize_before: True # whether to perform layer normalization before the input
decoder_normalize_before: True # whether to perform layer normalization before the input
reduction_factor: 1 # reduction factor
init_type: xavier_uniform # initialization type
init_enc_alpha: 1.0 # initial value of alpha of encoder scaled position encoding
init_dec_alpha: 1.0 # initial value of alpha of decoder scaled position encoding
transformer_enc_dropout_rate: 0.2 # dropout rate for transformer encoder layer
transformer_enc_positional_dropout_rate: 0.2 # dropout rate for transformer encoder positional encoding
transformer_enc_attn_dropout_rate: 0.2 # dropout rate for transformer encoder attention layer
transformer_dec_dropout_rate: 0.2 # dropout rate for transformer decoder layer
transformer_dec_positional_dropout_rate: 0.2 # dropout rate for transformer decoder positional encoding
transformer_dec_attn_dropout_rate: 0.2 # dropout rate for transformer decoder attention layer
pitch_predictor_layers: 5 # number of conv layers in pitch predictor
pitch_predictor_chans: 256 # number of channels of conv layers in pitch predictor
pitch_predictor_kernel_size: 5 # kernel size of conv leyers in pitch predictor
pitch_predictor_dropout: 0.5 # dropout rate in pitch predictor
pitch_embed_kernel_size: 1 # kernel size of conv embedding layer for pitch
pitch_embed_dropout: 0.0 # dropout rate after conv embedding layer for pitch
stop_gradient_from_pitch_predictor: true # whether to stop the gradient from pitch predictor to encoder
energy_predictor_layers: 2 # number of conv layers in energy predictor
energy_predictor_chans: 256 # number of channels of conv layers in energy predictor
energy_predictor_kernel_size: 3 # kernel size of conv leyers in energy predictor
energy_predictor_dropout: 0.5 # dropout rate in energy predictor
energy_embed_kernel_size: 1 # kernel size of conv embedding layer for energy
energy_embed_dropout: 0.0 # dropout rate after conv embedding layer for energy
stop_gradient_from_energy_predictor: false # whether to stop the gradient from energy predictor to encoder
###########################################################
# UPDATER SETTING #
###########################################################
updater:
use_masking: True # whether to apply masking for padded part in loss calculation
###########################################################
# OPTIMIZER SETTING #
###########################################################
optimizer:
optim: adam # optimizer type
learning_rate: 0.001 # learning rate
###########################################################
# TRAINING SETTING #
###########################################################
max_epoch: 1000
num_snapshots: 5
###########################################################
# OTHER SETTING #
###########################################################
seed: 10086
<pad> 0
<unk> 1
a1 2
a2 3
a3 4
a4 5
a5 6
ai1 7
ai2 8
ai3 9
ai4 10
ai5 11
air2 12
air4 13
an1 14
an2 15
an3 16
an4 17
an5 18
ang1 19
ang2 20
ang3 21
ang4 22
ang5 23
anr1 24
anr3 25
anr4 26
ao1 27
ao2 28
ao3 29
ao4 30
ao5 31
aor3 32
aor4 33
ar2 34
ar3 35
ar4 36
b 37
c 38
ch 39
d 40
e1 41
e2 42
e3 43
e4 44
e5 45
ei1 46
ei2 47
ei3 48
ei4 49
ei5 50
en1 51
en2 52
en3 53
en4 54
en5 55
eng1 56
eng2 57
eng3 58
eng4 59
eng5 60
enr1 61
enr2 62
enr4 63
enr5 64
er2 65
er3 66
er4 67
er5 68
f 69
g 70
h 71
i1 72
i2 73
i3 74
i4 75
i5 76
ia1 77
ia2 78
ia3 79
ia4 80
ia5 81
ian1 82
ian2 83
ian3 84
ian4 85
ian5 86
iang1 87
iang2 88
iang3 89
iang4 90
iang5 91
iangr4 92
ianr1 93
ianr2 94
ianr3 95
iao1 96
iao2 97
iao3 98
iao4 99
iao5 100
iar1 101
iar3 102
ie1 103
ie2 104
ie3 105
ie4 106
ie5 107
ii1 108
ii2 109
ii3 110
ii4 111
ii5 112
iii1 113
iii2 114
iii3 115
iii4 116
iii5 117
iiir4 118
iir2 119
in1 120
in2 121
in3 122
in4 123
in5 124
ing1 125
ing2 126
ing3 127
ing4 128
ing5 129
ingr2 130
ingr3 131
inr4 132
io1 133
io5 134
iong1 135
iong2 136
iong3 137
iong4 138
iong5 139
iou1 140
iou2 141
iou3 142
iou4 143
iou5 144
iour1 145
ir1 146
ir2 147
ir3 148
ir4 149
ir5 150
j 151
k 152
l 153
m 154
n 155
o1 156
o2 157
o3 158
o4 159
o5 160
ong1 161
ong2 162
ong3 163
ong4 164
ong5 165
ongr4 166
ou1 167
ou2 168
ou3 169
ou4 170
ou5 171
our2 172
p 173
q 174
r 175
s 176
sh 177
sil 178
sp 179
spl 180
spn 181
t 182
u1 183
u2 184
u3 185
u4 186
u5 187
ua1 188
ua2 189
ua3 190
ua4 191
ua5 192
uai1 193
uai2 194
uai3 195
uai4 196
uai5 197
uair4 198
uan1 199
uan2 200
uan3 201
uan4 202
uan5 203
uang1 204
uang2 205
uang3 206
uang4 207
uang5 208
uanr1 209
uanr2 210
uei1 211
uei2 212
uei3 213
uei4 214
uei5 215
ueir1 216
ueir3 217
ueir4 218
uen1 219
uen2 220
uen3 221
uen4 222
uen5 223
ueng1 224
ueng2 225
ueng3 226
ueng4 227
uenr3 228
uenr4 229
uo1 230
uo2 231
uo3 232
uo4 233
uo5 234
uor2 235
uor3 236
ur3 237
ur4 238
v1 239
v2 240
v3 241
v4 242
v5 243
van1 244
van2 245
van3 246
van4 247
van5 248
vanr4 249
ve1 250
ve2 251
ve3 252
ve4 253
ve5 254
vn1 255
vn2 256
vn3 257
vn4 258
vn5 259
x 260
z 261
zh 262
, 263
。 264
? 265
! 266
<eos> 267
# This is the hyperparameter configuration file for Parallel WaveGAN.
# Please make sure this is adjusted for the CSMSC dataset. If you want to
# apply to the other dataset, you might need to carefully change some parameters.
# This configuration requires 12 GB GPU memory and takes ~3 days on RTX TITAN.
###########################################################
# FEATURE EXTRACTION SETTING #
###########################################################
fs: 24000 # Sampling rate.
n_fft: 2048 # FFT size. (in samples)
n_shift: 300 # Hop size. (in samples)
win_length: 1200 # Window length. (in samples)
# If set to null, it will be the same as fft_size.
window: "hann" # Window function.
n_mels: 80 # Number of mel basis.
fmin: 80 # Minimum freq in mel basis calculation.
fmax: 7600 # Maximum frequency in mel basis calculation.
# global_gain_scale: 1.0 # Will be multiplied to all of waveform.
trim_silence: false # Whether to trim the start and end of silence.
top_db: 60 # Need to tune carefully if the recording is not good.
trim_frame_length: 2048 # Frame size in trimming.(in samples)
trim_hop_length: 512 # Hop size in trimming.(in samples)
###########################################################
# GENERATOR NETWORK ARCHITECTURE SETTING #
###########################################################
generator_params:
in_channels: 1 # Number of input channels.
out_channels: 1 # Number of output channels.
kernel_size: 3 # Kernel size of dilated convolution.
layers: 30 # Number of residual block layers.
stacks: 3 # Number of stacks i.e., dilation cycles.
residual_channels: 64 # Number of channels in residual conv.
gate_channels: 128 # Number of channels in gated conv.
skip_channels: 64 # Number of channels in skip conv.
aux_channels: 80 # Number of channels for auxiliary feature conv.
# Must be the same as num_mels.
aux_context_window: 2 # Context window size for auxiliary feature.
# If set to 2, previous 2 and future 2 frames will be considered.
dropout: 0.0 # Dropout rate. 0.0 means no dropout applied.
bias: true # use bias in residual blocks
use_weight_norm: true # Whether to use weight norm.
# If set to true, it will be applied to all of the conv layers.
use_causal_conv: false # use causal conv in residual blocks and upsample layers
# upsample_net: "ConvInUpsampleNetwork" # Upsampling network architecture.
upsample_scales: [4, 5, 3, 5] # Upsampling scales. Prodcut of these must be the same as hop size.
interpolate_mode: "nearest" # upsample net interpolate mode
freq_axis_kernel_size: 1 # upsamling net: convolution kernel size in frequencey axis
nonlinear_activation: null
nonlinear_activation_params: {}
###########################################################
# DISCRIMINATOR NETWORK ARCHITECTURE SETTING #
###########################################################
discriminator_params:
in_channels: 1 # Number of input channels.
out_channels: 1 # Number of output channels.
kernel_size: 3 # Number of output channels.
layers: 10 # Number of conv layers.
conv_channels: 64 # Number of chnn layers.
bias: true # Whether to use bias parameter in conv.
use_weight_norm: true # Whether to use weight norm.
# If set to true, it will be applied to all of the conv layers.
nonlinear_activation: "LeakyReLU" # Nonlinear function after each conv.
nonlinear_activation_params: # Nonlinear function parameters
negative_slope: 0.2 # Alpha in LeakyReLU.
###########################################################
# STFT LOSS SETTING #
###########################################################
stft_loss_params:
fft_sizes: [1024, 2048, 512] # List of FFT size for STFT-based loss.
hop_sizes: [120, 240, 50] # List of hop size for STFT-based loss
win_lengths: [600, 1200, 240] # List of window length for STFT-based loss.
window: "hann" # Window function for STFT-based loss
###########################################################
# ADVERSARIAL LOSS SETTING #
###########################################################
lambda_adv: 4.0 # Loss balancing coefficient.
###########################################################
# DATA LOADER SETTING #
###########################################################
batch_size: 6 # Batch size.
batch_max_steps: 25500 # Length of each audio in batch. Make sure dividable by hop_size.
pin_memory: true # Whether to pin memory in Pytorch DataLoader.
num_workers: 4 # Number of workers in Pytorch DataLoader.
remove_short_samples: true # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true # Whether to allow cache in dataset. If true, it requires cpu memory.
###########################################################
# OPTIMIZER & SCHEDULER SETTING #
###########################################################
generator_optimizer_params:
epsilon: 1.0e-6 # Generator's epsilon.
weight_decay: 0.0 # Generator's weight decay coefficient.
generator_scheduler_params:
learning_rate: 0.0001 # Generator's learning rate.
step_size: 200000 # Generator's scheduler step size.
gamma: 0.5 # Generator's scheduler gamma.
# At each step size, lr will be multiplied by this parameter.
generator_grad_norm: 10 # Generator's gradient norm.
discriminator_optimizer_params:
epsilon: 1.0e-6 # Discriminator's epsilon.
weight_decay: 0.0 # Discriminator's weight decay coefficient.
discriminator_scheduler_params:
learning_rate: 0.00005 # Discriminator's learning rate.
step_size: 200000 # Discriminator's scheduler step size.
gamma: 0.5 # Discriminator's scheduler gamma.
# At each step size, lr will be multiplied by this parameter.
discriminator_grad_norm: 1 # Discriminator's gradient norm.
###########################################################
# INTERVAL SETTING #
###########################################################
discriminator_train_start_steps: 100000 # Number of steps to start to train discriminator.
train_max_steps: 400000 # Number of training steps.
save_interval_steps: 5000 # Interval steps to save checkpoint.
eval_interval_steps: 1000 # Interval steps to evaluate the network.
###########################################################
# OTHER SETTING #
###########################################################
num_save_intermediate_results: 4 # Number of results to be saved as intermediate results.
num_snapshots: 10 # max number of snapshots to keep while training
seed: 42 # random seed for paddle, random, and np.random
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from pathlib import Path
from typing import List
import numpy as np
import paddle
from paddlehub.env import MODULE_HOME
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from parakeet.frontend.zh_frontend import Frontend
from parakeet.models.fastspeech2 import FastSpeech2
from parakeet.models.fastspeech2 import FastSpeech2Inference
from parakeet.models.parallel_wavegan import PWGGenerator
from parakeet.models.parallel_wavegan import PWGInference
from parakeet.modules.normalizer import ZScore
import soundfile as sf
from yacs.config import CfgNode
import yaml
@moduleinfo(name="fastspeech2_baker", version="1.0.0", summary="", author="Baidu", author_email="", type="audio/tts")
class FastSpeech(paddle.nn.Layer):
def __init__(self, output_dir='./wavs'):
super(FastSpeech, self).__init__()
fastspeech2_res_dir = os.path.join(MODULE_HOME, 'fastspeech2_baker', 'assets/fastspeech2_nosil_baker_ckpt_0.4')
pwg_res_dir = os.path.join(MODULE_HOME, 'fastspeech2_baker', 'assets/pwg_baker_ckpt_0.4')
phones_dict = os.path.join(fastspeech2_res_dir, 'phone_id_map.txt')
with open(phones_dict, "r") as f:
phn_id = [line.strip().split() for line in f.readlines()]
vocab_size = len(phn_id)
# fastspeech2
fastspeech2_config = os.path.join(fastspeech2_res_dir, 'default.yaml')
with open(fastspeech2_config) as f:
fastspeech2_config = CfgNode(yaml.safe_load(f))
self.samplerate = fastspeech2_config.fs
fastspeech2_checkpoint = os.path.join(fastspeech2_res_dir, 'snapshot_iter_76000.pdz')
model = FastSpeech2(idim=vocab_size, odim=fastspeech2_config.n_mels, **fastspeech2_config["model"])
model.set_state_dict(paddle.load(fastspeech2_checkpoint)["main_params"])
logger.info('Load fastspeech2 params from %s' % os.path.abspath(fastspeech2_checkpoint))
model.eval()
# vocoder
pwg_config = os.path.join(pwg_res_dir, 'pwg_default.yaml')
with open(pwg_config) as f:
pwg_config = CfgNode(yaml.safe_load(f))
pwg_checkpoint = os.path.join(pwg_res_dir, 'pwg_snapshot_iter_400000.pdz')
vocoder = PWGGenerator(**pwg_config["generator_params"])
vocoder.set_state_dict(paddle.load(pwg_checkpoint)["generator_params"])
logger.info('Load vocoder params from %s' % os.path.abspath(pwg_checkpoint))
vocoder.remove_weight_norm()
vocoder.eval()
# frontend
self.frontend = Frontend(phone_vocab_path=phones_dict)
# stat
fastspeech2_stat = os.path.join(fastspeech2_res_dir, 'speech_stats.npy')
stat = np.load(fastspeech2_stat)
mu, std = stat
mu = paddle.to_tensor(mu)
std = paddle.to_tensor(std)
fastspeech2_normalizer = ZScore(mu, std)
pwg_stat = os.path.join(pwg_res_dir, 'pwg_stats.npy')
stat = np.load(pwg_stat)
mu, std = stat
mu = paddle.to_tensor(mu)
std = paddle.to_tensor(std)
pwg_normalizer = ZScore(mu, std)
# inference
self.fastspeech2_inference = FastSpeech2Inference(fastspeech2_normalizer, model)
self.pwg_inference = PWGInference(pwg_normalizer, vocoder)
self.output_dir = Path(output_dir)
self.output_dir.mkdir(parents=True, exist_ok=True)
def forward(self, text: str):
wav = None
input_ids = self.frontend.get_input_ids(text, merge_sentences=True)
phone_ids = input_ids["phone_ids"]
for part_phone_ids in phone_ids:
with paddle.no_grad():
mel = self.fastspeech2_inference(part_phone_ids)
temp_wav = self.pwg_inference(mel)
if wav is None:
wav = temp_wav
else:
wav = paddle.concat([wav, temp_wav])
return wav
@serving
def generate(self, sentences: List[str], device='cpu'):
assert isinstance(sentences, list) and isinstance(sentences[0], str), \
'Input data should be List[str], but got {}'.format(type(sentences))
paddle.set_device(device)
wav_files = []
for i, sentence in enumerate(sentences):
wav = self(sentence)
wav_file = str(self.output_dir.absolute() / (str(i + 1) + ".wav"))
sf.write(wav_file, wav.numpy(), samplerate=self.samplerate)
wav_files.append(wav_file)
logger.info('{} wave files have been generated in {}'.format(len(sentences), self.output_dir.absolute()))
return wav_files
git+https://github.com/PaddlePaddle/Parakeet@8040cb0#egg=paddle-parakeet
# fastspeech2_ljspeech
|模型名称|fastspeech2_ljspeech|
| :--- | :---: |
|类别|语音-语音合成|
|网络|FastSpeech2|
|数据集|LJSpeech-1.1|
|是否支持Fine-tuning|否|
|模型大小|425MB|
|最新更新日期|2021-10-20|
|数据指标|-|
## 一、模型基本信息
### 模型介绍
FastSpeech2是微软亚洲研究院和微软Azure语音团队联合浙江大学于2020年提出的语音合成(Text to Speech, TTS)模型。FastSpeech2是FastSpeech的改进版,解决了FastSpeech依赖Teacher-Student的知识蒸馏框架,训练流程比较复杂和训练目标相比真实语音存在信息损失的问题。
FastSpeech2的模型架构如下图所示,它沿用FastSpeech中提出的Feed-Forward Transformer(FFT)架构,但在音素编码器和梅尔频谱解码器中加入了一个可变信息适配器(Variance Adaptor),从而支持在FastSpeech2中引入更多语音中变化的信息,例如时长、音高、音量(频谱能量)等,来解决语音合成中的一对多映射问题。
<p align="center">
<img src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/images/fastspeech2.png" hspace='10'/> <br />
</p>
Parallel WaveGAN是一种使用了无蒸馏的对抗生成网络,快速且占用空间小的波形生成方法。该方法通过联合优化多分辨率谱图和对抗损失函数来训练非自回归WaveNet,可以有效捕获真实语音波形的时频分布。Parallel WaveGAN的结构如下图所示:
<p align="center">
<img src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/images/pwg.png" hspace='10'/> <br />
</p>
fastspeech2_ljspeech使用了FastSpeech2作为声学模型,使用Parallel WaveGAN作为声码器,并在[The LJ Speech Dataset](https://keithito.com/LJ-Speech-Dataset/)数据集上进行了预训练,可直接用于预测合成音频。
更多详情请参考:
- [FastSpeech 2: Fast and High-Quality End-to-End Text-to-Speech](https://arxiv.org/abs/2006.04558)
- [FastSpeech语音合成系统技术升级,微软联合浙大提出FastSpeech2](https://www.msra.cn/zh-cn/news/features/fastspeech2)
- [Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram](https://arxiv.org/abs/1910.11480)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.1.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install fastspeech2_ljspeech
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
```python
import paddlehub as hub
# 需要合成语音的文本
sentences = ['The quick brown fox jumps over a lazy dog.']
model = hub.Module(
name='fastspeech2_ljspeech',
version='1.0.0')
wav_files = model.generate(sentences)
# 打印合成的音频文件的路径
print(wav_files)
```
详情可参考PaddleHub示例:
- [语音合成](../../../../demo/text_to_speech)
- ### 2、API
- ```python
def __init__(output_dir)
```
- 创建Module对象(动态图组网版本)
- **参数**
- `output_dir`: 合成音频文件的输出目录。
- ```python
def generate(
sentences,
device='cpu',
)
```
- 将输入的文本合成为音频文件并保存到输出目录。
- **参数**
- `sentences`:合成音频的文本列表,类型为`List[str]`。
- `device`:预测时使用的设备,默认为`cpu`,如需使用gpu预测,请设置为`gpu`。
- **返回**
- `wav_files`:`List[str]`类型,返回合成音频的存放路径。
## 四、服务部署
- PaddleHub Serving可以部署一个在线的语音识别服务。
- ### 第一步:启动PaddleHub Serving
- ```shell
$ hub serving start -m fastspeech2_ljspeech
```
- 这样就完成了一个语音识别服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
# 需要合成语音的文本
sentences = [
'The quick brown fox jumps over a lazy dog.',
'Today is a good day!',
]
# 以key的方式指定text传入预测方法的时的参数,此例中为"sentences"
data = {"sentences": sentences}
# 发送post请求,content-type类型应指定json方式,url中的ip地址需改为对应机器的ip
url = "http://127.0.0.1:8866/predict/fastspeech2_ljspeech"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 五、更新历史
* 1.0.0
初始发布
```shell
$ hub install fastspeech2_ljspeech
```
此差异已折叠。
git+https://github.com/PaddlePaddle/Parakeet@8040cb0#egg=paddle-parakeet
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
import base64
import cv2
import numpy as np
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册