diff --git a/README.md b/README.md index ceef15af62c033a6c08d7f7792a73e9249c813e0..1144d3ab52ed7b8f2d6ae4cb7d8f50b6602a4ca2 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,7 @@

Quick Start + | Quick Start Server | Documents | Models List @@ -178,6 +179,8 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision +- 👏🏻 2022.03.28: PaddleSpeech Server is available for Audio Classification, Automatic Speech Recognition and Text-to-Speech. +- 👏🏻 2022.03.28: PaddleSpeech CLI is available for Speaker Verfication. - 🤗 2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available! - 👏🏻 2021.12.10: PaddleSpeech CLI is available for Audio Classification, Automatic Speech Recognition, Speech Translation (English to Chinese) and Text-to-Speech. @@ -203,6 +206,11 @@ Developers can have a try of our models with [PaddleSpeech Command Line](./paddl paddlespeech cls --input input.wav ``` +**Speaker Verification** +``` +paddlespeech vector --task spk --input input_16k.wav +``` + **Automatic Speech Recognition** ```shell paddlespeech asr --lang zh --input input_16k.wav @@ -242,6 +250,36 @@ For more command lines, please see: [demos](https://github.com/PaddlePaddle/Padd If you want to try more functions like training and tuning, please have a look at [Speech-to-Text Quick Start](./docs/source/asr/quick_start.md) and [Text-to-Speech Quick Start](./docs/source/tts/quick_start.md). + + +## Quick Start Server + +Developers can have a try of our speech server with [PaddleSpeech Server Command Line](./paddlespeech/server/README.md). + +**Start server** +```shell +paddlespeech_server start --config_file ./paddlespeech/server/conf/application.yaml +``` + +**Access Speech Recognition Services** +```shell +paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input input_16k.wav +``` + +**Access Text to Speech Services** +```shell +paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav +``` + +**Access Audio Classification Services** +```shell +paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav +``` + + +For more information about server command lines, please see: [speech server demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server) + + ## Model List PaddleSpeech supports a series of most popular models. They are summarized in [released models](./docs/source/released_model.md) and attached with available pretrained models. @@ -458,6 +496,29 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r +**Speaker Verification** + + + + + + + + + + + + + + + + + + +
Task Dataset Model Type Link
Speaker VerificationVoxCeleb12ECAPA-TDNN + ecapa-tdnn-voxceleb12 +
+ **Punctuation Restoration** @@ -499,6 +560,7 @@ Normally, [Speech SoTA](https://paperswithcode.com/area/speech), [Audio SoTA](ht - [Chinese Rule Based Text Frontend](./docs/source/tts/zh_text_frontend.md) - [Test Audio Samples](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html) - [Audio Classification](./demos/audio_tagging/README.md) + - [Speaker Verification](./demos/speaker_verification/README.md) - [Speech Translation](./demos/speech_translation/README.md) - [Released Models](./docs/source/released_model.md) - [Community](#Community) diff --git a/README_cn.md b/README_cn.md index 8ea91e98d42662c3ee3afcab52228d98191c19fc..ab4ce6e6b878626011ac5cbcfb5c82b4b03ef5d6 100644 --- a/README_cn.md +++ b/README_cn.md @@ -6,6 +6,7 @@

快速开始 + | 快速使用服务 | 教程文档 | 模型列表 @@ -179,7 +180,9 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme -- 🤗 2021.12.14: 我们在 Hugging Face Spaces 上的 [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) 以及 [TTS](https://huggingface.co/spaces/akhaliq/paddlespeech) Demos 上线啦! +- 👏🏻 2022.03.28: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、以及语音合成。 +- 👏🏻 2022.03.28: PaddleSpeech CLI 上线声纹验证。 +- 🤗 2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available! - 👏🏻 2021.12.10: PaddleSpeech CLI 上线!覆盖了声音分类、语音识别、语音翻译(英译中)以及语音合成。 ### 技术交流群 @@ -202,6 +205,10 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme ```shell paddlespeech cls --input input.wav ``` +**声纹识别** +```shell +paddlespeech vector --task spk --input input_16k.wav +``` **语音识别** ```shell paddlespeech asr --lang zh --input input_16k.wav @@ -236,6 +243,33 @@ paddlespeech asr --input ./zh.wav | paddlespeech text --task punc 更多命令行命令请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos) > Note: 如果需要训练或者微调,请查看[语音识别](./docs/source/asr/quick_start.md), [语音合成](./docs/source/tts/quick_start.md)。 + +## 快速使用服务 +安装完成后,开发者可以通过命令行快速使用服务。 + +**启动服务** +```shell +paddlespeech_server start --config_file ./paddlespeech/server/conf/application.yaml +``` + +**访问语音识别服务** +```shell +paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input input_16k.wav +``` + +**访问语音合成服务** +```shell +paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav +``` + +**访问音频分类服务** +```shell +paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav +``` + +更多服务相关的命令行使用信息,请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server) + + ## 模型列表 PaddleSpeech 支持很多主流的模型,并提供了预训练模型,详情请见[模型列表](./docs/source/released_model.md)。 @@ -453,6 +487,30 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声

+ +**声纹识别** + + + + + + + + + + + + + + + + + + +
Task Dataset Model Type Link
Speaker VerificationVoxCeleb12ECAPA-TDNN + ecapa-tdnn-voxceleb12 +
+ **标点恢复** @@ -499,6 +557,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声 - [中文文本前端](./docs/source/tts/zh_text_frontend.md) - [测试语音样本](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html) - [声音分类](./demos/audio_tagging/README_cn.md) + - [声纹识别](./demos/speaker_verification/README_cn.md) - [语音翻译](./demos/speech_translation/README_cn.md) - [模型列表](#模型列表) - [语音识别](#语音识别模型) @@ -521,6 +580,15 @@ author={PaddlePaddle Authors}, howpublished = {\url{https://github.com/PaddlePaddle/PaddleSpeech}}, year={2021} } + +@inproceedings{zheng2021fused, + title={Fused acoustic and text encoding for multimodal bilingual pretraining and speech translation}, + author={Zheng, Renjie and Chen, Junkun and Ma, Mingbo and Huang, Liang}, + booktitle={International Conference on Machine Learning}, + pages={12736--12746}, + year={2021}, + organization={PMLR} +} ``` @@ -568,7 +636,6 @@ year={2021} ## 致谢 - 非常感谢 [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) 多年来的关注和建议,以及在诸多问题上的帮助。 -- 非常感谢 [AK391](https://github.com/AK391) 在 Huggingface Spaces 上使用 Gradio 对我们的语音合成功能进行网页版演示。 - 非常感谢 [mymagicpower](https://github.com/mymagicpower) 采用PaddleSpeech 对 ASR 的[短语音](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk)及[长语音](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk)进行 Java 实现。 - 非常感谢 [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) 采用 PaddleSpeech 语音合成功能实现 Virtual Uploader(VUP)/Virtual YouTuber(VTuber) 虚拟主播。 - 非常感谢 [745165806](https://github.com/745165806)/[PaddleSpeechTask](https://github.com/745165806/PaddleSpeechTask) 贡献标点重建相关模型。 diff --git a/demos/speaker_verification/README.md b/demos/speaker_verification/README.md new file mode 100644 index 0000000000000000000000000000000000000000..c4d10ccf22f55ea31afe2303d7c2fcd9c0213d22 --- /dev/null +++ b/demos/speaker_verification/README.md @@ -0,0 +1,178 @@ +([简体中文](./README_cn.md)|English) +# Speech Verification) + +## Introduction + +Speaker Verification, refers to the problem of getting a speaker embedding from an audio. + +This demo is an implementation to extract speaker embedding from a specific audio file. It can be done by a single command or a few lines in python using `PaddleSpeech`. + +## Usage +### 1. Installation +see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). + +You can choose one way from easy, meduim and hard to install paddlespeech. + +### 2. Prepare Input File +The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model. + +Here are sample files for this demo that can be downloaded: +```bash +wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav +``` + +### 3. Usage +- Command Line(Recommended) + ```bash + paddlespeech vector --task spk --input 85236145389.wav + + echo -e "demo1 85236145389.wav" > vec.job + paddlespeech vector --task spk --input vec.job + + echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk + ``` + + Usage: + ```bash + paddlespeech vector --help + ``` + Arguments: + - `input`(required): Audio file to recognize. + - `model`: Model type of vector task. Default: `ecapatdnn_voxceleb12`. + - `sample_rate`: Sample rate of the model. Default: `16000`. + - `config`: Config of vector task. Use pretrained model when it is None. Default: `None`. + - `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`. + - `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment. + + Output: + +```bash + demo {'dim': 192, 'embedding': array([ -5.749211 , 9.505463 , -8.200284 , -5.2075014 , + 5.3940268 , -3.04878 , 1.611095 , 10.127234 , + -10.534177 , -15.821609 , 1.2032688 , -0.35080156, + 1.2629458 , -12.643498 , -2.5758228 , -11.343508 , + 2.3385992 , -8.719341 , 14.213509 , 15.404744 , + -0.39327756, 6.338786 , 2.688887 , 8.7104025 , + 17.469526 , -8.77959 , 7.0576906 , 4.648855 , + -1.3089896 , -23.294737 , 8.013747 , 13.891729 , + -9.926753 , 5.655307 , -5.9422326 , -22.842539 , + 0.6293588 , -18.46266 , -10.811862 , 9.8192625 , + 3.0070958 , 3.8072643 , -2.3861165 , 3.0821571 , + -14.739942 , 1.7594414 , -0.6485091 , 4.485623 , + 2.0207152 , 7.264915 , -6.40137 , 23.63524 , + 2.9711294 , -22.708025 , 9.93719 , 20.354511 , + -10.324688 , -0.700492 , -8.783211 , -5.27593 , + 15.999649 , 3.3004563 , 12.747926 , 15.429879 , + 4.7849145 , 5.6699696 , -2.3826702 , 10.605882 , + 3.9112158 , 3.1500628 , 15.859915 , -2.1832209 , + -23.908653 , -6.4799504 , -4.5365124 , -9.224193 , + 14.568347 , -10.568833 , 4.982321 , -4.342062 , + 0.0914714 , 12.645902 , -5.74285 , -3.2141201 , + -2.7173362 , -6.680575 , 0.4757669 , -5.035051 , + -6.7964664 , 16.865469 , -11.54324 , 7.681869 , + 0.44475392, 9.708182 , -8.932846 , 0.4123232 , + -4.361452 , 1.3948607 , 9.511665 , 0.11667654, + 2.9079323 , 6.049952 , 9.275183 , -18.078873 , + 6.2983274 , -0.7500531 , -2.725033 , -7.6027865 , + 3.3404543 , 2.990815 , 4.010979 , 11.000591 , + -2.8873312 , 7.1352735 , -16.79663 , 18.495346 , + -14.293832 , 7.89578 , 2.2714825 , 22.976387 , + -4.875734 , -3.0836344 , -2.9999814 , 13.751918 , + 6.448228 , -11.924197 , 2.171869 , 2.0423572 , + -6.173772 , 10.778437 , 25.77281 , -4.9495463 , + 14.57806 , 0.3044315 , 2.6132357 , -7.591999 , + -2.076944 , 9.025118 , 1.7834753 , -3.1799617 , + -4.9401326 , 23.465864 , 5.1685796 , -9.018578 , + 9.037825 , -4.4150195 , 6.859591 , -12.274467 , + -0.88911164, 5.186309 , -3.9988663 , -13.638606 , + -9.925445 , -0.06329413, -3.6709652 , -12.397416 , + -12.719869 , -1.395601 , 2.1150916 , 5.7381287 , + -4.4691963 , -3.82819 , -0.84233856, -1.1604277 , + -13.490127 , 8.731719 , -20.778936 , -11.495662 , + 5.8033476 , -4.752041 , 10.833007 , -6.717991 , + 4.504732 , 13.4244375 , 1.1306485 , 7.3435574 , + 1.400918 , 14.704036 , -9.501399 , 7.2315617 , + -6.417456 , 1.3333273 , 11.872697 , -0.30664724, + 8.8845 , 6.5569253 , 4.7948146 , 0.03662816, + -8.704245 , 6.224871 , -3.2701402 , -11.508579 ], + dtype=float32)} + ``` + +- Python API + ```python + import paddle + from paddlespeech.cli import VectorExecutor + + vector_executor = VectorExecutor() + audio_emb = vector_executor( + model='ecapatdnn_voxceleb12', + sample_rate=16000, + config=None, + ckpt_path=None, + audio_file='./85236145389.wav', + force_yes=False, + device=paddle.get_device()) + print('Audio embedding Result: \n{}'.format(audio_emb)) + ``` + + Output: + ```bash + # Vector Result: + {'dim': 192, 'embedding': array([ -5.749211 , 9.505463 , -8.200284 , -5.2075014 , + 5.3940268 , -3.04878 , 1.611095 , 10.127234 , + -10.534177 , -15.821609 , 1.2032688 , -0.35080156, + 1.2629458 , -12.643498 , -2.5758228 , -11.343508 , + 2.3385992 , -8.719341 , 14.213509 , 15.404744 , + -0.39327756, 6.338786 , 2.688887 , 8.7104025 , + 17.469526 , -8.77959 , 7.0576906 , 4.648855 , + -1.3089896 , -23.294737 , 8.013747 , 13.891729 , + -9.926753 , 5.655307 , -5.9422326 , -22.842539 , + 0.6293588 , -18.46266 , -10.811862 , 9.8192625 , + 3.0070958 , 3.8072643 , -2.3861165 , 3.0821571 , + -14.739942 , 1.7594414 , -0.6485091 , 4.485623 , + 2.0207152 , 7.264915 , -6.40137 , 23.63524 , + 2.9711294 , -22.708025 , 9.93719 , 20.354511 , + -10.324688 , -0.700492 , -8.783211 , -5.27593 , + 15.999649 , 3.3004563 , 12.747926 , 15.429879 , + 4.7849145 , 5.6699696 , -2.3826702 , 10.605882 , + 3.9112158 , 3.1500628 , 15.859915 , -2.1832209 , + -23.908653 , -6.4799504 , -4.5365124 , -9.224193 , + 14.568347 , -10.568833 , 4.982321 , -4.342062 , + 0.0914714 , 12.645902 , -5.74285 , -3.2141201 , + -2.7173362 , -6.680575 , 0.4757669 , -5.035051 , + -6.7964664 , 16.865469 , -11.54324 , 7.681869 , + 0.44475392, 9.708182 , -8.932846 , 0.4123232 , + -4.361452 , 1.3948607 , 9.511665 , 0.11667654, + 2.9079323 , 6.049952 , 9.275183 , -18.078873 , + 6.2983274 , -0.7500531 , -2.725033 , -7.6027865 , + 3.3404543 , 2.990815 , 4.010979 , 11.000591 , + -2.8873312 , 7.1352735 , -16.79663 , 18.495346 , + -14.293832 , 7.89578 , 2.2714825 , 22.976387 , + -4.875734 , -3.0836344 , -2.9999814 , 13.751918 , + 6.448228 , -11.924197 , 2.171869 , 2.0423572 , + -6.173772 , 10.778437 , 25.77281 , -4.9495463 , + 14.57806 , 0.3044315 , 2.6132357 , -7.591999 , + -2.076944 , 9.025118 , 1.7834753 , -3.1799617 , + -4.9401326 , 23.465864 , 5.1685796 , -9.018578 , + 9.037825 , -4.4150195 , 6.859591 , -12.274467 , + -0.88911164, 5.186309 , -3.9988663 , -13.638606 , + -9.925445 , -0.06329413, -3.6709652 , -12.397416 , + -12.719869 , -1.395601 , 2.1150916 , 5.7381287 , + -4.4691963 , -3.82819 , -0.84233856, -1.1604277 , + -13.490127 , 8.731719 , -20.778936 , -11.495662 , + 5.8033476 , -4.752041 , 10.833007 , -6.717991 , + 4.504732 , 13.4244375 , 1.1306485 , 7.3435574 , + 1.400918 , 14.704036 , -9.501399 , 7.2315617 , + -6.417456 , 1.3333273 , 11.872697 , -0.30664724, + 8.8845 , 6.5569253 , 4.7948146 , 0.03662816, + -8.704245 , 6.224871 , -3.2701402 , -11.508579 ], + dtype=float32)} + ``` + +### 4.Pretrained Models + +Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API: + +| Model | Sample Rate +| :--- | :---: | +| ecapatdnn_voxceleb12 | 16k diff --git a/demos/speaker_verification/README_cn.md b/demos/speaker_verification/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..e2799b75e921035b71353abe506d6daf40e6a7ff --- /dev/null +++ b/demos/speaker_verification/README_cn.md @@ -0,0 +1,175 @@ +(简体中文|[English](./README.md)) + +# 声纹识别 +## 介绍 +声纹识别是一项用计算机程序自动提取说话人特征的技术。 + +这个 demo 是一个从给定音频文件提取说话人特征,它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。 + +## 使用方法 +### 1. 安装 +请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)。 + +你可以从 easy,medium,hard 三中方式中选择一种方式安装。 + +### 2. 准备输入 +这个 demo 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。 + +可以下载此 demo 的示例音频: +```bash +# 该音频的内容是数字串 85236145389 +wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav +``` +### 3. 使用方法 +- 命令行 (推荐使用) + ```bash + paddlespeech vector --task spk --input 85236145389.wav + + echo -e "demo1 85236145389.wav" > vec.job + paddlespeech vector --task spk --input vec.job + + echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk + ``` + + 使用方法: + ```bash + paddlespeech vector --help + ``` + 参数: + - `input`(必须输入):用于识别的音频文件。 + - `model`:声纹任务的模型,默认值:`ecapatdnn_voxceleb12`。 + - `sample_rate`:音频采样率,默认值:`16000`。 + - `config`:声纹任务的参数文件,若不设置则使用预训练模型中的默认配置,默认值:`None`。 + - `ckpt_path`:模型参数文件,若不设置则下载预训练模型使用,默认值:`None`。 + - `device`:执行预测的设备,默认值:当前系统下 paddlepaddle 的默认 device。 + + 输出: + ```bash + demo {'dim': 192, 'embedding': array([ -5.749211 , 9.505463 , -8.200284 , -5.2075014 , + 5.3940268 , -3.04878 , 1.611095 , 10.127234 , + -10.534177 , -15.821609 , 1.2032688 , -0.35080156, + 1.2629458 , -12.643498 , -2.5758228 , -11.343508 , + 2.3385992 , -8.719341 , 14.213509 , 15.404744 , + -0.39327756, 6.338786 , 2.688887 , 8.7104025 , + 17.469526 , -8.77959 , 7.0576906 , 4.648855 , + -1.3089896 , -23.294737 , 8.013747 , 13.891729 , + -9.926753 , 5.655307 , -5.9422326 , -22.842539 , + 0.6293588 , -18.46266 , -10.811862 , 9.8192625 , + 3.0070958 , 3.8072643 , -2.3861165 , 3.0821571 , + -14.739942 , 1.7594414 , -0.6485091 , 4.485623 , + 2.0207152 , 7.264915 , -6.40137 , 23.63524 , + 2.9711294 , -22.708025 , 9.93719 , 20.354511 , + -10.324688 , -0.700492 , -8.783211 , -5.27593 , + 15.999649 , 3.3004563 , 12.747926 , 15.429879 , + 4.7849145 , 5.6699696 , -2.3826702 , 10.605882 , + 3.9112158 , 3.1500628 , 15.859915 , -2.1832209 , + -23.908653 , -6.4799504 , -4.5365124 , -9.224193 , + 14.568347 , -10.568833 , 4.982321 , -4.342062 , + 0.0914714 , 12.645902 , -5.74285 , -3.2141201 , + -2.7173362 , -6.680575 , 0.4757669 , -5.035051 , + -6.7964664 , 16.865469 , -11.54324 , 7.681869 , + 0.44475392, 9.708182 , -8.932846 , 0.4123232 , + -4.361452 , 1.3948607 , 9.511665 , 0.11667654, + 2.9079323 , 6.049952 , 9.275183 , -18.078873 , + 6.2983274 , -0.7500531 , -2.725033 , -7.6027865 , + 3.3404543 , 2.990815 , 4.010979 , 11.000591 , + -2.8873312 , 7.1352735 , -16.79663 , 18.495346 , + -14.293832 , 7.89578 , 2.2714825 , 22.976387 , + -4.875734 , -3.0836344 , -2.9999814 , 13.751918 , + 6.448228 , -11.924197 , 2.171869 , 2.0423572 , + -6.173772 , 10.778437 , 25.77281 , -4.9495463 , + 14.57806 , 0.3044315 , 2.6132357 , -7.591999 , + -2.076944 , 9.025118 , 1.7834753 , -3.1799617 , + -4.9401326 , 23.465864 , 5.1685796 , -9.018578 , + 9.037825 , -4.4150195 , 6.859591 , -12.274467 , + -0.88911164, 5.186309 , -3.9988663 , -13.638606 , + -9.925445 , -0.06329413, -3.6709652 , -12.397416 , + -12.719869 , -1.395601 , 2.1150916 , 5.7381287 , + -4.4691963 , -3.82819 , -0.84233856, -1.1604277 , + -13.490127 , 8.731719 , -20.778936 , -11.495662 , + 5.8033476 , -4.752041 , 10.833007 , -6.717991 , + 4.504732 , 13.4244375 , 1.1306485 , 7.3435574 , + 1.400918 , 14.704036 , -9.501399 , 7.2315617 , + -6.417456 , 1.3333273 , 11.872697 , -0.30664724, + 8.8845 , 6.5569253 , 4.7948146 , 0.03662816, + -8.704245 , 6.224871 , -3.2701402 , -11.508579 ], + dtype=float32)} + ``` + +- Python API + ```python + import paddle + from paddlespeech.cli import VectorExecutor + + vector_executor = VectorExecutor() + audio_emb = vector_executor( + model='ecapatdnn_voxceleb12', + sample_rate=16000, + config=None, # Set `config` and `ckpt_path` to None to use pretrained model. + ckpt_path=None, + audio_file='./85236145389.wav', + force_yes=False, + device=paddle.get_device()) + print('Audio embedding Result: \n{}'.format(audio_emb)) + ``` + + 输出: + ```bash + # Vector Result: + {'dim': 192, 'embedding': array([ -5.749211 , 9.505463 , -8.200284 , -5.2075014 , + 5.3940268 , -3.04878 , 1.611095 , 10.127234 , + -10.534177 , -15.821609 , 1.2032688 , -0.35080156, + 1.2629458 , -12.643498 , -2.5758228 , -11.343508 , + 2.3385992 , -8.719341 , 14.213509 , 15.404744 , + -0.39327756, 6.338786 , 2.688887 , 8.7104025 , + 17.469526 , -8.77959 , 7.0576906 , 4.648855 , + -1.3089896 , -23.294737 , 8.013747 , 13.891729 , + -9.926753 , 5.655307 , -5.9422326 , -22.842539 , + 0.6293588 , -18.46266 , -10.811862 , 9.8192625 , + 3.0070958 , 3.8072643 , -2.3861165 , 3.0821571 , + -14.739942 , 1.7594414 , -0.6485091 , 4.485623 , + 2.0207152 , 7.264915 , -6.40137 , 23.63524 , + 2.9711294 , -22.708025 , 9.93719 , 20.354511 , + -10.324688 , -0.700492 , -8.783211 , -5.27593 , + 15.999649 , 3.3004563 , 12.747926 , 15.429879 , + 4.7849145 , 5.6699696 , -2.3826702 , 10.605882 , + 3.9112158 , 3.1500628 , 15.859915 , -2.1832209 , + -23.908653 , -6.4799504 , -4.5365124 , -9.224193 , + 14.568347 , -10.568833 , 4.982321 , -4.342062 , + 0.0914714 , 12.645902 , -5.74285 , -3.2141201 , + -2.7173362 , -6.680575 , 0.4757669 , -5.035051 , + -6.7964664 , 16.865469 , -11.54324 , 7.681869 , + 0.44475392, 9.708182 , -8.932846 , 0.4123232 , + -4.361452 , 1.3948607 , 9.511665 , 0.11667654, + 2.9079323 , 6.049952 , 9.275183 , -18.078873 , + 6.2983274 , -0.7500531 , -2.725033 , -7.6027865 , + 3.3404543 , 2.990815 , 4.010979 , 11.000591 , + -2.8873312 , 7.1352735 , -16.79663 , 18.495346 , + -14.293832 , 7.89578 , 2.2714825 , 22.976387 , + -4.875734 , -3.0836344 , -2.9999814 , 13.751918 , + 6.448228 , -11.924197 , 2.171869 , 2.0423572 , + -6.173772 , 10.778437 , 25.77281 , -4.9495463 , + 14.57806 , 0.3044315 , 2.6132357 , -7.591999 , + -2.076944 , 9.025118 , 1.7834753 , -3.1799617 , + -4.9401326 , 23.465864 , 5.1685796 , -9.018578 , + 9.037825 , -4.4150195 , 6.859591 , -12.274467 , + -0.88911164, 5.186309 , -3.9988663 , -13.638606 , + -9.925445 , -0.06329413, -3.6709652 , -12.397416 , + -12.719869 , -1.395601 , 2.1150916 , 5.7381287 , + -4.4691963 , -3.82819 , -0.84233856, -1.1604277 , + -13.490127 , 8.731719 , -20.778936 , -11.495662 , + 5.8033476 , -4.752041 , 10.833007 , -6.717991 , + 4.504732 , 13.4244375 , 1.1306485 , 7.3435574 , + 1.400918 , 14.704036 , -9.501399 , 7.2315617 , + -6.417456 , 1.3333273 , 11.872697 , -0.30664724, + 8.8845 , 6.5569253 , 4.7948146 , 0.03662816, + -8.704245 , 6.224871 , -3.2701402 , -11.508579 ], + dtype=float32)} + ``` + +### 4.预训练模型 +以下是 PaddleSpeech 提供的可以被命令行和 python API 使用的预训练模型列表: + +| 模型 | 采样率 +| :--- | :---: | +| ecapatdnn_voxceleb12 | 16k diff --git a/demos/speaker_verification/run.sh b/demos/speaker_verification/run.sh new file mode 100644 index 0000000000000000000000000000000000000000..856886d333cd30f983576875e809ed2016a51f50 --- /dev/null +++ b/demos/speaker_verification/run.sh @@ -0,0 +1,6 @@ +#!/bin/bash + +wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav + +# asr +paddlespeech vector --task spk --input ./85236145389.wav \ No newline at end of file diff --git a/demos/speech_server/README.md b/demos/speech_server/README.md index 10489e7131408ac8c074797f543e8e0edefa289e..0323d3983ab58f40285f81f135dedf2f9f019b7e 100644 --- a/demos/speech_server/README.md +++ b/demos/speech_server/README.md @@ -15,8 +15,8 @@ You can choose one way from meduim and hard to install paddlespeech. ### 2. Prepare config File The configuration file can be found in `conf/application.yaml` . -Among them, `engine_list` indicates the speech engine that will be included in the service to be started, in the format of _. -At present, the speech tasks integrated by the service include: asr (speech recognition) and tts (speech synthesis). +Among them, `engine_list` indicates the speech engine that will be included in the service to be started, in the format of `_`. +At present, the speech tasks integrated by the service include: asr (speech recognition), tts (text to sppech) and cls (audio classification). Currently the engine type supports two forms: python and inference (Paddle Inference) diff --git a/demos/speech_server/README_cn.md b/demos/speech_server/README_cn.md index 2bd8af6c91f88045cad2aed643ebe524148f6184..687b51f10aca14936b20f6d6667d13644049c380 100644 --- a/demos/speech_server/README_cn.md +++ b/demos/speech_server/README_cn.md @@ -17,7 +17,7 @@ ### 2. 准备配置文件 配置文件可参见 `conf/application.yaml` 。 其中,`engine_list`表示即将启动的服务将会包含的语音引擎,格式为 <语音任务>_<引擎类型>。 -目前服务集成的语音任务有: asr(语音识别)、tts(语音合成)。 +目前服务集成的语音任务有: asr(语音识别)、tts(语音合成)以及cls(音频分类)。 目前引擎类型支持两种形式:python 及 inference (Paddle Inference) diff --git a/docs/source/released_model.md b/docs/source/released_model.md index a6092d558cadabcfae2f357c4ccf40b8dfab13f4..826279e6a316e4b1d4f665b05706aa1f0939d7fa 100644 --- a/docs/source/released_model.md +++ b/docs/source/released_model.md @@ -75,6 +75,12 @@ Model Type | Dataset| Example Link | Pretrained Models | Static Models PANN | Audioset| [audioset_tagging_cnn](https://github.com/qiuqiangkong/audioset_tagging_cnn) | [panns_cnn6.pdparams](https://bj.bcebos.com/paddleaudio/models/panns_cnn6.pdparams), [panns_cnn10.pdparams](https://bj.bcebos.com/paddleaudio/models/panns_cnn10.pdparams), [panns_cnn14.pdparams](https://bj.bcebos.com/paddleaudio/models/panns_cnn14.pdparams) | [panns_cnn6_static.tar.gz](https://paddlespeech.bj.bcebos.com/cls/inference_model/panns_cnn6_static.tar.gz)(18M), [panns_cnn10_static.tar.gz](https://paddlespeech.bj.bcebos.com/cls/inference_model/panns_cnn10_static.tar.gz)(19M), [panns_cnn14_static.tar.gz](https://paddlespeech.bj.bcebos.com/cls/inference_model/panns_cnn14_static.tar.gz)(289M) PANN | ESC-50 |[pann-esc50](../../examples/esc50/cls0)|[esc50_cnn6.tar.gz](https://paddlespeech.bj.bcebos.com/cls/esc50/esc50_cnn6.tar.gz), [esc50_cnn10.tar.gz](https://paddlespeech.bj.bcebos.com/cls/esc50/esc50_cnn10.tar.gz), [esc50_cnn14.tar.gz](https://paddlespeech.bj.bcebos.com/cls/esc50/esc50_cnn14.tar.gz) +## Speaker Verification Models + +Model Type | Dataset| Example Link | Pretrained Models | Static Models +:-------------:| :------------:| :-----: | :-----: | :-----: +PANN | VoxCeleb| [voxceleb_ecapatdnn](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0) | [ecapatdnn.tar.gz](https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_1.tar.gz) | - + ## Punctuation Restoration Models Model Type | Dataset| Example Link | Pretrained Models :-------------:| :------------:| :-----: | :-----: diff --git a/examples/voxceleb/sv0/RESULT.md b/examples/voxceleb/sv0/RESULT.md new file mode 100644 index 0000000000000000000000000000000000000000..c37bcecef9b4276adcd7eb05b14893c48c3bdf96 --- /dev/null +++ b/examples/voxceleb/sv0/RESULT.md @@ -0,0 +1,7 @@ +# VoxCeleb + +## ECAPA-TDNN + +| Model | Number of Params | Release | Config | dim | Test set | Cosine | Cosine + S-Norm | +| --- | --- | --- | --- | --- | --- | --- | ---- | +| ECAPA-TDNN | 85M | 0.1.1 | conf/ecapa_tdnn.yaml |192 | test | 1.15 | 1.06 | diff --git a/paddlespeech/cli/README.md b/paddlespeech/cli/README.md index 5ac7a3bcaf1709b94020715d4480c08cf98cc3f0..19c822040de6699123781f14b6eac5bcf3ca15a6 100644 --- a/paddlespeech/cli/README.md +++ b/paddlespeech/cli/README.md @@ -13,6 +13,12 @@ paddlespeech cls --input input.wav ``` + ## Speaker Verification + + ```bash + paddlespeech vector --task spk --input input_16k.wav + ``` + ## Automatic Speech Recognition ``` paddlespeech asr --lang zh --input input_16k.wav diff --git a/paddlespeech/cli/README_cn.md b/paddlespeech/cli/README_cn.md index 75ab9e41b10152446db762b1b4ed1c180cd49967..4b15d6c7bc68a39075aba7efb37a04e687b5ab35 100644 --- a/paddlespeech/cli/README_cn.md +++ b/paddlespeech/cli/README_cn.md @@ -12,6 +12,12 @@ ## 声音分类 ```bash paddlespeech cls --input input.wav + ``` + + ## 声纹识别 + + ```bash + paddlespeech vector --task spk --input input_16k.wav ``` ## 语音识别 diff --git a/paddlespeech/cli/vector/infer.py b/paddlespeech/cli/vector/infer.py index 56eccd133f9a5ef75d18963356dd67d176191d68..79d3b5dba1de53591df41fe06c28500d62144151 100644 --- a/paddlespeech/cli/vector/infer.py +++ b/paddlespeech/cli/vector/infer.py @@ -42,13 +42,15 @@ pretrained_models = { # "paddlespeech vector --task spk --model ecapatdnn_voxceleb12-16k --sr 16000 --input ./input.wav" "ecapatdnn_voxceleb12-16k": { 'url': - 'https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_0.tar.gz', + 'https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_1.tar.gz', 'md5': - '85ff08ce0ef406b8c6d7b5ffc5b2b48f', + 'a1c0dba7d4de997187786ff517d5b4ec', 'cfg_path': - 'conf/model.yaml', + 'conf/model.yaml', # the yaml config path 'ckpt_path': - 'model/model', + 'model/model', # the format is ${dir}/{model_name}, + # so the first 'model' is dir, the second 'model' is the name + # this means we have a model stored as model/model.pdparams }, } @@ -66,12 +68,13 @@ class VectorExecutor(BaseExecutor): self.parser = argparse.ArgumentParser( prog="paddlespeech.vector", add_help=True) + self.parser.add_argument( "--model", type=str, default="ecapatdnn_voxceleb12", choices=["ecapatdnn_voxceleb12"], - help="Choose model type of asr task.") + help="Choose model type of vector task.") self.parser.add_argument( "--task", type=str, @@ -79,7 +82,7 @@ class VectorExecutor(BaseExecutor): choices=["spk"], help="task type in vector domain") self.parser.add_argument( - "--input", type=str, default=None, help="Audio file to recognize.") + "--input", type=str, default=None, help="Audio file to extract embedding.") self.parser.add_argument( "--sample_rate", type=int, @@ -173,22 +176,55 @@ class VectorExecutor(BaseExecutor): sample_rate: int=16000, config: os.PathLike=None, ckpt_path: os.PathLike=None, - force_yes: bool=False, device=paddle.get_device()): + """Extract the audio embedding + + Args: + audio_file (os.PathLike): audio path, + whose format must be wav and sample rate must be matched the model + model (str, optional): mode type, which is been loaded from the pretrained model list. + Defaults to 'ecapatdnn-voxceleb12'. + sample_rate (int, optional): model sample rate. Defaults to 16000. + config (os.PathLike, optional): yaml config. Defaults to None. + ckpt_path (os.PathLike, optional): pretrained model path. Defaults to None. + device (optional): paddle running host device. Defaults to paddle.get_device(). + + Returns: + dict: return the audio embedding and the embedding shape + """ + # stage 0: check the audio format audio_file = os.path.abspath(audio_file) if not self._check(audio_file, sample_rate): sys.exit(-1) + # stage 1: set the paddle runtime host device logger.info(f"device type: {device}") paddle.device.set_device(device) + + # stage 2: read the specific pretrained model self._init_from_path(model, sample_rate, config, ckpt_path) + + # stage 3: preprocess the audio and get the audio feat self.preprocess(model, audio_file) + + # stage 4: infer the model and get the audio embedding self.infer(model) + + # stage 5: process the result and set them to output dict res = self.postprocess() return res def _get_pretrained_path(self, tag: str) -> os.PathLike: + """get the neural network path from the pretrained model list + we stored all the pretained mode in the variable `pretrained_models` + + Args: + tag (str): model tag in the pretrained model list + + Returns: + os.PathLike: the downloaded pretrained model path in the disk + """ support_models = list(pretrained_models.keys()) assert tag in pretrained_models, \ 'The model "{}" you want to use has not been supported,'\ @@ -210,15 +246,33 @@ class VectorExecutor(BaseExecutor): sample_rate: int=16000, cfg_path: Optional[os.PathLike]=None, ckpt_path: Optional[os.PathLike]=None): + """Init the neural network from the model path + + Args: + model_type (str, optional): model tag in the pretrained model list. + Defaults to 'ecapatdnn_voxceleb12'. + sample_rate (int, optional): model sample rate. + Defaults to 16000. + cfg_path (Optional[os.PathLike], optional): yaml config file path. + Defaults to None. + ckpt_path (Optional[os.PathLike], optional): the pretrained model path, which is stored in the disk. + Defaults to None. + """ + # stage 0: avoid to init the mode again if hasattr(self, "model"): logger.info("Model has been initialized") return # stage 1: get the model and config path + # if we want init the network from the model stored in the disk, + # we must pass the config path and the ckpt model path if cfg_path is None or ckpt_path is None: + # get the mode from pretrained list sample_rate_str = "16k" if sample_rate == 16000 else "8k" tag = model_type + "-" + sample_rate_str logger.info(f"load the pretrained model: {tag}") + # get the model from the pretrained list + # we download the pretrained model and store it in the res_path res_path = self._get_pretrained_path(tag) self.res_path = res_path @@ -227,6 +281,7 @@ class VectorExecutor(BaseExecutor): self.ckpt_path = os.path.join( res_path, pretrained_models[tag]['ckpt_path'] + '.pdparams') else: + # get the model from disk self.cfg_path = os.path.abspath(cfg_path) self.ckpt_path = os.path.abspath(ckpt_path + ".pdparams") self.res_path = os.path.dirname( @@ -241,7 +296,6 @@ class VectorExecutor(BaseExecutor): self.config.merge_from_file(self.cfg_path) # stage 3: get the model name to instance the model network with dynamic_import - # Noet: we use the '-' to get the model name instead of '_' logger.info("start to dynamic import the model class") model_name = model_type[:model_type.rindex('_')] logger.info(f"model name {model_name}") @@ -262,31 +316,55 @@ class VectorExecutor(BaseExecutor): @paddle.no_grad() def infer(self, model_type: str): + """Infer the model to get the embedding + Args: + model_type (str): speaker verification model type + """ + # stage 0: get the feat and length from _inputs feats = self._inputs["feats"] lengths = self._inputs["lengths"] logger.info("start to do backbone network model forward") logger.info( f"feats shape:{feats.shape}, lengths shape: {lengths.shape}") + + # stage 1: get the audio embedding # embedding from (1, emb_size, 1) -> (emb_size) embedding = self.model.backbone(feats, lengths).squeeze().numpy() logger.info(f"embedding size: {embedding.shape}") + # stage 2: put the embedding and dim info to _outputs property + # the embedding type is numpy.array self._outputs["embedding"] = embedding def postprocess(self) -> Union[str, os.PathLike]: - return self._outputs["embedding"] + """Return the audio embedding info + + Returns: + Union[str, os.PathLike]: audio embedding info + """ + embedding = self._outputs["embedding"] + dim = embedding.shape[0] + return {"dim": dim, "embedding": embedding} def preprocess(self, model_type: str, input_file: Union[str, os.PathLike]): + """Extract the audio feat + + Args: + model_type (str): speaker verification model type + input_file (Union[str, os.PathLike]): audio file path + """ audio_file = input_file if isinstance(audio_file, (str, os.PathLike)): logger.info(f"Preprocess audio file: {audio_file}") - # stage 1: load the audio + # stage 1: load the audio sample points + # Note: this process must match the training process waveform, sr = load_audio(audio_file) logger.info(f"load the audio sample points, shape is: {waveform.shape}") # stage 2: get the audio feat + # Note: Now we only support fbank feature try: feat = melspectrogram( x=waveform, @@ -302,8 +380,13 @@ class VectorExecutor(BaseExecutor): feat = paddle.to_tensor(feat).unsqueeze(0) # in inference period, the lengths is all one without padding lengths = paddle.ones([1]) + + # stage 3: we do feature normalize, + # Now we assume that the feat must do normalize feat = feature_normalize(feat, mean_norm=True, std_norm=False) + # stage 4: store the feat and length in the _inputs, + # which will be used in other function logger.info(f"feats shape: {feat.shape}") self._inputs["feats"] = feat self._inputs["lengths"] = lengths @@ -311,6 +394,15 @@ class VectorExecutor(BaseExecutor): logger.info("audio extract the feat success") def _check(self, audio_file: str, sample_rate: int): + """Check if the model sample match the audio sample rate + + Args: + audio_file (str): audio file path, which will be extracted the embedding + sample_rate (int): the desired model sample rate + + Returns: + bool: return if the audio sample rate matches the model sample rate + """ self.sample_rate = sample_rate if self.sample_rate != 16000 and self.sample_rate != 8000: logger.error( diff --git a/paddlespeech/server/README.md b/paddlespeech/server/README.md index 4ce9605d62a0c411840f9f861a5f251b146110ab..819fe440d220c1f4b06b2557978c9205ede804e0 100644 --- a/paddlespeech/server/README.md +++ b/paddlespeech/server/README.md @@ -10,7 +10,7 @@ paddlespeech_server help ``` ### Start the server - First set the service-related configuration parameters, similar to `./conf/application.yaml`, + First set the service-related configuration parameters, similar to `./conf/application.yaml`. Set `engine_list`, which represents the speech tasks included in the service to be started Then start the service: ```bash paddlespeech_server start --config_file ./conf/application.yaml @@ -23,7 +23,7 @@ ``` ### Access speech recognition services ``` - paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./tests/16_audio.wav + paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input input_16k.wav ``` ### Access text to speech services @@ -31,3 +31,7 @@ paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "你好,欢迎使用百度飞桨深度学习框架!" --output output.wav ``` + ### Access audio classification services + ```bash + paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav + ``` diff --git a/paddlespeech/server/README_cn.md b/paddlespeech/server/README_cn.md index 2dfd9474ba6490dedbb8d984c5ba9810506fa415..c0a4a7336700c642efc2172dfa14416dff0ef5ec 100644 --- a/paddlespeech/server/README_cn.md +++ b/paddlespeech/server/README_cn.md @@ -10,7 +10,7 @@ paddlespeech_server help ``` ### 启动服务 - 首先设置服务相关配置文件,类似于 `./conf/application.yaml`,同时设置服务配置中的语音任务模型相关配置,类似于 `./conf/tts/tts.yaml`。 + 首先设置服务相关配置文件,类似于 `./conf/application.yaml`,设置 `engine_list`,该值表示即将启动的服务中包含的语音任务。 然后启动服务: ```bash paddlespeech_server start --config_file ./conf/application.yaml @@ -30,3 +30,8 @@ ```bash paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "你好,欢迎使用百度飞桨深度学习框架!" --output output.wav ``` + + ### 访问音频分类服务 + ```bash + paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav + ``` diff --git a/paddlespeech/vector/cluster/__init__.py b/paddlespeech/vector/cluster/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..97043fd7ba6885aac81cad5a49924c23c67d4d47 --- /dev/null +++ b/paddlespeech/vector/cluster/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/paddlespeech/vector/io/__init__.py b/paddlespeech/vector/io/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..97043fd7ba6885aac81cad5a49924c23c67d4d47 --- /dev/null +++ b/paddlespeech/vector/io/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/paddlespeech/vector/modules/__init__.py b/paddlespeech/vector/modules/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..97043fd7ba6885aac81cad5a49924c23c67d4d47 --- /dev/null +++ b/paddlespeech/vector/modules/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/paddlespeech/vector/training/__init__.py b/paddlespeech/vector/training/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..97043fd7ba6885aac81cad5a49924c23c67d4d47 --- /dev/null +++ b/paddlespeech/vector/training/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/paddlespeech/vector/utils/__init__.py b/paddlespeech/vector/utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..97043fd7ba6885aac81cad5a49924c23c67d4d47 --- /dev/null +++ b/paddlespeech/vector/utils/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License.