提交 c76c4800 编写于 作者: Y Yang Zhou

Merge branch 'develop' of github.com:SmileGoat/PaddleSpeech into refactor_file_struct

([简体中文](./README_cn.md)|English) ([简体中文](./README_cn.md)|English)
<p align="center"> <p align="center">
<img src="./docs/images/PaddleSpeech_logo.png" /> <img src="./docs/images/PaddleSpeech_logo.png" />
</p> </p>
...@@ -20,20 +17,17 @@ ...@@ -20,20 +17,17 @@
<a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a> <a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
</p> </p>
<div align="center"> <div align="center">
<h3> <h4>
| <a href="#quick-start"> Quick Start </a> <a href="#quick-start"> Quick Start </a>
| <a href="#quick-start-server"> Quick Start Server </a> | <a href="#quick-start-server"> Quick Start Server </a>
| <a href="#quick-start-streaming-server"> Quick Start Streaming Server</a> | <a href="#quick-start-streaming-server"> Quick Start Streaming Server</a>
|
</br>
| <a href="#documents"> Documents </a> | <a href="#documents"> Documents </a>
| <a href="#model-list"> Models List </a> | <a href="#model-list"> Models List </a>
| | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio Courses </a>
</h3> </h4>
</div> </div>
------------------------------------------------------------------------------------
**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models. **PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.
...@@ -170,23 +164,12 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision ...@@ -170,23 +164,12 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🤗 2021.12.14: [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available! - 🤗 2021.12.14: [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
- 👏🏻 2021.12.10: `CLI` is available for `Audio Classification`, `Automatic Speech Recognition`, `Speech Translation (English to Chinese)` and `Text-to-Speech`. - 👏🏻 2021.12.10: `CLI` is available for `Audio Classification`, `Automatic Speech Recognition`, `Speech Translation (English to Chinese)` and `Text-to-Speech`.
### 🔥 Hot Activities
<!---
2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
--->
- 2021.12.21~12.24
4 Days Live Courses: Depth interpretation of PaddleSpeech!
**Courses videos and related materials: https://aistudio.baidu.com/aistudio/education/group/info/25130**
### Community ### Community
- Scan the QR code below with your Wechat (reply【语音】after your friend's application is approved), you can access to official technical exchange group. Look forward to your participation. - Scan the QR code below with your Wechat, you can access to official technical exchange group and get the bonus ( more than 20GB learning materials, such as papers, codes and videos ) and the live link of the lessons. Look forward to your participation.
<div align="center"> <div align="center">
<img src="https://raw.githubusercontent.com/yt605155624/lanceTest/main/images/wechat_4.jpg" width = "300" /> <img src="https://user-images.githubusercontent.com/23690325/169763015-cbd8e28d-602c-4723-810d-dbc6da49441e.jpg" width = "200" />
</div> </div>
## Installation ## Installation
......
...@@ -18,40 +18,19 @@ ...@@ -18,40 +18,19 @@
<a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a> <a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
</p> </p>
<div align="center"> <div align="center">
<h3> <h4>
<a href="#quick-start"> Quick Start </a> <a href="#快速开始"> 快速开始 </a>
| <a href="#quick-start-server"> Quick Start Server </a> | <a href="#快速使用服务"> 快速使用服务 </a>
| <a href="#quick-start-streaming-server"> Quick Start Streaming Server</a> | <a href="#快速使用流式服务"> 快速使用流式服务 </a>
</br> | <a href="#教程文档"> 教程文档 </a>
<a href="#documents"> Documents </a> | <a href="#模型列表"> 模型列表 </a>
| <a href="#model-list"> Models List </a> | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio 课程 </a>
</h3> </h4>
</div> </div>
------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------
<div align="center">
<h3>
<a href="#quick-start"> 快速开始 </a>
| <a href="#quick-start-server"> 快速使用服务 </a>
| <a href="#quick-start-streaming-server"> 快速使用流式服务 </a>
| <a href="#documents"> 教程文档 </a>
| <a href="#model-list"> 模型列表 </a>
</div>
<!---
from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readmes-readable.md
1.What is this repo or project? (You can reuse the repo description you used earlier because this section doesn’t have to be long.)
2.How does it work?
3.Who will use this repo or project?
4.What is the goal of this project?
-->
**PaddleSpeech** 是基于飞桨 [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,包含大量基于深度学习前沿和有影响力的模型,一些典型的应用示例如下: **PaddleSpeech** 是基于飞桨 [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,包含大量基于深度学习前沿和有影响力的模型,一些典型的应用示例如下:
##### 语音识别 ##### 语音识别
...@@ -178,39 +157,30 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme ...@@ -178,39 +157,30 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
### 近期更新 ### 近期更新
- 👑 2022.05.13: PaddleSpeech 发布 [PP-ASR](./docs/source/asr/PPASR_cn.md) 流式语音识别系统、[PP-TTS](./docs/source/tts/PPTTS_cn.md) 流式语音合成系统、[PP-VPR](docs/source/vpr/PPVPR_cn.md) 全链路声纹识别系统
<!---
2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
--->
- 👑 2022.05.13: PaddleSpeech 发布 [PP-ASR](./docs/source/asr/PPASR_cn.md)[PP-TTS](./docs/source/tts/PPTTS_cn.md)[PP-VPR](docs/source/vpr/PPVPR_cn.md)
- 👏🏻 2022.05.06: PaddleSpeech Streaming Server 上线! 覆盖了语音识别(标点恢复、时间戳),和语音合成。 - 👏🏻 2022.05.06: PaddleSpeech Streaming Server 上线! 覆盖了语音识别(标点恢复、时间戳),和语音合成。
- 👏🏻 2022.05.06: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、语音合成、声纹识别,标点恢复。 - 👏🏻 2022.05.06: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、语音合成、声纹识别,标点恢复。
- 👏🏻 2022.03.28: PaddleSpeech CLI 覆盖声音分类、语音识别、语音翻译(英译中)、语音合成,声纹验证。 - 👏🏻 2022.03.28: PaddleSpeech CLI 覆盖声音分类、语音识别、语音翻译(英译中)、语音合成,声纹验证。
- 🤗 2021.12.14: PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available! - 🤗 2021.12.14: PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
### 🔥 热门活动
- 2021.12.21~12.24
4 日直播课: 深度解读 PaddleSpeech 语音技术! ### 🔥 加入技术交流群获取入群福利
**直播回放与课件资料: https://aistudio.baidu.com/aistudio/education/group/info/25130** - 3 日直播课链接: 深度解读 PP-TTS、PP-ASR、PP-VPR 三项核心语音系统关键技术
- 20G 学习大礼包:视频课程、前沿论文与学习资料
微信扫描二维码关注公众号,点击“马上报名”填写问卷加入官方交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
### 技术交流群
微信扫描二维码(好友申请通过后回复【语音】)加入官方交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
<div align="center"> <div align="center">
<img src="https://raw.githubusercontent.com/yt605155624/lanceTest/main/images/wechat_4.jpg" width = "300" /> <img src="https://user-images.githubusercontent.com/23690325/169763015-cbd8e28d-602c-4723-810d-dbc6da49441e.jpg" width = "200" />
</div> </div>
## 安装 ## 安装
我们强烈建议用户在 **Linux** 环境下,*3.7* 以上版本的 *python* 上安装 PaddleSpeech。 我们强烈建议用户在 **Linux** 环境下,*3.7* 以上版本的 *python* 上安装 PaddleSpeech。
目前为止,**Linux** 支持声音分类、语音识别、语音合成和语音翻译四种功能,**Mac OSX、 Windows** 下暂不支持语音翻译功能。 想了解具体安装细节,可以参考[安装文档](./docs/source/install_cn.md) 目前为止,**Linux** 支持声音分类、语音识别、语音合成和语音翻译四种功能,**Mac OSX、 Windows** 下暂不支持语音翻译功能。 想了解具体安装细节,可以参考[安装文档](./docs/source/install_cn.md)
<a name="快速开始"></a>
## 快速开始 ## 快速开始
安装完成后,开发者可以通过命令行快速开始,改变 `--input` 可以尝试用自己的音频或文本测试。 安装完成后,开发者可以通过命令行快速开始,改变 `--input` 可以尝试用自己的音频或文本测试。
...@@ -257,7 +227,7 @@ paddlespeech asr --input ./zh.wav | paddlespeech text --task punc ...@@ -257,7 +227,7 @@ paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
更多命令行命令请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos) 更多命令行命令请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos)
> Note: 如果需要训练或者微调,请查看[语音识别](./docs/source/asr/quick_start.md), [语音合成](./docs/source/tts/quick_start.md)。 > Note: 如果需要训练或者微调,请查看[语音识别](./docs/source/asr/quick_start.md), [语音合成](./docs/source/tts/quick_start.md)。
<a name="快速使用服务"></a>
## 快速使用服务 ## 快速使用服务
安装完成后,开发者可以通过命令行快速使用服务。 安装完成后,开发者可以通过命令行快速使用服务。
...@@ -283,30 +253,30 @@ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav ...@@ -283,30 +253,30 @@ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav
更多服务相关的命令行使用信息,请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server) 更多服务相关的命令行使用信息,请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)
<a name="quickstartstreamingserver"></a> <a name="快速使用流式服务"></a>
## 快速使用流式服务 ## 快速使用流式服务
开发者可以尝试[流式ASR](./demos/streaming_asr_server/README.md)[流式TTS](./demos/streaming_tts_server/README.md)服务. 开发者可以尝试 [流式 ASR](./demos/streaming_asr_server/README.md)[流式 TTS](./demos/streaming_tts_server/README.md) 服务.
**启动流式ASR服务** **启动流式 ASR 服务**
``` ```
paddlespeech_server start --config_file ./demos/streaming_asr_server/conf/application.yaml paddlespeech_server start --config_file ./demos/streaming_asr_server/conf/application.yaml
``` ```
**访问流式ASR服务** **访问流式 ASR 服务**
``` ```
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input input_16k.wav paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
``` ```
**启动流式TTS服务** **启动流式 TTS 服务**
``` ```
paddlespeech_server start --config_file ./demos/streaming_tts_server/conf/tts_online_application.yaml paddlespeech_server start --config_file ./demos/streaming_tts_server/conf/tts_online_application.yaml
``` ```
**访问流式TTS服务** **访问流式 TTS 服务**
``` ```
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
...@@ -314,8 +284,7 @@ paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http ...@@ -314,8 +284,7 @@ paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http
更多信息参看: [流式 ASR](./demos/streaming_asr_server/README.md)[流式 TTS](./demos/streaming_tts_server/README.md) 更多信息参看: [流式 ASR](./demos/streaming_asr_server/README.md)[流式 TTS](./demos/streaming_tts_server/README.md)
<a name="modulelist"></a> <a name="模型列表"></a>
## 模型列表 ## 模型列表
PaddleSpeech 支持很多主流的模型,并提供了预训练模型,详情请见[模型列表](./docs/source/released_model.md) PaddleSpeech 支持很多主流的模型,并提供了预训练模型,详情请见[模型列表](./docs/source/released_model.md)
...@@ -587,6 +556,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声 ...@@ -587,6 +556,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
</tbody> </tbody>
</table> </table>
<a name="教程文档"></a>
## 教程文档 ## 教程文档
对于 PaddleSpeech 的所关注的任务,以下指南有助于帮助开发者快速入门,了解语音相关核心思想。 对于 PaddleSpeech 的所关注的任务,以下指南有助于帮助开发者快速入门,了解语音相关核心思想。
...@@ -668,7 +638,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声 ...@@ -668,7 +638,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
<a name="欢迎贡献"></a> <a name="欢迎贡献"></a>
## 参与 PaddleSpeech 的开发 ## 参与 PaddleSpeech 的开发
热烈欢迎您在[Discussions](https://github.com/PaddlePaddle/PaddleSpeech/discussions) 中提交问题,并在[Issues](https://github.com/PaddlePaddle/PaddleSpeech/issues) 中指出发现的 bug。此外,我们非常希望您参与到 PaddleSpeech 的开发中! 热烈欢迎您在 [Discussions](https://github.com/PaddlePaddle/PaddleSpeech/discussions) 中提交问题,并在 [Issues](https://github.com/PaddlePaddle/PaddleSpeech/issues) 中指出发现的 bug。此外,我们非常希望您参与到 PaddleSpeech 的开发中!
### 贡献者 ### 贡献者
<p align="center"> <p align="center">
......
...@@ -16,7 +16,12 @@ see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/doc ...@@ -16,7 +16,12 @@ see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/doc
You can choose one way from meduim and hard to install paddlespeech. You can choose one way from meduim and hard to install paddlespeech.
The dependency refers to the requirements.txt The dependency refers to the requirements.txt, and install the dependency as follows:
```
pip install -r requriement.txt
```
### 2. Prepare Input File ### 2. Prepare Input File
The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model. The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
......
...@@ -16,7 +16,11 @@ ...@@ -16,7 +16,11 @@
请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md) 请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)
你可以从 medium,hard 三中方式中选择一种方式安装。 你可以从 medium,hard 三中方式中选择一种方式安装。
依赖参见 requirements.txt 依赖参见 requirements.txt, 安装依赖
```
pip install -r requriement.txt
```
### 2. 准备输入 ### 2. 准备输入
这个 demo 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。 这个 demo 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。
......
...@@ -28,6 +28,7 @@ acs_python: ...@@ -28,6 +28,7 @@ acs_python:
word_list: "./conf/words.txt" word_list: "./conf/words.txt"
sample_rate: 16000 sample_rate: 16000
device: 'cpu' # set 'gpu:id' or 'cpu' device: 'cpu' # set 'gpu:id' or 'cpu'
ping_timeout: 100 # seconds
......
websocket-client
\ No newline at end of file
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
from paddlespeech.cli.log import logger
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
if __name__ == "__main__":
parser = argparse.ArgumentParser(
prog='paddlespeech_server.start', add_help=True)
parser.add_argument(
"--config_file",
action="store",
help="yaml file of the app",
default=None,
required=True)
parser.add_argument(
"--log_file",
action="store",
help="log file",
default="./log/paddlespeech.log")
logger.info("start to parse the args")
args = parser.parse_args()
logger.info("start to launch the streaming asr server")
streaming_asr_server = ServerExecutor()
streaming_asr_server(config_file=args.config_file, log_file=args.log_file)
...@@ -26,8 +26,7 @@ def get_audios(path): ...@@ -26,8 +26,7 @@ def get_audios(path):
""" """
supported_formats = [".wav", ".mp3", ".ogg", ".flac", ".m4a"] supported_formats = [".wav", ".mp3", ".ogg", ".flac", ".m4a"]
return [ return [
item item for sublist in [[os.path.join(dir, file) for file in files]
for sublist in [[os.path.join(dir, file) for file in files]
for dir, _, files in list(os.walk(path))] for dir, _, files in list(os.walk(path))]
for item in sublist if os.path.splitext(item)[1] in supported_formats for item in sublist if os.path.splitext(item)[1] in supported_formats
] ]
......
...@@ -53,50 +53,49 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav ...@@ -53,50 +53,49 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
Output: Output:
```bash ```bash
demo [ 1.4217498 5.626253 -5.342073 1.1773866 3.308055 demo [ -1.3251206 7.8606825 -4.620626 0.3000721 2.2648535
1.756596 5.167894 10.80636 -3.8226728 -5.6141334 -1.1931441 3.0647137 7.673595 -6.0044727 -12.02426
2.623845 -0.8072968 1.9635103 -7.3128724 0.01103897 -1.9496069 3.1269536 1.618838 -7.6383104 -1.2299773
-9.723131 0.6619743 -6.976803 10.213478 7.494748 -12.338331 2.1373026 -5.3957124 9.717328 5.6752305
2.9105635 3.8949256 3.7999806 7.1061673 16.905321 3.7805123 3.0597172 3.429692 8.97601 13.174125
-7.1493764 8.733103 3.4230042 -4.831653 -11.403367 -0.53132284 8.9424715 4.46511 -4.4262476 -9.726503
11.232214 7.1274667 -4.2828417 2.452362 -5.130748 8.399328 7.2239175 -7.435854 2.9441683 -4.3430395
-18.177666 -2.6116815 -11.000337 -6.7314315 1.6564683 -13.886965 -1.6346735 -10.9027405 -5.311245 3.8007221
0.7618269 1.1253023 -2.083836 4.725744 -8.782597 3.8976038 -2.1230774 -2.3521194 4.151031 -7.4048667
-3.539873 3.814236 5.1420674 2.162061 4.096431 0.13911647 2.4626107 4.9664545 0.9897574 5.4839754
-6.4162116 12.747448 1.9429878 -15.152943 6.417416 -3.3574002 10.1340065 -0.6120171 -10.403095 4.6007543
16.097002 -9.716668 -1.9920526 -3.3649497 -1.871939 16.00935 -7.7836914 -4.1945305 -6.9368606 1.1789556
11.567354 3.69788 11.258265 7.442363 9.183411 11.490801 4.2380238 9.550931 8.375046 7.5089145
4.5281515 -1.2417862 4.3959084 6.6727695 5.8898783 -0.65707296 -0.30051577 2.8406055 3.0828028 0.730817
7.627124 -0.66919386 -11.889693 -9.208865 -7.4274073 6.148354 0.13766119 -13.424735 -7.7461405 -2.3227983
-3.7776625 6.917234 -9.848748 -2.0944717 -5.135116 -8.305252 2.9879124 -10.995229 0.15211068 -2.3820348
0.49563864 9.317534 -5.9141874 -1.8098574 -0.11738578 -1.7984174 8.495629 -5.8522367 -3.755498 0.6989711
-7.169265 -1.0578263 -5.7216787 -5.1173844 16.137651 -5.2702994 -2.6188622 -1.8828466 -4.64665 14.078544
-4.473626 7.6624317 -0.55381083 9.631587 -6.4704556 -0.5495333 10.579158 -3.2160501 9.349004 -4.381078
-8.548508 4.3716145 -0.79702514 4.478997 -2.9758704 -11.675817 -2.8630207 4.5721755 2.246612 -4.574342
3.272176 2.8382776 5.134597 -9.190781 -0.5657382 1.8610188 2.3767874 5.6257877 -9.784078 0.64967257
-4.8745747 2.3165567 -5.984303 -2.1798875 0.35541576 -1.4579505 0.4263264 -4.9211264 -2.454784 3.4869802
-0.31784213 9.493548 2.1144536 4.358092 -12.089823 -0.42654222 8.341269 1.356552 7.0966883 -13.102829
8.451689 -7.925461 4.6242585 4.4289427 18.692003 8.016734 -7.1159344 1.8699781 0.208721 14.699384
-2.6204622 -5.149185 -0.35821092 8.488551 4.981496 -1.025278 -2.6107233 -2.5082312 8.427193 6.9138527
-9.32683 -2.2544234 6.6417594 1.2119585 10.977129 -6.2912464 0.6157366 2.489688 -3.4668267 9.921763
16.555033 3.3238444 9.551863 -1.6676947 -0.79539716 11.200815 -0.1966403 7.4916005 -0.62312716 -0.25848144
-8.605674 -0.47356385 2.6741948 -5.359179 -2.6673796 -9.947997 -0.9611041 1.1649219 -2.1907122 -1.5028487
0.66607 15.443222 4.740594 -3.4725387 11.592567 -0.51926106 15.165954 2.4649463 -0.9980445 7.4416637
-2.054497 1.7361217 -8.265324 -9.30447 5.4068313 -2.0768049 3.5896823 -7.3055434 -7.5620847 4.323335
-1.5180256 -7.746615 -6.089606 0.07112726 -0.34904733 0.0804418 -6.56401 -2.3148053 -1.7642345 -2.4708817
-8.649895 -9.998958 -2.564841 -0.53999114 2.601808 -7.675618 -9.548878 -1.0177554 0.16986446 2.5877135
-0.31927416 -1.8815292 -2.07215 -3.4105783 -8.2998085 -1.8752296 -0.36614323 -6.0493784 -2.3965611 -5.9453387
1.483641 -15.365992 -8.288208 3.8847756 -3.4876456 0.9424033 -13.155974 -7.457801 0.14658108 -3.742797
7.3629923 0.4657332 3.132599 12.438889 -1.8337058 5.8414927 -1.2872906 5.5694313 12.57059 1.0939219
4.532936 2.7264361 10.145339 -6.521951 2.897153 2.2142086 1.9181576 6.9914207 -5.888139 3.1409824
-3.3925855 5.079156 7.759716 4.677565 5.8457737 -2.003628 2.4434285 9.973139 5.03668 2.0051203
2.402413 7.7071047 3.9711342 -6.390043 6.1268735 2.8615603 5.860224 2.9176188 -1.6311141 2.0292206
-3.7760346 -11.118123 ] -4.070415 -6.831437 ]
``` ```
- Python API - Python API
```python ```python
import paddle
from paddlespeech.cli import VectorExecutor from paddlespeech.cli import VectorExecutor
vector_executor = VectorExecutor() vector_executor = VectorExecutor()
...@@ -128,88 +127,88 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav ...@@ -128,88 +127,88 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
```bash ```bash
# Vector Result: # Vector Result:
Audio embedding Result: Audio embedding Result:
[ 1.4217498 5.626253 -5.342073 1.1773866 3.308055 [ -1.3251206 7.8606825 -4.620626 0.3000721 2.2648535
1.756596 5.167894 10.80636 -3.8226728 -5.6141334 -1.1931441 3.0647137 7.673595 -6.0044727 -12.02426
2.623845 -0.8072968 1.9635103 -7.3128724 0.01103897 -1.9496069 3.1269536 1.618838 -7.6383104 -1.2299773
-9.723131 0.6619743 -6.976803 10.213478 7.494748 -12.338331 2.1373026 -5.3957124 9.717328 5.6752305
2.9105635 3.8949256 3.7999806 7.1061673 16.905321 3.7805123 3.0597172 3.429692 8.97601 13.174125
-7.1493764 8.733103 3.4230042 -4.831653 -11.403367 -0.53132284 8.9424715 4.46511 -4.4262476 -9.726503
11.232214 7.1274667 -4.2828417 2.452362 -5.130748 8.399328 7.2239175 -7.435854 2.9441683 -4.3430395
-18.177666 -2.6116815 -11.000337 -6.7314315 1.6564683 -13.886965 -1.6346735 -10.9027405 -5.311245 3.8007221
0.7618269 1.1253023 -2.083836 4.725744 -8.782597 3.8976038 -2.1230774 -2.3521194 4.151031 -7.4048667
-3.539873 3.814236 5.1420674 2.162061 4.096431 0.13911647 2.4626107 4.9664545 0.9897574 5.4839754
-6.4162116 12.747448 1.9429878 -15.152943 6.417416 -3.3574002 10.1340065 -0.6120171 -10.403095 4.6007543
16.097002 -9.716668 -1.9920526 -3.3649497 -1.871939 16.00935 -7.7836914 -4.1945305 -6.9368606 1.1789556
11.567354 3.69788 11.258265 7.442363 9.183411 11.490801 4.2380238 9.550931 8.375046 7.5089145
4.5281515 -1.2417862 4.3959084 6.6727695 5.8898783 -0.65707296 -0.30051577 2.8406055 3.0828028 0.730817
7.627124 -0.66919386 -11.889693 -9.208865 -7.4274073 6.148354 0.13766119 -13.424735 -7.7461405 -2.3227983
-3.7776625 6.917234 -9.848748 -2.0944717 -5.135116 -8.305252 2.9879124 -10.995229 0.15211068 -2.3820348
0.49563864 9.317534 -5.9141874 -1.8098574 -0.11738578 -1.7984174 8.495629 -5.8522367 -3.755498 0.6989711
-7.169265 -1.0578263 -5.7216787 -5.1173844 16.137651 -5.2702994 -2.6188622 -1.8828466 -4.64665 14.078544
-4.473626 7.6624317 -0.55381083 9.631587 -6.4704556 -0.5495333 10.579158 -3.2160501 9.349004 -4.381078
-8.548508 4.3716145 -0.79702514 4.478997 -2.9758704 -11.675817 -2.8630207 4.5721755 2.246612 -4.574342
3.272176 2.8382776 5.134597 -9.190781 -0.5657382 1.8610188 2.3767874 5.6257877 -9.784078 0.64967257
-4.8745747 2.3165567 -5.984303 -2.1798875 0.35541576 -1.4579505 0.4263264 -4.9211264 -2.454784 3.4869802
-0.31784213 9.493548 2.1144536 4.358092 -12.089823 -0.42654222 8.341269 1.356552 7.0966883 -13.102829
8.451689 -7.925461 4.6242585 4.4289427 18.692003 8.016734 -7.1159344 1.8699781 0.208721 14.699384
-2.6204622 -5.149185 -0.35821092 8.488551 4.981496 -1.025278 -2.6107233 -2.5082312 8.427193 6.9138527
-9.32683 -2.2544234 6.6417594 1.2119585 10.977129 -6.2912464 0.6157366 2.489688 -3.4668267 9.921763
16.555033 3.3238444 9.551863 -1.6676947 -0.79539716 11.200815 -0.1966403 7.4916005 -0.62312716 -0.25848144
-8.605674 -0.47356385 2.6741948 -5.359179 -2.6673796 -9.947997 -0.9611041 1.1649219 -2.1907122 -1.5028487
0.66607 15.443222 4.740594 -3.4725387 11.592567 -0.51926106 15.165954 2.4649463 -0.9980445 7.4416637
-2.054497 1.7361217 -8.265324 -9.30447 5.4068313 -2.0768049 3.5896823 -7.3055434 -7.5620847 4.323335
-1.5180256 -7.746615 -6.089606 0.07112726 -0.34904733 0.0804418 -6.56401 -2.3148053 -1.7642345 -2.4708817
-8.649895 -9.998958 -2.564841 -0.53999114 2.601808 -7.675618 -9.548878 -1.0177554 0.16986446 2.5877135
-0.31927416 -1.8815292 -2.07215 -3.4105783 -8.2998085 -1.8752296 -0.36614323 -6.0493784 -2.3965611 -5.9453387
1.483641 -15.365992 -8.288208 3.8847756 -3.4876456 0.9424033 -13.155974 -7.457801 0.14658108 -3.742797
7.3629923 0.4657332 3.132599 12.438889 -1.8337058 5.8414927 -1.2872906 5.5694313 12.57059 1.0939219
4.532936 2.7264361 10.145339 -6.521951 2.897153 2.2142086 1.9181576 6.9914207 -5.888139 3.1409824
-3.3925855 5.079156 7.759716 4.677565 5.8457737 -2.003628 2.4434285 9.973139 5.03668 2.0051203
2.402413 7.7071047 3.9711342 -6.390043 6.1268735 2.8615603 5.860224 2.9176188 -1.6311141 2.0292206
-3.7760346 -11.118123 ] -4.070415 -6.831437 ]
# get the test embedding # get the test embedding
Test embedding Result: Test embedding Result:
[ -1.902964 2.0690894 -8.034194 3.5472693 0.18089125 [ 2.5247195 5.119042 -4.335273 4.4583654 5.047907
6.9085927 1.4097427 -1.9487704 -10.021278 -0.20755845 3.5059214 1.6159848 0.49364898 -11.6899185 -3.1014526
-8.04332 4.344489 2.3200977 -14.306299 5.184692 -5.6589785 -0.42684984 2.674276 -11.937654 6.2248464
-11.55602 -3.8497238 0.6444722 1.2833948 2.6766639 -10.776924 -5.694543 1.112041 1.5709964 1.0961034
0.5878921 0.7946299 1.7207596 2.5791872 14.998469 1.3976512 2.324352 1.339981 5.279319 13.734659
-1.3385371 15.031221 -0.8006958 1.99287 -9.52007 -2.5753925 13.651442 -2.2357535 5.1575427 -3.251567
2.435466 4.003221 -4.33817 -4.898601 -5.304714 1.4023279 6.1191974 -6.0845175 -1.3646189 -2.6789894
-18.033886 10.790787 -12.784645 -5.641755 2.9761686 -15.220778 9.779349 -9.411551 -6.388947 6.8313975
-10.566622 1.4839455 6.152458 -5.7195854 2.8603241 -9.245996 0.31196198 2.5509644 -4.413065 6.1649427
6.112133 8.489869 5.5958056 1.2836679 -1.2293907 6.793837 2.6328635 8.620976 3.4832475 0.52491665
0.89927405 7.0288725 -2.854029 -0.9782962 5.8255906 2.9115407 5.8392377 0.6702376 -3.2726715 2.6694255
14.905906 -5.025907 0.7866458 -4.2444224 -16.354029 16.91701 -5.5811176 0.23362345 -4.5573606 -11.801059
10.521315 0.9604709 -3.3257897 7.144871 -13.592733 14.728292 -0.5198082 -3.999922 7.0927105 -7.0459595
-8.568869 -1.7953678 0.26313916 10.916714 -6.9374123 -5.4389 -0.46420583 -5.1085467 10.376568 -8.889225
1.857403 -6.2746415 2.8154466 -7.2338667 -2.293357 -0.37705845 -1.659806 2.6731026 -7.1909504 1.4608804
-0.05452765 5.4287076 5.0849075 -6.690375 -1.6183422 -2.163136 -0.17949677 4.0241547 0.11319201 0.601279
3.654291 0.94352573 -9.200294 -5.4749465 -3.5235846 2.039692 3.1910992 -11.649526 -8.121584 -4.8707457
1.3420814 4.240421 -2.772944 -2.8451524 16.311104 0.3851982 1.4231744 -2.3321972 0.99332285 14.121717
4.2969875 -1.762936 -12.5758915 8.595198 -0.8835239 5.899413 0.7384519 -17.760096 10.555021 4.1366534
-1.5708797 1.568961 1.1413603 3.5032008 -0.45251232 -0.3391071 -0.20792882 3.208204 0.8847948 -8.721497
-6.786333 16.89443 5.3366146 -8.789056 0.6355629 -6.432868 13.006379 4.8956 -9.155822 -1.9441519
3.2579517 -3.328322 7.5969577 0.66025066 -6.550468 5.7815638 -2.066733 10.425042 -0.8802383 -2.4314315
-9.148656 2.020372 -0.4615173 1.1965656 -3.8764873 -9.869258 0.35095334 -5.3549943 2.1076174 -8.290468
11.6562195 -6.0750933 12.182899 3.2218833 0.81969476 8.4433365 -4.689333 9.334139 -2.172678 -3.0250976
5.570001 -3.8459578 -7.205299 7.9262037 -7.6611166 8.394216 -3.2110903 -7.93868 2.3960824 -2.3213403
-5.249467 -2.2671914 7.2658715 -13.298164 4.821147 -1.4963245 -3.476059 4.132903 -10.893354 4.362673
-2.7263982 11.691089 -3.8918593 -2.838112 -1.0336838 -0.45456508 10.258634 -1.1655927 -6.7799754 0.22885278
-3.8034165 2.8536487 -5.60398 -1.1972581 1.3455094 -4.399287 2.333433 -4.84745 -4.2752337 -1.3577863
-3.4903061 2.2408795 5.5010734 -3.970756 11.99696 -1.0685898 9.505196 7.3062205 0.08708266 12.927811
-7.8858757 0.43160373 -5.5059714 4.3426995 16.322706 -9.57974 1.3936648 -1.9444873 5.776769 15.251903
11.635366 0.72157705 -9.245714 -3.91465 -4.449838 10.6118355 -1.4903594 -9.535318 -3.6553776 -1.6699586
-1.5716927 7.713747 -2.2430465 -6.198303 -13.481864 -0.5933151 7.600357 -4.8815503 -8.698617 -15.855757
2.8156567 -5.7812386 5.1456156 2.7289324 -14.505571 0.25632986 -7.2235737 0.9506656 0.7128582 -9.051738
13.270688 3.448231 -7.0659585 4.5886116 -4.466099 8.74869 -1.6426028 -6.5762258 2.506905 -6.7431564
-0.296428 -11.463529 -2.6076477 14.110243 -6.9725137 5.129912 -12.189555 -3.6435068 12.068113 -6.0059533
-1.9962958 2.7119343 19.391657 0.01961198 14.607133 -2.3535995 2.9014351 22.3082 -1.5563312 13.193291
-1.6695905 -4.391516 1.3131028 -6.670972 -5.888604 2.7583609 -7.468798 1.3407065 -4.599617 -6.2345777
12.0612335 5.9285784 3.3715196 1.492534 10.723728 10.7689295 7.137627 5.099476 0.3473359 9.647881
-0.95514804 -12.085431 ] -2.0484571 -5.8549366 ]
# get the score between enroll and test # get the score between enroll and test
Eembeddings Score: 0.4292638301849365 Eembeddings Score: 0.45332613587379456
``` ```
### 4.Pretrained Models ### 4.Pretrained Models
......
...@@ -51,45 +51,45 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav ...@@ -51,45 +51,45 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
输出: 输出:
```bash ```bash
demo [ 1.4217498 5.626253 -5.342073 1.1773866 3.308055 [ -1.3251206 7.8606825 -4.620626 0.3000721 2.2648535
1.756596 5.167894 10.80636 -3.8226728 -5.6141334 -1.1931441 3.0647137 7.673595 -6.0044727 -12.02426
2.623845 -0.8072968 1.9635103 -7.3128724 0.01103897 -1.9496069 3.1269536 1.618838 -7.6383104 -1.2299773
-9.723131 0.6619743 -6.976803 10.213478 7.494748 -12.338331 2.1373026 -5.3957124 9.717328 5.6752305
2.9105635 3.8949256 3.7999806 7.1061673 16.905321 3.7805123 3.0597172 3.429692 8.97601 13.174125
-7.1493764 8.733103 3.4230042 -4.831653 -11.403367 -0.53132284 8.9424715 4.46511 -4.4262476 -9.726503
11.232214 7.1274667 -4.2828417 2.452362 -5.130748 8.399328 7.2239175 -7.435854 2.9441683 -4.3430395
-18.177666 -2.6116815 -11.000337 -6.7314315 1.6564683 -13.886965 -1.6346735 -10.9027405 -5.311245 3.8007221
0.7618269 1.1253023 -2.083836 4.725744 -8.782597 3.8976038 -2.1230774 -2.3521194 4.151031 -7.4048667
-3.539873 3.814236 5.1420674 2.162061 4.096431 0.13911647 2.4626107 4.9664545 0.9897574 5.4839754
-6.4162116 12.747448 1.9429878 -15.152943 6.417416 -3.3574002 10.1340065 -0.6120171 -10.403095 4.6007543
16.097002 -9.716668 -1.9920526 -3.3649497 -1.871939 16.00935 -7.7836914 -4.1945305 -6.9368606 1.1789556
11.567354 3.69788 11.258265 7.442363 9.183411 11.490801 4.2380238 9.550931 8.375046 7.5089145
4.5281515 -1.2417862 4.3959084 6.6727695 5.8898783 -0.65707296 -0.30051577 2.8406055 3.0828028 0.730817
7.627124 -0.66919386 -11.889693 -9.208865 -7.4274073 6.148354 0.13766119 -13.424735 -7.7461405 -2.3227983
-3.7776625 6.917234 -9.848748 -2.0944717 -5.135116 -8.305252 2.9879124 -10.995229 0.15211068 -2.3820348
0.49563864 9.317534 -5.9141874 -1.8098574 -0.11738578 -1.7984174 8.495629 -5.8522367 -3.755498 0.6989711
-7.169265 -1.0578263 -5.7216787 -5.1173844 16.137651 -5.2702994 -2.6188622 -1.8828466 -4.64665 14.078544
-4.473626 7.6624317 -0.55381083 9.631587 -6.4704556 -0.5495333 10.579158 -3.2160501 9.349004 -4.381078
-8.548508 4.3716145 -0.79702514 4.478997 -2.9758704 -11.675817 -2.8630207 4.5721755 2.246612 -4.574342
3.272176 2.8382776 5.134597 -9.190781 -0.5657382 1.8610188 2.3767874 5.6257877 -9.784078 0.64967257
-4.8745747 2.3165567 -5.984303 -2.1798875 0.35541576 -1.4579505 0.4263264 -4.9211264 -2.454784 3.4869802
-0.31784213 9.493548 2.1144536 4.358092 -12.089823 -0.42654222 8.341269 1.356552 7.0966883 -13.102829
8.451689 -7.925461 4.6242585 4.4289427 18.692003 8.016734 -7.1159344 1.8699781 0.208721 14.699384
-2.6204622 -5.149185 -0.35821092 8.488551 4.981496 -1.025278 -2.6107233 -2.5082312 8.427193 6.9138527
-9.32683 -2.2544234 6.6417594 1.2119585 10.977129 -6.2912464 0.6157366 2.489688 -3.4668267 9.921763
16.555033 3.3238444 9.551863 -1.6676947 -0.79539716 11.200815 -0.1966403 7.4916005 -0.62312716 -0.25848144
-8.605674 -0.47356385 2.6741948 -5.359179 -2.6673796 -9.947997 -0.9611041 1.1649219 -2.1907122 -1.5028487
0.66607 15.443222 4.740594 -3.4725387 11.592567 -0.51926106 15.165954 2.4649463 -0.9980445 7.4416637
-2.054497 1.7361217 -8.265324 -9.30447 5.4068313 -2.0768049 3.5896823 -7.3055434 -7.5620847 4.323335
-1.5180256 -7.746615 -6.089606 0.07112726 -0.34904733 0.0804418 -6.56401 -2.3148053 -1.7642345 -2.4708817
-8.649895 -9.998958 -2.564841 -0.53999114 2.601808 -7.675618 -9.548878 -1.0177554 0.16986446 2.5877135
-0.31927416 -1.8815292 -2.07215 -3.4105783 -8.2998085 -1.8752296 -0.36614323 -6.0493784 -2.3965611 -5.9453387
1.483641 -15.365992 -8.288208 3.8847756 -3.4876456 0.9424033 -13.155974 -7.457801 0.14658108 -3.742797
7.3629923 0.4657332 3.132599 12.438889 -1.8337058 5.8414927 -1.2872906 5.5694313 12.57059 1.0939219
4.532936 2.7264361 10.145339 -6.521951 2.897153 2.2142086 1.9181576 6.9914207 -5.888139 3.1409824
-3.3925855 5.079156 7.759716 4.677565 5.8457737 -2.003628 2.4434285 9.973139 5.03668 2.0051203
2.402413 7.7071047 3.9711342 -6.390043 6.1268735 2.8615603 5.860224 2.9176188 -1.6311141 2.0292206
-3.7760346 -11.118123 ] -4.070415 -6.831437 ]
``` ```
- Python API - Python API
...@@ -125,88 +125,88 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav ...@@ -125,88 +125,88 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
```bash ```bash
# Vector Result: # Vector Result:
Audio embedding Result: Audio embedding Result:
[ 1.4217498 5.626253 -5.342073 1.1773866 3.308055 [ -1.3251206 7.8606825 -4.620626 0.3000721 2.2648535
1.756596 5.167894 10.80636 -3.8226728 -5.6141334 -1.1931441 3.0647137 7.673595 -6.0044727 -12.02426
2.623845 -0.8072968 1.9635103 -7.3128724 0.01103897 -1.9496069 3.1269536 1.618838 -7.6383104 -1.2299773
-9.723131 0.6619743 -6.976803 10.213478 7.494748 -12.338331 2.1373026 -5.3957124 9.717328 5.6752305
2.9105635 3.8949256 3.7999806 7.1061673 16.905321 3.7805123 3.0597172 3.429692 8.97601 13.174125
-7.1493764 8.733103 3.4230042 -4.831653 -11.403367 -0.53132284 8.9424715 4.46511 -4.4262476 -9.726503
11.232214 7.1274667 -4.2828417 2.452362 -5.130748 8.399328 7.2239175 -7.435854 2.9441683 -4.3430395
-18.177666 -2.6116815 -11.000337 -6.7314315 1.6564683 -13.886965 -1.6346735 -10.9027405 -5.311245 3.8007221
0.7618269 1.1253023 -2.083836 4.725744 -8.782597 3.8976038 -2.1230774 -2.3521194 4.151031 -7.4048667
-3.539873 3.814236 5.1420674 2.162061 4.096431 0.13911647 2.4626107 4.9664545 0.9897574 5.4839754
-6.4162116 12.747448 1.9429878 -15.152943 6.417416 -3.3574002 10.1340065 -0.6120171 -10.403095 4.6007543
16.097002 -9.716668 -1.9920526 -3.3649497 -1.871939 16.00935 -7.7836914 -4.1945305 -6.9368606 1.1789556
11.567354 3.69788 11.258265 7.442363 9.183411 11.490801 4.2380238 9.550931 8.375046 7.5089145
4.5281515 -1.2417862 4.3959084 6.6727695 5.8898783 -0.65707296 -0.30051577 2.8406055 3.0828028 0.730817
7.627124 -0.66919386 -11.889693 -9.208865 -7.4274073 6.148354 0.13766119 -13.424735 -7.7461405 -2.3227983
-3.7776625 6.917234 -9.848748 -2.0944717 -5.135116 -8.305252 2.9879124 -10.995229 0.15211068 -2.3820348
0.49563864 9.317534 -5.9141874 -1.8098574 -0.11738578 -1.7984174 8.495629 -5.8522367 -3.755498 0.6989711
-7.169265 -1.0578263 -5.7216787 -5.1173844 16.137651 -5.2702994 -2.6188622 -1.8828466 -4.64665 14.078544
-4.473626 7.6624317 -0.55381083 9.631587 -6.4704556 -0.5495333 10.579158 -3.2160501 9.349004 -4.381078
-8.548508 4.3716145 -0.79702514 4.478997 -2.9758704 -11.675817 -2.8630207 4.5721755 2.246612 -4.574342
3.272176 2.8382776 5.134597 -9.190781 -0.5657382 1.8610188 2.3767874 5.6257877 -9.784078 0.64967257
-4.8745747 2.3165567 -5.984303 -2.1798875 0.35541576 -1.4579505 0.4263264 -4.9211264 -2.454784 3.4869802
-0.31784213 9.493548 2.1144536 4.358092 -12.089823 -0.42654222 8.341269 1.356552 7.0966883 -13.102829
8.451689 -7.925461 4.6242585 4.4289427 18.692003 8.016734 -7.1159344 1.8699781 0.208721 14.699384
-2.6204622 -5.149185 -0.35821092 8.488551 4.981496 -1.025278 -2.6107233 -2.5082312 8.427193 6.9138527
-9.32683 -2.2544234 6.6417594 1.2119585 10.977129 -6.2912464 0.6157366 2.489688 -3.4668267 9.921763
16.555033 3.3238444 9.551863 -1.6676947 -0.79539716 11.200815 -0.1966403 7.4916005 -0.62312716 -0.25848144
-8.605674 -0.47356385 2.6741948 -5.359179 -2.6673796 -9.947997 -0.9611041 1.1649219 -2.1907122 -1.5028487
0.66607 15.443222 4.740594 -3.4725387 11.592567 -0.51926106 15.165954 2.4649463 -0.9980445 7.4416637
-2.054497 1.7361217 -8.265324 -9.30447 5.4068313 -2.0768049 3.5896823 -7.3055434 -7.5620847 4.323335
-1.5180256 -7.746615 -6.089606 0.07112726 -0.34904733 0.0804418 -6.56401 -2.3148053 -1.7642345 -2.4708817
-8.649895 -9.998958 -2.564841 -0.53999114 2.601808 -7.675618 -9.548878 -1.0177554 0.16986446 2.5877135
-0.31927416 -1.8815292 -2.07215 -3.4105783 -8.2998085 -1.8752296 -0.36614323 -6.0493784 -2.3965611 -5.9453387
1.483641 -15.365992 -8.288208 3.8847756 -3.4876456 0.9424033 -13.155974 -7.457801 0.14658108 -3.742797
7.3629923 0.4657332 3.132599 12.438889 -1.8337058 5.8414927 -1.2872906 5.5694313 12.57059 1.0939219
4.532936 2.7264361 10.145339 -6.521951 2.897153 2.2142086 1.9181576 6.9914207 -5.888139 3.1409824
-3.3925855 5.079156 7.759716 4.677565 5.8457737 -2.003628 2.4434285 9.973139 5.03668 2.0051203
2.402413 7.7071047 3.9711342 -6.390043 6.1268735 2.8615603 5.860224 2.9176188 -1.6311141 2.0292206
-3.7760346 -11.118123 ] -4.070415 -6.831437 ]
# get the test embedding # get the test embedding
Test embedding Result: Test embedding Result:
[ -1.902964 2.0690894 -8.034194 3.5472693 0.18089125 [ 2.5247195 5.119042 -4.335273 4.4583654 5.047907
6.9085927 1.4097427 -1.9487704 -10.021278 -0.20755845 3.5059214 1.6159848 0.49364898 -11.6899185 -3.1014526
-8.04332 4.344489 2.3200977 -14.306299 5.184692 -5.6589785 -0.42684984 2.674276 -11.937654 6.2248464
-11.55602 -3.8497238 0.6444722 1.2833948 2.6766639 -10.776924 -5.694543 1.112041 1.5709964 1.0961034
0.5878921 0.7946299 1.7207596 2.5791872 14.998469 1.3976512 2.324352 1.339981 5.279319 13.734659
-1.3385371 15.031221 -0.8006958 1.99287 -9.52007 -2.5753925 13.651442 -2.2357535 5.1575427 -3.251567
2.435466 4.003221 -4.33817 -4.898601 -5.304714 1.4023279 6.1191974 -6.0845175 -1.3646189 -2.6789894
-18.033886 10.790787 -12.784645 -5.641755 2.9761686 -15.220778 9.779349 -9.411551 -6.388947 6.8313975
-10.566622 1.4839455 6.152458 -5.7195854 2.8603241 -9.245996 0.31196198 2.5509644 -4.413065 6.1649427
6.112133 8.489869 5.5958056 1.2836679 -1.2293907 6.793837 2.6328635 8.620976 3.4832475 0.52491665
0.89927405 7.0288725 -2.854029 -0.9782962 5.8255906 2.9115407 5.8392377 0.6702376 -3.2726715 2.6694255
14.905906 -5.025907 0.7866458 -4.2444224 -16.354029 16.91701 -5.5811176 0.23362345 -4.5573606 -11.801059
10.521315 0.9604709 -3.3257897 7.144871 -13.592733 14.728292 -0.5198082 -3.999922 7.0927105 -7.0459595
-8.568869 -1.7953678 0.26313916 10.916714 -6.9374123 -5.4389 -0.46420583 -5.1085467 10.376568 -8.889225
1.857403 -6.2746415 2.8154466 -7.2338667 -2.293357 -0.37705845 -1.659806 2.6731026 -7.1909504 1.4608804
-0.05452765 5.4287076 5.0849075 -6.690375 -1.6183422 -2.163136 -0.17949677 4.0241547 0.11319201 0.601279
3.654291 0.94352573 -9.200294 -5.4749465 -3.5235846 2.039692 3.1910992 -11.649526 -8.121584 -4.8707457
1.3420814 4.240421 -2.772944 -2.8451524 16.311104 0.3851982 1.4231744 -2.3321972 0.99332285 14.121717
4.2969875 -1.762936 -12.5758915 8.595198 -0.8835239 5.899413 0.7384519 -17.760096 10.555021 4.1366534
-1.5708797 1.568961 1.1413603 3.5032008 -0.45251232 -0.3391071 -0.20792882 3.208204 0.8847948 -8.721497
-6.786333 16.89443 5.3366146 -8.789056 0.6355629 -6.432868 13.006379 4.8956 -9.155822 -1.9441519
3.2579517 -3.328322 7.5969577 0.66025066 -6.550468 5.7815638 -2.066733 10.425042 -0.8802383 -2.4314315
-9.148656 2.020372 -0.4615173 1.1965656 -3.8764873 -9.869258 0.35095334 -5.3549943 2.1076174 -8.290468
11.6562195 -6.0750933 12.182899 3.2218833 0.81969476 8.4433365 -4.689333 9.334139 -2.172678 -3.0250976
5.570001 -3.8459578 -7.205299 7.9262037 -7.6611166 8.394216 -3.2110903 -7.93868 2.3960824 -2.3213403
-5.249467 -2.2671914 7.2658715 -13.298164 4.821147 -1.4963245 -3.476059 4.132903 -10.893354 4.362673
-2.7263982 11.691089 -3.8918593 -2.838112 -1.0336838 -0.45456508 10.258634 -1.1655927 -6.7799754 0.22885278
-3.8034165 2.8536487 -5.60398 -1.1972581 1.3455094 -4.399287 2.333433 -4.84745 -4.2752337 -1.3577863
-3.4903061 2.2408795 5.5010734 -3.970756 11.99696 -1.0685898 9.505196 7.3062205 0.08708266 12.927811
-7.8858757 0.43160373 -5.5059714 4.3426995 16.322706 -9.57974 1.3936648 -1.9444873 5.776769 15.251903
11.635366 0.72157705 -9.245714 -3.91465 -4.449838 10.6118355 -1.4903594 -9.535318 -3.6553776 -1.6699586
-1.5716927 7.713747 -2.2430465 -6.198303 -13.481864 -0.5933151 7.600357 -4.8815503 -8.698617 -15.855757
2.8156567 -5.7812386 5.1456156 2.7289324 -14.505571 0.25632986 -7.2235737 0.9506656 0.7128582 -9.051738
13.270688 3.448231 -7.0659585 4.5886116 -4.466099 8.74869 -1.6426028 -6.5762258 2.506905 -6.7431564
-0.296428 -11.463529 -2.6076477 14.110243 -6.9725137 5.129912 -12.189555 -3.6435068 12.068113 -6.0059533
-1.9962958 2.7119343 19.391657 0.01961198 14.607133 -2.3535995 2.9014351 22.3082 -1.5563312 13.193291
-1.6695905 -4.391516 1.3131028 -6.670972 -5.888604 2.7583609 -7.468798 1.3407065 -4.599617 -6.2345777
12.0612335 5.9285784 3.3715196 1.492534 10.723728 10.7689295 7.137627 5.099476 0.3473359 9.647881
-0.95514804 -12.085431 ] -2.0484571 -5.8549366 ]
# get the score between enroll and test # get the score between enroll and test
Eembeddings Score: 0.4292638301849365 Eembeddings Score: 0.45332613587379456
``` ```
### 4.预训练模型 ### 4.预训练模型
......
...@@ -274,12 +274,12 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee ...@@ -274,12 +274,12 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
Output: Output:
```bash ```bash
[2022-05-08 00:18:44,249] [ INFO] - vector http client start [2022-05-25 12:25:36,165] [ INFO] - vector http client start
[2022-05-08 00:18:44,250] [ INFO] - the input audio: 85236145389.wav [2022-05-25 12:25:36,165] [ INFO] - the input audio: 85236145389.wav
[2022-05-08 00:18:44,250] [ INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector [2022-05-25 12:25:36,165] [ INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector
[2022-05-08 00:18:44,250] [ INFO] - http://127.0.0.1:8590/paddlespeech/vector [2022-05-25 12:25:36,166] [ INFO] - http://127.0.0.1:8790/paddlespeech/vector
[2022-05-08 00:18:44,406] [ INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}} [2022-05-25 12:25:36,324] [ INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
[2022-05-08 00:18:44,406] [ INFO] - Response time 0.156481 s. [2022-05-25 12:25:36,324] [ INFO] - Response time 0.159053 s.
``` ```
* Python API * Python API
...@@ -299,7 +299,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee ...@@ -299,7 +299,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
Output: Output:
``` bash ``` bash
{'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}} {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
``` ```
#### 7.2 Get the score between speaker audio embedding #### 7.2 Get the score between speaker audio embedding
...@@ -331,12 +331,12 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee ...@@ -331,12 +331,12 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
Output: Output:
``` bash ``` bash
[2022-05-09 10:28:40,556] [ INFO] - vector score http client start [2022-05-25 12:33:24,527] [ INFO] - vector score http client start
[2022-05-09 10:28:40,556] [ INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav [2022-05-25 12:33:24,527] [ INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
[2022-05-09 10:28:40,556] [ INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector/score [2022-05-25 12:33:24,528] [ INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
[2022-05-09 10:28:40,731] [ INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}} [2022-05-25 12:33:24,695] [ INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
[2022-05-09 10:28:40,731] [ INFO] - The vector: None [2022-05-25 12:33:24,696] [ INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
[2022-05-09 10:28:40,731] [ INFO] - Response time 0.175514 s. [2022-05-25 12:33:24,696] [ INFO] - Response time 0.168271 s.
``` ```
* Python API * Python API
...@@ -358,10 +358,11 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee ...@@ -358,10 +358,11 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
Output: Output:
``` bash ``` bash
[2022-05-09 10:34:54,769] [ INFO] - vector score http client start [2022-05-25 12:30:14,143] [ INFO] - vector score http client start
[2022-05-09 10:34:54,771] [ INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav [2022-05-25 12:30:14,143] [ INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
[2022-05-09 10:34:54,771] [ INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector/score [2022-05-25 12:30:14,143] [ INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
[2022-05-09 10:34:55,026] [ INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}} [2022-05-25 12:30:14,363] [ INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
{'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
``` ```
### 8. Punctuation prediction ### 8. Punctuation prediction
......
...@@ -277,12 +277,12 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee ...@@ -277,12 +277,12 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
输出: 输出:
``` bash ``` bash
[2022-05-08 00:18:44,249] [ INFO] - vector http client start [2022-05-25 12:25:36,165] [ INFO] - vector http client start
[2022-05-08 00:18:44,250] [ INFO] - the input audio: 85236145389.wav [2022-05-25 12:25:36,165] [ INFO] - the input audio: 85236145389.wav
[2022-05-08 00:18:44,250] [ INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector [2022-05-25 12:25:36,165] [ INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector
[2022-05-08 00:18:44,250] [ INFO] - http://127.0.0.1:8590/paddlespeech/vector [2022-05-25 12:25:36,166] [ INFO] - http://127.0.0.1:8790/paddlespeech/vector
[2022-05-08 00:18:44,406] [ INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}} [2022-05-25 12:25:36,324] [ INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
[2022-05-08 00:18:44,406] [ INFO] - Response time 0.156481 s. [2022-05-25 12:25:36,324] [ INFO] - Response time 0.159053 s.
``` ```
* Python API * Python API
...@@ -302,7 +302,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee ...@@ -302,7 +302,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
输出: 输出:
``` bash ``` bash
{'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [1.421751856803894, 5.626245498657227, -5.342077255249023, 1.1773887872695923, 3.3080549240112305, 1.7565933465957642, 5.167886257171631, 10.806358337402344, -3.8226819038391113, -5.614140033721924, 2.6238479614257812, -0.8072972893714905, 1.9635076522827148, -7.312870025634766, 0.011035939678549767, -9.723129272460938, 0.6619706153869629, -6.976806163787842, 10.213476181030273, 7.494769096374512, 2.9105682373046875, 3.8949244022369385, 3.799983501434326, 7.106168746948242, 16.90532875061035, -7.149388313293457, 8.733108520507812, 3.423006296157837, -4.831653594970703, -11.403363227844238, 11.232224464416504, 7.127461910247803, -4.282842636108398, 2.452359437942505, -5.130749702453613, -18.17766761779785, -2.6116831302642822, -11.000344276428223, -6.731433391571045, 1.6564682722091675, 0.7618281245231628, 1.125300407409668, -2.0838370323181152, 4.725743293762207, -8.782588005065918, -3.5398752689361572, 3.8142364025115967, 5.142068862915039, 2.1620609760284424, 4.09643030166626, -6.416214942932129, 12.747446060180664, 1.9429892301559448, -15.15294361114502, 6.417416095733643, 16.09701156616211, -9.716667175292969, -1.9920575618743896, -3.36494779586792, -1.8719440698623657, 11.567351341247559, 3.6978814601898193, 11.258262634277344, 7.442368507385254, 9.183408737182617, 4.528149127960205, -1.2417854070663452, 4.395912170410156, 6.6727728843688965, 5.88988733291626, 7.627128601074219, -0.6691966652870178, -11.889698028564453, -9.20886516571045, -7.42740535736084, -3.777663230895996, 6.917238712310791, -9.848755836486816, -2.0944676399230957, -5.1351165771484375, 0.4956451654434204, 9.317537307739258, -5.914181232452393, -1.809860348701477, -0.11738915741443634, -7.1692705154418945, -1.057827353477478, -5.721670627593994, -5.117385387420654, 16.13765525817871, -4.473617076873779, 7.6624321937561035, -0.55381840467453, 9.631585121154785, -6.470459461212158, -8.548508644104004, 4.371616840362549, -0.7970245480537415, 4.4789886474609375, -2.975860834121704, 3.2721822261810303, 2.838287830352783, 5.134591102600098, -9.19079875946045, -0.5657302737236023, -4.8745832443237305, 2.3165574073791504, -5.984319686889648, -2.1798853874206543, 0.3554139733314514, -0.3178512752056122, 9.493552207946777, 2.1144471168518066, 4.358094692230225, -12.089824676513672, 8.451693534851074, -7.925466537475586, 4.624246597290039, 4.428936958312988, 18.69200897216797, -2.6204581260681152, -5.14918851852417, -0.3582090139389038, 8.488558769226074, 4.98148775100708, -9.326835632324219, -2.2544219493865967, 6.641760349273682, 1.2119598388671875, 10.977124214172363, 16.555034637451172, 3.3238420486450195, 9.551861763000488, -1.6676981449127197, -0.7953944206237793, -8.605667114257812, -0.4735655188560486, 2.674196243286133, -5.359177112579346, -2.66738224029541, 0.6660683155059814, 15.44322681427002, 4.740593433380127, -3.472534418106079, 11.592567443847656, -2.0544962882995605, 1.736127495765686, -8.265326499938965, -9.30447769165039, 5.406829833984375, -1.518022894859314, -7.746612548828125, -6.089611053466797, 0.07112743705511093, -0.3490503430366516, -8.64989185333252, -9.998957633972168, -2.564845085144043, -0.5399947762489319, 2.6018123626708984, -0.3192799389362335, -1.8815255165100098, -2.0721492767333984, -3.410574436187744, -8.29980754852295, 1.483638048171997, -15.365986824035645, -8.288211822509766, 3.884779930114746, -3.4876468181610107, 7.362999439239502, 0.4657334089279175, 3.1326050758361816, 12.438895225524902, -1.8337041139602661, 4.532927989959717, 2.7264339923858643, 10.14534854888916, -6.521963596343994, 2.897155523300171, -3.392582654953003, 5.079153060913086, 7.7597246170043945, 4.677570819854736, 5.845779895782471, 2.402411460876465, 7.7071051597595215, 3.9711380004882812, -6.39003849029541, 6.12687873840332, -3.776029348373413, -11.118121147155762]}} {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
``` ```
#### 7.2 音频声纹打分 #### 7.2 音频声纹打分
...@@ -333,12 +333,12 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee ...@@ -333,12 +333,12 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
输出: 输出:
``` bash ``` bash
[2022-05-09 10:28:40,556] [ INFO] - vector score http client start [2022-05-25 12:33:24,527] [ INFO] - vector score http client start
[2022-05-09 10:28:40,556] [ INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav [2022-05-25 12:33:24,527] [ INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
[2022-05-09 10:28:40,556] [ INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/vector/score [2022-05-25 12:33:24,528] [ INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
[2022-05-09 10:28:40,731] [ INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}} [2022-05-25 12:33:24,695] [ INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
[2022-05-09 10:28:40,731] [ INFO] - The vector: None [2022-05-25 12:33:24,696] [ INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
[2022-05-09 10:28:40,731] [ INFO] - Response time 0.175514 s. [2022-05-25 12:33:24,696] [ INFO] - Response time 0.168271 s.
``` ```
* Python API * Python API
...@@ -360,10 +360,11 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee ...@@ -360,10 +360,11 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
输出: 输出:
``` bash ``` bash
[2022-05-09 10:34:54,769] [ INFO] - vector score http client start [2022-05-25 12:30:14,143] [ INFO] - vector score http client start
[2022-05-09 10:34:54,771] [ INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav [2022-05-25 12:30:14,143] [ INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
[2022-05-09 10:34:54,771] [ INFO] - endpoint: http://127.0.0.1:8590/paddlespeech/vector/score [2022-05-25 12:30:14,143] [ INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
[2022-05-09 10:34:55,026] [ INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.4292638897895813}} [2022-05-25 12:30:14,363] [ INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
{'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
``` ```
......
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
# SERVER SETTING # # SERVER SETTING #
################################################################################# #################################################################################
host: 0.0.0.0 host: 0.0.0.0
port: 8090 port: 8091
# The task format in the engin_list is: <speech task>_<engine type> # The task format in the engin_list is: <speech task>_<engine type>
# task choices = ['asr_online'] # task choices = ['asr_online']
......
...@@ -13,9 +13,7 @@ ...@@ -13,9 +13,7 @@
# limitations under the License. # limitations under the License.
#!/usr/bin/python #!/usr/bin/python
# -*- coding: UTF-8 -*- # -*- coding: UTF-8 -*-
# script for calc RTF: grep -rn RTF log.txt | awk '{print $NF}' | awk -F "=" '{sum += $NF} END {print "all time",sum, "audio num", NR, "RTF", sum/NR}' # script for calc RTF: grep -rn RTF log.txt | awk '{print $NF}' | awk -F "=" '{sum += $NF} END {print "all time",sum, "audio num", NR, "RTF", sum/NR}'
import argparse import argparse
import asyncio import asyncio
import codecs import codecs
......
...@@ -92,5 +92,3 @@ server 的 demo: [streaming_asr_server](https://github.com/PaddlePaddle/Paddle ...@@ -92,5 +92,3 @@ server 的 demo: [streaming_asr_server](https://github.com/PaddlePaddle/Paddle
## 4. 快速开始 ## 4. 快速开始
关于如果使用 PP-ASR,可以看这里的 [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md),其中提供了 **简单****中等****困难** 三种安装方式。如果想体验 paddlespeech 的推理功能,可以用 **简单** 安装方式。 关于如果使用 PP-ASR,可以看这里的 [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md),其中提供了 **简单****中等****困难** 三种安装方式。如果想体验 paddlespeech 的推理功能,可以用 **简单** 安装方式。
...@@ -4,7 +4,7 @@ There are 3 ways to use `PaddleSpeech`. According to the degree of difficulty, t ...@@ -4,7 +4,7 @@ There are 3 ways to use `PaddleSpeech`. According to the degree of difficulty, t
| Way | Function | Support| | Way | Function | Support|
|:---- |:----------------------------------------------------------- |:----| |:---- |:----------------------------------------------------------- |:----|
| Easy | (1) Use command-line functions of PaddleSpeech. <br> (2) Experience PaddleSpeech on Ai Studio. | Linux, Mac(not support M1 chip),Windows | | Easy | (1) Use command-line functions of PaddleSpeech. <br> (2) Experience PaddleSpeech on Ai Studio. | Linux, Mac(not support M1 chip),Windows ( For more information about installation, see [#1195](https://github.com/PaddlePaddle/PaddleSpeech/discussions/1195)) |
| Medium | Support major functions ,such as using the` ready-made `examples and using PaddleSpeech to train your model. | Linux | | Medium | Support major functions ,such as using the` ready-made `examples and using PaddleSpeech to train your model. | Linux |
| Hard | Support full function of Paddlespeech, including using join ctc decoder with kaldi, training n-gram language model, Montreal-Forced-Aligner, and so on. And you are more able to be a developer! | Ubuntu | | Hard | Support full function of Paddlespeech, including using join ctc decoder with kaldi, training n-gram language model, Montreal-Forced-Aligner, and so on. And you are more able to be a developer! | Ubuntu |
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
`PaddleSpeech` 有三种安装方法。根据安装的难易程度,这三种方法可以分为 **简单**, **中等****困难**. `PaddleSpeech` 有三种安装方法。根据安装的难易程度,这三种方法可以分为 **简单**, **中等****困难**.
| 方式 | 功能 | 支持系统 | | 方式 | 功能 | 支持系统 |
| :--- | :----------------------------------------------------------- | :------------------ | | :--- | :----------------------------------------------------------- | :------------------ |
| 简单 | (1) 使用 PaddleSpeech 的命令行功能. <br> (2) 在 Aistudio上体验 PaddleSpeech. | Linux, Mac(不支持M1芯片),Windows | | 简单 | (1) 使用 PaddleSpeech 的命令行功能. <br> (2) 在 Aistudio上体验 PaddleSpeech. | Linux, Mac(不支持M1芯片),Windows (安装详情查看[#1195](https://github.com/PaddlePaddle/PaddleSpeech/discussions/1195)) |
| 中等 | 支持 PaddleSpeech 主要功能,比如使用已有 examples 中的模型和使用 PaddleSpeech 来训练自己的模型. | Linux | | 中等 | 支持 PaddleSpeech 主要功能,比如使用已有 examples 中的模型和使用 PaddleSpeech 来训练自己的模型. | Linux |
| 困难 | 支持 PaddleSpeech 的各项功能,包含结合kaldi使用 join ctc decoder 方式解码,训练语言模型,使用强制对齐等。并且你更能成为一名开发者! | Ubuntu | | 困难 | 支持 PaddleSpeech 的各项功能,包含结合kaldi使用 join ctc decoder 方式解码,训练语言模型,使用强制对齐等。并且你更能成为一名开发者! | Ubuntu |
## 先决条件 ## 先决条件
......
...@@ -82,7 +82,7 @@ PANN | ESC-50 |[pann-esc50](../../examples/esc50/cls0)|[esc50_cnn6.tar.gz](https ...@@ -82,7 +82,7 @@ PANN | ESC-50 |[pann-esc50](../../examples/esc50/cls0)|[esc50_cnn6.tar.gz](https
Model Type | Dataset| Example Link | Pretrained Models | Static Models Model Type | Dataset| Example Link | Pretrained Models | Static Models
:-------------:| :------------:| :-----: | :-----: | :-----: :-------------:| :------------:| :-----: | :-----: | :-----:
PANN | VoxCeleb| [voxceleb_ecapatdnn](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0) | [ecapatdnn.tar.gz](https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz) | - ECAPA-TDNN | VoxCeleb| [voxceleb_ecapatdnn](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0) | [ecapatdnn.tar.gz](https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_1.tar.gz) | -
## Punctuation Restoration Models ## Punctuation Restoration Models
Model Type | Dataset| Example Link | Pretrained Models Model Type | Dataset| Example Link | Pretrained Models
......
...@@ -6,15 +6,8 @@ AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpu ...@@ -6,15 +6,8 @@ AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpu
We use AISHELL-3 to train a multi-speaker fastspeech2 model here. We use AISHELL-3 to train a multi-speaker fastspeech2 model here.
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download AISHELL-3. Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
```bash
wget https://www.openslr.org/resources/93/data_aishell3.tgz
```
Extract AISHELL-3.
```bash
mkdir data_aishell3
tar zxvf data_aishell3.tgz -C data_aishell3
```
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2. We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo. You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
......
...@@ -6,15 +6,8 @@ This example contains code used to train a [Tacotron2](https://arxiv.org/abs/171 ...@@ -6,15 +6,8 @@ This example contains code used to train a [Tacotron2](https://arxiv.org/abs/171
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download AISHELL-3. Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
```bash
wget https://www.openslr.org/resources/93/data_aishell3.tgz
```
Extract AISHELL-3.
```bash
mkdir data_aishell3
tar zxvf data_aishell3.tgz -C data_aishell3
```
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get phonemes for Tacotron2, the durations of MFA are not needed here. We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get phonemes for Tacotron2, the durations of MFA are not needed here.
You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo. You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
......
...@@ -6,15 +6,8 @@ This example contains code used to train a [FastSpeech2](https://arxiv.org/abs/2 ...@@ -6,15 +6,8 @@ This example contains code used to train a [FastSpeech2](https://arxiv.org/abs/2
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download AISHELL-3. Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
```bash
wget https://www.openslr.org/resources/93/data_aishell3.tgz
```
Extract AISHELL-3.
```bash
mkdir data_aishell3
tar zxvf data_aishell3.tgz -C data_aishell3
```
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2. We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo. You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
......
...@@ -4,15 +4,8 @@ This example contains code used to train a [parallel wavegan](http://arxiv.org/a ...@@ -4,15 +4,8 @@ This example contains code used to train a [parallel wavegan](http://arxiv.org/a
AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus that could be used to train multi-speaker Text-to-Speech (TTS) systems. AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus that could be used to train multi-speaker Text-to-Speech (TTS) systems.
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download AISHELL-3. Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
```bash
wget https://www.openslr.org/resources/93/data_aishell3.tgz
```
Extract AISHELL-3.
```bash
mkdir data_aishell3
tar zxvf data_aishell3.tgz -C data_aishell3
```
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2. We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo. You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
......
...@@ -4,15 +4,7 @@ This example contains code used to train a [HiFiGAN](https://arxiv.org/abs/2010. ...@@ -4,15 +4,7 @@ This example contains code used to train a [HiFiGAN](https://arxiv.org/abs/2010.
AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus that could be used to train multi-speaker Text-to-Speech (TTS) systems. AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus that could be used to train multi-speaker Text-to-Speech (TTS) systems.
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download AISHELL-3. Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
```bash
wget https://www.openslr.org/resources/93/data_aishell3.tgz
```
Extract AISHELL-3.
```bash
mkdir data_aishell3
tar zxvf data_aishell3.tgz -C data_aishell3
```
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2. We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo. You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
......
...@@ -26,4 +26,7 @@ Use the following command to run diarization on AMI corpus. ...@@ -26,4 +26,7 @@ Use the following command to run diarization on AMI corpus.
./run.sh --data_folder ./amicorpus --manual_annot_folder ./ami_public_manual_1.6.2 ./run.sh --data_folder ./amicorpus --manual_annot_folder ./ami_public_manual_1.6.2
``` ```
## Results (DER) coming soon! :) ## Best performance in terms of Diarization Error Rate (DER).
| System | Mic. |Orcl. (Dev)|Orcl. (Eval)| Est. (Dev) |Est. (Eval)|
| --------|-------- | ---------|----------- | --------|-----------|
| ECAPA-TDNN + SC | HeadsetMix| 1.54 % | 3.07 %| 1.56 %| 3.28 % |
...@@ -3,7 +3,7 @@ This example contains code used to train a [Tacotron2](https://arxiv.org/abs/171 ...@@ -3,7 +3,7 @@ This example contains code used to train a [Tacotron2](https://arxiv.org/abs/171
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/source). Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get phonemes for Tacotron2, the durations of MFA are not needed here. We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get phonemes for Tacotron2, the durations of MFA are not needed here.
......
...@@ -3,7 +3,7 @@ This example contains code used to train a [SpeedySpeech](http://arxiv.org/abs/2 ...@@ -3,7 +3,7 @@ This example contains code used to train a [SpeedySpeech](http://arxiv.org/abs/2
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/source). Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for SPEEDYSPEECH. We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for SPEEDYSPEECH.
......
...@@ -4,7 +4,7 @@ This example contains code used to train a [Fastspeech2](https://arxiv.org/abs/2 ...@@ -4,7 +4,7 @@ This example contains code used to train a [Fastspeech2](https://arxiv.org/abs/2
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/source). Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for fastspeech2. We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for fastspeech2.
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
## 数据集 ## 数据集
### 下载并解压 ### 下载并解压
[官方网站](https://test.data-baker.com/data/index/source) 下载数据集 [官方网站](https://test.data-baker.com/data/index/TNtts/) 下载数据集
### 获取MFA结果并解压 ### 获取MFA结果并解压
我们使用 [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) 去获得 fastspeech2 的音素持续时间。 我们使用 [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) 去获得 fastspeech2 的音素持续时间。
......
# This configuration tested on 4 GPUs (V100) with 32GB GPU
# memory. It takes around 2 weeks to finish the training
# but 100k iters model should generate reasonable results.
###########################################################
# FEATURE EXTRACTION SETTING #
###########################################################
fs: 22050 # sr
n_fft: 1024 # FFT size (samples).
n_shift: 256 # Hop size (samples). 12.5ms
win_length: null # Window length (samples). 50ms
# If set to null, it will be the same as fft_size.
window: "hann" # Window function.
##########################################################
# TTS MODEL SETTING #
##########################################################
model:
# generator related
generator_type: vits_generator
generator_params:
hidden_channels: 192
spks: -1
global_channels: -1
segment_size: 32
text_encoder_attention_heads: 2
text_encoder_ffn_expand: 4
text_encoder_blocks: 6
text_encoder_positionwise_layer_type: "conv1d"
text_encoder_positionwise_conv_kernel_size: 3
text_encoder_positional_encoding_layer_type: "rel_pos"
text_encoder_self_attention_layer_type: "rel_selfattn"
text_encoder_activation_type: "swish"
text_encoder_normalize_before: True
text_encoder_dropout_rate: 0.1
text_encoder_positional_dropout_rate: 0.0
text_encoder_attention_dropout_rate: 0.1
use_macaron_style_in_text_encoder: True
use_conformer_conv_in_text_encoder: False
text_encoder_conformer_kernel_size: -1
decoder_kernel_size: 7
decoder_channels: 512
decoder_upsample_scales: [8, 8, 2, 2]
decoder_upsample_kernel_sizes: [16, 16, 4, 4]
decoder_resblock_kernel_sizes: [3, 7, 11]
decoder_resblock_dilations: [[1, 3, 5], [1, 3, 5], [1, 3, 5]]
use_weight_norm_in_decoder: True
posterior_encoder_kernel_size: 5
posterior_encoder_layers: 16
posterior_encoder_stacks: 1
posterior_encoder_base_dilation: 1
posterior_encoder_dropout_rate: 0.0
use_weight_norm_in_posterior_encoder: True
flow_flows: 4
flow_kernel_size: 5
flow_base_dilation: 1
flow_layers: 4
flow_dropout_rate: 0.0
use_weight_norm_in_flow: True
use_only_mean_in_flow: True
stochastic_duration_predictor_kernel_size: 3
stochastic_duration_predictor_dropout_rate: 0.5
stochastic_duration_predictor_flows: 4
stochastic_duration_predictor_dds_conv_layers: 3
# discriminator related
discriminator_type: hifigan_multi_scale_multi_period_discriminator
discriminator_params:
scales: 1
scale_downsample_pooling: "AvgPool1D"
scale_downsample_pooling_params:
kernel_size: 4
stride: 2
padding: 2
scale_discriminator_params:
in_channels: 1
out_channels: 1
kernel_sizes: [15, 41, 5, 3]
channels: 128
max_downsample_channels: 1024
max_groups: 16
bias: True
downsample_scales: [2, 2, 4, 4, 1]
nonlinear_activation: "leakyrelu"
nonlinear_activation_params:
negative_slope: 0.1
use_weight_norm: True
use_spectral_norm: False
follow_official_norm: False
periods: [2, 3, 5, 7, 11]
period_discriminator_params:
in_channels: 1
out_channels: 1
kernel_sizes: [5, 3]
channels: 32
downsample_scales: [3, 3, 3, 3, 1]
max_downsample_channels: 1024
bias: True
nonlinear_activation: "leakyrelu"
nonlinear_activation_params:
negative_slope: 0.1
use_weight_norm: True
use_spectral_norm: False
# others
sampling_rate: 22050 # needed in the inference for saving wav
cache_generator_outputs: True # whether to cache generator outputs in the training
###########################################################
# LOSS SETTING #
###########################################################
# loss function related
generator_adv_loss_params:
average_by_discriminators: False # whether to average loss value by #discriminators
loss_type: mse # loss type, "mse" or "hinge"
discriminator_adv_loss_params:
average_by_discriminators: False # whether to average loss value by #discriminators
loss_type: mse # loss type, "mse" or "hinge"
feat_match_loss_params:
average_by_discriminators: False # whether to average loss value by #discriminators
average_by_layers: False # whether to average loss value by #layers of each discriminator
include_final_outputs: True # whether to include final outputs for loss calculation
mel_loss_params:
fs: 22050 # must be the same as the training data
fft_size: 1024 # fft points
hop_size: 256 # hop size
win_length: null # window length
window: hann # window type
num_mels: 80 # number of Mel basis
fmin: 0 # minimum frequency for Mel basis
fmax: null # maximum frequency for Mel basis
log_base: null # null represent natural log
###########################################################
# ADVERSARIAL LOSS SETTING #
###########################################################
lambda_adv: 1.0 # loss scaling coefficient for adversarial loss
lambda_mel: 45.0 # loss scaling coefficient for Mel loss
lambda_feat_match: 2.0 # loss scaling coefficient for feat match loss
lambda_dur: 1.0 # loss scaling coefficient for duration loss
lambda_kl: 1.0 # loss scaling coefficient for KL divergence loss
# others
sampling_rate: 22050 # needed in the inference for saving wav
cache_generator_outputs: True # whether to cache generator outputs in the training
###########################################################
# DATA LOADER SETTING #
###########################################################
batch_size: 64 # Batch size.
num_workers: 4 # Number of workers in DataLoader.
##########################################################
# OPTIMIZER & SCHEDULER SETTING #
##########################################################
# optimizer setting for generator
generator_optimizer_params:
beta1: 0.8
beta2: 0.99
epsilon: 1.0e-9
weight_decay: 0.0
generator_scheduler: exponential_decay
generator_scheduler_params:
learning_rate: 2.0e-4
gamma: 0.999875
# optimizer setting for discriminator
discriminator_optimizer_params:
beta1: 0.8
beta2: 0.99
epsilon: 1.0e-9
weight_decay: 0.0
discriminator_scheduler: exponential_decay
discriminator_scheduler_params:
learning_rate: 2.0e-4
gamma: 0.999875
generator_first: False # whether to start updating generator first
##########################################################
# OTHER TRAINING SETTING #
##########################################################
max_epoch: 1000 # number of epochs
num_snapshots: 10 # max number of snapshots to keep while training
seed: 777 # random seed number
#!/bin/bash
stage=0
stop_stage=100
config_path=$1
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
# get durations from MFA's result
echo "Generate durations.txt from MFA results ..."
python3 ${MAIN_ROOT}/utils/gen_duration_from_textgrid.py \
--inputdir=./baker_alignment_tone \
--output=durations.txt \
--config=${config_path}
fi
if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
# extract features
echo "Extract features ..."
python3 ${BIN_DIR}/preprocess.py \
--dataset=baker \
--rootdir=~/datasets/BZNSYP/ \
--dumpdir=dump \
--dur-file=durations.txt \
--config=${config_path} \
--num-cpu=20 \
--cut-sil=True
fi
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
# get features' stats(mean and std)
echo "Get features' stats ..."
python3 ${MAIN_ROOT}/utils/compute_statistics.py \
--metadata=dump/train/raw/metadata.jsonl \
--field-name="feats"
fi
if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
# normalize and covert phone/speaker to id, dev and test should use train's stats
echo "Normalize ..."
python3 ${BIN_DIR}/normalize.py \
--metadata=dump/train/raw/metadata.jsonl \
--dumpdir=dump/train/norm \
--feats-stats=dump/train/feats_stats.npy \
--phones-dict=dump/phone_id_map.txt \
--speaker-dict=dump/speaker_id_map.txt \
--skip-wav-copy
python3 ${BIN_DIR}/normalize.py \
--metadata=dump/dev/raw/metadata.jsonl \
--dumpdir=dump/dev/norm \
--feats-stats=dump/train/feats_stats.npy \
--phones-dict=dump/phone_id_map.txt \
--speaker-dict=dump/speaker_id_map.txt \
--skip-wav-copy
python3 ${BIN_DIR}/normalize.py \
--metadata=dump/test/raw/metadata.jsonl \
--dumpdir=dump/test/norm \
--feats-stats=dump/train/feats_stats.npy \
--phones-dict=dump/phone_id_map.txt \
--speaker-dict=dump/speaker_id_map.txt \
--skip-wav-copy
fi
#!/bin/bash
config_path=$1
train_output_path=$2
ckpt_name=$3
stage=0
stop_stage=0
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
FLAGS_allocator_strategy=naive_best_fit \
FLAGS_fraction_of_gpu_memory_to_use=0.01 \
python3 ${BIN_DIR}/synthesize.py \
--config=${config_path} \
--ckpt=${train_output_path}/checkpoints/${ckpt_name} \
--phones_dict=dump/phone_id_map.txt \
--test_metadata=dump/test/norm/metadata.jsonl \
--output_dir=${train_output_path}/test
fi
\ No newline at end of file
#!/bin/bash
config_path=$1
train_output_path=$2
ckpt_name=$3
stage=0
stop_stage=0
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
FLAGS_allocator_strategy=naive_best_fit \
FLAGS_fraction_of_gpu_memory_to_use=0.01 \
python3 ${BIN_DIR}/synthesize_e2e.py \
--config=${config_path} \
--ckpt=${train_output_path}/checkpoints/${ckpt_name} \
--phones_dict=dump/phone_id_map.txt \
--output_dir=${train_output_path}/test_e2e \
--text=${BIN_DIR}/../sentences.txt
fi
#!/bin/bash
config_path=$1
train_output_path=$2
python3 ${BIN_DIR}/train.py \
--train-metadata=dump/train/norm/metadata.jsonl \
--dev-metadata=dump/dev/norm/metadata.jsonl \
--config=${config_path} \
--output-dir=${train_output_path} \
--ngpu=4 \
--phones-dict=dump/phone_id_map.txt
#!/bin/bash
export MAIN_ROOT=`realpath ${PWD}/../../../`
export PATH=${MAIN_ROOT}:${MAIN_ROOT}/utils:${PATH}
export LC_ALL=C
export PYTHONDONTWRITEBYTECODE=1
# Use UTF-8 in Python to avoid UnicodeDecodeError when LC_ALL=C
export PYTHONIOENCODING=UTF-8
export PYTHONPATH=${MAIN_ROOT}:${PYTHONPATH}
MODEL=vits
export BIN_DIR=${MAIN_ROOT}/paddlespeech/t2s/exps/${MODEL}
\ No newline at end of file
#!/bin/bash
set -e
source path.sh
gpus=0,1
stage=0
stop_stage=100
conf_path=conf/default.yaml
train_output_path=exp/default
ckpt_name=snapshot_iter_153.pdz
# with the following command, you can choose the stage range you want to run
# such as `./run.sh --stage 0 --stop-stage 0`
# this can not be mixed use with `$1`, `$2` ...
source ${MAIN_ROOT}/utils/parse_options.sh || exit 1
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
# prepare data
./local/preprocess.sh ${conf_path} || exit -1
fi
if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
# train model, all `ckpt` under `train_output_path/checkpoints/` dir
CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${train_output_path} || exit -1
fi
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi
if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
# synthesize_e2e, vocoder is pwgan
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
This example contains code used to train a [parallel wavegan](http://arxiv.org/abs/1910.11480) model with [Chinese Standard Mandarin Speech Copus](https://www.data-baker.com/open_source.html). This example contains code used to train a [parallel wavegan](http://arxiv.org/abs/1910.11480) model with [Chinese Standard Mandarin Speech Copus](https://www.data-baker.com/open_source.html).
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download CSMSC from the [official website](https://www.data-baker.com/data/index/source) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`. Download CSMSC from it's [official website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut silence at the edge of audio. We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut silence at the edge of audio.
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
This example contains code used to train a [Multi Band MelGAN](https://arxiv.org/abs/2005.05106) model with [Chinese Standard Mandarin Speech Copus](https://www.data-baker.com/open_source.html). This example contains code used to train a [Multi Band MelGAN](https://arxiv.org/abs/2005.05106) model with [Chinese Standard Mandarin Speech Copus](https://www.data-baker.com/open_source.html).
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download CSMSC from the [official website](https://www.data-baker.com/data/index/source) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`. Download CSMSC from it's [official website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut the silence in the edge of audio. We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut the silence in the edge of audio.
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
This example contains code used to train a [Style MelGAN](https://arxiv.org/abs/2011.01557) model with [Chinese Standard Mandarin Speech Copus](https://www.data-baker.com/open_source.html). This example contains code used to train a [Style MelGAN](https://arxiv.org/abs/2011.01557) model with [Chinese Standard Mandarin Speech Copus](https://www.data-baker.com/open_source.html).
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download CSMSC from the [official website](https://www.data-baker.com/data/index/source) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`. Download CSMSC from it's [official website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut the silence in the edge of audio. We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut the silence in the edge of audio.
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
This example contains code used to train a [HiFiGAN](https://arxiv.org/abs/2010.05646) model with [Chinese Standard Mandarin Speech Copus](https://www.data-baker.com/open_source.html). This example contains code used to train a [HiFiGAN](https://arxiv.org/abs/2010.05646) model with [Chinese Standard Mandarin Speech Copus](https://www.data-baker.com/open_source.html).
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download CSMSC from the [official website](https://www.data-baker.com/data/index/source) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`. Download CSMSC from it's [official website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut silence at the edge of audio. We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut silence at the edge of audio.
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
This example contains code used to train a [WaveRNN](https://arxiv.org/abs/1802.08435) model with [Chinese Standard Mandarin Speech Copus](https://www.data-baker.com/open_source.html). This example contains code used to train a [WaveRNN](https://arxiv.org/abs/1802.08435) model with [Chinese Standard Mandarin Speech Copus](https://www.data-baker.com/open_source.html).
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download CSMSC from the [official website](https://www.data-baker.com/data/index/source) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`. Download CSMSC from it's [official website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut silence at the edge of audio. We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut silence at the edge of audio.
......
...@@ -3,7 +3,7 @@ This example contains code used to train a [Tacotron2](https://arxiv.org/abs/171 ...@@ -3,7 +3,7 @@ This example contains code used to train a [Tacotron2](https://arxiv.org/abs/171
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download LJSpeech-1.1 from the [official website](https://keithito.com/LJ-Speech-Dataset/). Download LJSpeech-1.1 from it's [Official Website](https://keithito.com/LJ-Speech-Dataset/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/LJSpeech-1.1`.
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get phonemes for Tacotron2, the durations of MFA are not needed here. We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get phonemes for Tacotron2, the durations of MFA are not needed here.
......
# TransformerTTS with LJSpeech # TransformerTTS with LJSpeech
## Dataset ## Dataset
We experiment with the LJSpeech dataset. Download and unzip [LJSpeech](https://keithito.com/LJ-Speech-Dataset/). ### Download and Extract
Download LJSpeech-1.1 from it's [Official Website](https://keithito.com/LJ-Speech-Dataset/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/LJSpeech-1.1`.
```bash
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar xjvf LJSpeech-1.1.tar.bz2
```
## Get Started ## Get Started
Assume the path to the dataset is `~/datasets/LJSpeech-1.1`. Assume the path to the dataset is `~/datasets/LJSpeech-1.1` and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/LJSpeech-1.1`.
Run the command below to Run the command below to
1. **source path**. 1. **source path**.
2. preprocess the dataset. 2. preprocess the dataset.
......
...@@ -3,7 +3,7 @@ This example contains code used to train a [Fastspeech2](https://arxiv.org/abs/2 ...@@ -3,7 +3,7 @@ This example contains code used to train a [Fastspeech2](https://arxiv.org/abs/2
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download LJSpeech-1.1 from the [official website](https://keithito.com/LJ-Speech-Dataset/). Download LJSpeech-1.1 from it's [Official Website](https://keithito.com/LJ-Speech-Dataset/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/LJSpeech-1.1`.
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for fastspeech2. We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for fastspeech2.
......
# WaveFlow with LJSpeech # WaveFlow with LJSpeech
## Dataset ## Dataset
We experiment with the LJSpeech dataset. Download and unzip [LJSpeech](https://keithito.com/LJ-Speech-Dataset/). ### Download and Extract
Download LJSpeech-1.1 from it's [Official Website](https://keithito.com/LJ-Speech-Dataset/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/LJSpeech-1.1`.
```bash
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar xjvf LJSpeech-1.1.tar.bz2
```
## Get Started ## Get Started
Assume the path to the dataset is `~/datasets/LJSpeech-1.1`. Assume the path to the dataset is `~/datasets/LJSpeech-1.1`.
Assume the path to the Tacotron2 generated mels is `../tts0/output/test`. Assume the path to the Tacotron2 generated mels is `../tts0/output/test`.
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
This example contains code used to train a [parallel wavegan](http://arxiv.org/abs/1910.11480) model with [LJSpeech-1.1](https://keithito.com/LJ-Speech-Dataset/). This example contains code used to train a [parallel wavegan](http://arxiv.org/abs/1910.11480) model with [LJSpeech-1.1](https://keithito.com/LJ-Speech-Dataset/).
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download LJSpeech-1.1 from the [official website](https://keithito.com/LJ-Speech-Dataset/). Download LJSpeech-1.1 from it's [Official Website](https://keithito.com/LJ-Speech-Dataset/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/LJSpeech-1.1`.
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut the silence in the edge of audio. We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut the silence in the edge of audio.
You can download from here [ljspeech_alignment.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/LJSpeech-1.1/ljspeech_alignment.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) of our repo. You can download from here [ljspeech_alignment.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/LJSpeech-1.1/ljspeech_alignment.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) of our repo.
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
This example contains code used to train a [HiFiGAN](https://arxiv.org/abs/2010.05646) model with [LJSpeech-1.1](https://keithito.com/LJ-Speech-Dataset/). This example contains code used to train a [HiFiGAN](https://arxiv.org/abs/2010.05646) model with [LJSpeech-1.1](https://keithito.com/LJ-Speech-Dataset/).
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download LJSpeech-1.1 from the [official website](https://keithito.com/LJ-Speech-Dataset/). Download LJSpeech-1.1 from it's [Official Website](https://keithito.com/LJ-Speech-Dataset/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/LJSpeech-1.1`.
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut the silence in the edge of audio. We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut the silence in the edge of audio.
You can download from here [ljspeech_alignment.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/LJSpeech-1.1/ljspeech_alignment.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) of our repo. You can download from here [ljspeech_alignment.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/LJSpeech-1.1/ljspeech_alignment.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) of our repo.
......
...@@ -3,7 +3,7 @@ This example contains code used to train a [Fastspeech2](https://arxiv.org/abs/2 ...@@ -3,7 +3,7 @@ This example contains code used to train a [Fastspeech2](https://arxiv.org/abs/2
## Dataset ## Dataset
### Download and Extract the dataset ### Download and Extract the dataset
Download VCTK-0.92 from the [official website](https://datashare.ed.ac.uk/handle/10283/3443). Download VCTK-0.92 from it's [Official Website](https://datashare.ed.ac.uk/handle/10283/3443) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/VCTK-Corpus-0.92`.
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for fastspeech2. We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for fastspeech2.
......
...@@ -3,7 +3,7 @@ This example contains code used to train a [parallel wavegan](http://arxiv.org/a ...@@ -3,7 +3,7 @@ This example contains code used to train a [parallel wavegan](http://arxiv.org/a
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download VCTK-0.92 from the [official website](https://datashare.ed.ac.uk/handle/10283/3443) and extract it to `~/datasets`. Then the dataset is in directory `~/datasets/VCTK-Corpus-0.92`. Download VCTK-0.92 from it's [Official Website](https://datashare.ed.ac.uk/handle/10283/3443) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/VCTK-Corpus-0.92`.
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut the silence in the edge of audio. We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut the silence in the edge of audio.
......
...@@ -3,7 +3,7 @@ This example contains code used to train a [HiFiGAN](https://arxiv.org/abs/2010. ...@@ -3,7 +3,7 @@ This example contains code used to train a [HiFiGAN](https://arxiv.org/abs/2010.
## Dataset ## Dataset
### Download and Extract ### Download and Extract
Download VCTK-0.92 from the [official website](https://datashare.ed.ac.uk/handle/10283/3443) and extract it to `~/datasets`. Then the dataset is in directory `~/datasets/VCTK-Corpus-0.92`. Download VCTK-0.92 from it's [Official Website](https://datashare.ed.ac.uk/handle/10283/3443) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/VCTK-Corpus-0.92`.
### Get MFA Result and Extract ### Get MFA Result and Extract
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut the silence in the edge of audio. We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut the silence in the edge of audio.
......
...@@ -141,11 +141,11 @@ using the `tar` scripts to unpack the model and then you can use the script to t ...@@ -141,11 +141,11 @@ using the `tar` scripts to unpack the model and then you can use the script to t
For example: For example:
``` ```
wget https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz wget https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_1.tar.gz
tar -xvf sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz tar -xvf sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_1.tar.gz
source path.sh source path.sh
# If you have processed the data and get the manifest file, you can skip the following 2 steps # If you have processed the data and get the manifest file, you can skip the following 2 steps
CUDA_VISIBLE_DEVICES= bash ./local/test.sh ./data sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_2/model/ conf/ecapa_tdnn.yaml CUDA_VISIBLE_DEVICES= bash ./local/test.sh ./data sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_1/model/ conf/ecapa_tdnn.yaml
``` ```
The performance of the released models are shown in [this](./RESULTS.md) The performance of the released models are shown in [this](./RESULTS.md)
...@@ -4,4 +4,4 @@ ...@@ -4,4 +4,4 @@
| Model | Number of Params | Release | Config | dim | Test set | Cosine | Cosine + S-Norm | | Model | Number of Params | Release | Config | dim | Test set | Cosine | Cosine + S-Norm |
| --- | --- | --- | --- | --- | --- | --- | ---- | | --- | --- | --- | --- | --- | --- | --- | ---- |
| ECAPA-TDNN | 85M | 0.2.0 | conf/ecapa_tdnn.yaml |192 | test | 1.02 | 0.95 | | ECAPA-TDNN | 85M | 0.2.1 | conf/ecapa_tdnn.yaml | 192 | test | 0.8188 | 0.7815|
...@@ -59,3 +59,11 @@ global_embedding_norm: True ...@@ -59,3 +59,11 @@ global_embedding_norm: True
embedding_mean_norm: True embedding_mean_norm: True
embedding_std_norm: False embedding_std_norm: False
###########################################
# score-norm #
###########################################
score_norm: s-norm
cohort_size: 20000 # amount of imposter utterances in normalization cohort
n_train_snts: 400000 # used for normalization stats
...@@ -58,3 +58,10 @@ global_embedding_norm: True ...@@ -58,3 +58,10 @@ global_embedding_norm: True
embedding_mean_norm: True embedding_mean_norm: True
embedding_std_norm: False embedding_std_norm: False
###########################################
# score-norm #
###########################################
score_norm: s-norm
cohort_size: 20000 # amount of imposter utterances in normalization cohort
n_train_snts: 400000 # used for normalization stats
...@@ -181,7 +181,7 @@ class ASRExecutor(BaseExecutor): ...@@ -181,7 +181,7 @@ class ASRExecutor(BaseExecutor):
lm_url, lm_url,
os.path.dirname(self.config.decode.lang_model_path), lm_md5) os.path.dirname(self.config.decode.lang_model_path), lm_md5)
elif "conformer" in model_type or "transformer" in model_type or "wenetspeech" in model_type: elif "conformer" in model_type or "transformer" in model_type:
self.config.spm_model_prefix = os.path.join( self.config.spm_model_prefix = os.path.join(
self.res_path, self.config.spm_model_prefix) self.res_path, self.config.spm_model_prefix)
self.text_feature = TextFeaturizer( self.text_feature = TextFeaturizer(
...@@ -205,7 +205,7 @@ class ASRExecutor(BaseExecutor): ...@@ -205,7 +205,7 @@ class ASRExecutor(BaseExecutor):
self.model.set_state_dict(model_dict) self.model.set_state_dict(model_dict)
# compute the max len limit # compute the max len limit
if "conformer" in model_type or "transformer" in model_type or "wenetspeech" in model_type: if "conformer" in model_type or "transformer" in model_type:
# in transformer like model, we may use the subsample rate cnn network # in transformer like model, we may use the subsample rate cnn network
subsample_rate = self.model.subsampling_rate() subsample_rate = self.model.subsampling_rate()
frame_shift_ms = self.config.preprocess_config.process[0][ frame_shift_ms = self.config.preprocess_config.process[0][
...@@ -242,7 +242,7 @@ class ASRExecutor(BaseExecutor): ...@@ -242,7 +242,7 @@ class ASRExecutor(BaseExecutor):
self._inputs["audio_len"] = audio_len self._inputs["audio_len"] = audio_len
logger.info(f"audio feat shape: {audio.shape}") logger.info(f"audio feat shape: {audio.shape}")
elif "conformer" in model_type or "transformer" in model_type or "wenetspeech" in model_type: elif "conformer" in model_type or "transformer" in model_type:
logger.info("get the preprocess conf") logger.info("get the preprocess conf")
preprocess_conf = self.config.preprocess_config preprocess_conf = self.config.preprocess_config
preprocess_args = {"train": False} preprocess_args = {"train": False}
......
...@@ -23,6 +23,7 @@ import paddle ...@@ -23,6 +23,7 @@ import paddle
import yaml import yaml
from paddleaudio import load from paddleaudio import load
from paddleaudio.features import LogMelSpectrogram from paddleaudio.features import LogMelSpectrogram
from paddlespeech.utils.dynamic_import import dynamic_import
from ..executor import BaseExecutor from ..executor import BaseExecutor
from ..log import logger from ..log import logger
...@@ -30,7 +31,7 @@ from ..utils import cli_register ...@@ -30,7 +31,7 @@ from ..utils import cli_register
from ..utils import stats_wrapper from ..utils import stats_wrapper
from .pretrained_models import model_alias from .pretrained_models import model_alias
from .pretrained_models import pretrained_models from .pretrained_models import pretrained_models
from paddlespeech.s2t.utils.dynamic_import import dynamic_import
__all__ = ['CLSExecutor'] __all__ = ['CLSExecutor']
......
...@@ -86,7 +86,7 @@ def get_path_from_url(url, ...@@ -86,7 +86,7 @@ def get_path_from_url(url,
str: a local path to save downloaded models & weights & datasets. str: a local path to save downloaded models & weights & datasets.
""" """
from paddle.fluid.dygraph.parallel import ParallelEnv from paddle.distributed import ParallelEnv
assert _is_url(url), "downloading from {} not a url".format(url) assert _is_url(url), "downloading from {} not a url".format(url)
# parse path after download to decompress under root_dir # parse path after download to decompress under root_dir
......
...@@ -36,8 +36,8 @@ from .pretrained_models import kaldi_bins ...@@ -36,8 +36,8 @@ from .pretrained_models import kaldi_bins
from .pretrained_models import model_alias from .pretrained_models import model_alias
from .pretrained_models import pretrained_models from .pretrained_models import pretrained_models
from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer
from paddlespeech.s2t.utils.dynamic_import import dynamic_import
from paddlespeech.s2t.utils.utility import UpdateConfig from paddlespeech.s2t.utils.utility import UpdateConfig
from paddlespeech.utils.dynamic_import import dynamic_import
__all__ = ["STExecutor"] __all__ = ["STExecutor"]
......
...@@ -21,7 +21,6 @@ from typing import Union ...@@ -21,7 +21,6 @@ from typing import Union
import paddle import paddle
from ...s2t.utils.dynamic_import import dynamic_import
from ..executor import BaseExecutor from ..executor import BaseExecutor
from ..log import logger from ..log import logger
from ..utils import cli_register from ..utils import cli_register
...@@ -29,6 +28,7 @@ from ..utils import stats_wrapper ...@@ -29,6 +28,7 @@ from ..utils import stats_wrapper
from .pretrained_models import model_alias from .pretrained_models import model_alias
from .pretrained_models import pretrained_models from .pretrained_models import pretrained_models
from .pretrained_models import tokenizer_alias from .pretrained_models import tokenizer_alias
from paddlespeech.utils.dynamic_import import dynamic_import
__all__ = ['TextExecutor'] __all__ = ['TextExecutor']
......
...@@ -32,10 +32,10 @@ from ..utils import cli_register ...@@ -32,10 +32,10 @@ from ..utils import cli_register
from ..utils import stats_wrapper from ..utils import stats_wrapper
from .pretrained_models import model_alias from .pretrained_models import model_alias
from .pretrained_models import pretrained_models from .pretrained_models import pretrained_models
from paddlespeech.s2t.utils.dynamic_import import dynamic_import
from paddlespeech.t2s.frontend import English from paddlespeech.t2s.frontend import English
from paddlespeech.t2s.frontend.zh_frontend import Frontend from paddlespeech.t2s.frontend.zh_frontend import Frontend
from paddlespeech.t2s.modules.normalizer import ZScore from paddlespeech.t2s.modules.normalizer import ZScore
from paddlespeech.utils.dynamic_import import dynamic_import
__all__ = ['TTSExecutor'] __all__ = ['TTSExecutor']
......
...@@ -24,11 +24,11 @@ from typing import Any ...@@ -24,11 +24,11 @@ from typing import Any
from typing import Dict from typing import Dict
import paddle import paddle
import paddleaudio
import requests import requests
import yaml import yaml
from paddle.framework import load from paddle.framework import load
import paddleaudio
from . import download from . import download
from .entry import commands from .entry import commands
try: try:
......
...@@ -32,7 +32,7 @@ from ..utils import cli_register ...@@ -32,7 +32,7 @@ from ..utils import cli_register
from ..utils import stats_wrapper from ..utils import stats_wrapper
from .pretrained_models import model_alias from .pretrained_models import model_alias
from .pretrained_models import pretrained_models from .pretrained_models import pretrained_models
from paddlespeech.s2t.utils.dynamic_import import dynamic_import from paddlespeech.utils.dynamic_import import dynamic_import
from paddlespeech.vector.io.batch import feature_normalize from paddlespeech.vector.io.batch import feature_normalize
from paddlespeech.vector.modules.sid_model import SpeakerIdetification from paddlespeech.vector.modules.sid_model import SpeakerIdetification
......
...@@ -19,9 +19,9 @@ pretrained_models = { ...@@ -19,9 +19,9 @@ pretrained_models = {
# "paddlespeech vector --task spk --model ecapatdnn_voxceleb12-16k --sr 16000 --input ./input.wav" # "paddlespeech vector --task spk --model ecapatdnn_voxceleb12-16k --sr 16000 --input ./input.wav"
"ecapatdnn_voxceleb12-16k": { "ecapatdnn_voxceleb12-16k": {
'url': 'url':
'https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz', 'https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_1.tar.gz',
'md5': 'md5':
'cc33023c54ab346cd318408f43fcaf95', '67c7ff8885d5246bd16e0f5ac1cba99f',
'cfg_path': 'cfg_path':
'conf/model.yaml', # the yaml config path 'conf/model.yaml', # the yaml config path
'ckpt_path': 'ckpt_path':
......
...@@ -22,7 +22,7 @@ from paddleaudio.features import LogMelSpectrogram ...@@ -22,7 +22,7 @@ from paddleaudio.features import LogMelSpectrogram
from paddleaudio.utils import logger from paddleaudio.utils import logger
from paddlespeech.cls.models import SoundClassifier from paddlespeech.cls.models import SoundClassifier
from paddlespeech.s2t.utils.dynamic_import import dynamic_import from paddlespeech.utils.dynamic_import import dynamic_import
# yapf: disable # yapf: disable
parser = argparse.ArgumentParser(__doc__) parser = argparse.ArgumentParser(__doc__)
......
...@@ -21,7 +21,7 @@ from paddleaudio.utils import logger ...@@ -21,7 +21,7 @@ from paddleaudio.utils import logger
from paddleaudio.utils import Timer from paddleaudio.utils import Timer
from paddlespeech.cls.models import SoundClassifier from paddlespeech.cls.models import SoundClassifier
from paddlespeech.s2t.utils.dynamic_import import dynamic_import from paddlespeech.utils.dynamic_import import dynamic_import
# yapf: disable # yapf: disable
parser = argparse.ArgumentParser(__doc__) parser = argparse.ArgumentParser(__doc__)
......
...@@ -37,6 +37,12 @@ if __name__ == "__main__": ...@@ -37,6 +37,12 @@ if __name__ == "__main__":
"--export_path", type=str, help="path of the jit model to save") "--export_path", type=str, help="path of the jit model to save")
parser.add_argument( parser.add_argument(
"--model_type", type=str, default='offline', help="offline/online") "--model_type", type=str, default='offline', help="offline/online")
parser.add_argument(
'--nxpu',
type=int,
default=0,
choices=[0, 1],
help="if nxpu == 0 and ngpu == 0, use cpu.")
args = parser.parse_args() args = parser.parse_args()
print("model_type:{}".format(args.model_type)) print("model_type:{}".format(args.model_type))
print_arguments(args) print_arguments(args)
......
...@@ -37,6 +37,12 @@ if __name__ == "__main__": ...@@ -37,6 +37,12 @@ if __name__ == "__main__":
# save asr result to # save asr result to
parser.add_argument( parser.add_argument(
"--result_file", type=str, help="path of save the asr result") "--result_file", type=str, help="path of save the asr result")
parser.add_argument(
'--nxpu',
type=int,
default=0,
choices=[0, 1],
help="if nxpu == 0 and ngpu == 0, use cpu.")
args = parser.parse_args() args = parser.parse_args()
print_arguments(args, globals()) print_arguments(args, globals())
print("model_type:{}".format(args.model_type)) print("model_type:{}".format(args.model_type))
......
...@@ -40,6 +40,12 @@ if __name__ == "__main__": ...@@ -40,6 +40,12 @@ if __name__ == "__main__":
"--export_path", type=str, help="path of the jit model to save") "--export_path", type=str, help="path of the jit model to save")
parser.add_argument( parser.add_argument(
"--model_type", type=str, default='offline', help='offline/online') "--model_type", type=str, default='offline', help='offline/online')
parser.add_argument(
'--nxpu',
type=int,
default=0,
choices=[0, 1],
help="if nxpu == 0 and ngpu == 0, use cpu.")
parser.add_argument( parser.add_argument(
"--enable-auto-log", action="store_true", help="use auto log") "--enable-auto-log", action="store_true", help="use auto log")
args = parser.parse_args() args = parser.parse_args()
......
...@@ -33,6 +33,12 @@ if __name__ == "__main__": ...@@ -33,6 +33,12 @@ if __name__ == "__main__":
parser = default_argument_parser() parser = default_argument_parser()
parser.add_argument( parser.add_argument(
"--model_type", type=str, default='offline', help='offline/online') "--model_type", type=str, default='offline', help='offline/online')
parser.add_argument(
'--nxpu',
type=int,
default=0,
choices=[0, 1],
help="if nxpu == 0 and ngpu == 0, use cpu.")
args = parser.parse_args() args = parser.parse_args()
print("model_type:{}".format(args.model_type)) print("model_type:{}".format(args.model_type))
print_arguments(args, globals()) print_arguments(args, globals())
......
...@@ -51,7 +51,7 @@ def _batch_shuffle(indices, batch_size, epoch, clipped=False): ...@@ -51,7 +51,7 @@ def _batch_shuffle(indices, batch_size, epoch, clipped=False):
""" """
rng = np.random.RandomState(epoch) rng = np.random.RandomState(epoch)
shift_len = rng.randint(0, batch_size - 1) shift_len = rng.randint(0, batch_size - 1)
batch_indices = list(zip(*[iter(indices[shift_len:])] * batch_size)) batch_indices = list(zip(* [iter(indices[shift_len:])] * batch_size))
rng.shuffle(batch_indices) rng.shuffle(batch_indices)
batch_indices = [item for batch in batch_indices for item in batch] batch_indices = [item for batch in batch_indices for item in batch]
assert clipped is False assert clipped is False
......
...@@ -112,7 +112,16 @@ class Trainer(): ...@@ -112,7 +112,16 @@ class Trainer():
logger.info(f"Rank: {self.rank}/{self.world_size}") logger.info(f"Rank: {self.rank}/{self.world_size}")
# set device # set device
paddle.set_device('gpu' if self.args.ngpu > 0 else 'cpu') if self.args.ngpu == 0:
if self.args.nxpu == 0:
paddle.set_device('cpu')
else:
paddle.set_device('xpu')
elif self.args.ngpu > 0:
paddle.set_device("gpu")
else:
raise Exception("invalid device")
if self.parallel: if self.parallel:
self.init_parallel() self.init_parallel()
......
...@@ -752,6 +752,7 @@ class VectorClientExecutor(BaseExecutor): ...@@ -752,6 +752,7 @@ class VectorClientExecutor(BaseExecutor):
res = handler.run(enroll_audio, test_audio, audio_format, res = handler.run(enroll_audio, test_audio, audio_format,
sample_rate) sample_rate)
logger.info(f"The vector score is: {res}") logger.info(f"The vector score is: {res}")
return res
else: else:
logger.error(f"Sorry, we have not support such task {task}") logger.error(f"Sorry, we have not support such task {task}")
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import hashlib
import os
import os.path as osp
import shutil
import subprocess
import tarfile
import time
import zipfile
import requests
from tqdm import tqdm
from paddlespeech.cli.log import logger
__all__ = ['get_path_from_url']
DOWNLOAD_RETRY_LIMIT = 3
def _is_url(path):
"""
Whether path is URL.
Args:
path (string): URL string or not.
"""
return path.startswith('http://') or path.startswith('https://')
def _map_path(url, root_dir):
# parse path after download under root_dir
fname = osp.split(url)[-1]
fpath = fname
return osp.join(root_dir, fpath)
def _get_unique_endpoints(trainer_endpoints):
# Sorting is to avoid different environmental variables for each card
trainer_endpoints.sort()
ips = set()
unique_endpoints = set()
for endpoint in trainer_endpoints:
ip = endpoint.split(":")[0]
if ip in ips:
continue
ips.add(ip)
unique_endpoints.add(endpoint)
logger.info("unique_endpoints {}".format(unique_endpoints))
return unique_endpoints
def get_path_from_url(url,
root_dir,
md5sum=None,
check_exist=True,
decompress=True,
method='get'):
""" Download from given url to root_dir.
if file or directory specified by url is exists under
root_dir, return the path directly, otherwise download
from url and decompress it, return the path.
Args:
url (str): download url
root_dir (str): root dir for downloading, it should be
WEIGHTS_HOME or DATASET_HOME
md5sum (str): md5 sum of download package
decompress (bool): decompress zip or tar file. Default is `True`
method (str): which download method to use. Support `wget` and `get`. Default is `get`.
Returns:
str: a local path to save downloaded models & weights & datasets.
"""
from paddle.fluid.dygraph.parallel import ParallelEnv
assert _is_url(url), "downloading from {} not a url".format(url)
# parse path after download to decompress under root_dir
fullpath = _map_path(url, root_dir)
# Mainly used to solve the problem of downloading data from different
# machines in the case of multiple machines. Different ips will download
# data, and the same ip will only download data once.
unique_endpoints = _get_unique_endpoints(ParallelEnv().trainer_endpoints[:])
if osp.exists(fullpath) and check_exist and _md5check(fullpath, md5sum):
logger.info("Found {}".format(fullpath))
else:
if ParallelEnv().current_endpoint in unique_endpoints:
fullpath = _download(url, root_dir, md5sum, method=method)
else:
while not os.path.exists(fullpath):
time.sleep(1)
if ParallelEnv().current_endpoint in unique_endpoints:
if decompress and (tarfile.is_tarfile(fullpath) or
zipfile.is_zipfile(fullpath)):
fullpath = _decompress(fullpath)
return fullpath
def _get_download(url, fullname):
# using requests.get method
fname = osp.basename(fullname)
try:
req = requests.get(url, stream=True)
except Exception as e: # requests.exceptions.ConnectionError
logger.info("Downloading {} from {} failed with exception {}".format(
fname, url, str(e)))
return False
if req.status_code != 200:
raise RuntimeError("Downloading from {} failed with code "
"{}!".format(url, req.status_code))
# For protecting download interupted, download to
# tmp_fullname firstly, move tmp_fullname to fullname
# after download finished
tmp_fullname = fullname + "_tmp"
total_size = req.headers.get('content-length')
with open(tmp_fullname, 'wb') as f:
if total_size:
with tqdm(total=(int(total_size) + 1023) // 1024) as pbar:
for chunk in req.iter_content(chunk_size=1024):
f.write(chunk)
pbar.update(1)
else:
for chunk in req.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
shutil.move(tmp_fullname, fullname)
return fullname
def _wget_download(url, fullname):
# using wget to download url
tmp_fullname = fullname + "_tmp"
# –user-agent
command = 'wget -O {} -t {} {}'.format(tmp_fullname, DOWNLOAD_RETRY_LIMIT,
url)
subprc = subprocess.Popen(
command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
_ = subprc.communicate()
if subprc.returncode != 0:
raise RuntimeError(
'{} failed. Please make sure `wget` is installed or {} exists'.
format(command, url))
shutil.move(tmp_fullname, fullname)
return fullname
_download_methods = {
'get': _get_download,
'wget': _wget_download,
}
def _download(url, path, md5sum=None, method='get'):
"""
Download from url, save to path.
url (str): download url
path (str): download to given path
md5sum (str): md5 sum of download package
method (str): which download method to use. Support `wget` and `get`. Default is `get`.
"""
assert method in _download_methods, 'make sure `{}` implemented'.format(
method)
if not osp.exists(path):
os.makedirs(path)
fname = osp.split(url)[-1]
fullname = osp.join(path, fname)
retry_cnt = 0
logger.info("Downloading {} from {}".format(fname, url))
while not (osp.exists(fullname) and _md5check(fullname, md5sum)):
if retry_cnt < DOWNLOAD_RETRY_LIMIT:
retry_cnt += 1
else:
raise RuntimeError("Download from {} failed. "
"Retry limit reached".format(url))
if not _download_methods[method](url, fullname):
time.sleep(1)
continue
return fullname
def _md5check(fullname, md5sum=None):
if md5sum is None:
return True
logger.info("File {} md5 checking...".format(fullname))
md5 = hashlib.md5()
with open(fullname, 'rb') as f:
for chunk in iter(lambda: f.read(4096), b""):
md5.update(chunk)
calc_md5sum = md5.hexdigest()
if calc_md5sum != md5sum:
logger.info("File {} md5 check failed, {}(calc) != "
"{}(base)".format(fullname, calc_md5sum, md5sum))
return False
return True
def _decompress(fname):
"""
Decompress for zip and tar file
"""
logger.info("Decompressing {}...".format(fname))
# For protecting decompressing interupted,
# decompress to fpath_tmp directory firstly, if decompress
# successed, move decompress files to fpath and delete
# fpath_tmp and remove download compress file.
if tarfile.is_tarfile(fname):
uncompressed_path = _uncompress_file_tar(fname)
elif zipfile.is_zipfile(fname):
uncompressed_path = _uncompress_file_zip(fname)
else:
raise TypeError("Unsupport compress file type {}".format(fname))
return uncompressed_path
def _uncompress_file_zip(filepath):
files = zipfile.ZipFile(filepath, 'r')
file_list = files.namelist()
file_dir = os.path.dirname(filepath)
if _is_a_single_file(file_list):
rootpath = file_list[0]
uncompressed_path = os.path.join(file_dir, rootpath)
for item in file_list:
files.extract(item, file_dir)
elif _is_a_single_dir(file_list):
rootpath = os.path.splitext(file_list[0])[0].split(os.sep)[0]
uncompressed_path = os.path.join(file_dir, rootpath)
for item in file_list:
files.extract(item, file_dir)
else:
rootpath = os.path.splitext(filepath)[0].split(os.sep)[-1]
uncompressed_path = os.path.join(file_dir, rootpath)
if not os.path.exists(uncompressed_path):
os.makedirs(uncompressed_path)
for item in file_list:
files.extract(item, os.path.join(file_dir, rootpath))
files.close()
return uncompressed_path
def _uncompress_file_tar(filepath, mode="r:*"):
files = tarfile.open(filepath, mode)
file_list = files.getnames()
file_dir = os.path.dirname(filepath)
if _is_a_single_file(file_list):
rootpath = file_list[0]
uncompressed_path = os.path.join(file_dir, rootpath)
for item in file_list:
files.extract(item, file_dir)
elif _is_a_single_dir(file_list):
rootpath = os.path.splitext(file_list[0])[0].split(os.sep)[-1]
uncompressed_path = os.path.join(file_dir, rootpath)
for item in file_list:
files.extract(item, file_dir)
else:
rootpath = os.path.splitext(filepath)[0].split(os.sep)[-1]
uncompressed_path = os.path.join(file_dir, rootpath)
if not os.path.exists(uncompressed_path):
os.makedirs(uncompressed_path)
for item in file_list:
files.extract(item, os.path.join(file_dir, rootpath))
files.close()
return uncompressed_path
def _is_a_single_file(file_list):
if len(file_list) == 1 and file_list[0].find(os.sep) < -1:
return True
return False
def _is_a_single_dir(file_list):
new_file_list = []
for file_path in file_list:
if '/' in file_path:
file_path = file_path.replace('/', os.sep)
elif '\\' in file_path:
file_path = file_path.replace('\\', os.sep)
new_file_list.append(file_path)
file_name = new_file_list[0].split(os.sep)[0]
for i in range(1, len(new_file_list)):
if file_name != new_file_list[i].split(os.sep)[0]:
return False
return True
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
...@@ -16,6 +16,7 @@ import json ...@@ -16,6 +16,7 @@ import json
import os import os
import re import re
import numpy as np
import paddle import paddle
import soundfile import soundfile
import websocket import websocket
...@@ -44,11 +45,10 @@ class ACSEngine(BaseEngine): ...@@ -44,11 +45,10 @@ class ACSEngine(BaseEngine):
logger.info("Init the acs engine") logger.info("Init the acs engine")
try: try:
self.config = config self.config = config
if self.config.device: self.device = self.config.get("device", paddle.get_device())
self.device = self.config.device
else:
self.device = paddle.get_device()
# websocket default ping timeout is 20 seconds
self.ping_timeout = self.config.get("ping_timeout", 20)
paddle.set_device(self.device) paddle.set_device(self.device)
logger.info(f"ACS Engine set the device: {self.device}") logger.info(f"ACS Engine set the device: {self.device}")
...@@ -100,8 +100,8 @@ class ACSEngine(BaseEngine): ...@@ -100,8 +100,8 @@ class ACSEngine(BaseEngine):
logger.error("No asr server, please input valid ip and port") logger.error("No asr server, please input valid ip and port")
return "" return ""
ws = websocket.WebSocket() ws = websocket.WebSocket()
ws.connect(self.url) logger.info(f"set the ping timeout: {self.ping_timeout} seconds")
# with websocket.WebSocket.connect(self.url) as ws: ws.connect(self.url, ping_timeout=self.ping_timeout)
audio_info = json.dumps( audio_info = json.dumps(
{ {
"name": "test.wav", "name": "test.wav",
...@@ -116,8 +116,8 @@ class ACSEngine(BaseEngine): ...@@ -116,8 +116,8 @@ class ACSEngine(BaseEngine):
logger.info("client receive msg={}".format(msg)) logger.info("client receive msg={}".format(msg))
# send the total audio data # send the total audio data
samples, sample_rate = soundfile.read(audio_data, dtype='int16') for chunk_data in self.read_wave(audio_data):
ws.send_binary(samples.tobytes()) ws.send_binary(chunk_data.tobytes())
msg = ws.recv() msg = ws.recv()
msg = json.loads(msg) msg = json.loads(msg)
logger.info(f"audio result: {msg}") logger.info(f"audio result: {msg}")
...@@ -142,6 +142,39 @@ class ACSEngine(BaseEngine): ...@@ -142,6 +142,39 @@ class ACSEngine(BaseEngine):
return msg return msg
def read_wave(self, audio_data: str):
"""read the audio file from specific wavfile path
Args:
audio_data (str): the audio data,
we assume that audio sample rate matches the model
Yields:
numpy.array: the samall package audio pcm data
"""
samples, sample_rate = soundfile.read(audio_data, dtype='int16')
x_len = len(samples)
assert sample_rate == 16000
chunk_size = int(85 * sample_rate / 1000) # 85ms, sample_rate = 16kHz
if x_len % chunk_size != 0:
padding_len_x = chunk_size - x_len % chunk_size
else:
padding_len_x = 0
padding = np.zeros((padding_len_x), dtype=samples.dtype)
padded_x = np.concatenate([samples, padding], axis=0)
assert (x_len + padding_len_x) % chunk_size == 0
num_chunk = (x_len + padding_len_x) / chunk_size
num_chunk = int(num_chunk)
for i in range(0, num_chunk):
start = i * chunk_size
end = start + chunk_size
x_chunk = padded_x[start:end]
yield x_chunk
def get_macthed_word(self, msg): def get_macthed_word(self, msg):
"""Get the matched info in msg """Get the matched info in msg
......
...@@ -25,7 +25,6 @@ from yacs.config import CfgNode ...@@ -25,7 +25,6 @@ from yacs.config import CfgNode
from .pretrained_models import pretrained_models from .pretrained_models import pretrained_models
from paddlespeech.cli.log import logger from paddlespeech.cli.log import logger
from paddlespeech.cli.tts.infer import TTSExecutor from paddlespeech.cli.tts.infer import TTSExecutor
from paddlespeech.s2t.utils.dynamic_import import dynamic_import
from paddlespeech.server.engine.base_engine import BaseEngine from paddlespeech.server.engine.base_engine import BaseEngine
from paddlespeech.server.utils.audio_process import float2pcm from paddlespeech.server.utils.audio_process import float2pcm
from paddlespeech.server.utils.util import denorm from paddlespeech.server.utils.util import denorm
...@@ -33,6 +32,7 @@ from paddlespeech.server.utils.util import get_chunks ...@@ -33,6 +32,7 @@ from paddlespeech.server.utils.util import get_chunks
from paddlespeech.t2s.frontend import English from paddlespeech.t2s.frontend import English
from paddlespeech.t2s.frontend.zh_frontend import Frontend from paddlespeech.t2s.frontend.zh_frontend import Frontend
from paddlespeech.t2s.modules.normalizer import ZScore from paddlespeech.t2s.modules.normalizer import ZScore
from paddlespeech.utils.dynamic_import import dynamic_import
__all__ = ['TTSEngine'] __all__ = ['TTSEngine']
......
...@@ -17,12 +17,12 @@ from typing import List ...@@ -17,12 +17,12 @@ from typing import List
from fastapi import APIRouter from fastapi import APIRouter
from paddlespeech.cli.log import logger from paddlespeech.cli.log import logger
from paddlespeech.server.restful.acs_api import router as acs_router
from paddlespeech.server.restful.asr_api import router as asr_router from paddlespeech.server.restful.asr_api import router as asr_router
from paddlespeech.server.restful.cls_api import router as cls_router from paddlespeech.server.restful.cls_api import router as cls_router
from paddlespeech.server.restful.text_api import router as text_router from paddlespeech.server.restful.text_api import router as text_router
from paddlespeech.server.restful.tts_api import router as tts_router from paddlespeech.server.restful.tts_api import router as tts_router
from paddlespeech.server.restful.vector_api import router as vec_router from paddlespeech.server.restful.vector_api import router as vec_router
from paddlespeech.server.restful.acs_api import router as acs_router
_router = APIRouter() _router = APIRouter()
......
...@@ -29,9 +29,9 @@ import requests ...@@ -29,9 +29,9 @@ import requests
import yaml import yaml
from paddle.framework import load from paddle.framework import load
from . import download
from .entry import client_commands from .entry import client_commands
from .entry import server_commands from .entry import server_commands
from paddlespeech.cli import download
try: try:
from .. import __version__ from .. import __version__
except ImportError: except ImportError:
......
...@@ -12,6 +12,7 @@ ...@@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
class Frame(object): class Frame(object):
"""Represents a "frame" of audio data.""" """Represents a "frame" of audio data."""
...@@ -77,8 +78,8 @@ class ChunkBuffer(object): ...@@ -77,8 +78,8 @@ class ChunkBuffer(object):
offset = 0 offset = 0
while offset + self.window_bytes <= len(audio): while offset + self.window_bytes <= len(audio):
yield Frame(audio[offset:offset + self.window_bytes], self.timestamp, yield Frame(audio[offset:offset + self.window_bytes],
self.window_sec) self.timestamp, self.window_sec)
self.timestamp += self.shift_sec self.timestamp += self.shift_sec
offset += self.shift_bytes offset += self.shift_bytes
......
...@@ -293,3 +293,45 @@ def transformer_single_spk_batch_fn(examples): ...@@ -293,3 +293,45 @@ def transformer_single_spk_batch_fn(examples):
"speech_lengths": speech_lengths, "speech_lengths": speech_lengths,
} }
return batch return batch
def vits_single_spk_batch_fn(examples):
"""
Returns:
Dict[str, Any]:
- text (Tensor): Text index tensor (B, T_text).
- text_lengths (Tensor): Text length tensor (B,).
- feats (Tensor): Feature tensor (B, T_feats, aux_channels).
- feats_lengths (Tensor): Feature length tensor (B,).
- speech (Tensor): Speech waveform tensor (B, T_wav).
"""
# fields = ["text", "text_lengths", "feats", "feats_lengths", "speech"]
text = [np.array(item["text"], dtype=np.int64) for item in examples]
feats = [np.array(item["feats"], dtype=np.float32) for item in examples]
speech = [np.array(item["wave"], dtype=np.float32) for item in examples]
text_lengths = [
np.array(item["text_lengths"], dtype=np.int64) for item in examples
]
feats_lengths = [
np.array(item["feats_lengths"], dtype=np.int64) for item in examples
]
text = batch_sequences(text)
feats = batch_sequences(feats)
speech = batch_sequences(speech)
# convert each batch to paddle.Tensor
text = paddle.to_tensor(text)
feats = paddle.to_tensor(feats)
text_lengths = paddle.to_tensor(text_lengths)
feats_lengths = paddle.to_tensor(feats_lengths)
batch = {
"text": text,
"text_lengths": text_lengths,
"feats": feats,
"feats_lengths": feats_lengths,
"speech": speech
}
return batch
...@@ -167,7 +167,6 @@ def batch_spec(minibatch, pad_value=0., time_major=False, dtype=np.float32): ...@@ -167,7 +167,6 @@ def batch_spec(minibatch, pad_value=0., time_major=False, dtype=np.float32):
def batch_sequences(sequences, axis=0, pad_value=0): def batch_sequences(sequences, axis=0, pad_value=0):
# import pdb; pdb.set_trace()
seq = sequences[0] seq = sequences[0]
ndim = seq.ndim ndim = seq.ndim
if axis < 0: if axis < 0:
......
...@@ -20,15 +20,14 @@ from scipy.interpolate import interp1d ...@@ -20,15 +20,14 @@ from scipy.interpolate import interp1d
class LogMelFBank(): class LogMelFBank():
def __init__(self, def __init__(self,
sr=24000, sr: int=24000,
n_fft=2048, n_fft: int=2048,
hop_length=300, hop_length: int=300,
win_length=None, win_length: int=None,
window="hann", window: str="hann",
n_mels=80, n_mels: int=80,
fmin=80, fmin: int=80,
fmax=7600, fmax: int=7600):
eps=1e-10):
self.sr = sr self.sr = sr
# stft # stft
self.n_fft = n_fft self.n_fft = n_fft
...@@ -54,7 +53,7 @@ class LogMelFBank(): ...@@ -54,7 +53,7 @@ class LogMelFBank():
fmax=self.fmax) fmax=self.fmax)
return mel_filter return mel_filter
def _stft(self, wav): def _stft(self, wav: np.ndarray):
D = librosa.core.stft( D = librosa.core.stft(
wav, wav,
n_fft=self.n_fft, n_fft=self.n_fft,
...@@ -65,11 +64,11 @@ class LogMelFBank(): ...@@ -65,11 +64,11 @@ class LogMelFBank():
pad_mode=self.pad_mode) pad_mode=self.pad_mode)
return D return D
def _spectrogram(self, wav): def _spectrogram(self, wav: np.ndarray):
D = self._stft(wav) D = self._stft(wav)
return np.abs(D) return np.abs(D)
def _mel_spectrogram(self, wav): def _mel_spectrogram(self, wav: np.ndarray):
S = self._spectrogram(wav) S = self._spectrogram(wav)
mel = np.dot(self.mel_filter, S) mel = np.dot(self.mel_filter, S)
return mel return mel
...@@ -90,14 +89,18 @@ class LogMelFBank(): ...@@ -90,14 +89,18 @@ class LogMelFBank():
class Pitch(): class Pitch():
def __init__(self, sr=24000, hop_length=300, f0min=80, f0max=7600): def __init__(self,
sr: int=24000,
hop_length: int=300,
f0min: int=80,
f0max: int=7600):
self.sr = sr self.sr = sr
self.hop_length = hop_length self.hop_length = hop_length
self.f0min = f0min self.f0min = f0min
self.f0max = f0max self.f0max = f0max
def _convert_to_continuous_f0(self, f0: np.array) -> np.array: def _convert_to_continuous_f0(self, f0: np.ndarray) -> np.ndarray:
if (f0 == 0).all(): if (f0 == 0).all():
print("All frames seems to be unvoiced.") print("All frames seems to be unvoiced.")
return f0 return f0
...@@ -120,9 +123,9 @@ class Pitch(): ...@@ -120,9 +123,9 @@ class Pitch():
return f0 return f0
def _calculate_f0(self, def _calculate_f0(self,
input: np.array, input: np.ndarray,
use_continuous_f0=True, use_continuous_f0: bool=True,
use_log_f0=True) -> np.array: use_log_f0: bool=True) -> np.ndarray:
input = input.astype(np.float) input = input.astype(np.float)
frame_period = 1000 * self.hop_length / self.sr frame_period = 1000 * self.hop_length / self.sr
f0, timeaxis = pyworld.dio( f0, timeaxis = pyworld.dio(
...@@ -139,7 +142,8 @@ class Pitch(): ...@@ -139,7 +142,8 @@ class Pitch():
f0[nonzero_idxs] = np.log(f0[nonzero_idxs]) f0[nonzero_idxs] = np.log(f0[nonzero_idxs])
return f0.reshape(-1) return f0.reshape(-1)
def _average_by_duration(self, input: np.array, d: np.array) -> np.array: def _average_by_duration(self, input: np.ndarray,
d: np.ndarray) -> np.ndarray:
d_cumsum = np.pad(d.cumsum(0), (1, 0), 'constant') d_cumsum = np.pad(d.cumsum(0), (1, 0), 'constant')
arr_list = [] arr_list = []
for start, end in zip(d_cumsum[:-1], d_cumsum[1:]): for start, end in zip(d_cumsum[:-1], d_cumsum[1:]):
...@@ -154,11 +158,11 @@ class Pitch(): ...@@ -154,11 +158,11 @@ class Pitch():
return arr_list return arr_list
def get_pitch(self, def get_pitch(self,
wav, wav: np.ndarray,
use_continuous_f0=True, use_continuous_f0: bool=True,
use_log_f0=True, use_log_f0: bool=True,
use_token_averaged_f0=True, use_token_averaged_f0: bool=True,
duration=None): duration: np.ndarray=None):
f0 = self._calculate_f0(wav, use_continuous_f0, use_log_f0) f0 = self._calculate_f0(wav, use_continuous_f0, use_log_f0)
if use_token_averaged_f0 and duration is not None: if use_token_averaged_f0 and duration is not None:
f0 = self._average_by_duration(f0, duration) f0 = self._average_by_duration(f0, duration)
...@@ -167,15 +171,13 @@ class Pitch(): ...@@ -167,15 +171,13 @@ class Pitch():
class Energy(): class Energy():
def __init__(self, def __init__(self,
sr=24000, n_fft: int=2048,
n_fft=2048, hop_length: int=300,
hop_length=300, win_length: int=None,
win_length=None, window: str="hann",
window="hann", center: bool=True,
center=True, pad_mode: str="reflect"):
pad_mode="reflect"):
self.sr = sr
self.n_fft = n_fft self.n_fft = n_fft
self.win_length = win_length self.win_length = win_length
self.hop_length = hop_length self.hop_length = hop_length
...@@ -183,7 +185,7 @@ class Energy(): ...@@ -183,7 +185,7 @@ class Energy():
self.center = center self.center = center
self.pad_mode = pad_mode self.pad_mode = pad_mode
def _stft(self, wav): def _stft(self, wav: np.ndarray):
D = librosa.core.stft( D = librosa.core.stft(
wav, wav,
n_fft=self.n_fft, n_fft=self.n_fft,
...@@ -194,7 +196,7 @@ class Energy(): ...@@ -194,7 +196,7 @@ class Energy():
pad_mode=self.pad_mode) pad_mode=self.pad_mode)
return D return D
def _calculate_energy(self, input): def _calculate_energy(self, input: np.ndarray):
input = input.astype(np.float32) input = input.astype(np.float32)
input_stft = self._stft(input) input_stft = self._stft(input)
input_power = np.abs(input_stft)**2 input_power = np.abs(input_stft)**2
...@@ -203,7 +205,8 @@ class Energy(): ...@@ -203,7 +205,8 @@ class Energy():
np.sum(input_power, axis=0), a_min=1.0e-10, a_max=float('inf'))) np.sum(input_power, axis=0), a_min=1.0e-10, a_max=float('inf')))
return energy return energy
def _average_by_duration(self, input: np.array, d: np.array) -> np.array: def _average_by_duration(self, input: np.ndarray,
d: np.ndarray) -> np.ndarray:
d_cumsum = np.pad(d.cumsum(0), (1, 0), 'constant') d_cumsum = np.pad(d.cumsum(0), (1, 0), 'constant')
arr_list = [] arr_list = []
for start, end in zip(d_cumsum[:-1], d_cumsum[1:]): for start, end in zip(d_cumsum[:-1], d_cumsum[1:]):
...@@ -214,8 +217,49 @@ class Energy(): ...@@ -214,8 +217,49 @@ class Energy():
arr_list = np.expand_dims(np.array(arr_list), 0).T arr_list = np.expand_dims(np.array(arr_list), 0).T
return arr_list return arr_list
def get_energy(self, wav, use_token_averaged_energy=True, duration=None): def get_energy(self,
wav: np.ndarray,
use_token_averaged_energy: bool=True,
duration: np.ndarray=None):
energy = self._calculate_energy(wav) energy = self._calculate_energy(wav)
if use_token_averaged_energy and duration is not None: if use_token_averaged_energy and duration is not None:
energy = self._average_by_duration(energy, duration) energy = self._average_by_duration(energy, duration)
return energy return energy
class LinearSpectrogram():
def __init__(
self,
n_fft: int=1024,
win_length: int=None,
hop_length: int=256,
window: str="hann",
center: bool=True, ):
self.n_fft = n_fft
self.hop_length = hop_length
self.win_length = win_length
self.window = window
self.center = center
self.n_fft = n_fft
self.pad_mode = "reflect"
def _stft(self, wav: np.ndarray):
D = librosa.core.stft(
wav,
n_fft=self.n_fft,
hop_length=self.hop_length,
win_length=self.win_length,
window=self.window,
center=self.center,
pad_mode=self.pad_mode)
return D
def _spectrogram(self, wav: np.ndarray):
D = self._stft(wav)
return np.abs(D)
def get_linear_spectrogram(self, wav: np.ndarray):
linear_spectrogram = self._spectrogram(wav)
linear_spectrogram = np.clip(
linear_spectrogram, a_min=1e-10, a_max=float("inf"))
return linear_spectrogram.T
...@@ -147,10 +147,17 @@ def process_sentences(config, ...@@ -147,10 +147,17 @@ def process_sentences(config,
spk_emb_dir: Path=None): spk_emb_dir: Path=None):
if nprocs == 1: if nprocs == 1:
results = [] results = []
for fp in fps: for fp in tqdm.tqdm(fps, total=len(fps)):
record = process_sentence(config, fp, sentences, output_dir, record = process_sentence(
mel_extractor, pitch_extractor, config=config,
energy_extractor, cut_sil, spk_emb_dir) fp=fp,
sentences=sentences,
output_dir=output_dir,
mel_extractor=mel_extractor,
pitch_extractor=pitch_extractor,
energy_extractor=energy_extractor,
cut_sil=cut_sil,
spk_emb_dir=spk_emb_dir)
if record: if record:
results.append(record) results.append(record)
else: else:
...@@ -325,7 +332,6 @@ def main(): ...@@ -325,7 +332,6 @@ def main():
f0min=config.f0min, f0min=config.f0min,
f0max=config.f0max) f0max=config.f0max)
energy_extractor = Energy( energy_extractor = Energy(
sr=config.fs,
n_fft=config.n_fft, n_fft=config.n_fft,
hop_length=config.n_shift, hop_length=config.n_shift,
win_length=config.win_length, win_length=config.win_length,
...@@ -334,36 +340,36 @@ def main(): ...@@ -334,36 +340,36 @@ def main():
# process for the 3 sections # process for the 3 sections
if train_wav_files: if train_wav_files:
process_sentences( process_sentences(
config, config=config,
train_wav_files, fps=train_wav_files,
sentences, sentences=sentences,
train_dump_dir, output_dir=train_dump_dir,
mel_extractor, mel_extractor=mel_extractor,
pitch_extractor, pitch_extractor=pitch_extractor,
energy_extractor, energy_extractor=energy_extractor,
nprocs=args.num_cpu, nprocs=args.num_cpu,
cut_sil=args.cut_sil, cut_sil=args.cut_sil,
spk_emb_dir=spk_emb_dir) spk_emb_dir=spk_emb_dir)
if dev_wav_files: if dev_wav_files:
process_sentences( process_sentences(
config, config=config,
dev_wav_files, fps=dev_wav_files,
sentences, sentences=sentences,
dev_dump_dir, output_dir=dev_dump_dir,
mel_extractor, mel_extractor=mel_extractor,
pitch_extractor, pitch_extractor=pitch_extractor,
energy_extractor, energy_extractor=energy_extractor,
cut_sil=args.cut_sil, cut_sil=args.cut_sil,
spk_emb_dir=spk_emb_dir) spk_emb_dir=spk_emb_dir)
if test_wav_files: if test_wav_files:
process_sentences( process_sentences(
config, config=config,
test_wav_files, fps=test_wav_files,
sentences, sentences=sentences,
test_dump_dir, output_dir=test_dump_dir,
mel_extractor, mel_extractor=mel_extractor,
pitch_extractor, pitch_extractor=pitch_extractor,
energy_extractor, energy_extractor=energy_extractor,
nprocs=args.num_cpu, nprocs=args.num_cpu,
cut_sil=args.cut_sil, cut_sil=args.cut_sil,
spk_emb_dir=spk_emb_dir) spk_emb_dir=spk_emb_dir)
......
...@@ -88,15 +88,17 @@ def process_sentence(config: Dict[str, Any], ...@@ -88,15 +88,17 @@ def process_sentence(config: Dict[str, Any],
y, (0, num_frames * config.n_shift - y.size), mode="reflect") y, (0, num_frames * config.n_shift - y.size), mode="reflect")
else: else:
y = y[:num_frames * config.n_shift] y = y[:num_frames * config.n_shift]
num_sample = y.shape[0] num_samples = y.shape[0]
mel_path = output_dir / (utt_id + "_feats.npy") mel_path = output_dir / (utt_id + "_feats.npy")
wav_path = output_dir / (utt_id + "_wave.npy") wav_path = output_dir / (utt_id + "_wave.npy")
np.save(wav_path, y) # (num_samples, ) # (num_samples, )
np.save(mel_path, logmel) # (num_frames, n_mels) np.save(wav_path, y)
# (num_frames, n_mels)
np.save(mel_path, logmel)
record = { record = {
"utt_id": utt_id, "utt_id": utt_id,
"num_samples": num_sample, "num_samples": num_samples,
"num_frames": num_frames, "num_frames": num_frames,
"feats": str(mel_path), "feats": str(mel_path),
"wave": str(wav_path), "wave": str(wav_path),
...@@ -111,11 +113,17 @@ def process_sentences(config, ...@@ -111,11 +113,17 @@ def process_sentences(config,
mel_extractor=None, mel_extractor=None,
nprocs: int=1, nprocs: int=1,
cut_sil: bool=True): cut_sil: bool=True):
if nprocs == 1: if nprocs == 1:
results = [] results = []
for fp in tqdm.tqdm(fps, total=len(fps)): for fp in tqdm.tqdm(fps, total=len(fps)):
record = process_sentence(config, fp, sentences, output_dir, record = process_sentence(
mel_extractor, cut_sil) config=config,
fp=fp,
sentences=sentences,
output_dir=output_dir,
mel_extractor=mel_extractor,
cut_sil=cut_sil)
if record: if record:
results.append(record) results.append(record)
else: else:
...@@ -150,7 +158,7 @@ def main(): ...@@ -150,7 +158,7 @@ def main():
"--dataset", "--dataset",
default="baker", default="baker",
type=str, type=str,
help="name of dataset, should in {baker, ljspeech, vctk} now") help="name of dataset, should in {baker, aishell3, ljspeech, vctk} now")
parser.add_argument( parser.add_argument(
"--rootdir", default=None, type=str, help="directory to dataset.") "--rootdir", default=None, type=str, help="directory to dataset.")
parser.add_argument( parser.add_argument(
...@@ -264,28 +272,28 @@ def main(): ...@@ -264,28 +272,28 @@ def main():
# process for the 3 sections # process for the 3 sections
if train_wav_files: if train_wav_files:
process_sentences( process_sentences(
config, config=config,
train_wav_files, fps=train_wav_files,
sentences, sentences=sentences,
train_dump_dir, output_dir=train_dump_dir,
mel_extractor=mel_extractor, mel_extractor=mel_extractor,
nprocs=args.num_cpu, nprocs=args.num_cpu,
cut_sil=args.cut_sil) cut_sil=args.cut_sil)
if dev_wav_files: if dev_wav_files:
process_sentences( process_sentences(
config, config=config,
dev_wav_files, fps=dev_wav_files,
sentences, sentences=sentences,
dev_dump_dir, output_dir=dev_dump_dir,
mel_extractor=mel_extractor, mel_extractor=mel_extractor,
nprocs=args.num_cpu, nprocs=args.num_cpu,
cut_sil=args.cut_sil) cut_sil=args.cut_sil)
if test_wav_files: if test_wav_files:
process_sentences( process_sentences(
config, config=config,
test_wav_files, fps=test_wav_files,
sentences, sentences=sentences,
test_dump_dir, output_dir=test_dump_dir,
mel_extractor=mel_extractor, mel_extractor=mel_extractor,
nprocs=args.num_cpu, nprocs=args.num_cpu,
cut_sil=args.cut_sil) cut_sil=args.cut_sil)
......
...@@ -126,11 +126,17 @@ def process_sentences(config, ...@@ -126,11 +126,17 @@ def process_sentences(config,
nprocs: int=1, nprocs: int=1,
cut_sil: bool=True, cut_sil: bool=True,
use_relative_path: bool=False): use_relative_path: bool=False):
if nprocs == 1: if nprocs == 1:
results = [] results = []
for fp in tqdm.tqdm(fps, total=len(fps)): for fp in tqdm.tqdm(fps, total=len(fps)):
record = process_sentence(config, fp, sentences, output_dir, record = process_sentence(
mel_extractor, cut_sil) config=config,
fp=fp,
sentences=sentences,
output_dir=output_dir,
mel_extractor=mel_extractor,
cut_sil=cut_sil)
if record: if record:
results.append(record) results.append(record)
else: else:
...@@ -268,30 +274,30 @@ def main(): ...@@ -268,30 +274,30 @@ def main():
# process for the 3 sections # process for the 3 sections
if train_wav_files: if train_wav_files:
process_sentences( process_sentences(
config, config=config,
train_wav_files, fps=train_wav_files,
sentences, sentences=sentences,
train_dump_dir, output_dir=train_dump_dir,
mel_extractor, mel_extractor=mel_extractor,
nprocs=args.num_cpu, nprocs=args.num_cpu,
cut_sil=args.cut_sil, cut_sil=args.cut_sil,
use_relative_path=args.use_relative_path) use_relative_path=args.use_relative_path)
if dev_wav_files: if dev_wav_files:
process_sentences( process_sentences(
config, config=config,
dev_wav_files, fps=dev_wav_files,
sentences, sentences=sentences,
dev_dump_dir, output_dir=dev_dump_dir,
mel_extractor, mel_extractor=mel_extractor,
cut_sil=args.cut_sil, cut_sil=args.cut_sil,
use_relative_path=args.use_relative_path) use_relative_path=args.use_relative_path)
if test_wav_files: if test_wav_files:
process_sentences( process_sentences(
config, config=config,
test_wav_files, fps=test_wav_files,
sentences, sentences=sentences,
test_dump_dir, output_dir=test_dump_dir,
mel_extractor, mel_extractor=mel_extractor,
nprocs=args.num_cpu, nprocs=args.num_cpu,
cut_sil=args.cut_sil, cut_sil=args.cut_sil,
use_relative_path=args.use_relative_path) use_relative_path=args.use_relative_path)
......
...@@ -176,7 +176,10 @@ def main(): ...@@ -176,7 +176,10 @@ def main():
parser.add_argument( parser.add_argument(
"--ngpu", type=int, default=1, help="if ngpu == 0, use cpu or xpu.") "--ngpu", type=int, default=1, help="if ngpu == 0, use cpu or xpu.")
parser.add_argument( parser.add_argument(
"--nxpu", type=int, default=0, help="if nxpu == 0 and ngpu == 0, use cpu.") "--nxpu",
type=int,
default=0,
help="if nxpu == 0 and ngpu == 0, use cpu.")
args, _ = parser.parse_known_args() args, _ = parser.parse_known_args()
......
...@@ -188,7 +188,10 @@ def main(): ...@@ -188,7 +188,10 @@ def main():
parser.add_argument("--dev-metadata", type=str, help="dev data.") parser.add_argument("--dev-metadata", type=str, help="dev data.")
parser.add_argument("--output-dir", type=str, help="output dir.") parser.add_argument("--output-dir", type=str, help="output dir.")
parser.add_argument( parser.add_argument(
"--nxpu", type=int, default=0, help="if nxpu == 0 and ngpu == 0, use cpu.") "--nxpu",
type=int,
default=0,
help="if nxpu == 0 and ngpu == 0, use cpu.")
parser.add_argument( parser.add_argument(
"--ngpu", type=int, default=1, help="if ngpu == 0, use cpu or xpu") "--ngpu", type=int, default=1, help="if ngpu == 0, use cpu or xpu")
......
...@@ -27,11 +27,11 @@ from paddle import jit ...@@ -27,11 +27,11 @@ from paddle import jit
from paddle.static import InputSpec from paddle.static import InputSpec
from yacs.config import CfgNode from yacs.config import CfgNode
from paddlespeech.s2t.utils.dynamic_import import dynamic_import
from paddlespeech.t2s.datasets.data_table import DataTable from paddlespeech.t2s.datasets.data_table import DataTable
from paddlespeech.t2s.frontend import English from paddlespeech.t2s.frontend import English
from paddlespeech.t2s.frontend.zh_frontend import Frontend from paddlespeech.t2s.frontend.zh_frontend import Frontend
from paddlespeech.t2s.modules.normalizer import ZScore from paddlespeech.t2s.modules.normalizer import ZScore
from paddlespeech.utils.dynamic_import import dynamic_import
model_alias = { model_alias = {
# acoustic model # acoustic model
......
...@@ -125,7 +125,7 @@ def evaluate(args): ...@@ -125,7 +125,7 @@ def evaluate(args):
def parse_args(): def parse_args():
# parse args and config and redirect to train_sp # parse args and config
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
description="Synthesize with acoustic model & vocoder") description="Synthesize with acoustic model & vocoder")
# acoustic model # acoustic model
...@@ -143,7 +143,7 @@ def parse_args(): ...@@ -143,7 +143,7 @@ def parse_args():
'--am_config', '--am_config',
type=str, type=str,
default=None, default=None,
help='Config of acoustic model. Use deault config when it is None.') help='Config of acoustic model.')
parser.add_argument( parser.add_argument(
'--am_ckpt', '--am_ckpt',
type=str, type=str,
...@@ -182,7 +182,7 @@ def parse_args(): ...@@ -182,7 +182,7 @@ def parse_args():
'--voc_config', '--voc_config',
type=str, type=str,
default=None, default=None,
help='Config of voc. Use deault config when it is None.') help='Config of voc.')
parser.add_argument( parser.add_argument(
'--voc_ckpt', type=str, default=None, help='Checkpoint file of voc.') '--voc_ckpt', type=str, default=None, help='Checkpoint file of voc.')
parser.add_argument( parser.add_argument(
......
...@@ -159,7 +159,7 @@ def evaluate(args): ...@@ -159,7 +159,7 @@ def evaluate(args):
def parse_args(): def parse_args():
# parse args and config and redirect to train_sp # parse args and config
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
description="Synthesize with acoustic model & vocoder") description="Synthesize with acoustic model & vocoder")
# acoustic model # acoustic model
...@@ -177,7 +177,7 @@ def parse_args(): ...@@ -177,7 +177,7 @@ def parse_args():
'--am_config', '--am_config',
type=str, type=str,
default=None, default=None,
help='Config of acoustic model. Use deault config when it is None.') help='Config of acoustic model.')
parser.add_argument( parser.add_argument(
'--am_ckpt', '--am_ckpt',
type=str, type=str,
...@@ -223,7 +223,7 @@ def parse_args(): ...@@ -223,7 +223,7 @@ def parse_args():
'--voc_config', '--voc_config',
type=str, type=str,
default=None, default=None,
help='Config of voc. Use deault config when it is None.') help='Config of voc.')
parser.add_argument( parser.add_argument(
'--voc_ckpt', type=str, default=None, help='Checkpoint file of voc.') '--voc_ckpt', type=str, default=None, help='Checkpoint file of voc.')
parser.add_argument( parser.add_argument(
......
...@@ -24,7 +24,6 @@ from paddle.static import InputSpec ...@@ -24,7 +24,6 @@ from paddle.static import InputSpec
from timer import timer from timer import timer
from yacs.config import CfgNode from yacs.config import CfgNode
from paddlespeech.s2t.utils.dynamic_import import dynamic_import
from paddlespeech.t2s.exps.syn_utils import denorm from paddlespeech.t2s.exps.syn_utils import denorm
from paddlespeech.t2s.exps.syn_utils import get_chunks from paddlespeech.t2s.exps.syn_utils import get_chunks
from paddlespeech.t2s.exps.syn_utils import get_frontend from paddlespeech.t2s.exps.syn_utils import get_frontend
...@@ -33,6 +32,7 @@ from paddlespeech.t2s.exps.syn_utils import get_voc_inference ...@@ -33,6 +32,7 @@ from paddlespeech.t2s.exps.syn_utils import get_voc_inference
from paddlespeech.t2s.exps.syn_utils import model_alias from paddlespeech.t2s.exps.syn_utils import model_alias
from paddlespeech.t2s.exps.syn_utils import voc_to_static from paddlespeech.t2s.exps.syn_utils import voc_to_static
from paddlespeech.t2s.utils import str2bool from paddlespeech.t2s.utils import str2bool
from paddlespeech.utils.dynamic_import import dynamic_import
def evaluate(args): def evaluate(args):
...@@ -201,7 +201,7 @@ def evaluate(args): ...@@ -201,7 +201,7 @@ def evaluate(args):
def parse_args(): def parse_args():
# parse args and config and redirect to train_sp # parse args and config
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
description="Synthesize with acoustic model & vocoder") description="Synthesize with acoustic model & vocoder")
# acoustic model # acoustic model
...@@ -212,10 +212,7 @@ def parse_args(): ...@@ -212,10 +212,7 @@ def parse_args():
choices=['fastspeech2_csmsc'], choices=['fastspeech2_csmsc'],
help='Choose acoustic model type of tts task.') help='Choose acoustic model type of tts task.')
parser.add_argument( parser.add_argument(
'--am_config', '--am_config', type=str, default=None, help='Config of acoustic model.')
type=str,
default=None,
help='Config of acoustic model. Use deault config when it is None.')
parser.add_argument( parser.add_argument(
'--am_ckpt', '--am_ckpt',
type=str, type=str,
...@@ -245,10 +242,7 @@ def parse_args(): ...@@ -245,10 +242,7 @@ def parse_args():
], ],
help='Choose vocoder type of tts task.') help='Choose vocoder type of tts task.')
parser.add_argument( parser.add_argument(
'--voc_config', '--voc_config', type=str, default=None, help='Config of voc.')
type=str,
default=None,
help='Config of voc. Use deault config when it is None.')
parser.add_argument( parser.add_argument(
'--voc_ckpt', type=str, default=None, help='Checkpoint file of voc.') '--voc_ckpt', type=str, default=None, help='Checkpoint file of voc.')
parser.add_argument( parser.add_argument(
......
...@@ -125,9 +125,15 @@ def process_sentences(config, ...@@ -125,9 +125,15 @@ def process_sentences(config,
spk_emb_dir: Path=None): spk_emb_dir: Path=None):
if nprocs == 1: if nprocs == 1:
results = [] results = []
for fp in fps: for fp in tqdm.tqdm(fps, total=len(fps)):
record = process_sentence(config, fp, sentences, output_dir, record = process_sentence(
mel_extractor, cut_sil, spk_emb_dir) config=config,
fp=fp,
sentences=sentences,
output_dir=output_dir,
mel_extractor=mel_extractor,
cut_sil=cut_sil,
spk_emb_dir=spk_emb_dir)
if record: if record:
results.append(record) results.append(record)
else: else:
...@@ -299,30 +305,30 @@ def main(): ...@@ -299,30 +305,30 @@ def main():
# process for the 3 sections # process for the 3 sections
if train_wav_files: if train_wav_files:
process_sentences( process_sentences(
config, config=config,
train_wav_files, fps=train_wav_files,
sentences, sentences=sentences,
train_dump_dir, output_dir=train_dump_dir,
mel_extractor, mel_extractor=mel_extractor,
nprocs=args.num_cpu, nprocs=args.num_cpu,
cut_sil=args.cut_sil, cut_sil=args.cut_sil,
spk_emb_dir=spk_emb_dir) spk_emb_dir=spk_emb_dir)
if dev_wav_files: if dev_wav_files:
process_sentences( process_sentences(
config, config=config,
dev_wav_files, fps=dev_wav_files,
sentences, sentences=sentences,
dev_dump_dir, output_dir=dev_dump_dir,
mel_extractor, mel_extractor=mel_extractor,
cut_sil=args.cut_sil, cut_sil=args.cut_sil,
spk_emb_dir=spk_emb_dir) spk_emb_dir=spk_emb_dir)
if test_wav_files: if test_wav_files:
process_sentences( process_sentences(
config, config=config,
test_wav_files, fps=test_wav_files,
sentences, sentences=sentences,
test_dump_dir, output_dir=test_dump_dir,
mel_extractor, mel_extractor=mel_extractor,
nprocs=args.num_cpu, nprocs=args.num_cpu,
cut_sil=args.cut_sil, cut_sil=args.cut_sil,
spk_emb_dir=spk_emb_dir) spk_emb_dir=spk_emb_dir)
......
...@@ -125,11 +125,16 @@ def process_sentences(config, ...@@ -125,11 +125,16 @@ def process_sentences(config,
output_dir: Path, output_dir: Path,
mel_extractor=None, mel_extractor=None,
nprocs: int=1): nprocs: int=1):
if nprocs == 1: if nprocs == 1:
results = [] results = []
for fp in tqdm.tqdm(fps, total=len(fps)): for fp in tqdm.tqdm(fps, total=len(fps)):
record = process_sentence(config, fp, sentences, output_dir, record = process_sentence(
mel_extractor) config=config,
fp=fp,
sentences=sentences,
output_dir=output_dir,
mel_extractor=mel_extractor)
if record: if record:
results.append(record) results.append(record)
else: else:
...@@ -247,27 +252,27 @@ def main(): ...@@ -247,27 +252,27 @@ def main():
# process for the 3 sections # process for the 3 sections
if train_wav_files: if train_wav_files:
process_sentences( process_sentences(
config, config=config,
train_wav_files, fps=train_wav_files,
sentences, sentences=sentences,
train_dump_dir, output_dir=train_dump_dir,
mel_extractor, mel_extractor=mel_extractor,
nprocs=args.num_cpu) nprocs=args.num_cpu)
if dev_wav_files: if dev_wav_files:
process_sentences( process_sentences(
config, config=config,
dev_wav_files, fps=dev_wav_files,
sentences, sentences=sentences,
dev_dump_dir, output_dir=dev_dump_dir,
mel_extractor, mel_extractor=mel_extractor,
nprocs=args.num_cpu) nprocs=args.num_cpu)
if test_wav_files: if test_wav_files:
process_sentences( process_sentences(
config, config=config,
test_wav_files, fps=test_wav_files,
sentences, sentences=sentences,
test_dump_dir, output_dir=test_dump_dir,
mel_extractor, mel_extractor=mel_extractor,
nprocs=args.num_cpu) nprocs=args.num_cpu)
......
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Normalize feature files and dump them."""
import argparse
import logging
from operator import itemgetter
from pathlib import Path
import jsonlines
import numpy as np
from sklearn.preprocessing import StandardScaler
from tqdm import tqdm
from paddlespeech.t2s.datasets.data_table import DataTable
def main():
"""Run preprocessing process."""
parser = argparse.ArgumentParser(
description="Normalize dumped raw features (See detail in parallel_wavegan/bin/normalize.py)."
)
parser.add_argument(
"--metadata",
type=str,
required=True,
help="directory including feature files to be normalized. "
"you need to specify either *-scp or rootdir.")
parser.add_argument(
"--dumpdir",
type=str,
required=True,
help="directory to dump normalized feature files.")
parser.add_argument(
"--feats-stats",
type=str,
required=True,
help="speech statistics file.")
parser.add_argument(
"--skip-wav-copy",
default=False,
action="store_true",
help="whether to skip the copy of wav files.")
parser.add_argument(
"--phones-dict", type=str, default=None, help="phone vocabulary file.")
parser.add_argument(
"--speaker-dict", type=str, default=None, help="speaker id map file.")
parser.add_argument(
"--verbose",
type=int,
default=1,
help="logging level. higher is more logging. (default=1)")
args = parser.parse_args()
# set logger
if args.verbose > 1:
logging.basicConfig(
level=logging.DEBUG,
format="%(asctime)s (%(module)s:%(lineno)d) %(levelname)s: %(message)s"
)
elif args.verbose > 0:
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s (%(module)s:%(lineno)d) %(levelname)s: %(message)s"
)
else:
logging.basicConfig(
level=logging.WARN,
format="%(asctime)s (%(module)s:%(lineno)d) %(levelname)s: %(message)s"
)
logging.warning('Skip DEBUG/INFO messages')
dumpdir = Path(args.dumpdir).expanduser()
# use absolute path
dumpdir = dumpdir.resolve()
dumpdir.mkdir(parents=True, exist_ok=True)
# get dataset
with jsonlines.open(args.metadata, 'r') as reader:
metadata = list(reader)
dataset = DataTable(
metadata,
converters={
"feats": np.load,
"wave": None if args.skip_wav_copy else np.load,
})
logging.info(f"The number of files = {len(dataset)}.")
# restore scaler
feats_scaler = StandardScaler()
feats_scaler.mean_ = np.load(args.feats_stats)[0]
feats_scaler.scale_ = np.load(args.feats_stats)[1]
feats_scaler.n_features_in_ = feats_scaler.mean_.shape[0]
vocab_phones = {}
with open(args.phones_dict, 'rt') as f:
phn_id = [line.strip().split() for line in f.readlines()]
for phn, id in phn_id:
vocab_phones[phn] = int(id)
vocab_speaker = {}
with open(args.speaker_dict, 'rt') as f:
spk_id = [line.strip().split() for line in f.readlines()]
for spk, id in spk_id:
vocab_speaker[spk] = int(id)
# process each file
output_metadata = []
for item in tqdm(dataset):
utt_id = item['utt_id']
feats = item['feats']
wave = item['wave']
# normalize
feats = feats_scaler.transform(feats)
feats_path = dumpdir / f"{utt_id}_feats.npy"
np.save(feats_path, feats.astype(np.float32), allow_pickle=False)
if not args.skip_wav_copy:
wav_path = dumpdir / f"{utt_id}_wave.npy"
np.save(wav_path, wave.astype(np.float32), allow_pickle=False)
else:
wav_path = wave
phone_ids = [vocab_phones[p] for p in item['phones']]
spk_id = vocab_speaker[item["speaker"]]
record = {
"utt_id": item['utt_id'],
"text": phone_ids,
"text_lengths": item['text_lengths'],
'feats': str(feats_path),
"feats_lengths": item['feats_lengths'],
"wave": str(wav_path),
"spk_id": spk_id,
}
# add spk_emb for voice cloning
if "spk_emb" in item:
record["spk_emb"] = str(item["spk_emb"])
output_metadata.append(record)
output_metadata.sort(key=itemgetter('utt_id'))
output_metadata_path = Path(args.dumpdir) / "metadata.jsonl"
with jsonlines.open(output_metadata_path, 'w') as writer:
for item in output_metadata:
writer.write(item)
logging.info(f"metadata dumped into {output_metadata_path}")
if __name__ == "__main__":
main()
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册