未验证 提交 5c72e8ce 编写于 作者: H Hui Zhang 提交者: GitHub

Merge pull request #2253 from SmileGoat/add_pitch2

[audio] merge develop

要显示的变更太多。

To preserve performance only 1000 of 1000+ files are displayed.
......@@ -21,5 +21,6 @@ python:
version: 3.7
install:
- requirements: docs/requirements.txt
- method: setuptools
path: .
system_packages: true
\ No newline at end of file
include paddlespeech/t2s/exps/*.txt
include paddlespeech/t2s/frontend/*.yaml
\ No newline at end of file
([简体中文](./README_cn.md)|English)
<p align="center">
<img src="./docs/images/PaddleSpeech_logo.png" />
......@@ -24,14 +25,16 @@
| <a href="#documents"> Documents </a>
| <a href="#model-list"> Models List </a>
| <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio Courses </a>
| <a href="https://arxiv.org/abs/2205.12007"> Paper </a>
| <a href="https://arxiv.org/abs/2205.12007"> NAACL2022 Best Demo Award Paper </a>
| <a href="https://gitee.com/paddlepaddle/PaddleSpeech"> Gitee </a>
</h4>
</div>
------------------------------------------------------------------------------------
**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.
**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.
**PaddleSpeech** won the [NAACL2022 Best Demo Award](https://2022.naacl.org/blog/best-demo-award/), please check out our paper on [Arxiv](https://arxiv.org/abs/2205.12007).
##### Speech Recognition
......@@ -176,7 +179,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
## Installation
We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7*.
We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7* and *paddlepaddle>=2.3.1*.
Up to now, **Linux** supports CLI for the all our tasks, **Mac OSX** and **Windows** only supports PaddleSpeech CLI for Audio Classification, Speech-to-Text and Text-to-Speech. To install `PaddleSpeech`, please see [installation](./docs/source/install.md).
......@@ -494,6 +497,14 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
<a href = "./examples/aishell3/vc1">ge2e-fastspeech2-aishell3</a>
</td>
</tr>
<tr>
<td rowspan="3">End-to-End</td>
<td>VITS</td>
<td >CSMSC</td>
<td>
<a href = "./examples/csmsc/vits">VITS-csmsc</a>
</td>
</tr>
</tbody>
</table>
......@@ -688,6 +699,7 @@ You are warmly welcome to submit questions in [discussions](https://github.com/P
## Acknowledgement
- Many thanks to [BarryKCL](https://github.com/BarryKCL) improved TTS Chinses frontend based on [G2PW](https://github.com/GitYCC/g2pW)
- Many thanks to [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) for years of attention, constructive advice and great help.
- Many thanks to [mymagicpower](https://github.com/mymagicpower) for the Java implementation of ASR upon [short](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk) and [long](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk) audio files.
- Many thanks to [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) for developing Virtual Uploader(VUP)/Virtual YouTuber(VTuber) with PaddleSpeech TTS function.
......@@ -696,6 +708,8 @@ You are warmly welcome to submit questions in [discussions](https://github.com/P
- Many thanks to [awmmmm](https://github.com/awmmmm) for contributing fastspeech2 aishell3 conformer pretrained model.
- Many thanks to [phecda-xu](https://github.com/phecda-xu)/[PaddleDubbing](https://github.com/phecda-xu/PaddleDubbing) for developing a dubbing tool with GUI based on PaddleSpeech TTS model.
- Many thanks to [jerryuhoo](https://github.com/jerryuhoo)/[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk) for developing a GUI tool based on PaddleSpeech TTS and code for making datasets from videos based on PaddleSpeech ASR.
- Many thanks to [vpegasus](https://github.com/vpegasus)/[xuesebot](https://github.com/vpegasus/xuesebot) for developing a rasa chatbot,which is able to speak and listen thanks to PaddleSpeech.
- Many thanks to [chenkui164](https://github.com/chenkui164)/[FastASR](https://github.com/chenkui164/FastASR) for the C++ inference implementation of PaddleSpeech ASR.
Besides, PaddleSpeech depends on a lot of open source repositories. See [references](./docs/source/reference.md) for more information.
......
(简体中文|[English](./README.md))
<p align="center">
<img src="./docs/images/PaddleSpeech_logo.png" />
......@@ -19,13 +20,14 @@
</p>
<div align="center">
<h4>
<a href="#快速开始"> 快速开始 </a>
<a href="#安装"> 安装 </a>
| <a href="#快速开始"> 快速开始 </a>
| <a href="#快速使用服务"> 快速使用服务 </a>
| <a href="#快速使用流式服务"> 快速使用流式服务 </a>
| <a href="#教程文档"> 教程文档 </a>
| <a href="#模型列表"> 模型列表 </a>
| <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio 课程 </a>
| <a href="https://arxiv.org/abs/2205.12007"> 论文 </a>
| <a href="https://arxiv.org/abs/2205.12007"> NAACL2022 论文 </a>
| <a href="https://gitee.com/paddlepaddle/PaddleSpeech"> Gitee
</h4>
</div>
......@@ -34,6 +36,11 @@
------------------------------------------------------------------------------------
**PaddleSpeech** 是基于飞桨 [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,包含大量基于深度学习前沿和有影响力的模型,一些典型的应用示例如下:
**PaddleSpeech** 荣获 [NAACL2022 Best Demo Award](https://2022.naacl.org/blog/best-demo-award/), 请访问 [Arxiv](https://arxiv.org/abs/2205.12007) 论文。
### 效果展示
##### 语音识别
<div align = "center">
......@@ -150,7 +157,7 @@
本项目采用了易用、高效、灵活以及可扩展的实现,旨在为工业应用、学术研究提供更好的支持,实现的功能包含训练、推断以及测试模块,以及部署过程,主要包括
- 📦 **易用性**: 安装门槛低,可使用 [CLI](#quick-start) 快速开始。
- 🏆 **对标 SoTA**: 提供了高速、轻量级模型,且借鉴了最前沿的技术。
- 🏆 **流式ASR和TTS系统**:工业级的端到端流式识别、流式合成系统。
- 🏆 **流式 ASR 和 TTS 系统**:工业级的端到端流式识别、流式合成系统。
- 💯 **基于规则的中文前端**: 我们的前端包含文本正则化和字音转换(G2P)。此外,我们使用自定义语言规则来适应中文语境。
- **多种工业界以及学术界主流功能支持**:
- 🛎️ 典型音频任务: 本工具包提供了音频任务如音频分类、语音翻译、自动语音识别、文本转语音、语音合成、声纹识别、KWS等任务的实现。
......@@ -159,6 +166,7 @@
### 近期更新
- 👑 2022.05.13: PaddleSpeech 发布 [PP-ASR](./docs/source/asr/PPASR_cn.md) 流式语音识别系统、[PP-TTS](./docs/source/tts/PPTTS_cn.md) 流式语音合成系统、[PP-VPR](docs/source/vpr/PPVPR_cn.md) 全链路声纹识别系统
- 👏🏻 2022.05.06: PaddleSpeech Streaming Server 上线! 覆盖了语音识别(标点恢复、时间戳),和语音合成。
- 👏🏻 2022.05.06: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、语音合成、声纹识别,标点恢复。
......@@ -177,61 +185,195 @@
<img src="https://user-images.githubusercontent.com/23690325/169763015-cbd8e28d-602c-4723-810d-dbc6da49441e.jpg" width = "200" />
</div>
<a name="安装"></a>
## 安装
我们强烈建议用户在 **Linux** 环境下,*3.7* 以上版本的 *python* 上安装 PaddleSpeech。
目前为止,**Linux** 支持声音分类、语音识别、语音合成和语音翻译四种功能,**Mac OSX、 Windows** 下暂不支持语音翻译功能。 想了解具体安装细节,可以参考[安装文档](./docs/source/install_cn.md)
### 相关依赖
+ gcc >= 4.8.5
+ paddlepaddle >= 2.3.1
+ python >= 3.7
+ linux(推荐), mac, windows
PaddleSpeech依赖于paddlepaddle,安装可以参考[paddlepaddle官网](https://www.paddlepaddle.org.cn/),根据自己机器的情况进行选择。这里给出cpu版本示例,其它版本大家可以根据自己机器的情况进行安装。
```shell
pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
PaddleSpeech快速安装方式有两种,一种是pip安装,一种是源码编译(推荐)。
### pip 安装
```shell
pip install pytest-runner
pip install paddlespeech
```
### 源码编译
```shell
git clone https://github.com/PaddlePaddle/PaddleSpeech.git
cd PaddleSpeech
pip install pytest-runner
pip install .
```
更多关于安装问题,如 conda 环境,librosa 依赖的系统库,gcc 环境问题,kaldi 安装等,可以参考这篇[安装文档](docs/source/install_cn.md),如安装上遇到问题可以在 [#2150](https://github.com/PaddlePaddle/PaddleSpeech/issues/2150) 上留言以及查找相关问题
<a name="快速开始"></a>
## 快速开始
安装完成后,开发者可以通过命令行快速开始,改变 `--input` 可以尝试用自己的音频或文本测试
安装完成后,开发者可以通过命令行或者Python快速开始,命令行模式下改变 `--input` 可以尝试用自己的音频或文本测试,支持16k wav格式音频
**声音分类**
你也可以在`aistudio`中快速体验 👉🏻[PaddleSpeech API Demo ](https://aistudio.baidu.com/aistudio/projectdetail/4281335?shared=1)
测试音频示例下载
```shell
paddlespeech cls --input input.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
```
**声纹识别**
### 语音识别
<details><summary>&emsp;(点击可展开)开源中文语音识别</summary>
命令行一键体验
```shell
paddlespeech vector --task spk --input input_16k.wav
paddlespeech asr --lang zh --input zh.wav
```
Python API 一键预测
```python
>>> from paddlespeech.cli.asr.infer import ASRExecutor
>>> asr = ASRExecutor()
>>> result = asr(audio_file="zh.wav")
>>> print(result)
我认为跑步最重要的就是给我带来了身体健康
```
**语音识别**
</details>
### 语音合成
<details><summary>&emsp;开源中文语音合成</summary>
输出 24k 采样率wav格式音频
命令行一键体验
```shell
paddlespeech asr --lang zh --input input_16k.wav
paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!" --output output.wav
```
**语音翻译** (English to Chinese)
Python API 一键预测
```python
>>> from paddlespeech.cli.tts.infer import TTSExecutor
>>> tts = TTSExecutor()
>>> tts(text="今天天气十分不错。", output="output.wav")
```
- 语音合成的 web demo 已经集成进了 [Huggingface Spaces](https://huggingface.co/spaces). 请参考: [TTS Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS)
</details>
### 声音分类
<details><summary>&emsp;适配多场景的开放领域声音分类工具</summary>
基于AudioSet数据集527个类别的声音分类模型
命令行一键体验
```shell
paddlespeech st --input input_16k.wav
paddlespeech cls --input zh.wav
```
python API 一键预测
```python
>>> from paddlespeech.cli.cls.infer import CLSExecutor
>>> cls = CLSExecutor()
>>> result = cls(audio_file="zh.wav")
>>> print(result)
Speech 0.9027186632156372
```
**语音合成**
</details>
### 声纹提取
<details><summary>&emsp;工业级声纹提取工具</summary>
命令行一键体验
```shell
paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!" --output output.wav
paddlespeech vector --task spk --input zh.wav
```
Python API 一键预测
```python
>>> from paddlespeech.cli.vector import VectorExecutor
>>> vec = VectorExecutor()
>>> result = vec(audio_file="zh.wav")
>>> print(result) # 187维向量
[ -0.19083306 9.474295 -14.122263 -2.0916545 0.04848729
4.9295826 1.4780062 0.3733844 10.695862 3.2697146
-4.48199 -0.6617882 -9.170393 -11.1568775 -1.2358263 ...]
```
- 语音合成的 web demo 已经集成进了 [Huggingface Spaces](https://huggingface.co/spaces). 请参考: [TTS Demo](https://huggingface.co/spaces/akhaliq/paddlespeech)
**文本后处理**
- 标点恢复
```bash
paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
```
</details>
**批处理**
### 标点恢复
<details><summary>&emsp;一键恢复文本标点,可与ASR模型配合使用</summary>
命令行一键体验
```shell
paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
```
echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
Python API 一键预测
```python
>>> from paddlespeech.cli.text.infer import TextExecutor
>>> text_punc = TextExecutor()
>>> result = text_punc(text="今天的天气真不错啊你下午有空吗我想约你一起去吃饭")
今天的天气真不错啊你下午有空吗我想约你一起去吃饭
```
**Shell管道**
ASR + Punc:
</details>
### 语音翻译
<details><summary>&emsp;端到端英译中语音翻译工具</summary>
使用预编译的kaldi相关工具,只支持在Ubuntu系统中体验
命令行一键体验
```shell
paddlespeech st --input en.wav
```
paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
python API 一键预测
```python
>>> from paddlespeech.cli.st.infer import STExecutor
>>> st = STExecutor()
>>> result = st(audio_file="en.wav")
['我 在 这栋 建筑 的 古老 门上 敲门 。']
```
更多命令行命令请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos)
> Note: 如果需要训练或者微调,请查看[语音识别](./docs/source/asr/quick_start.md), [语音合成](./docs/source/tts/quick_start.md)。
</details>
<a name="快速使用服务"></a>
## 快速使用服务
安装完成后,开发者可以通过命令行快速使用服务。
安装完成后,开发者可以通过命令行一键启动语音识别,语音合成,音频分类三种服务。
**启动服务**
```shell
......@@ -480,6 +622,15 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
<a href = "./examples/aishell3/vc1">ge2e-fastspeech2-aishell3</a>
</td>
</tr>
</tr>
<tr>
<td rowspan="3">端到端</td>
<td>VITS</td>
<td >CSMSC</td>
<td>
<a href = "./examples/csmsc/vits">VITS-csmsc</a>
</td>
</tr>
</tbody>
</table>
......@@ -600,6 +751,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
语音合成模块最初被称为 [Parakeet](https://github.com/PaddlePaddle/Parakeet),现在与此仓库合并。如果您对该任务的学术研究感兴趣,请参阅 [TTS 研究概述](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview)。此外,[模型介绍](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/tts/models_introduction.md) 是了解语音合成流程的一个很好的指南。
## ⭐ 应用案例
- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): 使用 PaddleSpeech 的语音合成模块生成虚拟人的声音。**
......@@ -681,6 +833,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
## 致谢
- 非常感谢 [BarryKCL](https://github.com/BarryKCL)基于[G2PW](https://github.com/GitYCC/g2pW)对TTS中文文本前端的优化。
- 非常感谢 [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) 多年来的关注和建议,以及在诸多问题上的帮助。
- 非常感谢 [mymagicpower](https://github.com/mymagicpower) 采用PaddleSpeech 对 ASR 的[短语音](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk)[长语音](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk)进行 Java 实现。
- 非常感谢 [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) 采用 PaddleSpeech 语音合成功能实现 Virtual Uploader(VUP)/Virtual YouTuber(VTuber) 虚拟主播。
......@@ -690,7 +843,8 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
- 非常感谢 [phecda-xu](https://github.com/phecda-xu)/[PaddleDubbing](https://github.com/phecda-xu/PaddleDubbing) 基于 PaddleSpeech 的 TTS 模型搭建带 GUI 操作界面的配音工具。
- 非常感谢 [jerryuhoo](https://github.com/jerryuhoo)/[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk) 基于 PaddleSpeech 的 TTS GUI 界面和基于 ASR 制作数据集的相关代码。
- 非常感谢 [vpegasus](https://github.com/vpegasus)/[xuesebot](https://github.com/vpegasus/xuesebot) 基于 PaddleSpeech 的 ASR 与 TTS 设计的可听、说对话机器人。
- 非常感谢 [chenkui164](https://github.com/chenkui164)/[FastASR](https://github.com/chenkui164/FastASR) 对 PaddleSpeech 的 ASR 进行 C++ 推理实现。
此外,PaddleSpeech 依赖于许多开源存储库。有关更多信息,请参阅 [references](./docs/source/reference.md)
......
# [Aidatatang_200zh](http://www.openslr.org/62/)
# [Aidatatang_200zh](http://openslr.elda.org/62/)
Aidatatang_200zh is a free Chinese Mandarin speech corpus provided by Beijing DataTang Technology Co., Ltd under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License.
The contents and the corresponding descriptions of the corpus include:
......
# [Aishell1](http://www.openslr.org/33/)
# [Aishell1](http://openslr.elda.org/33/)
This Open Source Mandarin Speech Corpus, AISHELL-ASR0009-OS1, is 178 hours long. It is a part of AISHELL-ASR0009, of which utterance contains 11 domains, including smart home, autonomous driving, and industrial production. The whole recording was put in quiet indoor environment, using 3 different devices at the same time: high fidelity microphone (44.1kHz, 16-bit,); Android-system mobile phone (16kHz, 16-bit), iOS-system mobile phone (16kHz, 16-bit). Audios in high fidelity were re-sampled to 16kHz to build AISHELL- ASR0009-OS1. 400 speakers from different accent areas in China were invited to participate in the recording. The manual transcription accuracy rate is above 95%, through professional speech annotation and strict quality inspection. The corpus is divided into training, development and testing sets. ( This database is free for academic research, not in the commerce, if without permission. )
......@@ -31,7 +31,7 @@ from utils.utility import unpack
DATA_HOME = os.path.expanduser('~/.cache/paddle/dataset/speech')
URL_ROOT = 'http://www.openslr.org/resources/33'
URL_ROOT = 'http://openslr.elda.org/resources/33'
# URL_ROOT = 'https://openslr.magicdatatech.com/resources/33'
DATA_URL = URL_ROOT + '/data_aishell.tgz'
MD5_DATA = '2f494334227864a8a8fec932999db9d8'
......
......@@ -31,7 +31,7 @@ import soundfile
from utils.utility import download
from utils.utility import unpack
URL_ROOT = "http://www.openslr.org/resources/12"
URL_ROOT = "http://openslr.elda.org/resources/12"
#URL_ROOT = "https://openslr.magicdatatech.com/resources/12"
URL_TEST_CLEAN = URL_ROOT + "/test-clean.tar.gz"
URL_TEST_OTHER = URL_ROOT + "/test-other.tar.gz"
......
# [MagicData](http://www.openslr.org/68/)
# [MagicData](http://openslr.elda.org/68/)
MAGICDATA Mandarin Chinese Read Speech Corpus was developed by MAGIC DATA Technology Co., Ltd. and freely published for non-commercial use.
The contents and the corresponding descriptions of the corpus include:
......
......@@ -30,7 +30,7 @@ import soundfile
from utils.utility import download
from utils.utility import unpack
URL_ROOT = "http://www.openslr.org/resources/31"
URL_ROOT = "http://openslr.elda.org/resources/31"
URL_TRAIN_CLEAN = URL_ROOT + "/train-clean-5.tar.gz"
URL_DEV_CLEAN = URL_ROOT + "/dev-clean-2.tar.gz"
......
......@@ -34,7 +34,7 @@ from utils.utility import unpack
DATA_HOME = os.path.expanduser('~/.cache/paddle/dataset/speech')
URL_ROOT = 'https://www.openslr.org/resources/17'
URL_ROOT = 'https://openslr.elda.org/resources/17'
DATA_URL = URL_ROOT + '/musan.tar.gz'
MD5_DATA = '0c472d4fc0c5141eca47ad1ffeb2a7df'
......
# [Primewords](http://www.openslr.org/47/)
# [Primewords](http://openslr.elda.org/47/)
This free Chinese Mandarin speech corpus set is released by Shanghai Primewords Information Technology Co., Ltd.
The corpus is recorded by smart mobile phones from 296 native Chinese speakers. The transcription accuracy is larger than 98%, at the confidence level of 95%. It is free for academic use.
......
......@@ -34,7 +34,7 @@ from utils.utility import unzip
DATA_HOME = os.path.expanduser('~/.cache/paddle/dataset/speech')
URL_ROOT = '--no-check-certificate http://www.openslr.org/resources/28'
URL_ROOT = '--no-check-certificate https://us.openslr.org/resources/28/rirs_noises.zip'
DATA_URL = URL_ROOT + '/rirs_noises.zip'
MD5_DATA = 'e6f48e257286e05de56413b4779d8ffb'
......
# [FreeST](http://www.openslr.org/38/)
# [FreeST](http://openslr.elda.org/38/)
# [THCHS30](http://www.openslr.org/18/)
# [THCHS30](http://openslr.elda.org/18/)
This is the *data part* of the `THCHS30 2015` acoustic data
& scripts dataset.
......
......@@ -32,7 +32,7 @@ from utils.utility import unpack
DATA_HOME = os.path.expanduser('~/.cache/paddle/dataset/speech')
URL_ROOT = 'http://www.openslr.org/resources/18'
URL_ROOT = 'http://openslr.elda.org/resources/18'
# URL_ROOT = 'https://openslr.magicdatatech.com/resources/18'
DATA_URL = URL_ROOT + '/data_thchs30.tgz'
TEST_NOISE_URL = URL_ROOT + '/test-noise.tgz'
......
......@@ -12,6 +12,7 @@ This directory contains many speech applications in multiple scenarios.
* speech recognition - recognize text of an audio file
* speech server - Server for Speech Task, e.g. ASR,TTS,CLS
* streaming asr server - receive audio stream from websocket, and recognize to transcript.
* streaming tts server - receive text from http or websocket, and streaming audio data stream.
* speech translation - end to end speech translation
* story talker - book reader based on OCR and TTS
* style_fs2 - multi style control for FastSpeech2 model
......
......@@ -10,8 +10,9 @@
* 元宇宙 - 基于语音合成的 2D 增强现实。
* 标点恢复 - 通常作为语音识别的文本后处理任务,为一段无标点的纯文本添加相应的标点符号。
* 语音识别 - 识别一段音频中包含的语音文字。
* 语音服务 - 离线语音服务,包括ASR、TTS、CLS等
* 流式语音识别服务 - 流式输入语音数据流识别音频中的文字
* 语音服务 - 离线语音服务,包括ASR、TTS、CLS等。
* 流式语音识别服务 - 流式输入语音数据流识别音频中的文字。
* 流式语音合成服务 - 根据待合成文本流式生成合成音频数据流。
* 语音翻译 - 实时识别音频中的语言,并同时翻译成目标语言。
* 会说话的故事书 - 基于 OCR 和语音合成的会说话的故事书。
* 个性化语音合成 - 基于 FastSpeech2 模型的个性化语音合成。
......
......@@ -2,7 +2,7 @@ diskcache==5.2.1
dtaidistance==2.3.1
fastapi
librosa==0.8.0
numpy==1.21.0
numpy==1.22.0
pydantic
pymilvus==2.0.1
pymysql
......
文件模式从 100644 更改为 100755
([简体中文](./README_cn.md)|English)
# KWS (Keyword Spotting)
## Introduction
KWS(Keyword Spotting) is a technique to recognize keyword from a giving speech audio.
This demo is an implementation to recognize keyword from a specific audio file. It can be done by a single command or a few lines in python using `PaddleSpeech`.
## Usage
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech.
### 2. Prepare Input File
The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
Here are sample files for this demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/kws/hey_snips.wav https://paddlespeech.bj.bcebos.com/kws/non-keyword.wav
```
### 3. Usage
- Command Line(Recommended)
```bash
paddlespeech kws --input ./hey_snips.wav
paddlespeech kws --input ./non-keyword.wav
```
Usage:
```bash
paddlespeech kws --help
```
Arguments:
- `input`(required): Audio file to recognize.
- `threshold`:Score threshold for kws. Default: `0.8`.
- `model`: Model type of kws task. Default: `mdtc_heysnips`.
- `config`: Config of kws task. Use pretrained model when it is None. Default: `None`.
- `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`.
- `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.
- `verbose`: Show the log information.
Output:
```bash
# Input file: ./hey_snips.wav
Score: 1.000, Threshold: 0.8, Is keyword: True
# Input file: ./non-keyword.wav
Score: 0.000, Threshold: 0.8, Is keyword: False
```
- Python API
```python
import paddle
from paddlespeech.cli.kws import KWSExecutor
kws_executor = KWSExecutor()
result = kws_executor(
audio_file='./hey_snips.wav',
threshold=0.8,
model='mdtc_heysnips',
config=None,
ckpt_path=None,
device=paddle.get_device())
print('KWS Result: \n{}'.format(result))
```
Output:
```bash
KWS Result:
Score: 1.000, Threshold: 0.8, Is keyword: True
```
### 4.Pretrained Models
Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:
| Model | Language | Sample Rate
| :--- | :---: | :---: |
| mdtc_heysnips | en | 16k
(简体中文|[English](./README.md))
# 关键词识别
## 介绍
关键词识别是一项用于识别一段语音内是否包含特定的关键词。
这个 demo 是一个从给定音频文件识别特定关键词的实现,它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。
## 使用方法
### 1. 安装
请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)
你可以从 easy,medium,hard 三中方式中选择一种方式安装。
### 2. 准备输入
这个 demo 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。
可以下载此 demo 的示例音频:
```bash
wget -c https://paddlespeech.bj.bcebos.com/kws/hey_snips.wav https://paddlespeech.bj.bcebos.com/kws/non-keyword.wav
```
### 3. 使用方法
- 命令行 (推荐使用)
```bash
paddlespeech kws --input ./hey_snips.wav
paddlespeech kws --input ./non-keyword.wav
```
使用方法:
```bash
paddlespeech kws --help
```
参数:
- `input`(必须输入):用于识别关键词的音频文件。
- `threshold`:用于判别是包含关键词的得分阈值,默认值:`0.8`
- `model`:KWS 任务的模型,默认值:`mdtc_heysnips`
- `config`:KWS 任务的参数文件,若不设置则使用预训练模型中的默认配置,默认值:`None`
- `ckpt_path`:模型参数文件,若不设置则下载预训练模型使用,默认值:`None`
- `device`:执行预测的设备,默认值:当前系统下 paddlepaddle 的默认 device。
- `verbose`: 如果使用,显示 logger 信息。
输出:
```bash
# 输入为 ./hey_snips.wav
Score: 1.000, Threshold: 0.8, Is keyword: True
# 输入为 ./non-keyword.wav
Score: 0.000, Threshold: 0.8, Is keyword: False
```
- Python API
```python
import paddle
from paddlespeech.cli.kws import KWSExecutor
kws_executor = KWSExecutor()
result = kws_executor(
audio_file='./hey_snips.wav',
threshold=0.8,
model='mdtc_heysnips',
config=None,
ckpt_path=None,
device=paddle.get_device())
print('KWS Result: \n{}'.format(result))
```
输出:
```bash
KWS Result:
Score: 1.000, Threshold: 0.8, Is keyword: True
```
### 4.预训练模型
以下是 PaddleSpeech 提供的可以被命令行和 python API 使用的预训练模型列表:
| 模型 | 语言 | 采样率
| :--- | :---: | :---: |
| mdtc_heysnips | en | 16k
#!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/kws/hey_snips.wav https://paddlespeech.bj.bcebos.com/kws/non-keyword.wav
# kws
paddlespeech kws --input ./hey_snips.wav
paddlespeech kws --input non-keyword.wav
文件模式从 100644 更改为 100755
#!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
# asr
paddlespeech asr --input ./zh.wav
......@@ -8,3 +9,18 @@ paddlespeech asr --input ./zh.wav
# asr + punc
paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
# asr help
paddlespeech asr --help
# english asr
paddlespeech asr --lang en --model transformer_librispeech --input ./en.wav
# model stats
paddlespeech stats --task asr
# paddlespeech help
paddlespeech --help
此差异已折叠。
此差异已折叠。
文件模式从 100644 更改为 100755
文件模式从 100644 更改为 100755
......@@ -7,7 +7,7 @@ host: 0.0.0.0
port: 8090
# The task format in the engin_list is: <speech task>_<engine type>
# task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference', 'cls_python', 'cls_inference']
# task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference', 'cls_python', 'cls_inference', 'text_python', 'vector_python']
protocol: 'http'
engine_list: ['asr_python', 'tts_python', 'cls_python', 'text_python', 'vector_python']
......@@ -28,7 +28,6 @@ asr_python:
force_yes: True
device: # set 'gpu:id' or 'cpu'
################### speech task: asr; engine_type: inference #######################
asr_inference:
# model_type choices=['deepspeech2offline_aishell']
......@@ -50,10 +49,11 @@ asr_inference:
################################### TTS #########################################
################### speech task: tts; engine_type: python #######################
tts_python:
# am (acoustic model) choices=['speedyspeech_csmsc', 'fastspeech2_csmsc',
# 'fastspeech2_ljspeech', 'fastspeech2_aishell3',
# 'fastspeech2_vctk']
tts_python:
# am (acoustic model) choices=['speedyspeech_csmsc', 'fastspeech2_csmsc',
# 'fastspeech2_ljspeech', 'fastspeech2_aishell3',
# 'fastspeech2_vctk', 'fastspeech2_mix',
# 'tacotron2_csmsc', 'tacotron2_ljspeech']
am: 'fastspeech2_csmsc'
am_config:
am_ckpt:
......@@ -64,8 +64,10 @@ tts_python:
spk_id: 0
# voc (vocoder) choices=['pwgan_csmsc', 'pwgan_ljspeech', 'pwgan_aishell3',
# 'pwgan_vctk', 'mb_melgan_csmsc']
voc: 'pwgan_csmsc'
# 'pwgan_vctk', 'mb_melgan_csmsc', 'style_melgan_csmsc',
# 'hifigan_csmsc', 'hifigan_ljspeech', 'hifigan_aishell3',
# 'hifigan_vctk', 'wavernn_csmsc']
voc: 'mb_melgan_csmsc'
voc_config:
voc_ckpt:
voc_stat:
......@@ -94,7 +96,7 @@ tts_inference:
summary: True # False -> do not show predictor config
# voc (vocoder) choices=['pwgan_csmsc', 'mb_melgan_csmsc','hifigan_csmsc']
voc: 'pwgan_csmsc'
voc: 'mb_melgan_csmsc'
voc_model: # the pdmodel file of your vocoder static model (XX.pdmodel)
voc_params: # the pdiparams file of your vocoder static model (XX.pdipparams)
voc_sample_rate: 24000
......
#!/bin/bash
paddlespeech_server start --config_file ./conf/application.yaml
paddlespeech_server start --config_file ./conf/application.yaml &> server.log &
#!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/123456789.wav
# sid extract
paddlespeech_client vector --server_ip 127.0.0.1 --port 8090 --task spk --input ./85236145389.wav
# sid score
paddlespeech_client vector --server_ip 127.0.0.1 --port 8090 --task score --enroll ./85236145389.wav --test ./123456789.wav
#!/bin/bash
paddlespeech_client text --server_ip 127.0.0.1 --port 8090 --input 今天的天气真好啊你下午有空吗我想约你一起去吃饭
文件模式从 100644 更改为 100755
*/.vscode/*
*.wav
*/resource/*
.Ds*
*.pyc
*.pcm
*.npy
*.diff
*.sqlite
*/static/*
*.pdparams
*.pdiparams*
*.pdmodel
*/source/*
*/PaddleSpeech/*
# 接口文档
开启服务后可参照:
http://0.0.0.0:8010/docs
## ASR
### 【POST】/asr/offline
说明:上传 16k, 16bit wav 文件,返回 offline 语音识别模型识别结果
返回: JSON
前端接口: ASR-端到端识别,音频文件识别;语音指令-录音上传
示例:
```json
{
"code": 0,
"result": "你也喜欢这个天气吗",
"message": "ok"
}
```
### 【POST】/asr/offlinefile
说明:上传16k,16bit wav文件,返回 offline 语音识别模型识别结果 + wav 数据的 base64
返回: JSON
前端接口: 音频文件识别(播放这段base64还原后记得添加 wav 头,采样率 16k, int16,添加后才能播放)
示例:
```json
{
"code": 0,
"result": {
"asr_result": "今天天气真好",
"wav_base64": "///+//3//f/8/////v/////////////////+/wAA//8AAAEAAQACAAIAAQABAP"
},
"message": "ok"
}
```
### 【POST】/asr/collectEnv
说明: 通过采集环境噪音,上传 16k, int16 wav 文件,来生成后台 VAD 的能量阈值, 返回阈值结果
前端接口:ASR-环境采样
返回: JSON
```json
{
"code": 0,
"result": 3624.93505859375,
"message": "采集环境噪音成功"
}
```
### 【GET】/asr/stopRecord
说明:通过 GET 请求 /asr/stopRecord, 后台停止接收 offlineStream 中通过 WS 协议 上传的数据
前端接口:语音聊天-暂停录音(获取 NLP,播放 TTS 时暂停)
返回: JSON
```JSON
{
"code": 0,
"result": null,
"message": "停止成功"
}
```
### 【GET】/asr/resumeRecord
说明:通过 GET 请求 /asr/resumeRecord, 后台停止接收 offlineStream 中通过 WS 协议 上传的数据
前端接口:语音聊天-恢复录音( TTS 播放完毕时,告诉后台恢复录音)
返回: JSON
```JSON
{
"code": 0,
"result": null,
"message": "Online录音恢复"
}
```
### 【Websocket】/ws/asr/offlineStream
说明:通过 WS 协议,将前端音频持续上传到后台,前端采集 16k,Int16 类型的PCM片段,持续上传到后端
前端接口:语音聊天-开始录音,持续将麦克风语音传给后端,后端推送语音识别结果
返回:后端返回识别结果,offline 模型识别结果, 由WS推送
### 【Websocket】/ws/asr/onlineStream
说明:通过 WS 协议,将前端音频持续上传到后台,前端采集 16k,Int16 类型的 PCM 片段,持续上传到后端
前端接口:ASR-流式识别开始录音,持续将麦克风语音传给后端,后端推送语音识别结果
返回:后端返回识别结果,online 模型识别结果, 由 WS 推送
## NLP
### 【POST】/nlp/chat
说明:返回闲聊对话的结果
前端接口:语音聊天-获取到ASR识别结果后,向后端获取闲聊文本
上传示例:
```json
{
"chat": "天气非常棒"
}
```
返回示例:
```json
{
"code": 0,
"result": "是的,我也挺喜欢的",
"message": "ok"
}
```
### 【POST】/nlp/ie
说明:返回信息抽取结果
前端接口:语音指令-向后端获取信息抽取结果
上传示例:
```json
{
"chat": "今天我从马来西亚出发去香港花了五十万元"
}
```
返回示例:
```json
{
"code": 0,
"result": [
{
"时间": [
{
"text": "今天",
"start": 0,
"end": 2,
"probability": 0.9817976247505698
}
],
"出发地": [
{
"text": "马来西亚",
"start": 4,
"end": 8,
"probability": 0.974892389414169
}
],
"目的地": [
{
"text": "马来西亚",
"start": 4,
"end": 8,
"probability": 0.7347504438136951
}
],
"费用": [
{
"text": "五十万元",
"start": 15,
"end": 19,
"probability": 0.9679076530644402
}
]
}
],
"message": "ok"
}
```
## TTS
### 【POST】/tts/offline
说明:获取 TTS 离线模型音频
前端接口:TTS-端到端合成
上传示例:
```json
{
"text": "天气非常棒"
}
```
返回示例:对应音频对应的 base64 编码
```json
{
"code": 0,
"result": "UklGRrzQAABXQVZFZm10IBAAAAABAAEAwF0AAIC7AAACABAAZGF0YZjQAAADAP7/BAADAAAA...",
"message": "ok"
}
```
### 【POST】/tts/online
说明:流式获取语音合成音频
前端接口:流式合成
上传示例:
```json
{
"text": "天气非常棒"
}
```
返回示例:
二进制PCM片段,16k Int 16类型
## VPR
### 【POST】/vpr/enroll
说明:声纹注册,通过表单上传 spk_id(字符串,非空), 与 audio (文件)
前端接口:声纹识别-声纹注册
上传示例:
```text
curl -X 'POST' \
'http://0.0.0.0:8010/vpr/enroll' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'spk_id=啦啦啦啦' \
-F 'audio=@demo_16k.wav;type=audio/wav'
```
返回示例:
```json
{
"status": true,
"msg": "Successfully enroll data!"
}
```
### 【POST】/vpr/recog
说明:声纹识别,识别文件,提取文件的声纹信息做比对 音频 16k, int 16 wav 格式
前端接口:声纹识别-上传音频,返回声纹识别结果
上传示例:
```shell
curl -X 'POST' \
'http://0.0.0.0:8010/vpr/recog' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'audio=@demo_16k.wav;type=audio/wav'
```
返回示例:
```json
[
[
"啦啦啦啦",
[
"",
100
]
],
[
"test1",
[
"",
11.64
]
],
[
"test2",
[
"",
6.09
]
]
]
```
### 【POST】/vpr/del
说明: 根据 spk_id 删除用户数据
前端接口:声纹识别-删除用户数据
上传示例:
```json
{
"spk_id":"啦啦啦啦"
}
```
返回示例
```json
{
"status": true,
"msg": "Successfully delete data!"
}
```
### 【GET】/vpr/list
说明:查询用户列表数据,无需参数,返回 spk_id 与 vpr_id
前端接口:声纹识别-获取声纹数据列表
返回示例:
```json
[
[
"test1",
"test2"
],
[
9,
10
]
]
```
### 【GET】/vpr/data
说明: 根据 vpr_id 获取用户vpr时使用的音频
前端接口:声纹识别-获取vpr对应的音频
访问示例:
```shell
curl -X 'GET' \
'http://0.0.0.0:8010/vpr/data?vprId=9' \
-H 'accept: application/json'
```
返回示例:
对应音频文件
### 【GET】/vpr/database64
说明: 根据 vpr_id 获取用户 vpr 时注册使用音频转换成 16k, int16 类型的数组,返回 base64 编码
前端接口:声纹识别-获取 vpr 对应的音频(注意:播放时需要添加 wav头,16k,int16, 可参考 tts 播放时添加 wav 的方式,注意更改采样率)
访问示例:
```shell
curl -X 'GET' \
'http://localhost:8010/vpr/database64?vprId=12' \
-H 'accept: application/json'
```
返回示例:
```json
{
"code": 0,
"result":"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
"message": "ok"
```
\ No newline at end of file
# Paddle Speech Demo
PaddleSpeechDemo 是一个以 PaddleSpeech 的语音交互功能为主体开发的 Demo 展示项目,用于帮助大家更好的上手 PaddleSpeech 以及使用 PaddleSpeech 构建自己的应用。
智能语音交互部分使用 PaddleSpeech,对话以及信息抽取部分使用 PaddleNLP,网页前端展示部分基于 Vue3 进行开发
主要功能:
+ 语音聊天:PaddleSpeech 的语音识别能力+语音合成能力,对话部分基于 PaddleNLP 的闲聊功能
+ 声纹识别:PaddleSpeech 的声纹识别功能展示
+ 语音识别:支持【实时语音识别】,【端到端识别】,【音频文件识别】三种模式
+ 语音合成:支持【流式合成】与【端到端合成】两种方式
+ 语音指令:基于 PaddleSpeech 的语音识别能力与 PaddleNLP 的信息抽取,实现交通费的智能报销
运行效果:
![效果](docs/效果展示.png)
## 安装
### 后端环境安装
```
# 安装环境
cd speech_server
pip install -r requirements.txt
# 下载 ie 模型,针对地点进行微调,效果更好,不下载的话会使用其它版本,效果没有这个好
cd source
mkdir model
cd model
wget https://bj.bcebos.com/paddlenlp/applications/speech-cmd-analysis/finetune/model_state.pdparams
```
### 前端环境安装
前端依赖 `node.js` ,需要提前安装,确保 `npm` 可用,`npm` 测试版本 `8.3.1`,建议下载[官网](https://nodejs.org/en/)稳定版的 `node.js`
```
# 进入前端目录
cd web_client
# 安装 `yarn`,已经安装可跳过
npm install -g yarn
# 使用yarn安装前端依赖
yarn install
```
## 启动服务
### 开启后端服务
```
cd speech_server
# 默认8010端口
python main.py --port 8010
```
### 开启前端服务
```
cd web_client
yarn dev --port 8011
```
默认配置下,前端中配置的后台地址信息是 localhost,确保后端服务器和打开页面的游览器在同一台机器上,不在一台机器的配置方式见下方的 FAQ:【后端如果部署在其它机器或者别的端口如何修改】
## FAQ
#### Q: 如何安装node.js
A: node.js的安装可以参考[【菜鸟教程】](https://www.runoob.com/nodejs/nodejs-install-setup.html), 确保 npm 可用
#### Q:后端如果部署在其它机器或者别的端口如何修改
A:后端的配置地址有分散在两个文件中
修改第一个文件 `PaddleSpeechWebClient/vite.config.js`
```
server: {
host: "0.0.0.0",
proxy: {
"/api": {
target: "http://localhost:8010", // 这里改成后端所在接口
changeOrigin: true,
rewrite: (path) => path.replace(/^\/api/, ""),
},
},
}
```
修改第二个文件 `PaddleSpeechWebClient/src/api/API.js`( Websocket 代理配置失败,所以需要在这个文件中修改)
```
// websocket (这里改成后端所在的接口)
CHAT_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/offlineStream', // ChatBot websocket 接口
ASR_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/onlineStream', // Stream ASR 接口
TTS_SOCKET_RECORD: 'ws://localhost:8010/ws/tts/online', // Stream TTS 接口
```
#### Q:后端以IP地址的形式,前端无法录音
A:这里主要是游览器安全策略的限制,需要配置游览器后重启。游览器修改配置可参考[使用js-audio-recorder报浏览器不支持getUserMedia](https://blog.csdn.net/YRY_LIKE_YOU/article/details/113745273)
chrome设置地址: chrome://flags/#unsafely-treat-insecure-origin-as-secure
## 参考资料
vue实现录音参考资料:https://blog.csdn.net/qq_41619796/article/details/107865602#t1
前端流式播放音频参考仓库:
https://github.com/AnthumChris/fetch-stream-audio
https://bm.enthuses.me/buffered.php?bref=6677
# This is the parameter configuration file for streaming tts server.
#################################################################################
# SERVER SETTING #
#################################################################################
host: 0.0.0.0
port: 8092
# The task format in the engin_list is: <speech task>_<engine type>
# engine_list choices = ['tts_online', 'tts_online-onnx'], the inference speed of tts_online-onnx is faster than tts_online.
# protocol choices = ['websocket', 'http']
protocol: 'http'
engine_list: ['tts_online-onnx']
#################################################################################
# ENGINE CONFIG #
#################################################################################
################################### TTS #########################################
################### speech task: tts; engine_type: online #######################
tts_online:
# am (acoustic model) choices=['fastspeech2_csmsc', 'fastspeech2_cnndecoder_csmsc']
# fastspeech2_cnndecoder_csmsc support streaming am infer.
am: 'fastspeech2_csmsc'
am_config:
am_ckpt:
am_stat:
phones_dict:
tones_dict:
speaker_dict:
spk_id: 0
# voc (vocoder) choices=['mb_melgan_csmsc, hifigan_csmsc']
# Both mb_melgan_csmsc and hifigan_csmsc support streaming voc inference
voc: 'mb_melgan_csmsc'
voc_config:
voc_ckpt:
voc_stat:
# others
lang: 'zh'
device: 'cpu' # set 'gpu:id' or 'cpu'
# am_block and am_pad only for fastspeech2_cnndecoder_onnx model to streaming am infer,
# when am_pad set 12, streaming synthetic audio is the same as non-streaming synthetic audio
am_block: 72
am_pad: 12
# voc_pad and voc_block voc model to streaming voc infer,
# when voc model is mb_melgan_csmsc, voc_pad set 14, streaming synthetic audio is the same as non-streaming synthetic audio; The minimum value of pad can be set to 7, streaming synthetic audio sounds normal
# when voc model is hifigan_csmsc, voc_pad set 19, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
voc_block: 36
voc_pad: 14
#################################################################################
# ENGINE CONFIG #
#################################################################################
################################### TTS #########################################
################### speech task: tts; engine_type: online-onnx #######################
tts_online-onnx:
# am (acoustic model) choices=['fastspeech2_csmsc_onnx', 'fastspeech2_cnndecoder_csmsc_onnx']
# fastspeech2_cnndecoder_csmsc_onnx support streaming am infer.
am: 'fastspeech2_cnndecoder_csmsc_onnx'
# am_ckpt is a list, if am is fastspeech2_cnndecoder_csmsc_onnx, am_ckpt = [encoder model, decoder model, postnet model];
# if am is fastspeech2_csmsc_onnx, am_ckpt = [ckpt model];
am_ckpt: # list
am_stat:
phones_dict:
tones_dict:
speaker_dict:
spk_id: 0
am_sample_rate: 24000
am_sess_conf:
device: "cpu" # set 'gpu:id' or 'cpu'
use_trt: False
cpu_threads: 4
# voc (vocoder) choices=['mb_melgan_csmsc_onnx, hifigan_csmsc_onnx']
# Both mb_melgan_csmsc_onnx and hifigan_csmsc_onnx support streaming voc inference
voc: 'hifigan_csmsc_onnx'
voc_ckpt:
voc_sample_rate: 24000
voc_sess_conf:
device: "cpu" # set 'gpu:id' or 'cpu'
use_trt: False
cpu_threads: 4
# others
lang: 'zh'
# am_block and am_pad only for fastspeech2_cnndecoder_onnx model to streaming am infer,
# when am_pad set 12, streaming synthetic audio is the same as non-streaming synthetic audio
am_block: 72
am_pad: 12
# voc_pad and voc_block voc model to streaming voc infer,
# when voc model is mb_melgan_csmsc_onnx, voc_pad set 14, streaming synthetic audio is the same as non-streaming synthetic audio; The minimum value of pad can be set to 7, streaming synthetic audio sounds normal
# when voc model is hifigan_csmsc_onnx, voc_pad set 19, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
voc_block: 36
voc_pad: 14
# voc_upsample should be same as n_shift on voc config.
voc_upsample: 300
# This is the parameter configuration file for PaddleSpeech Serving.
#################################################################################
# SERVER SETTING #
#################################################################################
host: 0.0.0.0
port: 8090
# The task format in the engin_list is: <speech task>_<engine type>
# task choices = ['asr_online']
# protocol = ['websocket'] (only one can be selected).
# websocket only support online engine type.
protocol: 'websocket'
engine_list: ['asr_online']
#################################################################################
# ENGINE CONFIG #
#################################################################################
################################### ASR #########################################
################### speech task: asr; engine_type: online #######################
asr_online:
model_type: 'conformer_online_wenetspeech'
am_model: # the pdmodel file of am static model [optional]
am_params: # the pdiparams file of am static model [optional]
lang: 'zh'
sample_rate: 16000
cfg_path:
decode_method:
force_yes: True
device: 'cpu' # cpu or gpu:id
decode_method: "attention_rescoring"
continuous_decoding: True # enable continue decoding when endpoint detected
num_decoding_left_chunks: 16
am_predictor_conf:
device: # set 'gpu:id' or 'cpu'
switch_ir_optim: True
glog_info: False # True -> print glog
summary: True # False -> do not show predictor config
chunk_buffer_conf:
window_n: 7 # frame
shift_n: 4 # frame
window_ms: 25 # ms
shift_ms: 10 # ms
sample_rate: 16000
sample_width: 2
# todo:
# 1. 开启服务
# 2. 接收录音音频,返回识别结果
# 3. 接收ASR识别结果,返回NLP对话结果
# 4. 接收NLP对话结果,返回TTS音频
import base64
import yaml
import os
import json
import datetime
import librosa
import soundfile as sf
import numpy as np
import argparse
import uvicorn
import aiofiles
from typing import Optional, List
from pydantic import BaseModel
from fastapi import FastAPI, Header, File, UploadFile, Form, Cookie, WebSocket, WebSocketDisconnect
from fastapi.responses import StreamingResponse
from starlette.responses import FileResponse
from starlette.middleware.cors import CORSMiddleware
from starlette.requests import Request
from starlette.websockets import WebSocketState as WebSocketState
from src.AudioManeger import AudioMannger
from src.util import *
from src.robot import Robot
from src.WebsocketManeger import ConnectionManager
from src.SpeechBase.vpr import VPR
from paddlespeech.server.engine.asr.online.python.asr_engine import PaddleASRConnectionHanddler
from paddlespeech.server.utils.audio_process import float2pcm
# 解析配置
parser = argparse.ArgumentParser(
prog='PaddleSpeechDemo', add_help=True)
parser.add_argument(
"--port",
action="store",
type=int,
help="port of the app",
default=8010,
required=False)
args = parser.parse_args()
port = args.port
# 配置文件
tts_config = "conf/tts_online_application.yaml"
asr_config = "conf/ws_conformer_wenetspeech_application_faster.yaml"
asr_init_path = "source/demo/demo.wav"
db_path = "source/db/vpr.sqlite"
ie_model_path = "source/model"
# 路径配置
UPLOAD_PATH = "source/vpr"
WAV_PATH = "source/wav"
base_sources = [
UPLOAD_PATH, WAV_PATH
]
for path in base_sources:
os.makedirs(path, exist_ok=True)
# 初始化
app = FastAPI()
chatbot = Robot(asr_config, tts_config, asr_init_path, ie_model_path=ie_model_path)
manager = ConnectionManager()
aumanager = AudioMannger(chatbot)
aumanager.init()
vpr = VPR(db_path, dim = 192, top_k = 5)
# 服务配置
class NlpBase(BaseModel):
chat: str
class TtsBase(BaseModel):
text: str
class Audios:
def __init__(self) -> None:
self.audios = b""
audios = Audios()
######################################################################
########################### ASR 服务 #################################
#####################################################################
# 接收文件,返回ASR结果
# 上传文件
@app.post("/asr/offline")
async def speech2textOffline(files: List[UploadFile]):
# 只有第一个有效
asr_res = ""
for file in files[:1]:
# 生成时间戳
now_name = "asr_offline_" + datetime.datetime.strftime(datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav"
out_file_path = os.path.join(WAV_PATH, now_name)
async with aiofiles.open(out_file_path, 'wb') as out_file:
content = await file.read() # async read
await out_file.write(content) # async write
# 返回ASR识别结果
asr_res = chatbot.speech2text(out_file_path)
return SuccessRequest(result=asr_res)
# else:
# return ErrorRequest(message="文件不是.wav格式")
return ErrorRequest(message="上传文件为空")
# 接收文件,同时将wav强制转成16k, int16类型
@app.post("/asr/offlinefile")
async def speech2textOfflineFile(files: List[UploadFile]):
# 只有第一个有效
asr_res = ""
for file in files[:1]:
# 生成时间戳
now_name = "asr_offline_" + datetime.datetime.strftime(datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav"
out_file_path = os.path.join(WAV_PATH, now_name)
async with aiofiles.open(out_file_path, 'wb') as out_file:
content = await file.read() # async read
await out_file.write(content) # async write
# 将文件转成16k, 16bit类型的wav文件
wav, sr = librosa.load(out_file_path, sr=16000)
wav = float2pcm(wav) # float32 to int16
wav_bytes = wav.tobytes() # to bytes
wav_base64 = base64.b64encode(wav_bytes).decode('utf8')
# 将文件重新写入
now_name = now_name[:-4] + "_16k" + ".wav"
out_file_path = os.path.join(WAV_PATH, now_name)
sf.write(out_file_path,wav,16000)
# 返回ASR识别结果
asr_res = chatbot.speech2text(out_file_path)
response_res = {
"asr_result": asr_res,
"wav_base64": wav_base64
}
return SuccessRequest(result=response_res)
return ErrorRequest(message="上传文件为空")
# 流式接收测试
@app.post("/asr/online1")
async def speech2textOnlineRecive(files: List[UploadFile]):
audio_bin = b''
for file in files:
content = await file.read()
audio_bin += content
audios.audios += audio_bin
print(f"audios长度变化: {len(audios.audios)}")
return SuccessRequest(message="接收成功")
# 采集环境噪音大小
@app.post("/asr/collectEnv")
async def collectEnv(files: List[UploadFile]):
for file in files[:1]:
content = await file.read() # async read
# 初始化, wav 前44字节是头部信息
aumanager.compute_env_volume(content[44:])
vad_ = aumanager.vad_threshold
return SuccessRequest(result=vad_,message="采集环境噪音成功")
# 停止录音
@app.get("/asr/stopRecord")
async def stopRecord():
audios.audios = b""
aumanager.stop()
print("Online录音暂停")
return SuccessRequest(message="停止成功")
# 恢复录音
@app.get("/asr/resumeRecord")
async def resumeRecord():
aumanager.resume()
print("Online录音恢复")
return SuccessRequest(message="Online录音恢复")
# 聊天用的ASR
@app.websocket("/ws/asr/offlineStream")
async def websocket_endpoint(websocket: WebSocket):
await manager.connect(websocket)
try:
while True:
asr_res = None
# websocket 不接收,只推送
data = await websocket.receive_bytes()
if not aumanager.is_pause:
asr_res = aumanager.stream_asr(data)
else:
print("录音暂停")
if asr_res:
await manager.send_personal_message(asr_res, websocket)
aumanager.clear_asr()
except WebSocketDisconnect:
manager.disconnect(websocket)
# await manager.broadcast(f"用户-{user}-离开")
# print(f"用户-{user}-离开")
# Online识别的ASR
@app.websocket('/ws/asr/onlineStream')
async def websocket_endpoint(websocket: WebSocket):
"""PaddleSpeech Online ASR Server api
Args:
websocket (WebSocket): the websocket instance
"""
#1. the interface wait to accept the websocket protocal header
# and only we receive the header, it establish the connection with specific thread
await websocket.accept()
#2. if we accept the websocket headers, we will get the online asr engine instance
engine = chatbot.asr.engine
#3. each websocket connection, we will create an PaddleASRConnectionHanddler to process such audio
# and each connection has its own connection instance to process the request
# and only if client send the start signal, we create the PaddleASRConnectionHanddler instance
connection_handler = None
try:
#4. we do a loop to process the audio package by package according the protocal
# and only if the client send finished signal, we will break the loop
while True:
# careful here, changed the source code from starlette.websockets
# 4.1 we wait for the client signal for the specific action
assert websocket.application_state == WebSocketState.CONNECTED
message = await websocket.receive()
websocket._raise_on_disconnect(message)
#4.2 text for the action command and bytes for pcm data
if "text" in message:
# we first parse the specific command
message = json.loads(message["text"])
if 'signal' not in message:
resp = {"status": "ok", "message": "no valid json data"}
await websocket.send_json(resp)
# start command, we create the PaddleASRConnectionHanddler instance to process the audio data
# end command, we process the all the last audio pcm and return the final result
# and we break the loop
if message['signal'] == 'start':
resp = {"status": "ok", "signal": "server_ready"}
# do something at begining here
# create the instance to process the audio
# connection_handler = chatbot.asr.connection_handler
connection_handler = PaddleASRConnectionHanddler(engine)
await websocket.send_json(resp)
elif message['signal'] == 'end':
# reset single engine for an new connection
# and we will destroy the connection
connection_handler.decode(is_finished=True)
connection_handler.rescoring()
asr_results = connection_handler.get_result()
connection_handler.reset()
resp = {
"status": "ok",
"signal": "finished",
'result': asr_results
}
await websocket.send_json(resp)
break
else:
resp = {"status": "ok", "message": "no valid json data"}
await websocket.send_json(resp)
elif "bytes" in message:
# bytes for the pcm data
message = message["bytes"]
print("###############")
print("len message: ", len(message))
print("###############")
# we extract the remained audio pcm
# and decode for the result in this package data
connection_handler.extract_feat(message)
connection_handler.decode(is_finished=False)
asr_results = connection_handler.get_result()
# return the current period result
# if the engine create the vad instance, this connection will have many period results
resp = {'result': asr_results}
print(resp)
await websocket.send_json(resp)
except WebSocketDisconnect:
pass
######################################################################
########################### NLP 服务 #################################
#####################################################################
@app.post("/nlp/chat")
async def chatOffline(nlp_base:NlpBase):
chat = nlp_base.chat
if not chat:
return ErrorRequest(message="传入文本为空")
else:
res = chatbot.chat(chat)
return SuccessRequest(result=res)
@app.post("/nlp/ie")
async def ieOffline(nlp_base:NlpBase):
nlp_text = nlp_base.chat
if not nlp_text:
return ErrorRequest(message="传入文本为空")
else:
res = chatbot.ie(nlp_text)
return SuccessRequest(result=res)
######################################################################
########################### TTS 服务 #################################
#####################################################################
@app.post("/tts/offline")
async def text2speechOffline(tts_base:TtsBase):
text = tts_base.text
if not text:
return ErrorRequest(message="文本为空")
else:
now_name = "tts_"+ datetime.datetime.strftime(datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav"
out_file_path = os.path.join(WAV_PATH, now_name)
# 保存为文件,再转成base64传输
chatbot.text2speech(text, outpath=out_file_path)
with open(out_file_path, "rb") as f:
data_bin = f.read()
base_str = base64.b64encode(data_bin)
return SuccessRequest(result=base_str)
# http流式TTS
@app.post("/tts/online")
async def stream_tts(request_body: TtsBase):
text = request_body.text
return StreamingResponse(chatbot.text2speechStreamBytes(text=text))
# ws流式TTS
@app.websocket("/ws/tts/online")
async def stream_ttsWS(websocket: WebSocket):
await manager.connect(websocket)
try:
while True:
text = await websocket.receive_text()
# 用 websocket 流式接收音频数据
if text:
for sub_wav in chatbot.text2speechStream(text=text):
# print("发送sub wav: ", len(sub_wav))
res = {
"wav": sub_wav,
"done": False
}
await websocket.send_json(res)
# 输送结束
res = {
"wav": sub_wav,
"done": True
}
await websocket.send_json(res)
# manager.disconnect(websocket)
except WebSocketDisconnect:
manager.disconnect(websocket)
######################################################################
########################### VPR 服务 #################################
#####################################################################
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"])
@app.post('/vpr/enroll')
async def vpr_enroll(table_name: str=None,
spk_id: str=Form(...),
audio: UploadFile=File(...)):
# Enroll the uploaded audio with spk-id into MySQL
try:
if not spk_id:
return {'status': False, 'msg': "spk_id can not be None"}
# Save the upload data to server.
content = await audio.read()
now_name = "vpr_enroll_" + datetime.datetime.strftime(datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav"
audio_path = os.path.join(UPLOAD_PATH, now_name)
with open(audio_path, "wb+") as f:
f.write(content)
vpr.vpr_enroll(username=spk_id, wav_path=audio_path)
return {'status': True, 'msg': "Successfully enroll data!"}
except Exception as e:
return {'status': False, 'msg': e}
@app.post('/vpr/recog')
async def vpr_recog(request: Request,
table_name: str=None,
audio: UploadFile=File(...)):
# Voice print recognition online
# try:
# Save the upload data to server.
content = await audio.read()
now_name = "vpr_query_" + datetime.datetime.strftime(datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav"
query_audio_path = os.path.join(UPLOAD_PATH, now_name)
with open(query_audio_path, "wb+") as f:
f.write(content)
spk_ids, paths, scores = vpr.do_search_vpr(query_audio_path)
res = dict(zip(spk_ids, zip(paths, scores)))
# Sort results by distance metric, closest distances first
res = sorted(res.items(), key=lambda item: item[1][1], reverse=True)
return res
# except Exception as e:
# return {'status': False, 'msg': e}, 400
@app.post('/vpr/del')
async def vpr_del(spk_id: dict=None):
# Delete a record by spk_id in MySQL
try:
spk_id = spk_id['spk_id']
if not spk_id:
return {'status': False, 'msg': "spk_id can not be None"}
vpr.vpr_del(username=spk_id)
return {'status': True, 'msg': "Successfully delete data!"}
except Exception as e:
return {'status': False, 'msg': e}, 400
@app.get('/vpr/list')
async def vpr_list():
# Get all records in MySQL
try:
spk_ids, vpr_ids = vpr.do_list()
return spk_ids, vpr_ids
except Exception as e:
return {'status': False, 'msg': e}, 400
@app.get('/vpr/database64')
async def vpr_database64(vprId: int):
# Get the audio file from path by spk_id in MySQL
try:
if not vprId:
return {'status': False, 'msg': "vpr_id can not be None"}
audio_path = vpr.do_get_wav(vprId)
# 返回base64
# 将文件转成16k, 16bit类型的wav文件
wav, sr = librosa.load(audio_path, sr=16000)
wav = float2pcm(wav) # float32 to int16
wav_bytes = wav.tobytes() # to bytes
wav_base64 = base64.b64encode(wav_bytes).decode('utf8')
return SuccessRequest(result=wav_base64)
except Exception as e:
return {'status': False, 'msg': e}, 400
@app.get('/vpr/data')
async def vpr_data(vprId: int):
# Get the audio file from path by spk_id in MySQL
try:
if not vprId:
return {'status': False, 'msg': "vpr_id can not be None"}
audio_path = vpr.do_get_wav(vprId)
return FileResponse(audio_path)
except Exception as e:
return {'status': False, 'msg': e}, 400
if __name__ == '__main__':
uvicorn.run(app=app, host='0.0.0.0', port=port)
aiofiles
fastapi
librosa
numpy
pydantic
scikit_learn
SoundFile
starlette
uvicorn
paddlepaddle
paddlespeech
paddlenlp
faiss-cpu
python-multipart
\ No newline at end of file
import imp
from queue import Queue
import numpy as np
import os
import wave
import random
import datetime
from .util import randName
class AudioMannger:
def __init__(self, robot, frame_length=160, frame=10, data_width=2, vad_default = 300):
# 二进制 pcm 流
self.audios = b''
self.asr_result = ""
# Speech 核心主体
self.robot = robot
self.file_dir = "source"
os.makedirs(self.file_dir, exist_ok=True)
self.vad_deafult = vad_default
self.vad_threshold = vad_default
self.vad_threshold_path = os.path.join(self.file_dir, "vad_threshold.npy")
# 10ms 一帧
self.frame_length = frame_length
# 10帧,检测一次 vad
self.frame = frame
# int 16, 两个bytes
self.data_width = data_width
# window
self.window_length = frame_length * frame * data_width
# 是否开始录音
self.on_asr = False
self.silence_cnt = 0
self.max_silence_cnt = 4
self.is_pause = False # 录音暂停与恢复
def init(self):
if os.path.exists(self.vad_threshold_path):
# 平均响度文件存在
self.vad_threshold = np.load(self.vad_threshold_path)
def clear_audio(self):
# 清空 pcm 累积片段与 asr 识别结果
self.audios = b''
def clear_asr(self):
self.asr_result = ""
def compute_chunk_volume(self, start_index, pcm_bins):
# 根据帧长计算能量平均值
pcm_bin = pcm_bins[start_index: start_index + self.window_length]
# 转成 numpy
pcm_np = np.frombuffer(pcm_bin, np.int16)
# 归一化 + 计算响度
x = pcm_np.astype(np.float32)
x = np.abs(x)
return np.mean(x)
def is_speech(self, start_index, pcm_bins):
# 检查是否没
if start_index > len(pcm_bins):
return False
# 检查从这个 start 开始是否为静音帧
energy = self.compute_chunk_volume(start_index=start_index, pcm_bins=pcm_bins)
# print(energy)
if energy > self.vad_threshold:
return True
else:
return False
def compute_env_volume(self, pcm_bins):
max_energy = 0
start = 0
while start < len(pcm_bins):
energy = self.compute_chunk_volume(start_index=start, pcm_bins=pcm_bins)
if energy > max_energy:
max_energy = energy
start += self.window_length
self.vad_threshold = max_energy + 100 if max_energy > self.vad_deafult else self.vad_deafult
# 保存成文件
np.save(self.vad_threshold_path, self.vad_threshold)
print(f"vad 阈值大小: {self.vad_threshold}")
print(f"环境采样保存: {os.path.realpath(self.vad_threshold_path)}")
def stream_asr(self, pcm_bin):
# 先把 pcm_bin 送进去做端点检测
start = 0
while start < len(pcm_bin):
if self.is_speech(start_index=start, pcm_bins=pcm_bin):
self.on_asr = True
self.silence_cnt = 0
print("录音中")
self.audios += pcm_bin[ start : start + self.window_length]
else:
if self.on_asr:
self.silence_cnt += 1
if self.silence_cnt > self.max_silence_cnt:
self.on_asr = False
self.silence_cnt = 0
# 录音停止
print("录音停止")
# audios 保存为 wav, 送入 ASR
if len(self.audios) > 2 * 16000:
file_path = os.path.join(self.file_dir, "asr_" + datetime.datetime.strftime(datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav")
self.save_audio(file_path=file_path)
self.asr_result = self.robot.speech2text(file_path)
self.clear_audio()
return self.asr_result
else:
# 正常接收
print("录音中 静音")
self.audios += pcm_bin[ start : start + self.window_length]
start += self.window_length
return ""
def save_audio(self, file_path):
print("保存音频")
wf = wave.open(file_path, 'wb') # 创建一个音频文件,名字为“01.wav"
wf.setnchannels(1) # 设置声道数为2
wf.setsampwidth(2) # 设置采样深度为
wf.setframerate(16000) # 设置采样率为16000
# 将数据写入创建的音频文件
wf.writeframes(self.audios)
# 写完后将文件关闭
wf.close()
def end(self):
# audios 保存为 wav, 送入 ASR
file_path = os.path.join(self.file_dir, "asr.wav")
self.save_audio(file_path=file_path)
return self.robot.speech2text(file_path)
def stop(self):
self.is_pause = True
self.audios = b''
def resume(self):
self.is_pause = False
\ No newline at end of file
from re import sub
import numpy as np
import paddle
import librosa
import soundfile
from paddlespeech.server.engine.asr.online.python.asr_engine import ASREngine
from paddlespeech.server.engine.asr.online.python.asr_engine import PaddleASRConnectionHanddler
from paddlespeech.server.utils.config import get_config
def readWave(samples):
x_len = len(samples)
chunk_size = 85 * 16 #80ms, sample_rate = 16kHz
if x_len % chunk_size != 0:
padding_len_x = chunk_size - x_len % chunk_size
else:
padding_len_x = 0
padding = np.zeros((padding_len_x), dtype=samples.dtype)
padded_x = np.concatenate([samples, padding], axis=0)
assert (x_len + padding_len_x) % chunk_size == 0
num_chunk = (x_len + padding_len_x) / chunk_size
num_chunk = int(num_chunk)
for i in range(0, num_chunk):
start = i * chunk_size
end = start + chunk_size
x_chunk = padded_x[start:end]
yield x_chunk
class ASR:
def __init__(self, config_path, ) -> None:
self.config = get_config(config_path)['asr_online']
self.engine = ASREngine()
self.engine.init(self.config)
self.connection_handler = PaddleASRConnectionHanddler(self.engine)
def offlineASR(self, samples, sample_rate=16000):
x_chunk, x_chunk_lens = self.engine.preprocess(samples=samples, sample_rate=sample_rate)
self.engine.run(x_chunk, x_chunk_lens)
result = self.engine.postprocess()
self.engine.reset()
return result
def onlineASR(self, samples:bytes=None, is_finished=False):
if not is_finished:
# 流式开始
self.connection_handler.extract_feat(samples)
self.connection_handler.decode(is_finished)
asr_results = self.connection_handler.get_result()
return asr_results
else:
# 流式结束
self.connection_handler.decode(is_finished=True)
self.connection_handler.rescoring()
asr_results = self.connection_handler.get_result()
self.connection_handler.reset()
return asr_results
\ No newline at end of file
from paddlenlp import Taskflow
class NLP:
def __init__(self, ie_model_path=None):
schema = ["时间", "出发地", "目的地", "费用"]
if ie_model_path:
self.ie_model = Taskflow("information_extraction",
schema=schema, task_path=ie_model_path)
else:
self.ie_model = Taskflow("information_extraction",
schema=schema)
self.dialogue_model = Taskflow("dialogue")
def chat(self, text):
result = self.dialogue_model([text])
return result[0]
def ie(self, text):
result = self.ie_model(text)
return result
\ No newline at end of file
import base64
import sqlite3
import os
import numpy as np
from pkg_resources import resource_stream
def dict_factory(cursor, row):
d = {}
for idx, col in enumerate(cursor.description):
d[col[0]] = row[idx]
return d
class DataBase(object):
def __init__(self, db_path:str):
db_path = os.path.realpath(db_path)
if os.path.exists(db_path):
self.db_path = db_path
else:
db_path_dir = os.path.dirname(db_path)
os.makedirs(db_path_dir, exist_ok=True)
self.db_path = db_path
self.conn = sqlite3.connect(self.db_path)
self.conn.row_factory = dict_factory
self.cursor = self.conn.cursor()
self.init_database()
def init_database(self):
"""
初始化数据库, 若表不存在则创建
"""
sql = """
CREATE TABLE IF NOT EXISTS vprtable (
`id` INTEGER PRIMARY KEY AUTOINCREMENT,
`username` TEXT NOT NULL,
`vector` TEXT NOT NULL,
`wavpath` TEXT NOT NULL
);
"""
self.cursor.execute(sql)
self.conn.commit()
def execute_base(self, sql, data_dict):
self.cursor.execute(sql, data_dict)
self.conn.commit()
def insert_one(self, username, vector_base64:str, wav_path):
if not os.path.exists(wav_path):
return None, "wav not exists"
else:
sql = f"""
insert into
vprtable (username, vector, wavpath)
values (?, ?, ?)
"""
try:
self.cursor.execute(sql, (username, vector_base64, wav_path))
self.conn.commit()
lastidx = self.cursor.lastrowid
return lastidx, "data insert success"
except Exception as e:
print(e)
return None, e
def select_all(self):
sql = """
SELECT * from vprtable
"""
result = self.cursor.execute(sql).fetchall()
return result
def select_by_id(self, vpr_id):
sql = f"""
SELECT * from vprtable WHERE `id` = {vpr_id}
"""
result = self.cursor.execute(sql).fetchall()
return result
def select_by_username(self, username):
sql = f"""
SELECT * from vprtable WHERE `username` = '{username}'
"""
result = self.cursor.execute(sql).fetchall()
return result
def drop_by_username(self, username):
sql = f"""
DELETE from vprtable WHERE `username`='{username}'
"""
self.cursor.execute(sql)
self.conn.commit()
def drop_all(self):
sql = f"""
DELETE from vprtable
"""
self.cursor.execute(sql)
self.conn.commit()
def drop_table(self):
sql = f"""
DROP TABLE vprtable
"""
self.cursor.execute(sql)
self.conn.commit()
def encode_vector(self, vector:np.ndarray):
return base64.b64encode(vector).decode('utf8')
def decode_vector(self, vector_base64, dtype=np.float32):
b = base64.b64decode(vector_base64)
vc = np.frombuffer(b, dtype=dtype)
return vc
\ No newline at end of file
# tts 推理引擎,支持流式与非流式
# 精简化使用
# 用 onnxruntime 进行推理
# 1. 下载对应的模型
# 2. 加载模型
# 3. 端到端推理
# 4. 流式推理
import base64
import math
import logging
import numpy as np
from paddlespeech.server.utils.onnx_infer import get_sess
from paddlespeech.t2s.frontend.zh_frontend import Frontend
from paddlespeech.server.utils.util import denorm, get_chunks
from paddlespeech.server.utils.audio_process import float2pcm
from paddlespeech.server.utils.config import get_config
from paddlespeech.server.engine.tts.online.onnx.tts_engine import TTSEngine
class TTS:
def __init__(self, config_path):
self.config = get_config(config_path)['tts_online-onnx']
self.config['voc_block'] = 36
self.engine = TTSEngine()
self.engine.init(self.config)
self.executor = self.engine.executor
#self.engine.warm_up()
# 前端初始化
self.frontend = Frontend(
phone_vocab_path=self.engine.executor.phones_dict,
tone_vocab_path=None)
def depadding(self, data, chunk_num, chunk_id, block, pad, upsample):
"""
Streaming inference removes the result of pad inference
"""
front_pad = min(chunk_id * block, pad)
# first chunk
if chunk_id == 0:
data = data[:block * upsample]
# last chunk
elif chunk_id == chunk_num - 1:
data = data[front_pad * upsample:]
# middle chunk
else:
data = data[front_pad * upsample:(front_pad + block) * upsample]
return data
def offlineTTS(self, text):
get_tone_ids = False
merge_sentences = False
input_ids = self.frontend.get_input_ids(
text,
merge_sentences=merge_sentences,
get_tone_ids=get_tone_ids)
phone_ids = input_ids["phone_ids"]
wav_list = []
for i in range(len(phone_ids)):
orig_hs = self.engine.executor.am_encoder_infer_sess.run(
None, input_feed={'text': phone_ids[i].numpy()}
)
hs = orig_hs[0]
am_decoder_output = self.engine.executor.am_decoder_sess.run(
None, input_feed={'xs': hs})
am_postnet_output = self.engine.executor.am_postnet_sess.run(
None,
input_feed={
'xs': np.transpose(am_decoder_output[0], (0, 2, 1))
})
am_output_data = am_decoder_output + np.transpose(
am_postnet_output[0], (0, 2, 1))
normalized_mel = am_output_data[0][0]
mel = denorm(normalized_mel, self.engine.executor.am_mu, self.engine.executor.am_std)
wav = self.engine.executor.voc_sess.run(
output_names=None, input_feed={'logmel': mel})[0]
wav_list.append(wav)
wavs = np.concatenate(wav_list)
return wavs
def streamTTS(self, text):
get_tone_ids = False
merge_sentences = False
# front
input_ids = self.frontend.get_input_ids(
text,
merge_sentences=merge_sentences,
get_tone_ids=get_tone_ids)
phone_ids = input_ids["phone_ids"]
for i in range(len(phone_ids)):
part_phone_ids = phone_ids[i].numpy()
voc_chunk_id = 0
# fastspeech2_csmsc
if self.config.am == "fastspeech2_csmsc_onnx":
# am
mel = self.executor.am_sess.run(
output_names=None, input_feed={'text': part_phone_ids})
mel = mel[0]
# voc streaming
mel_chunks = get_chunks(mel, self.config.voc_block, self.config.voc_pad, "voc")
voc_chunk_num = len(mel_chunks)
for i, mel_chunk in enumerate(mel_chunks):
sub_wav = self.executor.voc_sess.run(
output_names=None, input_feed={'logmel': mel_chunk})
sub_wav = self.depadding(sub_wav[0], voc_chunk_num, i,
self.config.voc_block, self.config.voc_pad,
self.config.voc_upsample)
yield self.after_process(sub_wav)
# fastspeech2_cnndecoder_csmsc
elif self.config.am == "fastspeech2_cnndecoder_csmsc_onnx":
# am
orig_hs = self.executor.am_encoder_infer_sess.run(
None, input_feed={'text': part_phone_ids})
orig_hs = orig_hs[0]
# streaming voc chunk info
mel_len = orig_hs.shape[1]
voc_chunk_num = math.ceil(mel_len / self.config.voc_block)
start = 0
end = min(self.config.voc_block + self.config.voc_pad, mel_len)
# streaming am
hss = get_chunks(orig_hs, self.config.am_block, self.config.am_pad, "am")
am_chunk_num = len(hss)
for i, hs in enumerate(hss):
am_decoder_output = self.executor.am_decoder_sess.run(
None, input_feed={'xs': hs})
am_postnet_output = self.executor.am_postnet_sess.run(
None,
input_feed={
'xs': np.transpose(am_decoder_output[0], (0, 2, 1))
})
am_output_data = am_decoder_output + np.transpose(
am_postnet_output[0], (0, 2, 1))
normalized_mel = am_output_data[0][0]
sub_mel = denorm(normalized_mel, self.executor.am_mu,
self.executor.am_std)
sub_mel = self.depadding(sub_mel, am_chunk_num, i,
self.config.am_block, self.config.am_pad, 1)
if i == 0:
mel_streaming = sub_mel
else:
mel_streaming = np.concatenate(
(mel_streaming, sub_mel), axis=0)
# streaming voc
# 当流式AM推理的mel帧数大于流式voc推理的chunk size,开始进行流式voc 推理
while (mel_streaming.shape[0] >= end and
voc_chunk_id < voc_chunk_num):
voc_chunk = mel_streaming[start:end, :]
sub_wav = self.executor.voc_sess.run(
output_names=None, input_feed={'logmel': voc_chunk})
sub_wav = self.depadding(
sub_wav[0], voc_chunk_num, voc_chunk_id,
self.config.voc_block, self.config.voc_pad, self.config.voc_upsample)
yield self.after_process(sub_wav)
voc_chunk_id += 1
start = max(
0, voc_chunk_id * self.config.voc_block - self.config.voc_pad)
end = min(
(voc_chunk_id + 1) * self.config.voc_block + self.config.voc_pad,
mel_len)
else:
logging.error(
"Only support fastspeech2_csmsc or fastspeech2_cnndecoder_csmsc on streaming tts."
)
def streamTTSBytes(self, text):
for wav in self.engine.executor.infer(
text=text,
lang=self.engine.config.lang,
am=self.engine.config.am,
spk_id=0):
wav = float2pcm(wav) # float32 to int16
wav_bytes = wav.tobytes() # to bytes
yield wav_bytes
def after_process(self, wav):
# for tvm
wav = float2pcm(wav) # float32 to int16
wav_bytes = wav.tobytes() # to bytes
wav_base64 = base64.b64encode(wav_bytes).decode('utf8') # to base64
return wav_base64
def streamTTS_TVM(self, text):
# 用 TVM 优化
pass
\ No newline at end of file
# vpr Demo 没有使用 mysql 与 muilvs, 仅用于docker演示
import logging
import faiss
from matplotlib import use
import numpy as np
from .sql_helper import DataBase
from .vpr_encode import get_audio_embedding
class VPR:
def __init__(self, db_path, dim, top_k) -> None:
# 初始化
self.db_path = db_path
self.dim = dim
self.top_k = top_k
self.dtype = np.float32
self.vpr_idx = 0
# db 初始化
self.db = DataBase(db_path)
# faiss 初始化
index_ip = faiss.IndexFlatIP(dim)
self.index_ip = faiss.IndexIDMap(index_ip)
self.init()
def init(self):
# demo 初始化,把 mysql中的向量注册到 faiss 中
sql_dbs = self.db.select_all()
if sql_dbs:
for sql_db in sql_dbs:
idx = sql_db['id']
vc_bs64 = sql_db['vector']
vc = self.db.decode_vector(vc_bs64)
if len(vc.shape) == 1:
vc = np.expand_dims(vc, axis=0)
# 构建数据库
self.index_ip.add_with_ids(vc, np.array((idx,)).astype('int64'))
logging.info("faiss 构建完毕")
def faiss_enroll(self, idx, vc):
self.index_ip.add_with_ids(vc, np.array((idx,)).astype('int64'))
def vpr_enroll(self, username, wav_path):
# 注册声纹
emb = get_audio_embedding(wav_path)
emb = np.expand_dims(emb, axis=0)
if emb is not None:
emb_bs64 = self.db.encode_vector(emb)
last_idx, mess = self.db.insert_one(username, emb_bs64, wav_path)
if last_idx:
# faiss 注册
self.faiss_enroll(last_idx, emb)
else:
last_idx, mess = None
return last_idx
def vpr_recog(self, wav_path):
# 识别声纹
emb_search = get_audio_embedding(wav_path)
if emb_search is not None:
emb_search = np.expand_dims(emb_search, axis=0)
D, I = self.index_ip.search(emb_search, self.top_k)
D = D.tolist()[0]
I = I.tolist()[0]
return [(round(D[i] * 100, 2 ), I[i]) for i in range(len(D)) if I[i] != -1]
else:
logging.error("识别失败")
return None
def do_search_vpr(self, wav_path):
spk_ids, paths, scores = [], [], []
recog_result = self.vpr_recog(wav_path)
for score, idx in recog_result:
username = self.db.select_by_id(idx)[0]['username']
if username not in spk_ids:
spk_ids.append(username)
scores.append(score)
paths.append("")
return spk_ids, paths, scores
def vpr_del(self, username):
# 根据用户username, 删除声纹
# 查用户ID,删除对应向量
res = self.db.select_by_username(username)
for r in res:
idx = r['id']
self.index_ip.remove_ids(np.array((idx,)).astype('int64'))
self.db.drop_by_username(username)
def vpr_list(self):
# 获取数据列表
return self.db.select_all()
def do_list(self):
spk_ids, vpr_ids = [], []
for res in self.db.select_all():
spk_ids.append(res['username'])
vpr_ids.append(res['id'])
return spk_ids, vpr_ids
def do_get_wav(self, vpr_idx):
res = self.db.select_by_id(vpr_idx)
return res[0]['wavpath']
def vpr_data(self, idx):
# 获取对应ID的数据
res = self.db.select_by_id(idx)
return res
def vpr_droptable(self):
# 删除表
self.db.drop_table()
# 清空 faiss
self.index_ip.reset()
from paddlespeech.cli.vector import VectorExecutor
import numpy as np
import logging
vector_executor = VectorExecutor()
def get_audio_embedding(path):
"""
Use vpr_inference to generate embedding of audio
"""
try:
embedding = vector_executor(
audio_file=path, model='ecapatdnn_voxceleb12')
embedding = embedding / np.linalg.norm(embedding)
return embedding
except Exception as e:
logging.error(f"Error with embedding:{e}")
return None
\ No newline at end of file
from typing import List
from fastapi import WebSocket
class ConnectionManager:
def __init__(self):
# 存放激活的ws连接对象
self.active_connections: List[WebSocket] = []
async def connect(self, ws: WebSocket):
# 等待连接
await ws.accept()
# 存储ws连接对象
self.active_connections.append(ws)
def disconnect(self, ws: WebSocket):
# 关闭时 移除ws对象
self.active_connections.remove(ws)
@staticmethod
async def send_personal_message(message: str, ws: WebSocket):
# 发送个人消息
await ws.send_text(message)
async def broadcast(self, message: str):
# 广播消息
for connection in self.active_connections:
await connection.send_text(message)
manager = ConnectionManager()
\ No newline at end of file
from paddlespeech.cli.asr.infer import ASRExecutor
import soundfile as sf
import os
import librosa
from src.SpeechBase.asr import ASR
from src.SpeechBase.tts import TTS
from src.SpeechBase.nlp import NLP
class Robot:
def __init__(self, asr_config, tts_config,asr_init_path,
ie_model_path=None) -> None:
self.nlp = NLP(ie_model_path=ie_model_path)
self.asr = ASR(config_path=asr_config)
self.tts = TTS(config_path=tts_config)
self.tts_sample_rate = 24000
self.asr_sample_rate = 16000
# 流式识别效果不如端到端的模型,这里流式模型与端到端模型分开
self.asr_model = ASRExecutor()
self.asr_name = "conformer_wenetspeech"
self.warm_up_asrmodel(asr_init_path)
def warm_up_asrmodel(self, asr_init_path):
if not os.path.exists(asr_init_path):
path_dir = os.path.dirname(asr_init_path)
if not os.path.exists(path_dir):
os.makedirs(path_dir, exist_ok=True)
# TTS生成,采样率24000
text = "生成初始音频"
self.text2speech(text, asr_init_path)
# asr model初始化
self.asr_model(asr_init_path, model=self.asr_name,lang='zh',
sample_rate=16000, force_yes=True)
def speech2text(self, audio_file):
self.asr_model.preprocess(self.asr_name, audio_file)
self.asr_model.infer(self.asr_name)
res = self.asr_model.postprocess()
return res
def text2speech(self, text, outpath):
wav = self.tts.offlineTTS(text)
sf.write(
outpath, wav, samplerate=self.tts_sample_rate)
res = wav
return res
def text2speechStream(self, text):
for sub_wav_base64 in self.tts.streamTTS(text=text):
yield sub_wav_base64
def text2speechStreamBytes(self, text):
for wav_bytes in self.tts.streamTTSBytes(text=text):
yield wav_bytes
def chat(self, text):
result = self.nlp.chat(text)
return result
def ie(self, text):
result = self.nlp.ie(text)
return result
\ No newline at end of file
import random
def randName(n=5):
return "".join(random.sample('zyxwvutsrqponmlkjihgfedcba',n))
def SuccessRequest(result=None, message="ok"):
return {
"code": 0,
"result":result,
"message": message
}
def ErrorRequest(result=None, message="error"):
return {
"code": -1,
"result":result,
"message": message
}
\ No newline at end of file
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
lerna-debug.log*
node_modules
dist
dist-ssr
*.local
# Editor directories and files
.vscode/*
!.vscode/extensions.json
.idea
.DS_Store
*.suo
*.ntvs*
*.njsproj
*.sln
*.sw?
.vscode/*
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>飞桨PaddleSpeech</title>
</head>
<body>
<div id="app"></div>
<script type="module" src="/src/main.js"></script>
</body>
</html>
此差异已折叠。
{
"name": "paddlespeechwebclient",
"private": true,
"version": "0.0.0",
"scripts": {
"dev": "vite",
"build": "vite build",
"preview": "vite preview"
},
"dependencies": {
"ant-design-vue": "^2.2.8",
"axios": "^0.26.1",
"element-plus": "^2.1.9",
"js-audio-recorder": "0.5.7",
"lamejs": "^1.2.1",
"less": "^4.1.2",
"vue": "^3.2.25"
},
"devDependencies": {
"@vitejs/plugin-vue": "^2.3.0",
"vite": "^2.9.0"
}
}
<script setup>
import Experience from './components/Experience.vue'
import Header from './components/Content/Header/Header.vue'
</script>
<template>
<div class="app">
<Header></Header>
<Experience></Experience>
</div>
</template>
<style style="less">
.app {
background: url("assets/image/在线体验-背景@2x.png") no-repeat;
};
</style>
export const apiURL = {
ASR_OFFLINE : '/api/asr/offline', // 获取离线语音识别结果
ASR_COLLECT_ENV : '/api/asr/collectEnv', // 采集环境噪音
ASR_STOP_RECORD : '/api/asr/stopRecord', // 后端暂停录音
ASR_RESUME_RECORD : '/api/asr/resumeRecord',// 后端恢复录音
NLP_CHAT : '/api/nlp/chat', // NLP闲聊接口
NLP_IE : '/api/nlp/ie', // 信息抽取接口
TTS_OFFLINE : '/api/tts/offline', // 获取TTS音频
VPR_RECOG : '/api/vpr/recog', // 声纹识别接口,返回声纹对比相似度
VPR_ENROLL : '/api/vpr/enroll', // 声纹识别注册接口
VPR_LIST : '/api/vpr/list', // 获取声纹注册的数据列表
VPR_DEL : '/api/vpr/del', // 删除用户声纹
VPR_DATA : '/api/vpr/database64?vprId=', // 获取声纹注册数据 bs64格式
// websocket
CHAT_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/offlineStream', // ChatBot websocket 接口
ASR_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/onlineStream', // Stream ASR 接口
TTS_SOCKET_RECORD: 'ws://localhost:8010/ws/tts/online', // Stream TTS 接口
}
import axios from 'axios'
import {apiURL} from "./API.js"
// 上传音频文件,获得识别结果
export async function asrOffline(params){
const result = await axios.post(
apiURL.ASR_OFFLINE, params
)
return result
}
// 上传环境采集文件
export async function asrCollentEnv(params){
const result = await axios.post(
apiURL.ASR_OFFLINE, params
)
return result
}
// 暂停录音
export async function asrStopRecord(){
const result = await axios.get(apiURL.ASR_STOP_RECORD);
return result
}
// 恢复录音
export async function asrResumeRecord(){
const result = await axios.get(apiURL.ASR_RESUME_RECORD);
return result
}
\ No newline at end of file
import axios from 'axios'
import {apiURL} from "./API.js"
// 获取闲聊对话结果
export async function nlpChat(text){
const result = await axios.post(apiURL.NLP_CHAT, { chat : text});
return result
}
// 获取信息抽取结果
export async function nlpIE(text){
const result = await axios.post(apiURL.NLP_IE, { chat : text});
return result
}
import axios from 'axios'
import {apiURL} from "./API.js"
export async function ttsOffline(text){
const result = await axios.post(apiURL.TTS_OFFLINE, { text : text});
return result
}
import axios from 'axios'
import {apiURL} from "./API.js"
// 注册声纹
export async function vprEnroll(params){
const result = await axios.post(apiURL.VPR_ENROLL, params);
return result
}
// 声纹识别
export async function vprRecog(params){
const result = await axios.post(apiURL.VPR_RECOG, params);
return result
}
// 删除声纹
export async function vprDel(params){
const result = await axios.post(apiURL.VPR_DEL, params);
return result
}
// 获取声纹列表
export async function vprList(){
const result = await axios.get(apiURL.VPR_LIST);
return result
}
// 获取声纹音频
export async function vprData(params){
const result = await axios.get(apiURL.VPR_DATA+params);
return result
}
<svg xmlns="http://www.w3.org/2000/svg" width="50" height="50" viewBox="0 0 50 50">
<g fill="none" fill-rule="evenodd">
<rect width="50" height="50" opacity="0"/>
<path fill="#FFF" fill-rule="nonzero" d="M10.5625,26.375 L10.5625,37.375 L39.4375,37.375 L39.4375,26.375 L42.1875,26.375 L42.1875,40.125 L7.8125,40.125 L7.8125,26.375 L10.5625,26.375 Z M24.9193012,9.30543065 L32.8422855,17.1477673 L30.9077145,19.1022327 L26.3745,14.6154306 L26.375,29.125 L23.625,29.125 L23.6245,14.5224306 L19.1022838,19.0922338 L17.1477162,17.1577662 L24.9193012,9.30543065 Z"/>
</g>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="50" height="50" viewBox="0 0 50 50">
<g fill="#FFF" fill-rule="evenodd">
<rect width="50" height="50" opacity="0"/>
<path d="M18.625,5.7 C19.2739346,5.7 19.8,6.22606542 19.8,6.875 L19.8,42.125 C19.8,42.7739346 19.2739346,43.3 18.625,43.3 C17.9760654,43.3 17.45,42.7739346 17.45,42.125 L17.45,6.875 C17.45,6.22606542 17.9760654,5.7 18.625,5.7 Z M30.375,10.4 C31.0239346,10.4 31.55,10.9260654 31.55,11.575 L31.55,37.425 C31.55,38.0739346 31.0239346,38.6 30.375,38.6 C29.7260654,38.6 29.2,38.0739346 29.2,37.425 L29.2,11.575 C29.2,10.9260654 29.7260654,10.4 30.375,10.4 Z M6.875,15.1 C7.52393458,15.1 8.05,15.6260654 8.05,16.275 L8.05,32.725 C8.05,33.3739346 7.52393458,33.9 6.875,33.9 C6.22606542,33.9 5.7,33.3739346 5.7,32.725 L5.7,16.275 C5.7,15.6260654 6.22606542,15.1 6.875,15.1 Z M42.125,17.45 C42.7739346,17.45 43.3,17.9760654 43.3,18.625 L43.3,30.375 C43.3,31.0239346 42.7739346,31.55 42.125,31.55 C41.4760654,31.55 40.95,31.0239346 40.95,30.375 L40.95,18.625 C40.95,17.9760654 41.4760654,17.45 42.125,17.45 Z"/>
</g>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="50" height="50" viewBox="0 0 50 50">
<g fill="#FFF" fill-rule="evenodd">
<rect width="50" height="50" fill="none"/>
<path fill-rule="nonzero" d="M41.4485655,21.2539772 C42.1315264,21.2850061 42.6598822,21.8638177 42.6289611,22.5468326 C42.6247768,22.6388278 42.6185963,22.7404533 42.6102079,22.8512273 L42.5782082,23.2105123 L42.5782082,23.2105123 L42.5316934,23.6217955 L42.5316934,23.6217955 L42.4693948,24.0821848 L42.4693948,24.0821848 L42.3900439,24.5887883 L42.3900439,24.5887883 L42.292372,25.1387138 C42.2744962,25.2338175 42.2558041,25.3306058 42.2362693,25.4290185 C41.8143833,27.555069 41.1316382,29.6828464 40.1241953,31.6800323 C37.4291788,37.0229123 32.9261483,40.3971985 26.4086979,40.8900674 L25.9987324,40.9171116 L25.9987324,45.4234882 L36.4808016,45.4234882 C37.1644101,45.4234882 37.7186683,45.9777464 37.7186683,46.661355 C37.7186683,47.3023391 37.2315273,47.8294468 36.6073586,47.8928314 L36.4808016,47.8992217 L13.1797237,47.8992217 C12.4960073,47.8992217 11.941857,47.3450714 11.941857,46.661355 C11.941857,46.020472 12.4289031,45.4932758 13.0531489,45.4298797 L13.1797237,45.4234882 L23.5229989,45.4234882 L23.5229989,40.9094487 C16.8529053,40.4933909 12.2580826,37.0999016 9.52429608,31.6800323 C8.5167992,29.6828464 7.83410805,27.5550691 7.41222208,25.4290185 L7.30490754,24.8585165 L7.30490754,24.8585165 L7.21653999,24.3298905 L7.21653999,24.3298905 L7.1458579,23.8460326 L7.1458579,23.8460326 L7.09159974,23.4098348 L7.09159974,23.4098348 L7.052504,23.0241892 L7.052504,23.0241892 L7.02730915,22.6919879 C7.02419833,22.6412354 7.02161415,22.5928302 7.01953033,22.5468326 C6.98839343,21.8638177 7.5168571,21.2850061 8.19987204,21.2539772 C8.84009734,21.2248875 9.38883251,21.6875394 9.4804906,22.3081826 L9.52089639,22.8033886 L9.52089639,22.8033886 L9.55106194,23.0957484 L9.55106194,23.0957484 C9.61279606,23.6520033 9.70707015,24.274849 9.84046771,24.9470712 C10.2215574,26.8673593 10.837172,28.7858665 11.7346375,30.5650942 C14.2485231,35.5489392 18.4280434,38.4929132 24.8242187,38.5130415 C31.2204481,38.4929132 35.3998065,35.5489392 37.9138,30.5650942 C38.8112655,28.7858665 39.4267722,26.8673053 39.8078618,24.9470712 C39.9413133,24.274849 40.0356414,23.6520033 40.0973756,23.0957484 L40.1383001,22.683441 L40.1383001,22.683441 L40.15571,22.4343189 L40.15571,22.4343189 C40.1868469,21.7514119 40.7656585,21.2229482 41.4485655,21.2539772 Z M24.7277861,1.03431811 C30.2652292,1.03431811 34.7717072,5.45158401 34.9203849,10.9435284 L34.924173,11.2236897 L34.924173,24.2016207 C34.924173,29.8291412 30.3475899,34.3909923 24.7277861,34.3909923 C19.1903431,34.3909923 14.6838651,29.9738829 14.5351873,24.4817898 L14.5313993,24.2016206 L14.5313993,11.2236897 C14.5313993,5.59627708 19.1078745,1.03431811 24.7277861,1.03431811 Z M24.7278401,3.51005152 C20.5523235,3.51005152 17.1406309,6.83661824 17.0109562,10.9790926 L17.0071327,11.2236897 L17.0071327,24.2016206 C17.0071327,28.4575531 20.4658637,31.9152588 24.7278401,31.9152588 C28.9033567,31.9152588 32.3150493,28.5887959 32.444724,24.4462237 L32.4485475,24.2016206 L32.4485475,11.2236897 C32.4485475,6.96786511 28.9898165,3.51005152 24.7278401,3.51005152 Z"/>
</g>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 20 20">
<g fill="#FFF" fill-rule="evenodd">
<rect width="20" height="20" opacity="0"/>
<path fill-rule="nonzero" d="M17.2545788,8.38095607 C17.5371833,8.39379564 17.7558133,8.63330387 17.7430184,8.91593074 L17.7371151,9.01650414 L17.7371151,9.01650414 L17.7143664,9.26243626 L17.7143664,9.26243626 L17.675151,9.56380287 L17.675151,9.56380287 L17.6172162,9.91546885 C17.6058754,9.97798618 17.5936607,10.0423853 17.5805252,10.1085594 C17.4059517,10.9883044 17.1234365,11.868764 16.7065636,12.6951858 C15.608809,14.8714882 13.7861076,16.2584571 11.1569912,16.495803 L10.8615444,16.5174255 L10.8615444,18.3821331 L15.1989524,18.3821331 C15.4818249,18.3821331 15.7111731,18.6114813 15.7111731,18.8943538 C15.7111731,19.1458357 15.5299597,19.3549563 15.2910197,19.3983228 L15.1989524,19.4065745 L5.55712706,19.4065745 C5.2742099,19.4065745 5.04490634,19.1772709 5.04490634,18.8943538 C5.04490634,18.6429116 5.22608446,18.4337601 5.46504803,18.3903863 L5.55712706,18.3821331 L9.83710301,18.382133 L9.83710301,16.5142546 C7.07706426,16.3420928 5.1757583,14.9378903 4.04453631,12.6951858 C3.62764105,11.868764 3.34514816,10.9883044 3.17057465,10.1085594 L3.13388183,9.91546885 L3.13388183,9.91546885 L3.07593716,9.56380287 L3.07593716,9.56380287 L3.03671385,9.26243626 L3.03671385,9.26243626 L3.01397193,9.01650414 C3.01143062,8.98042028 3.00948271,8.94686015 3.00808152,8.91593074 C2.99519728,8.63330387 3.2138719,8.39379564 3.49649877,8.38095607 C3.77908098,8.36811648 4.01858921,8.58679112 4.03142877,8.86937333 L4.04579166,9.04965974 L4.04579166,9.04965974 L4.05561184,9.14306831 C4.08115699,9.37324275 4.12016696,9.63097201 4.17536595,9.90913293 C4.33305822,10.7037349 4.5877953,11.4975999 4.95916033,12.2338321 C5.99938887,14.2961128 7.72884553,15.5143089 10.3755388,15.5226379 C13.0222544,15.5143089 14.7516441,14.2961128 15.7919173,12.2338321 C16.1632823,11.4975999 16.4179747,10.7037126 16.575667,9.90913293 C16.6124812,9.7236923 16.6421003,9.54733242 16.6653248,9.38216386 L16.7052821,9.04965974 L16.7052821,9.04965974 L16.7196041,8.86937333 L16.7196041,8.86937333 C16.7324884,8.58679115 16.9719966,8.3681165 17.2545788,8.38095607 Z M10.3356356,0.0142005962 C12.595216,0.0142005962 14.4399401,1.79169133 14.5496666,4.02028091 L14.5548302,4.23049229 L14.5548302,9.60067063 C14.5548302,11.9292998 12.6610717,13.8169623 10.3356356,13.8169623 C8.07605526,13.8169623 6.23133121,12.0395346 6.12160467,9.81088771 L6.11644109,9.60067061 L6.11644109,4.23049229 C6.11644109,1.90190776 8.01015495,0.0142005962 10.3356356,0.0142005962 Z M10.335658,1.03864201 C8.63472709,1.03864201 7.24010749,2.37267291 7.14594933,4.04955911 L7.1408825,4.23049229 L7.1408825,9.60067061 C7.1408825,11.3617461 8.57208154,12.7925209 10.335658,12.7925209 C12.0365888,12.7925209 13.4312084,11.4585316 13.5253666,9.78160809 L13.5304334,9.60067061 L13.5304334,4.23049229 C13.5304334,2.46946142 12.0992344,1.03864201 10.335658,1.03864201 Z"/>
</g>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 16 16">
<path fill="#F33E3E" d="M4.0976,1.3362 C4.4618,1.1488 4.8948,1.4234 4.8948,1.833 C4.8948,2.0362 4.7852,2.2266 4.6046,2.3194 C2.5386,3.3816 1.1214,5.5338 1.1214,8.0122 C1.1214,11.677 4.2184,14.632 7.9326,14.398 C11.1952,14.1922 13.816,11.4788 13.9156,8.2112 C13.9936,5.6502 12.5572,3.4112 10.4372,2.3204 C10.256,2.2272 10.1452,2.0376 10.1452,1.8338 C10.1452,1.422 10.5814,1.1504 10.9476,1.3392 C13.366,2.5862 15.024,5.109 15.024,8.0124 C15.024,12.3292 11.3596,15.8064 6.978,15.497 C3.3116,15.238 0.3328,12.2886 0.0406,8.6244 C-0.2116,5.4644 1.5076,2.6692 4.0976,1.3362 Z M7.52,0.004 C7.8252,0.004 8.0726,0.2514 8.0726,0.5566 L8.0726,6.3544 C8.0726,6.6596 7.8252,6.907 7.52,6.907 C7.2148,6.907 6.9674,6.6596 6.9674,6.3544 L6.9674,0.5566 C6.9674,0.2514 7.2148,0.004 7.52,0.004 Z"/>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="116" height="116" viewBox="0 0 116 116">
<g fill="none" fill-rule="evenodd">
<circle cx="58" cy="58" r="58" fill="#2932E1"/>
<path fill="#FFF" fill-rule="nonzero" d="M74.4485655,54.2539772 C75.1315264,54.2850061 75.6598822,54.8638177 75.6289611,55.5468326 C75.6247768,55.6388278 75.6185963,55.7404533 75.6102079,55.8512273 L75.5782082,56.2105123 L75.5782082,56.2105123 L75.5316934,56.6217955 L75.5316934,56.6217955 L75.4693948,57.0821848 L75.4693948,57.0821848 L75.3900439,57.5887883 L75.3900439,57.5887883 L75.292372,58.1387138 C75.2744962,58.2338175 75.2558041,58.3306058 75.2362693,58.4290185 C74.8143833,60.555069 74.1316382,62.6828464 73.1241953,64.6800323 C70.4291788,70.0229123 65.9261483,73.3971985 59.4086979,73.8900674 L58.9987324,73.9171116 L58.9987324,78.4234882 L69.4808016,78.4234882 C70.1644101,78.4234882 70.7186683,78.9777464 70.7186683,79.661355 C70.7186683,80.3023391 70.2315273,80.8294468 69.6073586,80.8928314 L69.4808016,80.8992217 L46.1797237,80.8992217 C45.4960073,80.8992217 44.941857,80.3450714 44.941857,79.661355 C44.941857,79.020472 45.4289031,78.4932758 46.0531489,78.4298797 L46.1797237,78.4234882 L56.5229989,78.4234882 L56.5229989,73.9094487 C49.8529053,73.4933909 45.2580826,70.0999016 42.5242961,64.6800323 C41.5167992,62.6828464 40.834108,60.5550691 40.4122221,58.4290185 L40.3049075,57.8585165 L40.3049075,57.8585165 L40.21654,57.3298905 L40.21654,57.3298905 L40.1458579,56.8460326 L40.1458579,56.8460326 L40.0915997,56.4098348 L40.0915997,56.4098348 L40.052504,56.0241892 L40.052504,56.0241892 L40.0273091,55.6919879 C40.0241983,55.6412354 40.0216142,55.5928302 40.0195303,55.5468326 C39.9883934,54.8638177 40.5168571,54.2850061 41.199872,54.2539772 C41.8400973,54.2248875 42.3888325,54.6875394 42.4804906,55.3081826 L42.5208964,55.8033886 L42.5208964,55.8033886 L42.5510619,56.0957484 L42.5510619,56.0957484 C42.6127961,56.6520033 42.7070702,57.274849 42.8404677,57.9470712 C43.2215574,59.8673593 43.837172,61.7858665 44.7346375,63.5650942 C47.2485231,68.5489392 51.4280434,71.4929132 57.8242187,71.5130415 C64.2204481,71.4929132 68.3998065,68.5489392 70.9138,63.5650942 C71.8112655,61.7858665 72.4267722,59.8673053 72.8078618,57.9470712 C72.9413133,57.274849 73.0356414,56.6520033 73.0973756,56.0957484 L73.1383001,55.683441 L73.1383001,55.683441 L73.15571,55.4343189 L73.15571,55.4343189 C73.1868469,54.7514119 73.7656585,54.2229482 74.4485655,54.2539772 Z M57.7277861,34.0343181 C63.2652292,34.0343181 67.7717072,38.451584 67.9203849,43.9435284 L67.924173,44.2236897 L67.924173,57.2016207 C67.924173,62.8291412 63.3475899,67.3909923 57.7277861,67.3909923 C52.1903431,67.3909923 47.6838651,62.9738829 47.5351873,57.4817898 L47.5313993,57.2016206 L47.5313993,44.2236897 C47.5313993,38.5962771 52.1078745,34.0343181 57.7277861,34.0343181 Z M57.7278401,36.5100515 C53.5523235,36.5100515 50.1406309,39.8366182 50.0109562,43.9790926 L50.0071327,44.2236897 L50.0071327,57.2016206 C50.0071327,61.4575531 53.4658637,64.9152588 57.7278401,64.9152588 C61.9033567,64.9152588 65.3150493,61.5887959 65.444724,57.4462237 L65.4485475,57.2016206 L65.4485475,44.2236897 C65.4485475,39.9678651 61.9898165,36.5100515 57.7278401,36.5100515 Z"/>
</g>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="116" height="116" viewBox="0 0 116 116">
<g fill="none" fill-rule="evenodd">
<circle cx="58" cy="58" r="58" fill="#7278F5"/>
<path fill="#FFF" fill-rule="nonzero" d="M74.4485655,54.2539772 C75.1315264,54.2850061 75.6598822,54.8638177 75.6289611,55.5468326 C75.6247768,55.6388278 75.6185963,55.7404533 75.6102079,55.8512273 L75.5782082,56.2105123 L75.5782082,56.2105123 L75.5316934,56.6217955 L75.5316934,56.6217955 L75.4693948,57.0821848 L75.4693948,57.0821848 L75.3900439,57.5887883 L75.3900439,57.5887883 L75.292372,58.1387138 C75.2744962,58.2338175 75.2558041,58.3306058 75.2362693,58.4290185 C74.8143833,60.555069 74.1316382,62.6828464 73.1241953,64.6800323 C70.4291788,70.0229123 65.9261483,73.3971985 59.4086979,73.8900674 L58.9987324,73.9171116 L58.9987324,78.4234882 L69.4808016,78.4234882 C70.1644101,78.4234882 70.7186683,78.9777464 70.7186683,79.661355 C70.7186683,80.3023391 70.2315273,80.8294468 69.6073586,80.8928314 L69.4808016,80.8992217 L46.1797237,80.8992217 C45.4960073,80.8992217 44.941857,80.3450714 44.941857,79.661355 C44.941857,79.020472 45.4289031,78.4932758 46.0531489,78.4298797 L46.1797237,78.4234882 L56.5229989,78.4234882 L56.5229989,73.9094487 C49.8529053,73.4933909 45.2580826,70.0999016 42.5242961,64.6800323 C41.5167992,62.6828464 40.834108,60.5550691 40.4122221,58.4290185 L40.3049075,57.8585165 L40.3049075,57.8585165 L40.21654,57.3298905 L40.21654,57.3298905 L40.1458579,56.8460326 L40.1458579,56.8460326 L40.0915997,56.4098348 L40.0915997,56.4098348 L40.052504,56.0241892 L40.052504,56.0241892 L40.0273091,55.6919879 C40.0241983,55.6412354 40.0216142,55.5928302 40.0195303,55.5468326 C39.9883934,54.8638177 40.5168571,54.2850061 41.199872,54.2539772 C41.8400973,54.2248875 42.3888325,54.6875394 42.4804906,55.3081826 L42.5208964,55.8033886 L42.5208964,55.8033886 L42.5510619,56.0957484 L42.5510619,56.0957484 C42.6127961,56.6520033 42.7070702,57.274849 42.8404677,57.9470712 C43.2215574,59.8673593 43.837172,61.7858665 44.7346375,63.5650942 C47.2485231,68.5489392 51.4280434,71.4929132 57.8242187,71.5130415 C64.2204481,71.4929132 68.3998065,68.5489392 70.9138,63.5650942 C71.8112655,61.7858665 72.4267722,59.8673053 72.8078618,57.9470712 C72.9413133,57.274849 73.0356414,56.6520033 73.0973756,56.0957484 L73.1383001,55.683441 L73.1383001,55.683441 L73.15571,55.4343189 L73.15571,55.4343189 C73.1868469,54.7514119 73.7656585,54.2229482 74.4485655,54.2539772 Z M57.7277861,34.0343181 C63.2652292,34.0343181 67.7717072,38.451584 67.9203849,43.9435284 L67.924173,44.2236897 L67.924173,57.2016207 C67.924173,62.8291412 63.3475899,67.3909923 57.7277861,67.3909923 C52.1903431,67.3909923 47.6838651,62.9738829 47.5351873,57.4817898 L47.5313993,57.2016206 L47.5313993,44.2236897 C47.5313993,38.5962771 52.1078745,34.0343181 57.7277861,34.0343181 Z M57.7278401,36.5100515 C53.5523235,36.5100515 50.1406309,39.8366182 50.0109562,43.9790926 L50.0071327,44.2236897 L50.0071327,57.2016206 C50.0071327,61.4575531 53.4658637,64.9152588 57.7278401,64.9152588 C61.9033567,64.9152588 65.3150493,61.5887959 65.444724,57.4462237 L65.4485475,57.2016206 L65.4485475,44.2236897 C65.4485475,39.9678651 61.9898165,36.5100515 57.7278401,36.5100515 Z"/>
</g>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="10" height="12" viewBox="0 0 10 12">
<polygon fill="#FFF" fill-rule="evenodd" points="29 16 39 21.765 29 28" transform="translate(-29 -16)"/>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="10" height="12" viewBox="0 0 10 12">
<path fill="#FFF" fill-rule="evenodd" d="M31,17 L31,29 L29,29 L29,17 L31,17 Z M39,17 L39,29 L37,29 L37,17 L39,17 Z" transform="translate(-29 -17)"/>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="16" height="16" viewBox="0 0 16 16">
<defs>
<rect id="ic_更换示例-a" width="16" height="16" x="0" y="0"/>
</defs>
<g fill="none" fill-rule="evenodd" transform="matrix(-1 0 0 1 16 0)">
<mask id="ic_更换示例-b" fill="#fff">
<use xlink:href="#ic_更换示例-a"/>
</mask>
<path fill="#2932E1" fill-rule="nonzero" d="M6.35459401,0.717547671 L7.1160073,1.36581444 L5.76391165,2.95149486 C8.45440978,1.82595599 11.6186236,2.72687193 13.331374,5.17293307 C15.3274726,8.02365719 14.6537425,11.9415081 11.8236048,13.9231918 C8.99346706,15.9048756 5.08146225,15.1979908 3.08536373,12.3472667 C2.43380077,11.4167384 2.05175569,10.3497586 1.95954347,9.24373118 L1.95954347,9.24373118 L1.91800137,8.74545992 L2.9145439,8.66237572 L2.956086,9.16064698 C3.03368894,10.0914452 3.35506892,10.9889989 3.90451578,11.7736903 C5.58491905,14.1735549 8.873856,14.7678536 11.2500283,13.1040398 C13.6262007,11.440226 14.1926253,8.14637409 12.512222,5.74650951 C11.0401872,3.64422594 8.29699921,2.89825126 6.0091042,3.93534448 L6.11200137,3.89054767 L7.69316988,4.63120811 L7.26907888,5.53682768 L3.68173666,3.85691748 L6.35459401,0.717547671 Z" mask="url(#ic_更换示例-b)"/>
</g>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 20 20">
<g fill="#FFF" fill-rule="evenodd">
<rect width="20" height="20" opacity="0"/>
<path d="M7.5,2 C7.77614237,2 8,2.22385763 8,2.5 L8,17.5 C8,17.7761424 7.77614237,18 7.5,18 C7.22385763,18 7,17.7761424 7,17.5 L7,2.5 C7,2.22385763 7.22385763,2 7.5,2 Z M12.5,4 C12.7761424,4 13,4.22385763 13,4.5 L13,15.5 C13,15.7761424 12.7761424,16 12.5,16 C12.2238576,16 12,15.7761424 12,15.5 L12,4.5 C12,4.22385763 12.2238576,4 12.5,4 Z M2.5,6 C2.77614237,6 3,6.22385763 3,6.5 L3,13.5 C3,13.7761424 2.77614237,14 2.5,14 C2.22385763,14 2,13.7761424 2,13.5 L2,6.5 C2,6.22385763 2.22385763,6 2.5,6 Z M17.5,7 C17.7761424,7 18,7.22385763 18,7.5 L18,12.5 C18,12.7761424 17.7761424,13 17.5,13 C17.2238576,13 17,12.7761424 17,12.5 L17,7.5 C17,7.22385763 17.2238576,7 17.5,7 Z"/>
</g>
</svg>
<?xml version="1.0" encoding="UTF-8"?>
<svg width="20px" height="20px" viewBox="0 0 20 20" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<title>icon_录制声音(小语音)</title>
<g id="页面-1" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<g id="02-声纹识别-补充状态" transform="translate(-98.000000, -216.000000)" fill="#FFFFFF">
<g id="编组-6备份" transform="translate(77.000000, 204.000000)">
<g id="icon_录制声音(小语音)" transform="translate(21.000000, 12.000000)">
<rect id="矩形" opacity="0" x="0" y="0" width="20" height="20"></rect>
<path d="M17.2545788,8.38095607 C17.5371833,8.39379564 17.7558133,8.63330387 17.7430184,8.91593074 L17.7371151,9.01650414 L17.7371151,9.01650414 L17.7143664,9.26243626 L17.7143664,9.26243626 L17.675151,9.56380287 L17.675151,9.56380287 L17.6172162,9.91546885 C17.6058754,9.97798618 17.5936607,10.0423853 17.5805252,10.1085594 C17.4059517,10.9883044 17.1234365,11.868764 16.7065636,12.6951858 C15.608809,14.8714882 13.7861076,16.2584571 11.1569912,16.495803 L10.8615444,16.5174255 L10.8615444,18.3821331 L15.1989524,18.3821331 C15.4818249,18.3821331 15.7111731,18.6114813 15.7111731,18.8943538 C15.7111731,19.1458357 15.5299597,19.3549563 15.2910197,19.3983228 L15.1989524,19.4065745 L5.55712706,19.4065745 C5.2742099,19.4065745 5.04490634,19.1772709 5.04490634,18.8943538 C5.04490634,18.6429116 5.22608446,18.4337601 5.46504803,18.3903863 L5.55712706,18.3821331 L9.83710301,18.382133 L9.83710301,16.5142546 C7.07706426,16.3420928 5.1757583,14.9378903 4.04453631,12.6951858 C3.62764105,11.868764 3.34514816,10.9883044 3.17057465,10.1085594 L3.13388183,9.91546885 L3.13388183,9.91546885 L3.07593716,9.56380287 L3.07593716,9.56380287 L3.03671385,9.26243626 L3.03671385,9.26243626 L3.01397193,9.01650414 C3.01143062,8.98042028 3.00948271,8.94686015 3.00808152,8.91593074 C2.99519728,8.63330387 3.2138719,8.39379564 3.49649877,8.38095607 C3.77908098,8.36811648 4.01858921,8.58679112 4.03142877,8.86937333 L4.04579166,9.04965974 L4.04579166,9.04965974 L4.05561184,9.14306831 C4.08115699,9.37324275 4.12016696,9.63097201 4.17536595,9.90913293 C4.33305822,10.7037349 4.5877953,11.4975999 4.95916033,12.2338321 C5.99938887,14.2961128 7.72884553,15.5143089 10.3755388,15.5226379 C13.0222544,15.5143089 14.7516441,14.2961128 15.7919173,12.2338321 C16.1632823,11.4975999 16.4179747,10.7037126 16.575667,9.90913293 C16.6124812,9.7236923 16.6421003,9.54733242 16.6653248,9.38216386 L16.7052821,9.04965974 L16.7052821,9.04965974 L16.7196041,8.86937333 L16.7196041,8.86937333 C16.7324884,8.58679115 16.9719966,8.3681165 17.2545788,8.38095607 Z M10.3356356,0.0142005962 C12.595216,0.0142005962 14.4399401,1.79169133 14.5496666,4.02028091 L14.5548302,4.23049229 L14.5548302,9.60067063 C14.5548302,11.9292998 12.6610717,13.8169623 10.3356356,13.8169623 C8.07605526,13.8169623 6.23133121,12.0395346 6.12160467,9.81088771 L6.11644109,9.60067061 L6.11644109,4.23049229 C6.11644109,1.90190776 8.01015495,0.0142005962 10.3356356,0.0142005962 Z M10.335658,1.03864201 C8.63472709,1.03864201 7.24010749,2.37267291 7.14594933,4.04955911 L7.1408825,4.23049229 L7.1408825,9.60067061 C7.1408825,11.3617461 8.57208154,12.7925209 10.335658,12.7925209 C12.0365888,12.7925209 13.4312084,11.4585316 13.5253666,9.78160809 L13.5304334,9.60067061 L13.5304334,4.23049229 C13.5304334,2.46946142 12.0992344,1.03864201 10.335658,1.03864201 Z" id="形状" fill-rule="nonzero"></path>
</g>
</g>
</g>
</g>
</svg>
\ No newline at end of file
<template>
<div className="speech_header">
<div className="speech_header_title">
飞桨-PaddleSpeech
</div>
<div className="speech_header_describe">
PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,欢迎大家Star收藏鼓励
</div>
<div className="speech_header_link_box">
<a href="https://github.com/PaddlePaddle/PaddleSpeech" className="speech_header_link" target='_blank' rel='noreferrer' key={index}>
前往Github
</a>
</div>
</div>
</template>
<script>
export default {
name:"Header"
}
</script>
<style lang="less" scoped>
@import "./style.less";
</style>
\ No newline at end of file
.speech_header {
width: 1200px;
margin: 0 auto;
padding-top: 50px;
// background: url("../../../assets/image/在线体验-背景@2x.png") no-repeat;
box-sizing: border-box;
&::after {
content: "";
display: block;
clear: both;
visibility: hidden;
}
;
// background: pink;
.speech_header_title {
height: 57px;
font-family: PingFangSC-Medium;
font-size: 38px;
color: #000000;
letter-spacing: 0;
line-height: 57px;
font-weight: 500;
margin-bottom: 15px;
}
;
.speech_header_describe {
height: 26px;
font-family: PingFangSC-Regular;
font-size: 16px;
color: #575757;
line-height: 26px;
font-weight: 400;
margin-bottom: 24px;
}
;
.speech_header_link_box {
height: 40px;
margin-bottom: 40px;
display: flex;
align-items: center;
};
.speech_header_link {
display: block;
background: #2932E1;
width: 120px;
height: 40px;
line-height: 40px;
border-radius: 20px;
font-family: PingFangSC-Medium;
font-size: 14px;
color: #FFFFFF;
text-align: center;
font-weight: 500;
margin-right: 20px;
// margin-bottom: 40px;
&:hover {
opacity: 0.9;
}
;
}
;
.speech_header_divider {
width: 1200px;
height: 1px;
background: #D1D1D1;
margin-bottom: 40px;
}
;
.speech_header_content_wrapper {
width: 1200px;
margin: 0 auto;
// background: pink;
margin-bottom: 20px;
display: flex;
justify-content: space-between;
flex-wrap: wrap;
.speech_header_module {
width: 384px;
background: #FFFFFF;
border: 1px solid rgba(224, 224, 224, 1);
box-shadow: 4px 8px 12px 0px rgba(0, 0, 0, 0.05);
border-radius: 16px;
padding: 30px 34px 0px 34px;
box-sizing: border-box;
display: flex;
margin-bottom: 40px;
.speech_header_background_img {
width: 46px;
height: 46px;
background-size: 46px 46px;
background-repeat: no-repeat;
background-position: center;
margin-right: 20px;
}
;
.speech_header_content {
padding-top: 4px;
margin-bottom: 32px;
.speech_header_module_title {
height: 26px;
font-family: PingFangSC-Medium;
font-size: 20px;
color: #000000;
letter-spacing: 0;
line-height: 26px;
font-weight: 500;
margin-bottom: 10px;
}
;
.speech_header_module_introduce {
font-family: PingFangSC-Regular;
font-size: 16px;
color: #666666;
letter-spacing: 0;
font-weight: 400;
}
;
}
;
}
;
}
;
}
;
<script setup>
import ChatT from './SubMenu/ChatBot/ChatT.vue'
import ASRT from './SubMenu/ASR/ASRT.vue'
import TTST from './SubMenu/TTS/TTST.vue'
import VPRT from './SubMenu/VPR/VPRT.vue'
import IET from './SubMenu/IE/IET.vue'
</script>
<template>
<div className="experience">
<div className="experience_wrapper">
<div className="experience_title">
功能体验
</div>
<div className="experience_describe">
体验前,请允许浏览器获取麦克风权限
</div>
<div className="experience_content" >
<el-tabs
className="experience_tabs"
type="border-card"
>
<el-tab-pane label="语音聊天" key="1">
<ChatT></ChatT>
</el-tab-pane>
<el-tab-pane label="声纹识别" key="2">
<VPRT></VPRT>
</el-tab-pane>
<el-tab-pane label="语音识别" key="3">
<ASRT></ASRT>
</el-tab-pane>
<el-tab-pane label="语音合成" key="4">
<TTST></TTST>
</el-tab-pane>
<el-tab-pane label="语音指令" key="5">
<IET></IET>
</el-tab-pane>
</el-tabs>
</div>
</div>
</div>
</template>
<style lang="less">
@import "./style.less";
</style>
\ No newline at end of file
<template>
<div class="asrbox">
<h5> ASR 体验</h5>
<div class="home" style="margin:1vw;">
<el-button :type="recoType" @click="startRecorderChunk()" style="margin:1vw;">{{ recoText }} (流式)</el-button>
<el-button :type="recoType" @click="startRecorder()" style="margin:1vw;">{{ recoText }} (端到端)</el-button>
</div>
<a> asr_stream: {{ streamAsrResult }}</a>
<br>
<a> asr_offline: {{ asrResultOffline }} </a>
</div>
</template>
<script>
import Recorder from 'js-audio-recorder'
const recorder_chunk = new Recorder({
sampleBits: 16, // 采样位数,支持 8 或 16,默认是16
sampleRate: 16000, // 采样率,支持 11025、16000、22050、24000、44100、48000,根据浏览器默认值,我的chrome是48000
numChannels: 1, // 声道,支持 1 或 2, 默认是1
compiling: true
})
const recorder = new Recorder({
sampleBits: 16, // 采样位数,支持 8 或 16,默认是16
sampleRate: 16000, // 采样率,支持 11025、16000、22050、24000、44100、48000,根据浏览器默认值,我的chrome是48000
numChannels: 1, // 声道,支持 1 或 2, 默认是1
compiling: true
})
export default {
name: "ASR",
data(){
return {
streamAsrResult: '',
recoType: "primary",
recoText: "开始录音",
playType: "success",
asrResultOffline: '',
onReco: false,
ws:'',
}
},
mounted (){
// 初始化ws
this.ws = new WebSocket("ws://localhost:8010/ws/asr/onlineStream")
// 定义消息处理逻辑
var _that = this
this.ws.addEventListener('message', function (event) {
var temp = JSON.parse(event.data);
// console.log('ws message', event.data)
if(temp.result && (temp.result != _that.streamAsrResult)){
_that.streamAsrResult = temp.result
_that.$nextTick(()=>{})
console.log('更新了')
}
})
},
methods: {
startRecorder () {
if(!this.onReco){
recorder.clear()
recorder.start().then(() => {
}, (error) => {
console.log("录音出错");
})
this.onReco = true
this.recoType = "danger"
this.recoText = "结束录音"
this.$nextTick(()=>{
})
} else {
// 结束录音
recorder.stop()
this.onReco = false
this.recoType = "primary"
this.recoText = "开始录音"
this.$nextTick(()=>{})
// 音频导出成wav,然后上传到服务器
const wavs = recorder.getWAVBlob()
this.uploadFile(wavs, "/api/asr/offline")
}
},
startRecorderChunk() {
if(!this.onReco){
// 跟后端说:开始流式传输
var start = JSON.stringify({name:"test.wav", "nbest":5, signal:"start"})
this.ws.send(start)
recorder_chunk.start().then(() => {
setInterval(() => {
// 持续录音
let newData = recorder_chunk.getNextData();
if (!newData.length) {
return;
}
// 上传到流式测试1
this.uploadChunk(newData)
}, 500)
}, (error) => {
console.log("录音出错");
})
this.onReco = true
this.recoType = "danger"
this.recoText = "结束录音"
this.$nextTick(()=>{
})
} else {
// 结束录音
recorder_chunk.stop()
// 跟后端说不录了
// var end = JSON.stringify({name:"test.wav", "nbest":5, signal:"end"})
// this.ws.send(end)
this.onReco = false
this.recoType = "primary"
this.recoText = "开始录音"
this.$nextTick(()=>{})
recorder_chunk.clear()
}
},
uploadChunk(chunkDatas){
chunkDatas.forEach((chunkData) => {
this.ws.send(chunkData)
})
},
async uploadFile(file, post_url){
const formData = new FormData()
formData.append('files', file)
const result = await this.$http.post(post_url, formData);
if (result.data.code === 0) {
this.asrResultOffline = result.data.result
this.$nextTick(()=>{})
this.$message.success(result.data.message);
} else {
this.$message.error(result.data.message);
}
},
},
}
</script>
<style lang='less' scoped>
.asrbox {
border: 4px solid #F00;
// position: fixed;
top:40%;
width: 100%;
height: 20%;
overflow: auto;
}
</style>
\ No newline at end of file
<script setup>
import AudioFileIdentification from "./AudioFile/AudioFileIdentification.vue"
import RealTime from "./RealTime/RealTime.vue"
import EndToEndIdentification from "./EndToEnd/EndToEndIdentification.vue";
</script>
<template>
<div class="speech_recognition">
<div class="speech_recognition_tabs">
<div class="frame"></div>
<el-tabs class="speech_recognition_mytabs" type="border-card">
<el-tab-pane label="实时语音识别" key="1">
<RealTime />
</el-tab-pane>
<el-tab-pane label="端到端识别" key="2">
<EndToEndIdentification />
</el-tab-pane>
<el-tab-pane label="音频文件识别" key="3">
<AudioFileIdentification />
</el-tab-pane>
</el-tabs>
</div>
</div>
</template>
<script>
export default {
}
</script>
<style lang="less" scoped>
@import "./style.less";
</style>
\ No newline at end of file
<template>
<div class="audioFileIdentification">
<div v-if="uploadStatus === 0" class="public_recognition_speech">
<!-- 上传前 -->
<el-upload
:multiple="false"
:accept="'.wav'"
:limit="1"
:auto-upload="false"
:on-change="handleChange"
:show-file-list="false"
>
<div class="upload_img">
<div class="upload_img_back"></div>
</div>
</el-upload>
<div class="speech_text">
上传文件
</div>
<div class="speech_text_prompt">
支持50秒内的.wav文件
</div>
</div>
<!-- 上传中 -->
<div v-else-if="uploadStatus === 1" class="on_the_cross_speech">
<div class="on_the_upload_img">
<div class="on_the_upload_img_back"></div>
</div>
<div class="on_the_speech_text">
<span class="on_the_speech_loading"> <Spin indicator={antIcon} /></span> 上传中
</div>
</div>
<div v-else>
<!-- // {/* //开始识别 */} -->
<div v-if="recognitionStatus === 0" class="public_recognition_speech_start">
<div class="public_recognition_speech_content">
<div
class="public_recognition_speech_title"
>
{{ filename }}
</div>
<div
class="public_recognition_speech_again"
@click="uploadAgain()"
>重新上传</div>
<div
class="public_recognition_speech_play"
@click="paly()"
>播放</div>
</div>
<div class="speech_promp"
@click="beginToIdentify()">
开始识别
</div>
</div>
<!-- // {/* 识别中 */} -->
<div v-else-if="recognitionStatus === 1" class="public_recognition_speech_identify">
<div class="public_recognition_speech_identify_box">
<div
class="public_recognition_speech_identify_back_img"
>
<a-spin />
</div>
<div
class="public_recognition__identify_the_promp"
>识别中</div>
</div>
</div>
<!-- // {/* // 重新识别 */} -->
<div v-else class="public_recognition_speech_identify_ahain">
<div class="public_recognition_speech_identify_box_btn">
<div
class="public_recognition__identify_the_btn"
@click="toIdentifyThe()"
>重新识别</div>
</div>
</div>
</div>
<!-- {/* 指向 */} -->
<div class="public_recognition_point_to">
</div>
<!-- {/* 识别结果 */} -->
<div class="public_recognition_result">
<div>识别结果</div>
<div>{{ asrResult }}</div>
</div>
</div>
</template>
<script>
import { asrOffline } from '../../../../api/ApiASR'
let audioCtx = new AudioContext({
latencyHint: 'interactive',
sampleRate: 24000,
});
export default {
name:"",
data(){
return {
uploadStatus : 0,
recognitionStatus : 0,
asrResult : "",
indicator : "",
filename: "",
upfile: ""
}
},
methods:{
// 上传文件切换
handleChange(file, fileList){
this.uploadStatus = 2
this.filename = file.name
this.upfile = file
console.log(file)
// debugger
// var result = Buffer.from(file);
},
readFile(file) {
return new Promise((resolve, reject) => {
const fileReader = new FileReader();
fileReader.onload = function () {
resolve(fileReader);
};
fileReader.onerror = function (err) {
reject(err);
};
fileReader.readAsDataURL(file);
});
},
// 重新上传
uploadAgain(){
this.uploadStatus = 0
this.upfile = ""
this.filename = ""
this.asrResult = ""
},
// 播放音频
playAudioData(wav_buffer){
audioCtx.decodeAudioData(wav_buffer, buffer => {
let source = audioCtx.createBufferSource();
source.buffer = buffer
source.connect(audioCtx.destination);
source.start();
}, function (e) {
});
},
// 播放本地音频
async paly(){
if(this.upfile){
let fileRes = ""
let fileString = ""
fileRes = await this.readFile(this.upfile.raw);
fileString = fileRes.result;
const audioBase64type = (fileString.match(/data:[^;]*;base64,/))?.[0] ?? '';
const isBase64 = !!fileString.match(/data:[^;]*;base64,/);
const uploadBase64 = fileString.substr(audioBase64type.length);
// isBase64 ? uploadBase64 : undefined
// base转换二进制数
let typedArray = this.base64ToUint8Array(isBase64 ? uploadBase64 : undefined)
this.playAudioData(typedArray.buffer)
}
},
base64ToUint8Array(base64String){
const padding = '='.repeat((4 - base64String.length % 4) % 4);
const base64 = (base64String + padding)
.replace(/-/g, '+')
.replace(/_/g, '/');
const rawData = window.atob(base64);
const outputArray = new Uint8Array(rawData.length);
for (let i = 0; i < rawData.length; ++i) {
outputArray[i] = rawData.charCodeAt(i);
}
return outputArray;
},
// 开始识别
async beginToIdentify(){
// 识别中
this.recognitionStatus = 1
const formData = new FormData();
formData.append('files', this.upfile.raw);
const result = await asrOffline(formData)
// 重新识别
this.recognitionStatus = 2
console.log(result);
// debugger
if (result.data.code === 0) {
this.$message.success("识别成功")
// 获取识别文本
this.asrResult = result.data.result
} else {
this.$message.success("识别失败")
};
},
// 重新识别
toIdentifyThe(){
// this.uploadAgain()
this.uploadStatus = 0
this.recognitionStatus = 0
this.asrResult = ""
}
}
}
</script>
<style lang="less" scoped>
@import "./style.less";
</style>
\ No newline at end of file
.audioFileIdentification {
width: 1106px;
height: 270px;
// background-color: pink;
padding-top: 40px;
box-sizing: border-box;
display: flex;
// 开始上传
.public_recognition_speech {
width: 295px;
height: 230px;
padding-top: 32px;
box-sizing: border-box;
// 开始上传
.upload_img {
width: 116px;
height: 116px;
background: #2932E1;
border-radius: 50%;
margin-left: 98px;
cursor: pointer;
margin-bottom: 20px;
display: flex;
justify-content: center;
align-items: center;
.upload_img_back {
width: 34.38px;
height: 30.82px;
background: #2932E1;
background: url("../../../../assets/image/ic_大-上传文件.svg");
background-repeat: no-repeat;
background-position: center;
background-size: 34.38px 30.82px;
cursor: pointer;
}
&:hover {
opacity: 0.9;
};
};
.speech_text {
height: 22px;
font-family: PingFangSC-Medium;
font-size: 16px;
color: #000000;
font-weight: 500;
margin-left: 124px;
margin-bottom: 10px;
};
.speech_text_prompt {
height: 20px;
font-family: PingFangSC-Regular;
font-size: 14px;
color: #999999;
font-weight: 400;
margin-left: 84px;
};
};
// 上传中
.on_the_cross_speech {
width: 295px;
height: 230px;
padding-top: 32px;
box-sizing: border-box;
.on_the_upload_img {
width: 116px;
height: 116px;
background: #7278F5;
border-radius: 50%;
margin-left: 98px;
cursor: pointer;
margin-bottom: 20px;
display: flex;
justify-content: center;
align-items: center;
.on_the_upload_img_back {
width: 34.38px;
height: 30.82px;
background: #7278F5;
background: url("../../../../assets/image/ic_大-上传文件.svg");
background-repeat: no-repeat;
background-position: center;
background-size: 34.38px 30.82px;
cursor: pointer;
};
};
.on_the_speech_text {
height: 22px;
font-family: PingFangSC-Medium;
font-size: 16px;
color: #000000;
font-weight: 500;
margin-left: 124px;
margin-bottom: 10px;
display: flex;
// justify-content: center;
align-items: center;
.on_the_speech_loading {
display: inline-block;
width: 16px;
height: 16px;
background: #7278F5;
// background: url("../../../../assets/image/ic_开始聊天.svg");
// background-repeat: no-repeat;
// background-position: center;
// background-size: 16px 16px;
margin-right: 8px;
};
};
};
//开始识别
.public_recognition_speech_start {
width: 295px;
height: 230px;
padding-top: 32px;
box-sizing: border-box;
position: relative;
.public_recognition_speech_content {
width: 100%;
position: absolute;
top: 40px;
left: 50%;
transform: translateX(-50%);
display: flex;
justify-content: center;
align-items: center;
.public_recognition_speech_title {
height: 22px;
font-family: PingFangSC-Regular;
font-size: 16px;
color: #000000;
font-weight: 400;
};
.public_recognition_speech_again {
height: 22px;
font-family: PingFangSC-Regular;
font-size: 16px;
color: #2932E1;
font-weight: 400;
margin-left: 30px;
cursor: pointer;
};
.public_recognition_speech_play {
height: 22px;
font-family: PingFangSC-Regular;
font-size: 16px;
color: #2932E1;
font-weight: 400;
margin-left: 20px;
cursor: pointer;
};
};
.speech_promp {
position: absolute;
top: 112px;
left: 50%;
transform: translateX(-50%);
width: 142px;
height: 44px;
background: #2932E1;
border-radius: 22px;
font-family: PingFangSC-Medium;
font-size: 14px;
color: #FFFFFF;
text-align: center;
line-height: 44px;
font-weight: 500;
cursor: pointer;
};
};
// 识别中
.public_recognition_speech_identify {
width: 295px;
height: 230px;
padding-top: 32px;
box-sizing: border-box;
position: relative;
.public_recognition_speech_identify_box {
width: 143px;
height: 44px;
background: #7278F5;
border-radius: 22px;
position: absolute;
top: 50%;
left: 50%;
transform: translate(-50%,-50%);
display: flex;
justify-content: center;
align-items: center;
cursor: pointer;
.public_recognition_speech_identify_back_img {
width: 16px;
height: 16px;
// background: #7278F5;
// background: url("../../../../assets/image/ic_开始聊天.svg");
// background-repeat: no-repeat;
// background-position: center;
// background-size: 16px 16px;
};
.public_recognition__identify_the_promp {
height: 20px;
font-family: PingFangSC-Medium;
font-size: 14px;
color: #FFFFFF;
font-weight: 500;
margin-left: 12px;
};
};
};
// 重新识别
.public_recognition_speech_identify_ahain {
width: 295px;
height: 230px;
padding-top: 32px;
box-sizing: border-box;
position: relative;
cursor: pointer;
.public_recognition_speech_identify_box_btn {
width: 143px;
height: 44px;
background: #2932E1;
border-radius: 22px;
position: absolute;
top: 50%;
left: 50%;
transform: translate(-50%,-50%);
display: flex;
justify-content: center;
align-items: center;
cursor: pointer;
.public_recognition__identify_the_btn {
height: 20px;
font-family: PingFangSC-Medium;
font-size: 14px;
color: #FFFFFF;
font-weight: 500;
};
};
};
// 指向
.public_recognition_point_to {
width: 47px;
height: 67px;
background: url("../../../../assets/image/步骤-箭头切图@2x.png") no-repeat;
background-position: center;
background-size: 47px 67px;
margin-top: 91px;
margin-right: 67px;
};
// 识别结果
.public_recognition_result {
width: 680px;
height: 230px;
background: #FAFAFA;
padding: 40px 50px 0px 50px;
div {
&:nth-of-type(1) {
height: 26px;
font-family: PingFangSC-Medium;
font-size: 16px;
color: #666666;
line-height: 26px;
font-weight: 500;
margin-bottom: 20px;
};
&:nth-of-type(2) {
height: 26px;
font-family: PingFangSC-Medium;
font-size: 16px;
color: #666666;
line-height: 26px;
font-weight: 500;
};
};
};
};
\ No newline at end of file
<template>
<div class="endToEndIdentification">
<div class="public_recognition_speech">
<div v-if="onReco">
<!-- 结束录音 -->
<div @click="endRecorder()" class="endToEndIdentification_end_recorder_img">
<div class='endToEndIdentification_end_recorder_img_back'></div>
</div>
</div>
<div v-else>
<div @click="startRecorder()" class="endToEndIdentification_start_recorder_img"></div>
</div>
<div class="endToEndIdentification_prompt" >
<div v-if="onReco">
结束识别
</div>
<div v-else>
开始识别
</div>
</div>
<div class="speech_text_prompt">
停止录音后得到识别结果
</div>
</div>
<div class="public_recognition_point_to"></div>
<div class="public_recognition_result">
<div>识别结果</div>
<div> {{asrResult}} </div>
</div>
</div>
</template>
<script>
import Recorder from 'js-audio-recorder'
import { asrOffline } from '../../../../api/ApiASR'
const recorder = new Recorder({
sampleBits: 16, // 采样位数,支持 8 或 16,默认是16
sampleRate: 16000, // 采样率,支持 11025、16000、22050、24000、44100、48000,根据浏览器默认值,我的chrome是48000
numChannels: 1, // 声道,支持 1 或 2, 默认是1
compiling: true
})
export default {
data () {
return {
onReco: false,
asrResult: "",
}
},
methods: {
// 开始录音
startRecorder(){
this.onReco = true
recorder.clear()
recorder.start()
},
// 停止录音
endRecorder(){
recorder.stop()
this.onReco = false
// this.$nextTick(()=>{})
// 音频导出成wav,然后上传到服务器
const wavs = recorder.getWAVBlob()
this.uploadFile(wavs)
},
// 上传文件
async uploadFile(file){
const formData = new FormData()
formData.append('files', file)
const result = await asrOffline(formData)
if (result.data.code === 0) {
this.asrResult = result.data.result
// this.$nextTick(()=>{})
this.$message.success(result.data.message);
} else {
this.$message.error(result.data.message);
}
},
}
}
</script>
<style lang="less" scoped>
@import "./style.less";
</style>
\ No newline at end of file
.endToEndIdentification {
width: 1106px;
height: 270px;
// background-color: pink;
padding-top: 40px;
box-sizing: border-box;
display: flex;
// 开始识别
.public_recognition_speech {
width: 295px;
height: 230px;
padding-top: 32px;
box-sizing: border-box;
.endToEndIdentification_start_recorder_img {
width: 116px;
height: 116px;
background: #2932E1;
background: url("../../../../assets/image/ic_开始聊天.svg");
background-repeat: no-repeat;
background-position: center;
background-size: 116px 116px;
margin-left: 98px;
cursor: pointer;
margin-bottom: 20px;
&:hover {
background: url("../../../../assets/image/ic_开始聊天_hover.svg");
};
};
.endToEndIdentification_end_recorder_img {
width: 116px;
height: 116px;
background: #2932E1;
border-radius: 50%;
display: flex;
justify-content: center;
align-items: center;
margin-left: 98px;
margin-bottom: 20px;
cursor: pointer;
.endToEndIdentification_end_recorder_img_back {
width: 50px;
height: 50px;
background: url("../../../../assets/image/ic_大-声音波浪.svg");
background-repeat: no-repeat;
background-position: center;
background-size: 50px 50px;
&:hover {
opacity: 0.9;
};
};
};
.endToEndIdentification_prompt {
height: 22px;
font-family: PingFangSC-Medium;
font-size: 16px;
color: #000000;
font-weight: 500;
margin-left: 124px;
margin-bottom: 10px;
};
.speech_text_prompt {
height: 20px;
font-family: PingFangSC-Regular;
font-size: 14px;
color: #999999;
font-weight: 400;
margin-left: 90px;
};
};
// 指向
.public_recognition_point_to {
width: 47px;
height: 67px;
background: url("../../../../assets/image/步骤-箭头切图@2x.png") no-repeat;
background-position: center;
background-size: 47px 67px;
margin-top: 91px;
margin-right: 67px;
};
// 识别结果
.public_recognition_result {
width: 680px;
height: 230px;
background: #FAFAFA;
padding: 40px 50px 0px 50px;
div {
&:nth-of-type(1) {
height: 26px;
font-family: PingFangSC-Medium;
font-size: 16px;
color: #666666;
line-height: 26px;
font-weight: 500;
margin-bottom: 20px;
};
&:nth-of-type(2) {
height: 26px;
font-family: PingFangSC-Medium;
font-size: 16px;
color: #666666;
line-height: 26px;
font-weight: 500;
};
};
};
};
\ No newline at end of file
<template>
<div class="realTime">
<div class="public_recognition_speech">
<div v-if="onReco">
<!-- 结束录音 -->
<div @click="endRecorder()" class="endToEndIdentification_end_recorder_img">
<div class='endToEndIdentification_end_recorder_img_back'></div>
</div>
</div>
<div v-else>
<div @click="startRecorder()" class="endToEndIdentification_start_recorder_img"></div>
</div>
<div class="endToEndIdentification_prompt" >
<div v-if="onReco">
结束识别
</div>
<div v-else>
开始识别
</div>
</div>
<div class="speech_text_prompt">
实时得到识别结果
</div>
</div>
<div class="public_recognition_point_to"></div>
<div class="public_recognition_result">
<div>识别结果</div>
<div> {{asrResult}} </div>
</div>
</div>
</template>
<script>
import Recorder from 'js-audio-recorder'
import { apiURL } from '../../../../api/API'
const recorder = new Recorder({
sampleBits: 16, // 采样位数,支持 8 或 16,默认是16
sampleRate: 16000, // 采样率,支持 11025、16000、22050、24000、44100、48000,根据浏览器默认值,我的chrome是48000
numChannels: 1, // 声道,支持 1 或 2, 默认是1
compiling: true
})
export default {
data () {
return {
onReco: false,
asrResult: "",
wsUrl: "",
ws: ""
}
},
mounted () {
this.wsUrl = apiURL.ASR_SOCKET_RECORD
this.ws = new WebSocket(this.wsUrl)
if(this.ws.readyState === this.ws.CONNECTING){
this.$message.success("实时识别 Websocket 连接成功")
}
var _that = this
this.ws.addEventListener('message', function (event) {
var temp = JSON.parse(event.data);
// console.log('ws message', event.data)
if(temp.result && (temp.result != _that.streamAsrResult)){
_that.asrResult = temp.result
_that.$nextTick(()=>{})
}
})
},
methods: {
// 开始录音
startRecorder(){
// 检查 websocket 状态
// debugger
if(this.ws.readyState != this.ws.OPEN){
this.$message.error("websocket 链接失败,请检查链接地址是否正确")
return
}
this.onReco = true
// 先跟后端说开始
var start = JSON.stringify({name:"test.wav", "nbest":5, signal:"start"})
this.ws.send(start)
recorder.start().then(() => {
setInterval(() => {
// 持续录音
let newData = recorder.getNextData();
if (!newData.length) {
return;
}
// 上传到流式测试1
this.uploadChunk(newData)
}, 300)
}, (error) => {
console.log("录音出错");
})
// this.onReco = true
},
// 停止录音
endRecorder(){
// 结束录音
recorder.stop()
this.onReco = false
recorder.clear()
},
// 流式上传
uploadChunk(chunkDatas){
chunkDatas.forEach((chunkData) => {
this.ws.send(chunkData)
})
},
},
}
</script>
<style lang="less" scoped>
@import "./style.less";
</style>
\ No newline at end of file
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
文件模式从 100644 更改为 100755
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册