未验证 提交 62fa7e0f 编写于 作者: V vpegasus 提交者: GitHub

Merge branch 'PaddlePaddle:develop' into develop

......@@ -159,6 +159,7 @@
### 近期更新
- 👑 2022.05.13: PaddleSpeech 发布 [PP-ASR](./docs/source/asr/PPASR_cn.md) 流式语音识别系统、[PP-TTS](./docs/source/tts/PPTTS_cn.md) 流式语音合成系统、[PP-VPR](docs/source/vpr/PPVPR_cn.md) 全链路声纹识别系统
- 👏🏻 2022.05.06: PaddleSpeech Streaming Server 上线! 覆盖了语音识别(标点恢复、时间戳),和语音合成。
- 👏🏻 2022.05.06: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、语音合成、声纹识别,标点恢复。
......
*/.vscode/*
*.wav
*/resource/*
.Ds*
*.pyc
*.pcm
*.npy
*.diff
*.sqlite
*/static/*
*.pdparams
*.pdiparams*
*.pdmodel
*/source/*
*/PaddleSpeech/*
# Paddle Speech Demo
PaddleSpeechDemo是一个以PaddleSpeech的语音交互功能为主体开发的Demo展示项目,用于帮助大家更好的上手PaddleSpeech以及使用PaddleSpeech构建自己的应用。
智能语音交互部分使用PaddleSpeech,对话以及信息抽取部分使用PaddleNLP,网页前端展示部分基于Vue3进行开发
主要功能:
+ 语音聊天:PaddleSpeech的语音识别能力+语音合成能力,对话部分基于PaddleNLP的闲聊功能
+ 声纹识别:PaddleSpeech的声纹识别功能展示
+ 语音识别:支持【实时语音识别】,【端到端识别】,【音频文件识别】三种模式
+ 语音合成:支持【流式合成】与【端到端合成】两种方式
+ 语音指令:基于PaddleSpeech的语音识别能力与PaddleNLP的信息抽取,实现交通费的智能报销
运行效果:
![效果](docs/效果展示.png)
## 安装
### 后端环境安装
```
# 安装环境
cd speech_server
pip install -r requirements.txt
```
### 前端环境安装
前端依赖node.js ,需要提前安装,确保npm可用,npm测试版本8.3.1,建议下载[官网](https://nodejs.org/en/)稳定版的node.js
```
# 进入前端目录
cd web_client
# 安装yarn,已经安装可跳过
npm install -g yarn
# 使用yarn安装前端依赖
yarn install
```
## 启动服务
### 开启后端服务
```
cd speech_server
# 默认8010端口
python main.py --port 8010
```
### 开启前端服务
```
cd web_client
yarn dev --port 8011
```
默认配置下,前端中配置的后台地址信息是localhost,确保后端服务器和打开页面的游览器在同一台机器上,不在一台机器的配置方式见下方的FAQ:【后端如果部署在其它机器或者别的端口如何修改】
## Docker启动
### 后端docker
后端docker使用[paddlepaddle官方docker](https://www.paddlepaddle.org.cn),这里演示CPU版本
```
# 拉取PaddleSpeech项目
cd PaddleSpeechServer
git clone https://github.com/PaddlePaddle/PaddleSpeech.git
# 拉取镜像
docker pull registry.baidubce.com/paddlepaddle/paddle:2.3.0
# 启动容器
docker run --name paddle -it -p 8010:8010 -v $PWD:/paddle registry.baidubce.com/paddlepaddle/paddle:2.3.0 /bin/bash
# 进入容器
cd /paddle
# 安装依赖
pip install -r requirements
# 启动服务
python main --port 8010
```
### 前端docker
前端docker直接使用[node官方的docker](https://hub.docker.com/_/node)即可
```shell
docker pull node
```
镜像中安装依赖
```shell
cd PaddleSpeechWebClient
# 映射外部8011端口
docker run -it -p 8011:8011 -v $PWD:/paddle node:latest bin/bash
# 进入容器中
cd /paddle
# 安装依赖
yarn install
# 启动前端
yarn dev --port 8011
```
## FAQ
#### Q: 如何安装node.js
A: node.js的安装可以参考[【菜鸟教程】](https://www.runoob.com/nodejs/nodejs-install-setup.html), 确保npm可用
#### Q:后端如果部署在其它机器或者别的端口如何修改
A:后端的配置地址有分散在两个文件中
修改第一个文件`PaddleSpeechWebClient/vite.config.js`
```json
server: {
host: "0.0.0.0",
proxy: {
"/api": {
target: "http://localhost:8010", // 这里改成后端所在接口
changeOrigin: true,
rewrite: (path) => path.replace(/^\/api/, ""),
},
},
}
```
修改第二个文件`PaddleSpeechWebClient/src/api/API.js`(Websocket代理配置失败,所以需要在这个文件中修改)
```javascript
// websocket (这里改成后端所在的接口)
CHAT_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/offlineStream', // ChatBot websocket 接口
ASR_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/onlineStream', // Stream ASR 接口
TTS_SOCKET_RECORD: 'ws://localhost:8010/ws/tts/online', // Stream TTS 接口
```
#### Q:后端以IP地址的形式,前端无法录音
A:这里主要是游览器安全策略的限制,需要配置游览器后重启。游览器修改配置可参考[使用js-audio-recorder报浏览器不支持getUserMedia](https://blog.csdn.net/YRY_LIKE_YOU/article/details/113745273)
chrome设置地址: chrome://flags/#unsafely-treat-insecure-origin-as-secure
## 参考资料
vue实现录音参考资料:https://blog.csdn.net/qq_41619796/article/details/107865602#t1
前端流式播放音频参考仓库:
https://github.com/AnthumChris/fetch-stream-audio
https://bm.enthuses.me/buffered.php?bref=6677
# This is the parameter configuration file for streaming tts server.
#################################################################################
# SERVER SETTING #
#################################################################################
host: 0.0.0.0
port: 8092
# The task format in the engin_list is: <speech task>_<engine type>
# engine_list choices = ['tts_online', 'tts_online-onnx'], the inference speed of tts_online-onnx is faster than tts_online.
# protocol choices = ['websocket', 'http']
protocol: 'http'
engine_list: ['tts_online-onnx']
#################################################################################
# ENGINE CONFIG #
#################################################################################
################################### TTS #########################################
################### speech task: tts; engine_type: online #######################
tts_online:
# am (acoustic model) choices=['fastspeech2_csmsc', 'fastspeech2_cnndecoder_csmsc']
# fastspeech2_cnndecoder_csmsc support streaming am infer.
am: 'fastspeech2_csmsc'
am_config:
am_ckpt:
am_stat:
phones_dict:
tones_dict:
speaker_dict:
spk_id: 0
# voc (vocoder) choices=['mb_melgan_csmsc, hifigan_csmsc']
# Both mb_melgan_csmsc and hifigan_csmsc support streaming voc inference
voc: 'mb_melgan_csmsc'
voc_config:
voc_ckpt:
voc_stat:
# others
lang: 'zh'
device: 'cpu' # set 'gpu:id' or 'cpu'
# am_block and am_pad only for fastspeech2_cnndecoder_onnx model to streaming am infer,
# when am_pad set 12, streaming synthetic audio is the same as non-streaming synthetic audio
am_block: 72
am_pad: 12
# voc_pad and voc_block voc model to streaming voc infer,
# when voc model is mb_melgan_csmsc, voc_pad set 14, streaming synthetic audio is the same as non-streaming synthetic audio; The minimum value of pad can be set to 7, streaming synthetic audio sounds normal
# when voc model is hifigan_csmsc, voc_pad set 19, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
voc_block: 36
voc_pad: 14
#################################################################################
# ENGINE CONFIG #
#################################################################################
################################### TTS #########################################
################### speech task: tts; engine_type: online-onnx #######################
tts_online-onnx:
# am (acoustic model) choices=['fastspeech2_csmsc_onnx', 'fastspeech2_cnndecoder_csmsc_onnx']
# fastspeech2_cnndecoder_csmsc_onnx support streaming am infer.
am: 'fastspeech2_cnndecoder_csmsc_onnx'
# am_ckpt is a list, if am is fastspeech2_cnndecoder_csmsc_onnx, am_ckpt = [encoder model, decoder model, postnet model];
# if am is fastspeech2_csmsc_onnx, am_ckpt = [ckpt model];
am_ckpt: # list
am_stat:
phones_dict:
tones_dict:
speaker_dict:
spk_id: 0
am_sample_rate: 24000
am_sess_conf:
device: "cpu" # set 'gpu:id' or 'cpu'
use_trt: False
cpu_threads: 4
# voc (vocoder) choices=['mb_melgan_csmsc_onnx, hifigan_csmsc_onnx']
# Both mb_melgan_csmsc_onnx and hifigan_csmsc_onnx support streaming voc inference
voc: 'hifigan_csmsc_onnx'
voc_ckpt:
voc_sample_rate: 24000
voc_sess_conf:
device: "cpu" # set 'gpu:id' or 'cpu'
use_trt: False
cpu_threads: 4
# others
lang: 'zh'
# am_block and am_pad only for fastspeech2_cnndecoder_onnx model to streaming am infer,
# when am_pad set 12, streaming synthetic audio is the same as non-streaming synthetic audio
am_block: 72
am_pad: 12
# voc_pad and voc_block voc model to streaming voc infer,
# when voc model is mb_melgan_csmsc_onnx, voc_pad set 14, streaming synthetic audio is the same as non-streaming synthetic audio; The minimum value of pad can be set to 7, streaming synthetic audio sounds normal
# when voc model is hifigan_csmsc_onnx, voc_pad set 19, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
voc_block: 36
voc_pad: 14
# voc_upsample should be same as n_shift on voc config.
voc_upsample: 300
......@@ -7,8 +7,8 @@ host: 0.0.0.0
port: 8090
# The task format in the engin_list is: <speech task>_<engine type>
# task choices = ['asr_online', 'tts_online']
# protocol = ['websocket', 'http'] (only one can be selected).
# task choices = ['asr_online']
# protocol = ['websocket'] (only one can be selected).
# websocket only support online engine type.
protocol: 'websocket'
engine_list: ['asr_online']
......@@ -21,17 +21,18 @@ engine_list: ['asr_online']
################################### ASR #########################################
################### speech task: asr; engine_type: online #######################
asr_online:
model_type: 'deepspeech2online_aishell'
model_type: 'conformer_online_wenetspeech'
am_model: # the pdmodel file of am static model [optional]
am_params: # the pdiparams file of am static model [optional]
lang: 'zh'
sample_rate: 16000
cfg_path:
decode_method:
num_decoding_left_chunks:
force_yes: True
device: # cpu or gpu:id
device: 'cpu' # cpu or gpu:id
decode_method: "attention_rescoring"
continuous_decoding: True # enable continue decoding when endpoint detected
num_decoding_left_chunks: 16
am_predictor_conf:
device: # set 'gpu:id' or 'cpu'
switch_ir_optim: True
......@@ -39,11 +40,9 @@ asr_online:
summary: True # False -> do not show predictor config
chunk_buffer_conf:
frame_duration_ms: 80
shift_ms: 40
sample_rate: 16000
sample_width: 2
window_n: 7 # frame
shift_n: 4 # frame
window_ms: 20 # ms
window_ms: 25 # ms
shift_ms: 10 # ms
sample_rate: 16000
sample_width: 2
# todo:
# 1. 开启服务
# 2. 接收录音音频,返回识别结果
# 3. 接收ASR识别结果,返回NLP对话结果
# 4. 接收NLP对话结果,返回TTS音频
import base64
import yaml
import os
import json
import datetime
import librosa
import soundfile as sf
import numpy as np
import argparse
import uvicorn
import aiofiles
from typing import Optional, List
from pydantic import BaseModel
from fastapi import FastAPI, Header, File, UploadFile, Form, Cookie, WebSocket, WebSocketDisconnect
from fastapi.responses import StreamingResponse
from starlette.responses import FileResponse
from starlette.middleware.cors import CORSMiddleware
from starlette.requests import Request
from starlette.websockets import WebSocketState as WebSocketState
from src.AudioManeger import AudioMannger
from src.util import *
from src.robot import Robot
from src.WebsocketManeger import ConnectionManager
from src.SpeechBase.vpr import VPR
from paddlespeech.server.engine.asr.online.asr_engine import PaddleASRConnectionHanddler
from paddlespeech.server.utils.audio_process import float2pcm
# 解析配置
parser = argparse.ArgumentParser(
prog='PaddleSpeechDemo', add_help=True)
parser.add_argument(
"--port",
action="store",
type=int,
help="port of the app",
default=8010,
required=False)
args = parser.parse_args()
port = args.port
# 配置文件
tts_config = "conf/tts_online_application.yaml"
asr_config = "conf/ws_conformer_wenetspeech_application_faster.yaml"
asr_init_path = "source/demo/demo.wav"
db_path = "source/db/vpr.sqlite"
ie_model_path = "source/model"
# 路径配置
UPLOAD_PATH = "source/vpr"
WAV_PATH = "source/wav"
base_sources = [
UPLOAD_PATH, WAV_PATH
]
for path in base_sources:
os.makedirs(path, exist_ok=True)
# 初始化
app = FastAPI()
chatbot = Robot(asr_config, tts_config, asr_init_path, ie_model_path=ie_model_path)
manager = ConnectionManager()
aumanager = AudioMannger(chatbot)
aumanager.init()
vpr = VPR(db_path, dim = 192, top_k = 5)
# 服务配置
class NlpBase(BaseModel):
chat: str
class TtsBase(BaseModel):
text: str
class Audios:
def __init__(self) -> None:
self.audios = b""
audios = Audios()
######################################################################
########################### ASR 服务 #################################
#####################################################################
# 接收文件,返回ASR结果
# 上传文件
@app.post("/asr/offline")
async def speech2textOffline(files: List[UploadFile]):
# 只有第一个有效
asr_res = ""
for file in files[:1]:
# 生成时间戳
now_name = "asr_offline_" + datetime.datetime.strftime(datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav"
out_file_path = os.path.join(WAV_PATH, now_name)
async with aiofiles.open(out_file_path, 'wb') as out_file:
content = await file.read() # async read
await out_file.write(content) # async write
# 返回ASR识别结果
asr_res = chatbot.speech2text(out_file_path)
return SuccessRequest(result=asr_res)
# else:
# return ErrorRequest(message="文件不是.wav格式")
return ErrorRequest(message="上传文件为空")
# 接收文件,同时将wav强制转成16k, int16类型
@app.post("/asr/offlinefile")
async def speech2textOfflineFile(files: List[UploadFile]):
# 只有第一个有效
asr_res = ""
for file in files[:1]:
# 生成时间戳
now_name = "asr_offline_" + datetime.datetime.strftime(datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav"
out_file_path = os.path.join(WAV_PATH, now_name)
async with aiofiles.open(out_file_path, 'wb') as out_file:
content = await file.read() # async read
await out_file.write(content) # async write
# 将文件转成16k, 16bit类型的wav文件
wav, sr = librosa.load(out_file_path, sr=16000)
wav = float2pcm(wav) # float32 to int16
wav_bytes = wav.tobytes() # to bytes
wav_base64 = base64.b64encode(wav_bytes).decode('utf8')
# 将文件重新写入
now_name = now_name[:-4] + "_16k" + ".wav"
out_file_path = os.path.join(WAV_PATH, now_name)
sf.write(out_file_path,wav,16000)
# 返回ASR识别结果
asr_res = chatbot.speech2text(out_file_path)
response_res = {
"asr_result": asr_res,
"wav_base64": wav_base64
}
return SuccessRequest(result=response_res)
return ErrorRequest(message="上传文件为空")
# 流式接收测试
@app.post("/asr/online1")
async def speech2textOnlineRecive(files: List[UploadFile]):
audio_bin = b''
for file in files:
content = await file.read()
audio_bin += content
audios.audios += audio_bin
print(f"audios长度变化: {len(audios.audios)}")
return SuccessRequest(message="接收成功")
# 采集环境噪音大小
@app.post("/asr/collectEnv")
async def collectEnv(files: List[UploadFile]):
for file in files[:1]:
content = await file.read() # async read
# 初始化, wav 前44字节是头部信息
aumanager.compute_env_volume(content[44:])
vad_ = aumanager.vad_threshold
return SuccessRequest(result=vad_,message="采集环境噪音成功")
# 停止录音
@app.get("/asr/stopRecord")
async def stopRecord():
audios.audios = b""
aumanager.stop()
print("Online录音暂停")
return SuccessRequest(message="停止成功")
# 恢复录音
@app.get("/asr/resumeRecord")
async def resumeRecord():
aumanager.resume()
print("Online录音恢复")
return SuccessRequest(message="Online录音恢复")
# 聊天用的ASR
@app.websocket("/ws/asr/offlineStream")
async def websocket_endpoint(websocket: WebSocket):
await manager.connect(websocket)
try:
while True:
asr_res = None
# websocket 不接收,只推送
data = await websocket.receive_bytes()
if not aumanager.is_pause:
asr_res = aumanager.stream_asr(data)
else:
print("录音暂停")
if asr_res:
await manager.send_personal_message(asr_res, websocket)
aumanager.clear_asr()
except WebSocketDisconnect:
manager.disconnect(websocket)
# await manager.broadcast(f"用户-{user}-离开")
# print(f"用户-{user}-离开")
# Online识别的ASR
@app.websocket('/ws/asr/onlineStream')
async def websocket_endpoint(websocket: WebSocket):
"""PaddleSpeech Online ASR Server api
Args:
websocket (WebSocket): the websocket instance
"""
#1. the interface wait to accept the websocket protocal header
# and only we receive the header, it establish the connection with specific thread
await websocket.accept()
#2. if we accept the websocket headers, we will get the online asr engine instance
engine = chatbot.asr.engine
#3. each websocket connection, we will create an PaddleASRConnectionHanddler to process such audio
# and each connection has its own connection instance to process the request
# and only if client send the start signal, we create the PaddleASRConnectionHanddler instance
connection_handler = None
try:
#4. we do a loop to process the audio package by package according the protocal
# and only if the client send finished signal, we will break the loop
while True:
# careful here, changed the source code from starlette.websockets
# 4.1 we wait for the client signal for the specific action
assert websocket.application_state == WebSocketState.CONNECTED
message = await websocket.receive()
websocket._raise_on_disconnect(message)
#4.2 text for the action command and bytes for pcm data
if "text" in message:
# we first parse the specific command
message = json.loads(message["text"])
if 'signal' not in message:
resp = {"status": "ok", "message": "no valid json data"}
await websocket.send_json(resp)
# start command, we create the PaddleASRConnectionHanddler instance to process the audio data
# end command, we process the all the last audio pcm and return the final result
# and we break the loop
if message['signal'] == 'start':
resp = {"status": "ok", "signal": "server_ready"}
# do something at begining here
# create the instance to process the audio
# connection_handler = chatbot.asr.connection_handler
connection_handler = PaddleASRConnectionHanddler(engine)
await websocket.send_json(resp)
elif message['signal'] == 'end':
# reset single engine for an new connection
# and we will destroy the connection
connection_handler.decode(is_finished=True)
connection_handler.rescoring()
asr_results = connection_handler.get_result()
connection_handler.reset()
resp = {
"status": "ok",
"signal": "finished",
'result': asr_results
}
await websocket.send_json(resp)
break
else:
resp = {"status": "ok", "message": "no valid json data"}
await websocket.send_json(resp)
elif "bytes" in message:
# bytes for the pcm data
message = message["bytes"]
print("###############")
print("len message: ", len(message))
print("###############")
# we extract the remained audio pcm
# and decode for the result in this package data
connection_handler.extract_feat(message)
connection_handler.decode(is_finished=False)
asr_results = connection_handler.get_result()
# return the current period result
# if the engine create the vad instance, this connection will have many period results
resp = {'result': asr_results}
print(resp)
await websocket.send_json(resp)
except WebSocketDisconnect:
pass
######################################################################
########################### NLP 服务 #################################
#####################################################################
@app.post("/nlp/chat")
async def chatOffline(nlp_base:NlpBase):
chat = nlp_base.chat
if not chat:
return ErrorRequest(message="传入文本为空")
else:
res = chatbot.chat(chat)
return SuccessRequest(result=res)
@app.post("/nlp/ie")
async def ieOffline(nlp_base:NlpBase):
nlp_text = nlp_base.chat
if not nlp_text:
return ErrorRequest(message="传入文本为空")
else:
res = chatbot.ie(nlp_text)
return SuccessRequest(result=res)
######################################################################
########################### TTS 服务 #################################
#####################################################################
@app.post("/tts/offline")
async def text2speechOffline(tts_base:TtsBase):
text = tts_base.text
if not text:
return ErrorRequest(message="文本为空")
else:
now_name = "tts_"+ datetime.datetime.strftime(datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav"
out_file_path = os.path.join(WAV_PATH, now_name)
# 保存为文件,再转成base64传输
chatbot.text2speech(text, outpath=out_file_path)
with open(out_file_path, "rb") as f:
data_bin = f.read()
base_str = base64.b64encode(data_bin)
return SuccessRequest(result=base_str)
# http流式TTS
@app.post("/tts/online")
async def stream_tts(request_body: TtsBase):
text = request_body.text
return StreamingResponse(chatbot.text2speechStreamBytes(text=text))
# ws流式TTS
@app.websocket("/ws/tts/online")
async def stream_ttsWS(websocket: WebSocket):
await manager.connect(websocket)
try:
while True:
text = await websocket.receive_text()
# 用 websocket 流式接收音频数据
if text:
for sub_wav in chatbot.text2speechStream(text=text):
# print("发送sub wav: ", len(sub_wav))
res = {
"wav": sub_wav,
"done": False
}
await websocket.send_json(res)
# 输送结束
res = {
"wav": sub_wav,
"done": True
}
await websocket.send_json(res)
# manager.disconnect(websocket)
except WebSocketDisconnect:
manager.disconnect(websocket)
######################################################################
########################### VPR 服务 #################################
#####################################################################
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"])
@app.post('/vpr/enroll')
async def vpr_enroll(table_name: str=None,
spk_id: str=Form(...),
audio: UploadFile=File(...)):
# Enroll the uploaded audio with spk-id into MySQL
try:
if not spk_id:
return {'status': False, 'msg': "spk_id can not be None"}
# Save the upload data to server.
content = await audio.read()
now_name = "vpr_enroll_" + datetime.datetime.strftime(datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav"
audio_path = os.path.join(UPLOAD_PATH, now_name)
with open(audio_path, "wb+") as f:
f.write(content)
vpr.vpr_enroll(username=spk_id, wav_path=audio_path)
return {'status': True, 'msg': "Successfully enroll data!"}
except Exception as e:
return {'status': False, 'msg': e}
@app.post('/vpr/recog')
async def vpr_recog(request: Request,
table_name: str=None,
audio: UploadFile=File(...)):
# Voice print recognition online
# try:
# Save the upload data to server.
content = await audio.read()
now_name = "vpr_query_" + datetime.datetime.strftime(datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav"
query_audio_path = os.path.join(UPLOAD_PATH, now_name)
with open(query_audio_path, "wb+") as f:
f.write(content)
spk_ids, paths, scores = vpr.do_search_vpr(query_audio_path)
res = dict(zip(spk_ids, zip(paths, scores)))
# Sort results by distance metric, closest distances first
res = sorted(res.items(), key=lambda item: item[1][1], reverse=True)
return res
# except Exception as e:
# return {'status': False, 'msg': e}, 400
@app.post('/vpr/del')
async def vpr_del(spk_id: dict=None):
# Delete a record by spk_id in MySQL
try:
spk_id = spk_id['spk_id']
if not spk_id:
return {'status': False, 'msg': "spk_id can not be None"}
vpr.vpr_del(username=spk_id)
return {'status': True, 'msg': "Successfully delete data!"}
except Exception as e:
return {'status': False, 'msg': e}, 400
@app.get('/vpr/list')
async def vpr_list():
# Get all records in MySQL
try:
spk_ids, vpr_ids = vpr.do_list()
return spk_ids, vpr_ids
except Exception as e:
return {'status': False, 'msg': e}, 400
@app.get('/vpr/database64')
async def vpr_database64(vprId: int):
# Get the audio file from path by spk_id in MySQL
try:
if not vprId:
return {'status': False, 'msg': "vpr_id can not be None"}
audio_path = vpr.do_get_wav(vprId)
# 返回base64
# 将文件转成16k, 16bit类型的wav文件
wav, sr = librosa.load(audio_path, sr=16000)
wav = float2pcm(wav) # float32 to int16
wav_bytes = wav.tobytes() # to bytes
wav_base64 = base64.b64encode(wav_bytes).decode('utf8')
return SuccessRequest(result=wav_base64)
except Exception as e:
return {'status': False, 'msg': e}, 400
@app.get('/vpr/data')
async def vpr_data(vprId: int):
# Get the audio file from path by spk_id in MySQL
try:
if not vprId:
return {'status': False, 'msg': "vpr_id can not be None"}
audio_path = vpr.do_get_wav(vprId)
return FileResponse(audio_path)
except Exception as e:
return {'status': False, 'msg': e}, 400
if __name__ == '__main__':
uvicorn.run(app=app, host='0.0.0.0', port=port)
aiofiles
fastapi
librosa
numpy
pydantic
scikit_learn
SoundFile
starlette
uvicorn
paddlepaddle
paddlespeech
paddlenlp
faiss-cpu
python-multipart
\ No newline at end of file
import imp
from queue import Queue
import numpy as np
import os
import wave
import random
import datetime
from .util import randName
class AudioMannger:
def __init__(self, robot, frame_length=160, frame=10, data_width=2, vad_default = 300):
# 二进制 pcm 流
self.audios = b''
self.asr_result = ""
# Speech 核心主体
self.robot = robot
self.file_dir = "source"
os.makedirs(self.file_dir, exist_ok=True)
self.vad_deafult = vad_default
self.vad_threshold = vad_default
self.vad_threshold_path = os.path.join(self.file_dir, "vad_threshold.npy")
# 10ms 一帧
self.frame_length = frame_length
# 10帧,检测一次 vad
self.frame = frame
# int 16, 两个bytes
self.data_width = data_width
# window
self.window_length = frame_length * frame * data_width
# 是否开始录音
self.on_asr = False
self.silence_cnt = 0
self.max_silence_cnt = 4
self.is_pause = False # 录音暂停与恢复
def init(self):
if os.path.exists(self.vad_threshold_path):
# 平均响度文件存在
self.vad_threshold = np.load(self.vad_threshold_path)
def clear_audio(self):
# 清空 pcm 累积片段与 asr 识别结果
self.audios = b''
def clear_asr(self):
self.asr_result = ""
def compute_chunk_volume(self, start_index, pcm_bins):
# 根据帧长计算能量平均值
pcm_bin = pcm_bins[start_index: start_index + self.window_length]
# 转成 numpy
pcm_np = np.frombuffer(pcm_bin, np.int16)
# 归一化 + 计算响度
x = pcm_np.astype(np.float32)
x = np.abs(x)
return np.mean(x)
def is_speech(self, start_index, pcm_bins):
# 检查是否没
if start_index > len(pcm_bins):
return False
# 检查从这个 start 开始是否为静音帧
energy = self.compute_chunk_volume(start_index=start_index, pcm_bins=pcm_bins)
# print(energy)
if energy > self.vad_threshold:
return True
else:
return False
def compute_env_volume(self, pcm_bins):
max_energy = 0
start = 0
while start < len(pcm_bins):
energy = self.compute_chunk_volume(start_index=start, pcm_bins=pcm_bins)
if energy > max_energy:
max_energy = energy
start += self.window_length
self.vad_threshold = max_energy + 100 if max_energy > self.vad_deafult else self.vad_deafult
# 保存成文件
np.save(self.vad_threshold_path, self.vad_threshold)
print(f"vad 阈值大小: {self.vad_threshold}")
print(f"环境采样保存: {os.path.realpath(self.vad_threshold_path)}")
def stream_asr(self, pcm_bin):
# 先把 pcm_bin 送进去做端点检测
start = 0
while start < len(pcm_bin):
if self.is_speech(start_index=start, pcm_bins=pcm_bin):
self.on_asr = True
self.silence_cnt = 0
print("录音中")
self.audios += pcm_bin[ start : start + self.window_length]
else:
if self.on_asr:
self.silence_cnt += 1
if self.silence_cnt > self.max_silence_cnt:
self.on_asr = False
self.silence_cnt = 0
# 录音停止
print("录音停止")
# audios 保存为 wav, 送入 ASR
if len(self.audios) > 2 * 16000:
file_path = os.path.join(self.file_dir, "asr_" + datetime.datetime.strftime(datetime.datetime.now(), '%Y%m%d%H%M%S') + randName() + ".wav")
self.save_audio(file_path=file_path)
self.asr_result = self.robot.speech2text(file_path)
self.clear_audio()
return self.asr_result
else:
# 正常接收
print("录音中 静音")
self.audios += pcm_bin[ start : start + self.window_length]
start += self.window_length
return ""
def save_audio(self, file_path):
print("保存音频")
wf = wave.open(file_path, 'wb') # 创建一个音频文件,名字为“01.wav"
wf.setnchannels(1) # 设置声道数为2
wf.setsampwidth(2) # 设置采样深度为
wf.setframerate(16000) # 设置采样率为16000
# 将数据写入创建的音频文件
wf.writeframes(self.audios)
# 写完后将文件关闭
wf.close()
def end(self):
# audios 保存为 wav, 送入 ASR
file_path = os.path.join(self.file_dir, "asr.wav")
self.save_audio(file_path=file_path)
return self.robot.speech2text(file_path)
def stop(self):
self.is_pause = True
self.audios = b''
def resume(self):
self.is_pause = False
if __name__ == '__main__':
from robot import Robot
chatbot = Robot()
chatbot.init()
audio_manger = AudioMannger(chatbot)
file_list = [
"source/20220418145230qbenc.pcm",
]
for file in file_list:
with open(file, "rb") as f:
pcm_bin = f.read()
print(len(pcm_bin))
asr_ = audio_manger.stream_asr(pcm_bin=pcm_bin)
print(asr_)
print(audio_manger.end())
print(chatbot.speech2text("source/20220418145230zrxia.wav"))
\ No newline at end of file
from re import sub
import numpy as np
import paddle
import librosa
import soundfile
from paddlespeech.server.engine.asr.online.asr_engine import ASREngine
from paddlespeech.server.engine.asr.online.asr_engine import PaddleASRConnectionHanddler
from paddlespeech.server.utils.config import get_config
def readWave(samples):
x_len = len(samples)
chunk_size = 85 * 16 #80ms, sample_rate = 16kHz
if x_len % chunk_size != 0:
padding_len_x = chunk_size - x_len % chunk_size
else:
padding_len_x = 0
padding = np.zeros((padding_len_x), dtype=samples.dtype)
padded_x = np.concatenate([samples, padding], axis=0)
assert (x_len + padding_len_x) % chunk_size == 0
num_chunk = (x_len + padding_len_x) / chunk_size
num_chunk = int(num_chunk)
for i in range(0, num_chunk):
start = i * chunk_size
end = start + chunk_size
x_chunk = padded_x[start:end]
yield x_chunk
class ASR:
def __init__(self, config_path, ) -> None:
self.config = get_config(config_path)['asr_online']
self.engine = ASREngine()
self.engine.init(self.config)
self.connection_handler = PaddleASRConnectionHanddler(self.engine)
def offlineASR(self, samples, sample_rate=16000):
x_chunk, x_chunk_lens = self.engine.preprocess(samples=samples, sample_rate=sample_rate)
self.engine.run(x_chunk, x_chunk_lens)
result = self.engine.postprocess()
self.engine.reset()
return result
def onlineASR(self, samples:bytes=None, is_finished=False):
if not is_finished:
# 流式开始
self.connection_handler.extract_feat(samples)
self.connection_handler.decode(is_finished)
asr_results = self.connection_handler.get_result()
return asr_results
else:
# 流式结束
self.connection_handler.decode(is_finished=True)
self.connection_handler.rescoring()
asr_results = self.connection_handler.get_result()
self.connection_handler.reset()
return asr_results
if __name__ == '__main__':
config_path = r"../../PaddleSpeech/paddlespeech/server/conf/ws_conformer_application.yaml"
wav_path = r"../../source/demo/demo_16k.wav"
samples, sample_rate = soundfile.read(wav_path, dtype='int16')
asr = ASR(config_path=config_path)
end_result = asr.offlineASR(samples=samples, sample_rate=sample_rate)
print("端到端识别结果:", end_result)
for sub_wav in readWave(samples=samples):
# print(sub_wav)
message = sub_wav.tobytes()
offline_result = asr.onlineASR(message, is_finished=False)
print("流式识别结果: ", offline_result)
offline_result = asr.onlineASR(is_finished=True)
print("流式识别结果: ", offline_result)
\ No newline at end of file
from paddlenlp import Taskflow
class NLP:
def __init__(self, ie_model_path=None):
schema = ["时间", "出发地", "目的地", "费用"]
if ie_model_path:
self.ie_model = Taskflow("information_extraction",
schema=schema, task_path=ie_model_path)
else:
self.ie_model = Taskflow("information_extraction",
schema=schema)
self.dialogue_model = Taskflow("dialogue")
def chat(self, text):
result = self.dialogue_model([text])
return result[0]
def ie(self, text):
result = self.ie_model(text)
return result
if __name__ == '__main__':
ie_model_path = "../../source/model/"
nlp = NLP(ie_model_path=ie_model_path)
text = "今天早上我从大牛坊去百度科技园花了七百块钱"
print(nlp.ie(text))
\ No newline at end of file
import base64
import sqlite3
import os
import numpy as np
from pkg_resources import resource_stream
def dict_factory(cursor, row):
d = {}
for idx, col in enumerate(cursor.description):
d[col[0]] = row[idx]
return d
class DataBase(object):
def __init__(self, db_path:str):
db_path = os.path.realpath(db_path)
if os.path.exists(db_path):
self.db_path = db_path
else:
db_path_dir = os.path.dirname(db_path)
os.makedirs(db_path_dir, exist_ok=True)
self.db_path = db_path
self.conn = sqlite3.connect(self.db_path)
self.conn.row_factory = dict_factory
self.cursor = self.conn.cursor()
self.init_database()
def init_database(self):
"""
初始化数据库, 若表不存在则创建
"""
sql = """
CREATE TABLE IF NOT EXISTS vprtable (
`id` INTEGER PRIMARY KEY AUTOINCREMENT,
`username` TEXT NOT NULL,
`vector` TEXT NOT NULL,
`wavpath` TEXT NOT NULL
);
"""
self.cursor.execute(sql)
self.conn.commit()
def execute_base(self, sql, data_dict):
self.cursor.execute(sql, data_dict)
self.conn.commit()
def insert_one(self, username, vector_base64:str, wav_path):
if not os.path.exists(wav_path):
return None, "wav not exists"
else:
sql = f"""
insert into
vprtable (username, vector, wavpath)
values (?, ?, ?)
"""
try:
self.cursor.execute(sql, (username, vector_base64, wav_path))
self.conn.commit()
lastidx = self.cursor.lastrowid
return lastidx, "data insert success"
except Exception as e:
print(e)
return None, e
def select_all(self):
sql = """
SELECT * from vprtable
"""
result = self.cursor.execute(sql).fetchall()
return result
def select_by_id(self, vpr_id):
sql = f"""
SELECT * from vprtable WHERE `id` = {vpr_id}
"""
result = self.cursor.execute(sql).fetchall()
return result
def select_by_username(self, username):
sql = f"""
SELECT * from vprtable WHERE `username` = '{username}'
"""
result = self.cursor.execute(sql).fetchall()
return result
def drop_by_username(self, username):
sql = f"""
DELETE from vprtable WHERE `username`='{username}'
"""
self.cursor.execute(sql)
self.conn.commit()
def drop_all(self):
sql = f"""
DELETE from vprtable
"""
self.cursor.execute(sql)
self.conn.commit()
def drop_table(self):
sql = f"""
DROP TABLE vprtable
"""
self.cursor.execute(sql)
self.conn.commit()
def encode_vector(self, vector:np.ndarray):
return base64.b64encode(vector).decode('utf8')
def decode_vector(self, vector_base64, dtype=np.float32):
b = base64.b64decode(vector_base64)
vc = np.frombuffer(b, dtype=dtype)
return vc
if __name__ == '__main__':
db_path = "../../source/db/vpr.sqlite"
db = DataBase(db_path)
# 准备数据
import numpy as np
vector = np.random.randn((192)).astype(np.float32).tobytes()
vector_base64 = base64.b64encode(vector).decode('utf8')
username = "sss"
wav_path = r"../../source/demo/demo_16k.wav"
# 插入数据
db.insert_one(username, vector_base64, wav_path)
# 查询数据
res_all = db.select_all()
print("res_all: ", res_all)
s_id = res_all[0]['id']
res_id = db.select_by_id(s_id)
print("res_id: ", res_id)
res_uername = db.select_by_username(username)
print("res_username: ", res_uername)
# base64还原
b = base64.b64decode(res_uername[0]['vector'])
vc = np.frombuffer(b, dtype=np.float32)
print(vc)
# 删除数据
db.drop_by_username(username)
res_all = db.select_all()
print("删除后 res_all: ", res_all)
db.drop_all()
\ No newline at end of file
# tts 推理引擎,支持流式与非流式
# 精简化使用
# 用 onnxruntime 进行推理
# 1. 下载对应的模型
# 2. 加载模型
# 3. 端到端推理
# 4. 流式推理
import base64
import numpy as np
from paddlespeech.server.utils.onnx_infer import get_sess
from paddlespeech.t2s.frontend.zh_frontend import Frontend
from paddlespeech.server.utils.util import denorm, get_chunks
from paddlespeech.server.utils.audio_process import float2pcm
from paddlespeech.server.utils.config import get_config
from paddlespeech.server.engine.tts.online.onnx.tts_engine import TTSEngine
class TTS:
def __init__(self, config_path):
self.config = get_config(config_path)['tts_online-onnx']
self.config['voc_block'] = 36
self.engine = TTSEngine()
self.engine.init(self.config)
self.engine.warm_up()
# 前端初始化
self.frontend = Frontend(
phone_vocab_path=self.engine.executor.phones_dict,
tone_vocab_path=None)
def depadding(self, data, chunk_num, chunk_id, block, pad, upsample):
"""
Streaming inference removes the result of pad inference
"""
front_pad = min(chunk_id * block, pad)
# first chunk
if chunk_id == 0:
data = data[:block * upsample]
# last chunk
elif chunk_id == chunk_num - 1:
data = data[front_pad * upsample:]
# middle chunk
else:
data = data[front_pad * upsample:(front_pad + block) * upsample]
return data
def offlineTTS(self, text):
get_tone_ids = False
merge_sentences = False
input_ids = self.frontend.get_input_ids(
text,
merge_sentences=merge_sentences,
get_tone_ids=get_tone_ids)
phone_ids = input_ids["phone_ids"]
wav_list = []
for i in range(len(phone_ids)):
orig_hs = self.engine.executor.am_encoder_infer_sess.run(
None, input_feed={'text': phone_ids[i].numpy()}
)
hs = orig_hs[0]
am_decoder_output = self.engine.executor.am_decoder_sess.run(
None, input_feed={'xs': hs})
am_postnet_output = self.engine.executor.am_postnet_sess.run(
None,
input_feed={
'xs': np.transpose(am_decoder_output[0], (0, 2, 1))
})
am_output_data = am_decoder_output + np.transpose(
am_postnet_output[0], (0, 2, 1))
normalized_mel = am_output_data[0][0]
mel = denorm(normalized_mel, self.engine.executor.am_mu, self.engine.executor.am_std)
wav = self.engine.executor.voc_sess.run(
output_names=None, input_feed={'logmel': mel})[0]
wav_list.append(wav)
wavs = np.concatenate(wav_list)
return wavs
def streamTTS(self, text):
for sub_wav_base64 in self.engine.run(sentence=text):
yield sub_wav_base64
def streamTTSBytes(self, text):
for wav in self.engine.executor.infer(
text=text,
lang=self.engine.config.lang,
am=self.engine.config.am,
spk_id=0):
wav = float2pcm(wav) # float32 to int16
wav_bytes = wav.tobytes() # to bytes
yield wav_bytes
def after_process(self, wav):
# for tvm
wav = float2pcm(wav) # float32 to int16
wav_bytes = wav.tobytes() # to bytes
wav_base64 = base64.b64encode(wav_bytes).decode('utf8') # to base64
return wav_base64
def streamTTS_TVM(self, text):
# 用 TVM 优化
pass
if __name__ == '__main__':
text = "啊哈哈哈哈哈哈啊哈哈哈哈哈哈啊哈哈哈哈哈哈啊哈哈哈哈哈哈啊哈哈哈哈哈哈"
config_path="../../PaddleSpeech/demos/streaming_tts_server/conf/tts_online_application.yaml"
tts = TTS(config_path)
for sub_wav in tts.streamTTS(text):
print("sub_wav_base64: ", len(sub_wav))
end_wav = tts.offlineTTS(text)
print(end_wav)
\ No newline at end of file
# vpr Demo 没有使用 mysql 与 muilvs, 仅用于docker演示
import logging
import faiss
from matplotlib import use
import numpy as np
from .sql_helper import DataBase
from .vpr_encode import get_audio_embedding
class VPR:
def __init__(self, db_path, dim, top_k) -> None:
# 初始化
self.db_path = db_path
self.dim = dim
self.top_k = top_k
self.dtype = np.float32
self.vpr_idx = 0
# db 初始化
self.db = DataBase(db_path)
# faiss 初始化
index_ip = faiss.IndexFlatIP(dim)
self.index_ip = faiss.IndexIDMap(index_ip)
self.init()
def init(self):
# demo 初始化,把 mysql中的向量注册到 faiss 中
sql_dbs = self.db.select_all()
if sql_dbs:
for sql_db in sql_dbs:
idx = sql_db['id']
vc_bs64 = sql_db['vector']
vc = self.db.decode_vector(vc_bs64)
if len(vc.shape) == 1:
vc = np.expand_dims(vc, axis=0)
# 构建数据库
self.index_ip.add_with_ids(vc, np.array((idx,)).astype('int64'))
logging.info("faiss 构建完毕")
def faiss_enroll(self, idx, vc):
self.index_ip.add_with_ids(vc, np.array((idx,)).astype('int64'))
def vpr_enroll(self, username, wav_path):
# 注册声纹
emb = get_audio_embedding(wav_path)
emb = np.expand_dims(emb, axis=0)
if emb is not None:
emb_bs64 = self.db.encode_vector(emb)
last_idx, mess = self.db.insert_one(username, emb_bs64, wav_path)
if last_idx:
# faiss 注册
self.faiss_enroll(last_idx, emb)
else:
last_idx, mess = None
return last_idx
def vpr_recog(self, wav_path):
# 识别声纹
emb_search = get_audio_embedding(wav_path)
if emb_search is not None:
emb_search = np.expand_dims(emb_search, axis=0)
D, I = self.index_ip.search(emb_search, self.top_k)
D = D.tolist()[0]
I = I.tolist()[0]
return [(round(D[i] * 100, 2 ), I[i]) for i in range(len(D)) if I[i] != -1]
else:
logging.error("识别失败")
return None
def do_search_vpr(self, wav_path):
spk_ids, paths, scores = [], [], []
recog_result = self.vpr_recog(wav_path)
for score, idx in recog_result:
username = self.db.select_by_id(idx)[0]['username']
if username not in spk_ids:
spk_ids.append(username)
scores.append(score)
paths.append("")
return spk_ids, paths, scores
def vpr_del(self, username):
# 根据用户username, 删除声纹
# 查用户ID,删除对应向量
res = self.db.select_by_username(username)
for r in res:
idx = r['id']
self.index_ip.remove_ids(np.array((idx,)).astype('int64'))
self.db.drop_by_username(username)
def vpr_list(self):
# 获取数据列表
return self.db.select_all()
def do_list(self):
spk_ids, vpr_ids = [], []
for res in self.db.select_all():
spk_ids.append(res['username'])
vpr_ids.append(res['id'])
return spk_ids, vpr_ids
def do_get_wav(self, vpr_idx):
res = self.db.select_by_id(vpr_idx)
return res[0]['wavpath']
def vpr_data(self, idx):
# 获取对应ID的数据
res = self.db.select_by_id(idx)
return res
def vpr_droptable(self):
# 删除表
self.db.drop_table()
# 清空 faiss
self.index_ip.reset()
if __name__ == '__main__':
db_path = "../../source/db/vpr.sqlite"
dim = 192
top_k = 5
vpr = VPR(db_path, dim, top_k)
# 准备测试数据
username = "sss"
wav_path = r"../../source/demo/demo_16k.wav"
# 注册声纹
vpr.vpr_enroll(username, wav_path)
# 获取数据
print(vpr.vpr_list())
# 识别声纹
recolist = vpr.vpr_recog(wav_path)
print(recolist)
# 通过 id 获取数据
idx = recolist[0][1]
print(vpr.vpr_data(idx))
# 删除声纹
vpr.vpr_del(username)
vpr.vpr_droptable()
\ No newline at end of file
from paddlespeech.cli import VectorExecutor
import numpy as np
import logging
vector_executor = VectorExecutor()
def get_audio_embedding(path):
"""
Use vpr_inference to generate embedding of audio
"""
try:
embedding = vector_executor(
audio_file=path, model='ecapatdnn_voxceleb12')
embedding = embedding / np.linalg.norm(embedding)
return embedding
except Exception as e:
logging.error(f"Error with embedding:{e}")
return None
if __name__ == '__main__':
audio_path = r"../../source/demo/demo_16k.wav"
emb = get_audio_embedding(audio_path)
print(emb.shape)
print(emb.dtype)
print(type(emb))
\ No newline at end of file
from typing import List
from fastapi import WebSocket
class ConnectionManager:
def __init__(self):
# 存放激活的ws连接对象
self.active_connections: List[WebSocket] = []
async def connect(self, ws: WebSocket):
# 等待连接
await ws.accept()
# 存储ws连接对象
self.active_connections.append(ws)
def disconnect(self, ws: WebSocket):
# 关闭时 移除ws对象
self.active_connections.remove(ws)
@staticmethod
async def send_personal_message(message: str, ws: WebSocket):
# 发送个人消息
await ws.send_text(message)
async def broadcast(self, message: str):
# 广播消息
for connection in self.active_connections:
await connection.send_text(message)
manager = ConnectionManager()
\ No newline at end of file
from paddlespeech.cli.asr.infer import ASRExecutor
import soundfile as sf
import os
import librosa
from src.SpeechBase.asr import ASR
from src.SpeechBase.tts import TTS
from src.SpeechBase.nlp import NLP
class Robot:
def __init__(self, asr_config, tts_config,asr_init_path,
ie_model_path=None) -> None:
self.nlp = NLP(ie_model_path=ie_model_path)
self.asr = ASR(config_path=asr_config)
self.tts = TTS(config_path=tts_config)
self.tts_sample_rate = 24000
self.asr_sample_rate = 16000
# 流式识别效果不如端到端的模型,这里流式模型与端到端模型分开
self.asr_model = ASRExecutor()
self.asr_name = "conformer_wenetspeech"
self.warm_up_asrmodel(asr_init_path)
def warm_up_asrmodel(self, asr_init_path):
if not os.path.exists(asr_init_path):
path_dir = os.path.dirname(asr_init_path)
if not os.path.exists(path_dir):
os.makedirs(path_dir, exist_ok=True)
# TTS生成,采样率24000
text = "生成初始音频"
self.text2speech(text, asr_init_path)
# asr model初始化
self.asr_model(asr_init_path, model=self.asr_name,lang='zh',
sample_rate=16000)
def speech2text(self, audio_file):
self.asr_model.preprocess(self.asr_name, audio_file)
self.asr_model.infer(self.asr_name)
res = self.asr_model.postprocess()
return res
def text2speech(self, text, outpath):
wav = self.tts.offlineTTS(text)
sf.write(
outpath, wav, samplerate=self.tts_sample_rate)
res = wav
return res
def text2speechStream(self, text):
for sub_wav_base64 in self.tts.streamTTS(text=text):
yield sub_wav_base64
def text2speechStreamBytes(self, text):
for wav_bytes in self.tts.streamTTSBytes(text=text):
yield wav_bytes
def chat(self, text):
result = self.nlp.chat(text)
return result
def ie(self, text):
result = self.nlp.ie(text)
return result
if __name__ == '__main__':
tts_config = "../PaddleSpeech/demos/streaming_tts_server/conf/tts_online_application.yaml"
asr_config = "../PaddleSpeech/demos/streaming_asr_server/conf/ws_conformer_application.yaml"
demo_wav = "../source/demo/demo_16k.wav"
ie_model_path = "../source/model"
tts_wav = "../source/demo/tts.wav"
text = "今天天气真不错"
ie_text = "今天晚上我从大牛坊出发去三里屯花了六十五块钱"
robot = Robot(asr_config, tts_config, asr_init_path=demo_wav)
res = robot.speech2text(demo_wav)
print(res)
res = robot.chat(text)
print(res)
print("tts offline")
robot.text2speech(res, tts_wav)
print("ie test")
res = robot.ie(ie_text)
print(res)
\ No newline at end of file
import random
def randName(n=5):
return "".join(random.sample('zyxwvutsrqponmlkjihgfedcba',n))
def SuccessRequest(result=None, message="ok"):
return {
"code": 0,
"result":result,
"message": message
}
def ErrorRequest(result=None, message="error"):
return {
"code": -1,
"result":result,
"message": message
}
\ No newline at end of file
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
lerna-debug.log*
node_modules
dist
dist-ssr
*.local
# Editor directories and files
.vscode/*
!.vscode/extensions.json
.idea
.DS_Store
*.suo
*.ntvs*
*.njsproj
*.sln
*.sw?
.vscode/*
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>飞桨PaddleSpeech</title>
</head>
<body>
<div id="app"></div>
<script type="module" src="/src/main.js"></script>
</body>
</html>
此差异已折叠。
{
"name": "paddlespeechwebclient",
"private": true,
"version": "0.0.0",
"scripts": {
"dev": "vite",
"build": "vite build",
"preview": "vite preview"
},
"dependencies": {
"ant-design-vue": "^2.2.8",
"axios": "^0.26.1",
"element-plus": "^2.1.9",
"js-audio-recorder": "0.5.7",
"lamejs": "^1.2.1",
"less": "^4.1.2",
"vue": "^3.2.25"
},
"devDependencies": {
"@vitejs/plugin-vue": "^2.3.0",
"vite": "^2.9.0"
}
}
<script setup>
import Experience from './components/Experience.vue'
import Header from './components/Content/Header/Header.vue'
</script>
<template>
<div class="app">
<Header></Header>
<Experience></Experience>
</div>
</template>
<style style="less">
.app {
background: url("assets/image/在线体验-背景@2x.png") no-repeat;
};
</style>
export const apiURL = {
ASR_OFFLINE : '/api/asr/offline', // 获取离线语音识别结果
ASR_COLLECT_ENV : '/api/asr/collectEnv', // 采集环境噪音
ASR_STOP_RECORD : '/api/asr/stopRecord', // 后端暂停录音
ASR_RESUME_RECORD : '/api/asr/resumeRecord',// 后端恢复录音
NLP_CHAT : '/api/nlp/chat', // NLP闲聊接口
NLP_IE : '/api/nlp/ie', // 信息抽取接口
TTS_OFFLINE : '/api/tts/offline', // 获取TTS音频
VPR_RECOG : '/api/vpr/recog', // 声纹识别接口,返回声纹对比相似度
VPR_ENROLL : '/api/vpr/enroll', // 声纹识别注册接口
VPR_LIST : '/api/vpr/list', // 获取声纹注册的数据列表
VPR_DEL : '/api/vpr/del', // 删除用户声纹
VPR_DATA : '/api/vpr/database64?vprId=', // 获取声纹注册数据 bs64格式
// websocket
CHAT_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/offlineStream', // ChatBot websocket 接口
ASR_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/onlineStream', // Stream ASR 接口
TTS_SOCKET_RECORD: 'ws://localhost:8010/ws/tts/online', // Stream TTS 接口
}
import axios from 'axios'
import {apiURL} from "./API.js"
// 上传音频文件,获得识别结果
export async function asrOffline(params){
const result = await axios.post(
apiURL.ASR_OFFLINE, params
)
return result
}
// 上传环境采集文件
export async function asrCollentEnv(params){
const result = await axios.post(
apiURL.ASR_OFFLINE, params
)
return result
}
// 暂停录音
export async function asrStopRecord(){
const result = await axios.get(apiURL.ASR_STOP_RECORD);
return result
}
// 恢复录音
export async function asrResumeRecord(){
const result = await axios.get(apiURL.ASR_RESUME_RECORD);
return result
}
\ No newline at end of file
import axios from 'axios'
import {apiURL} from "./API.js"
// 获取闲聊对话结果
export async function nlpChat(text){
const result = await axios.post(apiURL.NLP_CHAT, { chat : text});
return result
}
// 获取信息抽取结果
export async function nlpIE(text){
const result = await axios.post(apiURL.NLP_IE, { chat : text});
return result
}
import axios from 'axios'
import {apiURL} from "./API.js"
export async function ttsOffline(text){
const result = await axios.post(apiURL.TTS_OFFLINE, { text : text});
return result
}
import axios from 'axios'
import {apiURL} from "./API.js"
// 注册声纹
export async function vprEnroll(params){
const result = await axios.post(apiURL.VPR_ENROLL, params);
return result
}
// 声纹识别
export async function vprRecog(params){
const result = await axios.post(apiURL.VPR_RECOG, params);
return result
}
// 删除声纹
export async function vprDel(params){
const result = await axios.post(apiURL.VPR_DEL, params);
return result
}
// 获取声纹列表
export async function vprList(){
const result = await axios.get(apiURL.VPR_LIST);
return result
}
// 获取声纹音频
export async function vprData(params){
const result = await axios.get(apiURL.VPR_DATA+params);
return result
}
<svg xmlns="http://www.w3.org/2000/svg" width="50" height="50" viewBox="0 0 50 50">
<g fill="none" fill-rule="evenodd">
<rect width="50" height="50" opacity="0"/>
<path fill="#FFF" fill-rule="nonzero" d="M10.5625,26.375 L10.5625,37.375 L39.4375,37.375 L39.4375,26.375 L42.1875,26.375 L42.1875,40.125 L7.8125,40.125 L7.8125,26.375 L10.5625,26.375 Z M24.9193012,9.30543065 L32.8422855,17.1477673 L30.9077145,19.1022327 L26.3745,14.6154306 L26.375,29.125 L23.625,29.125 L23.6245,14.5224306 L19.1022838,19.0922338 L17.1477162,17.1577662 L24.9193012,9.30543065 Z"/>
</g>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="50" height="50" viewBox="0 0 50 50">
<g fill="#FFF" fill-rule="evenodd">
<rect width="50" height="50" opacity="0"/>
<path d="M18.625,5.7 C19.2739346,5.7 19.8,6.22606542 19.8,6.875 L19.8,42.125 C19.8,42.7739346 19.2739346,43.3 18.625,43.3 C17.9760654,43.3 17.45,42.7739346 17.45,42.125 L17.45,6.875 C17.45,6.22606542 17.9760654,5.7 18.625,5.7 Z M30.375,10.4 C31.0239346,10.4 31.55,10.9260654 31.55,11.575 L31.55,37.425 C31.55,38.0739346 31.0239346,38.6 30.375,38.6 C29.7260654,38.6 29.2,38.0739346 29.2,37.425 L29.2,11.575 C29.2,10.9260654 29.7260654,10.4 30.375,10.4 Z M6.875,15.1 C7.52393458,15.1 8.05,15.6260654 8.05,16.275 L8.05,32.725 C8.05,33.3739346 7.52393458,33.9 6.875,33.9 C6.22606542,33.9 5.7,33.3739346 5.7,32.725 L5.7,16.275 C5.7,15.6260654 6.22606542,15.1 6.875,15.1 Z M42.125,17.45 C42.7739346,17.45 43.3,17.9760654 43.3,18.625 L43.3,30.375 C43.3,31.0239346 42.7739346,31.55 42.125,31.55 C41.4760654,31.55 40.95,31.0239346 40.95,30.375 L40.95,18.625 C40.95,17.9760654 41.4760654,17.45 42.125,17.45 Z"/>
</g>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="50" height="50" viewBox="0 0 50 50">
<g fill="#FFF" fill-rule="evenodd">
<rect width="50" height="50" fill="none"/>
<path fill-rule="nonzero" d="M41.4485655,21.2539772 C42.1315264,21.2850061 42.6598822,21.8638177 42.6289611,22.5468326 C42.6247768,22.6388278 42.6185963,22.7404533 42.6102079,22.8512273 L42.5782082,23.2105123 L42.5782082,23.2105123 L42.5316934,23.6217955 L42.5316934,23.6217955 L42.4693948,24.0821848 L42.4693948,24.0821848 L42.3900439,24.5887883 L42.3900439,24.5887883 L42.292372,25.1387138 C42.2744962,25.2338175 42.2558041,25.3306058 42.2362693,25.4290185 C41.8143833,27.555069 41.1316382,29.6828464 40.1241953,31.6800323 C37.4291788,37.0229123 32.9261483,40.3971985 26.4086979,40.8900674 L25.9987324,40.9171116 L25.9987324,45.4234882 L36.4808016,45.4234882 C37.1644101,45.4234882 37.7186683,45.9777464 37.7186683,46.661355 C37.7186683,47.3023391 37.2315273,47.8294468 36.6073586,47.8928314 L36.4808016,47.8992217 L13.1797237,47.8992217 C12.4960073,47.8992217 11.941857,47.3450714 11.941857,46.661355 C11.941857,46.020472 12.4289031,45.4932758 13.0531489,45.4298797 L13.1797237,45.4234882 L23.5229989,45.4234882 L23.5229989,40.9094487 C16.8529053,40.4933909 12.2580826,37.0999016 9.52429608,31.6800323 C8.5167992,29.6828464 7.83410805,27.5550691 7.41222208,25.4290185 L7.30490754,24.8585165 L7.30490754,24.8585165 L7.21653999,24.3298905 L7.21653999,24.3298905 L7.1458579,23.8460326 L7.1458579,23.8460326 L7.09159974,23.4098348 L7.09159974,23.4098348 L7.052504,23.0241892 L7.052504,23.0241892 L7.02730915,22.6919879 C7.02419833,22.6412354 7.02161415,22.5928302 7.01953033,22.5468326 C6.98839343,21.8638177 7.5168571,21.2850061 8.19987204,21.2539772 C8.84009734,21.2248875 9.38883251,21.6875394 9.4804906,22.3081826 L9.52089639,22.8033886 L9.52089639,22.8033886 L9.55106194,23.0957484 L9.55106194,23.0957484 C9.61279606,23.6520033 9.70707015,24.274849 9.84046771,24.9470712 C10.2215574,26.8673593 10.837172,28.7858665 11.7346375,30.5650942 C14.2485231,35.5489392 18.4280434,38.4929132 24.8242187,38.5130415 C31.2204481,38.4929132 35.3998065,35.5489392 37.9138,30.5650942 C38.8112655,28.7858665 39.4267722,26.8673053 39.8078618,24.9470712 C39.9413133,24.274849 40.0356414,23.6520033 40.0973756,23.0957484 L40.1383001,22.683441 L40.1383001,22.683441 L40.15571,22.4343189 L40.15571,22.4343189 C40.1868469,21.7514119 40.7656585,21.2229482 41.4485655,21.2539772 Z M24.7277861,1.03431811 C30.2652292,1.03431811 34.7717072,5.45158401 34.9203849,10.9435284 L34.924173,11.2236897 L34.924173,24.2016207 C34.924173,29.8291412 30.3475899,34.3909923 24.7277861,34.3909923 C19.1903431,34.3909923 14.6838651,29.9738829 14.5351873,24.4817898 L14.5313993,24.2016206 L14.5313993,11.2236897 C14.5313993,5.59627708 19.1078745,1.03431811 24.7277861,1.03431811 Z M24.7278401,3.51005152 C20.5523235,3.51005152 17.1406309,6.83661824 17.0109562,10.9790926 L17.0071327,11.2236897 L17.0071327,24.2016206 C17.0071327,28.4575531 20.4658637,31.9152588 24.7278401,31.9152588 C28.9033567,31.9152588 32.3150493,28.5887959 32.444724,24.4462237 L32.4485475,24.2016206 L32.4485475,11.2236897 C32.4485475,6.96786511 28.9898165,3.51005152 24.7278401,3.51005152 Z"/>
</g>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 20 20">
<g fill="#FFF" fill-rule="evenodd">
<rect width="20" height="20" opacity="0"/>
<path fill-rule="nonzero" d="M17.2545788,8.38095607 C17.5371833,8.39379564 17.7558133,8.63330387 17.7430184,8.91593074 L17.7371151,9.01650414 L17.7371151,9.01650414 L17.7143664,9.26243626 L17.7143664,9.26243626 L17.675151,9.56380287 L17.675151,9.56380287 L17.6172162,9.91546885 C17.6058754,9.97798618 17.5936607,10.0423853 17.5805252,10.1085594 C17.4059517,10.9883044 17.1234365,11.868764 16.7065636,12.6951858 C15.608809,14.8714882 13.7861076,16.2584571 11.1569912,16.495803 L10.8615444,16.5174255 L10.8615444,18.3821331 L15.1989524,18.3821331 C15.4818249,18.3821331 15.7111731,18.6114813 15.7111731,18.8943538 C15.7111731,19.1458357 15.5299597,19.3549563 15.2910197,19.3983228 L15.1989524,19.4065745 L5.55712706,19.4065745 C5.2742099,19.4065745 5.04490634,19.1772709 5.04490634,18.8943538 C5.04490634,18.6429116 5.22608446,18.4337601 5.46504803,18.3903863 L5.55712706,18.3821331 L9.83710301,18.382133 L9.83710301,16.5142546 C7.07706426,16.3420928 5.1757583,14.9378903 4.04453631,12.6951858 C3.62764105,11.868764 3.34514816,10.9883044 3.17057465,10.1085594 L3.13388183,9.91546885 L3.13388183,9.91546885 L3.07593716,9.56380287 L3.07593716,9.56380287 L3.03671385,9.26243626 L3.03671385,9.26243626 L3.01397193,9.01650414 C3.01143062,8.98042028 3.00948271,8.94686015 3.00808152,8.91593074 C2.99519728,8.63330387 3.2138719,8.39379564 3.49649877,8.38095607 C3.77908098,8.36811648 4.01858921,8.58679112 4.03142877,8.86937333 L4.04579166,9.04965974 L4.04579166,9.04965974 L4.05561184,9.14306831 C4.08115699,9.37324275 4.12016696,9.63097201 4.17536595,9.90913293 C4.33305822,10.7037349 4.5877953,11.4975999 4.95916033,12.2338321 C5.99938887,14.2961128 7.72884553,15.5143089 10.3755388,15.5226379 C13.0222544,15.5143089 14.7516441,14.2961128 15.7919173,12.2338321 C16.1632823,11.4975999 16.4179747,10.7037126 16.575667,9.90913293 C16.6124812,9.7236923 16.6421003,9.54733242 16.6653248,9.38216386 L16.7052821,9.04965974 L16.7052821,9.04965974 L16.7196041,8.86937333 L16.7196041,8.86937333 C16.7324884,8.58679115 16.9719966,8.3681165 17.2545788,8.38095607 Z M10.3356356,0.0142005962 C12.595216,0.0142005962 14.4399401,1.79169133 14.5496666,4.02028091 L14.5548302,4.23049229 L14.5548302,9.60067063 C14.5548302,11.9292998 12.6610717,13.8169623 10.3356356,13.8169623 C8.07605526,13.8169623 6.23133121,12.0395346 6.12160467,9.81088771 L6.11644109,9.60067061 L6.11644109,4.23049229 C6.11644109,1.90190776 8.01015495,0.0142005962 10.3356356,0.0142005962 Z M10.335658,1.03864201 C8.63472709,1.03864201 7.24010749,2.37267291 7.14594933,4.04955911 L7.1408825,4.23049229 L7.1408825,9.60067061 C7.1408825,11.3617461 8.57208154,12.7925209 10.335658,12.7925209 C12.0365888,12.7925209 13.4312084,11.4585316 13.5253666,9.78160809 L13.5304334,9.60067061 L13.5304334,4.23049229 C13.5304334,2.46946142 12.0992344,1.03864201 10.335658,1.03864201 Z"/>
</g>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 16 16">
<path fill="#F33E3E" d="M4.0976,1.3362 C4.4618,1.1488 4.8948,1.4234 4.8948,1.833 C4.8948,2.0362 4.7852,2.2266 4.6046,2.3194 C2.5386,3.3816 1.1214,5.5338 1.1214,8.0122 C1.1214,11.677 4.2184,14.632 7.9326,14.398 C11.1952,14.1922 13.816,11.4788 13.9156,8.2112 C13.9936,5.6502 12.5572,3.4112 10.4372,2.3204 C10.256,2.2272 10.1452,2.0376 10.1452,1.8338 C10.1452,1.422 10.5814,1.1504 10.9476,1.3392 C13.366,2.5862 15.024,5.109 15.024,8.0124 C15.024,12.3292 11.3596,15.8064 6.978,15.497 C3.3116,15.238 0.3328,12.2886 0.0406,8.6244 C-0.2116,5.4644 1.5076,2.6692 4.0976,1.3362 Z M7.52,0.004 C7.8252,0.004 8.0726,0.2514 8.0726,0.5566 L8.0726,6.3544 C8.0726,6.6596 7.8252,6.907 7.52,6.907 C7.2148,6.907 6.9674,6.6596 6.9674,6.3544 L6.9674,0.5566 C6.9674,0.2514 7.2148,0.004 7.52,0.004 Z"/>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="116" height="116" viewBox="0 0 116 116">
<g fill="none" fill-rule="evenodd">
<circle cx="58" cy="58" r="58" fill="#2932E1"/>
<path fill="#FFF" fill-rule="nonzero" d="M74.4485655,54.2539772 C75.1315264,54.2850061 75.6598822,54.8638177 75.6289611,55.5468326 C75.6247768,55.6388278 75.6185963,55.7404533 75.6102079,55.8512273 L75.5782082,56.2105123 L75.5782082,56.2105123 L75.5316934,56.6217955 L75.5316934,56.6217955 L75.4693948,57.0821848 L75.4693948,57.0821848 L75.3900439,57.5887883 L75.3900439,57.5887883 L75.292372,58.1387138 C75.2744962,58.2338175 75.2558041,58.3306058 75.2362693,58.4290185 C74.8143833,60.555069 74.1316382,62.6828464 73.1241953,64.6800323 C70.4291788,70.0229123 65.9261483,73.3971985 59.4086979,73.8900674 L58.9987324,73.9171116 L58.9987324,78.4234882 L69.4808016,78.4234882 C70.1644101,78.4234882 70.7186683,78.9777464 70.7186683,79.661355 C70.7186683,80.3023391 70.2315273,80.8294468 69.6073586,80.8928314 L69.4808016,80.8992217 L46.1797237,80.8992217 C45.4960073,80.8992217 44.941857,80.3450714 44.941857,79.661355 C44.941857,79.020472 45.4289031,78.4932758 46.0531489,78.4298797 L46.1797237,78.4234882 L56.5229989,78.4234882 L56.5229989,73.9094487 C49.8529053,73.4933909 45.2580826,70.0999016 42.5242961,64.6800323 C41.5167992,62.6828464 40.834108,60.5550691 40.4122221,58.4290185 L40.3049075,57.8585165 L40.3049075,57.8585165 L40.21654,57.3298905 L40.21654,57.3298905 L40.1458579,56.8460326 L40.1458579,56.8460326 L40.0915997,56.4098348 L40.0915997,56.4098348 L40.052504,56.0241892 L40.052504,56.0241892 L40.0273091,55.6919879 C40.0241983,55.6412354 40.0216142,55.5928302 40.0195303,55.5468326 C39.9883934,54.8638177 40.5168571,54.2850061 41.199872,54.2539772 C41.8400973,54.2248875 42.3888325,54.6875394 42.4804906,55.3081826 L42.5208964,55.8033886 L42.5208964,55.8033886 L42.5510619,56.0957484 L42.5510619,56.0957484 C42.6127961,56.6520033 42.7070702,57.274849 42.8404677,57.9470712 C43.2215574,59.8673593 43.837172,61.7858665 44.7346375,63.5650942 C47.2485231,68.5489392 51.4280434,71.4929132 57.8242187,71.5130415 C64.2204481,71.4929132 68.3998065,68.5489392 70.9138,63.5650942 C71.8112655,61.7858665 72.4267722,59.8673053 72.8078618,57.9470712 C72.9413133,57.274849 73.0356414,56.6520033 73.0973756,56.0957484 L73.1383001,55.683441 L73.1383001,55.683441 L73.15571,55.4343189 L73.15571,55.4343189 C73.1868469,54.7514119 73.7656585,54.2229482 74.4485655,54.2539772 Z M57.7277861,34.0343181 C63.2652292,34.0343181 67.7717072,38.451584 67.9203849,43.9435284 L67.924173,44.2236897 L67.924173,57.2016207 C67.924173,62.8291412 63.3475899,67.3909923 57.7277861,67.3909923 C52.1903431,67.3909923 47.6838651,62.9738829 47.5351873,57.4817898 L47.5313993,57.2016206 L47.5313993,44.2236897 C47.5313993,38.5962771 52.1078745,34.0343181 57.7277861,34.0343181 Z M57.7278401,36.5100515 C53.5523235,36.5100515 50.1406309,39.8366182 50.0109562,43.9790926 L50.0071327,44.2236897 L50.0071327,57.2016206 C50.0071327,61.4575531 53.4658637,64.9152588 57.7278401,64.9152588 C61.9033567,64.9152588 65.3150493,61.5887959 65.444724,57.4462237 L65.4485475,57.2016206 L65.4485475,44.2236897 C65.4485475,39.9678651 61.9898165,36.5100515 57.7278401,36.5100515 Z"/>
</g>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="116" height="116" viewBox="0 0 116 116">
<g fill="none" fill-rule="evenodd">
<circle cx="58" cy="58" r="58" fill="#7278F5"/>
<path fill="#FFF" fill-rule="nonzero" d="M74.4485655,54.2539772 C75.1315264,54.2850061 75.6598822,54.8638177 75.6289611,55.5468326 C75.6247768,55.6388278 75.6185963,55.7404533 75.6102079,55.8512273 L75.5782082,56.2105123 L75.5782082,56.2105123 L75.5316934,56.6217955 L75.5316934,56.6217955 L75.4693948,57.0821848 L75.4693948,57.0821848 L75.3900439,57.5887883 L75.3900439,57.5887883 L75.292372,58.1387138 C75.2744962,58.2338175 75.2558041,58.3306058 75.2362693,58.4290185 C74.8143833,60.555069 74.1316382,62.6828464 73.1241953,64.6800323 C70.4291788,70.0229123 65.9261483,73.3971985 59.4086979,73.8900674 L58.9987324,73.9171116 L58.9987324,78.4234882 L69.4808016,78.4234882 C70.1644101,78.4234882 70.7186683,78.9777464 70.7186683,79.661355 C70.7186683,80.3023391 70.2315273,80.8294468 69.6073586,80.8928314 L69.4808016,80.8992217 L46.1797237,80.8992217 C45.4960073,80.8992217 44.941857,80.3450714 44.941857,79.661355 C44.941857,79.020472 45.4289031,78.4932758 46.0531489,78.4298797 L46.1797237,78.4234882 L56.5229989,78.4234882 L56.5229989,73.9094487 C49.8529053,73.4933909 45.2580826,70.0999016 42.5242961,64.6800323 C41.5167992,62.6828464 40.834108,60.5550691 40.4122221,58.4290185 L40.3049075,57.8585165 L40.3049075,57.8585165 L40.21654,57.3298905 L40.21654,57.3298905 L40.1458579,56.8460326 L40.1458579,56.8460326 L40.0915997,56.4098348 L40.0915997,56.4098348 L40.052504,56.0241892 L40.052504,56.0241892 L40.0273091,55.6919879 C40.0241983,55.6412354 40.0216142,55.5928302 40.0195303,55.5468326 C39.9883934,54.8638177 40.5168571,54.2850061 41.199872,54.2539772 C41.8400973,54.2248875 42.3888325,54.6875394 42.4804906,55.3081826 L42.5208964,55.8033886 L42.5208964,55.8033886 L42.5510619,56.0957484 L42.5510619,56.0957484 C42.6127961,56.6520033 42.7070702,57.274849 42.8404677,57.9470712 C43.2215574,59.8673593 43.837172,61.7858665 44.7346375,63.5650942 C47.2485231,68.5489392 51.4280434,71.4929132 57.8242187,71.5130415 C64.2204481,71.4929132 68.3998065,68.5489392 70.9138,63.5650942 C71.8112655,61.7858665 72.4267722,59.8673053 72.8078618,57.9470712 C72.9413133,57.274849 73.0356414,56.6520033 73.0973756,56.0957484 L73.1383001,55.683441 L73.1383001,55.683441 L73.15571,55.4343189 L73.15571,55.4343189 C73.1868469,54.7514119 73.7656585,54.2229482 74.4485655,54.2539772 Z M57.7277861,34.0343181 C63.2652292,34.0343181 67.7717072,38.451584 67.9203849,43.9435284 L67.924173,44.2236897 L67.924173,57.2016207 C67.924173,62.8291412 63.3475899,67.3909923 57.7277861,67.3909923 C52.1903431,67.3909923 47.6838651,62.9738829 47.5351873,57.4817898 L47.5313993,57.2016206 L47.5313993,44.2236897 C47.5313993,38.5962771 52.1078745,34.0343181 57.7277861,34.0343181 Z M57.7278401,36.5100515 C53.5523235,36.5100515 50.1406309,39.8366182 50.0109562,43.9790926 L50.0071327,44.2236897 L50.0071327,57.2016206 C50.0071327,61.4575531 53.4658637,64.9152588 57.7278401,64.9152588 C61.9033567,64.9152588 65.3150493,61.5887959 65.444724,57.4462237 L65.4485475,57.2016206 L65.4485475,44.2236897 C65.4485475,39.9678651 61.9898165,36.5100515 57.7278401,36.5100515 Z"/>
</g>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="10" height="12" viewBox="0 0 10 12">
<polygon fill="#FFF" fill-rule="evenodd" points="29 16 39 21.765 29 28" transform="translate(-29 -16)"/>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="10" height="12" viewBox="0 0 10 12">
<path fill="#FFF" fill-rule="evenodd" d="M31,17 L31,29 L29,29 L29,17 L31,17 Z M39,17 L39,29 L37,29 L37,17 L39,17 Z" transform="translate(-29 -17)"/>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="16" height="16" viewBox="0 0 16 16">
<defs>
<rect id="ic_更换示例-a" width="16" height="16" x="0" y="0"/>
</defs>
<g fill="none" fill-rule="evenodd" transform="matrix(-1 0 0 1 16 0)">
<mask id="ic_更换示例-b" fill="#fff">
<use xlink:href="#ic_更换示例-a"/>
</mask>
<path fill="#2932E1" fill-rule="nonzero" d="M6.35459401,0.717547671 L7.1160073,1.36581444 L5.76391165,2.95149486 C8.45440978,1.82595599 11.6186236,2.72687193 13.331374,5.17293307 C15.3274726,8.02365719 14.6537425,11.9415081 11.8236048,13.9231918 C8.99346706,15.9048756 5.08146225,15.1979908 3.08536373,12.3472667 C2.43380077,11.4167384 2.05175569,10.3497586 1.95954347,9.24373118 L1.95954347,9.24373118 L1.91800137,8.74545992 L2.9145439,8.66237572 L2.956086,9.16064698 C3.03368894,10.0914452 3.35506892,10.9889989 3.90451578,11.7736903 C5.58491905,14.1735549 8.873856,14.7678536 11.2500283,13.1040398 C13.6262007,11.440226 14.1926253,8.14637409 12.512222,5.74650951 C11.0401872,3.64422594 8.29699921,2.89825126 6.0091042,3.93534448 L6.11200137,3.89054767 L7.69316988,4.63120811 L7.26907888,5.53682768 L3.68173666,3.85691748 L6.35459401,0.717547671 Z" mask="url(#ic_更换示例-b)"/>
</g>
</svg>
<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 20 20">
<g fill="#FFF" fill-rule="evenodd">
<rect width="20" height="20" opacity="0"/>
<path d="M7.5,2 C7.77614237,2 8,2.22385763 8,2.5 L8,17.5 C8,17.7761424 7.77614237,18 7.5,18 C7.22385763,18 7,17.7761424 7,17.5 L7,2.5 C7,2.22385763 7.22385763,2 7.5,2 Z M12.5,4 C12.7761424,4 13,4.22385763 13,4.5 L13,15.5 C13,15.7761424 12.7761424,16 12.5,16 C12.2238576,16 12,15.7761424 12,15.5 L12,4.5 C12,4.22385763 12.2238576,4 12.5,4 Z M2.5,6 C2.77614237,6 3,6.22385763 3,6.5 L3,13.5 C3,13.7761424 2.77614237,14 2.5,14 C2.22385763,14 2,13.7761424 2,13.5 L2,6.5 C2,6.22385763 2.22385763,6 2.5,6 Z M17.5,7 C17.7761424,7 18,7.22385763 18,7.5 L18,12.5 C18,12.7761424 17.7761424,13 17.5,13 C17.2238576,13 17,12.7761424 17,12.5 L17,7.5 C17,7.22385763 17.2238576,7 17.5,7 Z"/>
</g>
</svg>
<?xml version="1.0" encoding="UTF-8"?>
<svg width="20px" height="20px" viewBox="0 0 20 20" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<title>icon_录制声音(小语音)</title>
<g id="页面-1" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<g id="02-声纹识别-补充状态" transform="translate(-98.000000, -216.000000)" fill="#FFFFFF">
<g id="编组-6备份" transform="translate(77.000000, 204.000000)">
<g id="icon_录制声音(小语音)" transform="translate(21.000000, 12.000000)">
<rect id="矩形" opacity="0" x="0" y="0" width="20" height="20"></rect>
<path d="M17.2545788,8.38095607 C17.5371833,8.39379564 17.7558133,8.63330387 17.7430184,8.91593074 L17.7371151,9.01650414 L17.7371151,9.01650414 L17.7143664,9.26243626 L17.7143664,9.26243626 L17.675151,9.56380287 L17.675151,9.56380287 L17.6172162,9.91546885 C17.6058754,9.97798618 17.5936607,10.0423853 17.5805252,10.1085594 C17.4059517,10.9883044 17.1234365,11.868764 16.7065636,12.6951858 C15.608809,14.8714882 13.7861076,16.2584571 11.1569912,16.495803 L10.8615444,16.5174255 L10.8615444,18.3821331 L15.1989524,18.3821331 C15.4818249,18.3821331 15.7111731,18.6114813 15.7111731,18.8943538 C15.7111731,19.1458357 15.5299597,19.3549563 15.2910197,19.3983228 L15.1989524,19.4065745 L5.55712706,19.4065745 C5.2742099,19.4065745 5.04490634,19.1772709 5.04490634,18.8943538 C5.04490634,18.6429116 5.22608446,18.4337601 5.46504803,18.3903863 L5.55712706,18.3821331 L9.83710301,18.382133 L9.83710301,16.5142546 C7.07706426,16.3420928 5.1757583,14.9378903 4.04453631,12.6951858 C3.62764105,11.868764 3.34514816,10.9883044 3.17057465,10.1085594 L3.13388183,9.91546885 L3.13388183,9.91546885 L3.07593716,9.56380287 L3.07593716,9.56380287 L3.03671385,9.26243626 L3.03671385,9.26243626 L3.01397193,9.01650414 C3.01143062,8.98042028 3.00948271,8.94686015 3.00808152,8.91593074 C2.99519728,8.63330387 3.2138719,8.39379564 3.49649877,8.38095607 C3.77908098,8.36811648 4.01858921,8.58679112 4.03142877,8.86937333 L4.04579166,9.04965974 L4.04579166,9.04965974 L4.05561184,9.14306831 C4.08115699,9.37324275 4.12016696,9.63097201 4.17536595,9.90913293 C4.33305822,10.7037349 4.5877953,11.4975999 4.95916033,12.2338321 C5.99938887,14.2961128 7.72884553,15.5143089 10.3755388,15.5226379 C13.0222544,15.5143089 14.7516441,14.2961128 15.7919173,12.2338321 C16.1632823,11.4975999 16.4179747,10.7037126 16.575667,9.90913293 C16.6124812,9.7236923 16.6421003,9.54733242 16.6653248,9.38216386 L16.7052821,9.04965974 L16.7052821,9.04965974 L16.7196041,8.86937333 L16.7196041,8.86937333 C16.7324884,8.58679115 16.9719966,8.3681165 17.2545788,8.38095607 Z M10.3356356,0.0142005962 C12.595216,0.0142005962 14.4399401,1.79169133 14.5496666,4.02028091 L14.5548302,4.23049229 L14.5548302,9.60067063 C14.5548302,11.9292998 12.6610717,13.8169623 10.3356356,13.8169623 C8.07605526,13.8169623 6.23133121,12.0395346 6.12160467,9.81088771 L6.11644109,9.60067061 L6.11644109,4.23049229 C6.11644109,1.90190776 8.01015495,0.0142005962 10.3356356,0.0142005962 Z M10.335658,1.03864201 C8.63472709,1.03864201 7.24010749,2.37267291 7.14594933,4.04955911 L7.1408825,4.23049229 L7.1408825,9.60067061 C7.1408825,11.3617461 8.57208154,12.7925209 10.335658,12.7925209 C12.0365888,12.7925209 13.4312084,11.4585316 13.5253666,9.78160809 L13.5304334,9.60067061 L13.5304334,4.23049229 C13.5304334,2.46946142 12.0992344,1.03864201 10.335658,1.03864201 Z" id="形状" fill-rule="nonzero"></path>
</g>
</g>
</g>
</g>
</svg>
\ No newline at end of file
<template>
<div className="speech_header">
<div className="speech_header_title">
飞桨-PaddleSpeech
</div>
<div className="speech_header_describe">
PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,欢迎大家Star收藏鼓励
</div>
<div className="speech_header_link_box">
<a href="https://github.com/PaddlePaddle/PaddleSpeech" className="speech_header_link" target='_blank' rel='noreferrer' key={index}>
前往Github
</a>
</div>
</div>
</template>
<script>
export default {
name:"Header"
}
</script>
<style lang="less" scoped>
@import "./style.less";
</style>
\ No newline at end of file
.speech_header {
width: 1200px;
margin: 0 auto;
padding-top: 50px;
// background: url("../../../assets/image/在线体验-背景@2x.png") no-repeat;
box-sizing: border-box;
&::after {
content: "";
display: block;
clear: both;
visibility: hidden;
}
;
// background: pink;
.speech_header_title {
height: 57px;
font-family: PingFangSC-Medium;
font-size: 38px;
color: #000000;
letter-spacing: 0;
line-height: 57px;
font-weight: 500;
margin-bottom: 15px;
}
;
.speech_header_describe {
height: 26px;
font-family: PingFangSC-Regular;
font-size: 16px;
color: #575757;
line-height: 26px;
font-weight: 400;
margin-bottom: 24px;
}
;
.speech_header_link_box {
height: 40px;
margin-bottom: 40px;
display: flex;
align-items: center;
};
.speech_header_link {
display: block;
background: #2932E1;
width: 120px;
height: 40px;
line-height: 40px;
border-radius: 20px;
font-family: PingFangSC-Medium;
font-size: 14px;
color: #FFFFFF;
text-align: center;
font-weight: 500;
margin-right: 20px;
// margin-bottom: 40px;
&:hover {
opacity: 0.9;
}
;
}
;
.speech_header_divider {
width: 1200px;
height: 1px;
background: #D1D1D1;
margin-bottom: 40px;
}
;
.speech_header_content_wrapper {
width: 1200px;
margin: 0 auto;
// background: pink;
margin-bottom: 20px;
display: flex;
justify-content: space-between;
flex-wrap: wrap;
.speech_header_module {
width: 384px;
background: #FFFFFF;
border: 1px solid rgba(224, 224, 224, 1);
box-shadow: 4px 8px 12px 0px rgba(0, 0, 0, 0.05);
border-radius: 16px;
padding: 30px 34px 0px 34px;
box-sizing: border-box;
display: flex;
margin-bottom: 40px;
.speech_header_background_img {
width: 46px;
height: 46px;
background-size: 46px 46px;
background-repeat: no-repeat;
background-position: center;
margin-right: 20px;
}
;
.speech_header_content {
padding-top: 4px;
margin-bottom: 32px;
.speech_header_module_title {
height: 26px;
font-family: PingFangSC-Medium;
font-size: 20px;
color: #000000;
letter-spacing: 0;
line-height: 26px;
font-weight: 500;
margin-bottom: 10px;
}
;
.speech_header_module_introduce {
font-family: PingFangSC-Regular;
font-size: 16px;
color: #666666;
letter-spacing: 0;
font-weight: 400;
}
;
}
;
}
;
}
;
}
;
<script setup>
import ChatT from './SubMenu/ChatBot/ChatT.vue'
import ASRT from './SubMenu/ASR/ASRT.vue'
import TTST from './SubMenu/TTS/TTST.vue'
import VPRT from './SubMenu/VPR/VPRT.vue'
import IET from './SubMenu/IE/IET.vue'
</script>
<template>
<div className="experience">
<div className="experience_wrapper">
<div className="experience_title">
功能体验
</div>
<div className="experience_describe">
体验前,请允许浏览器获取麦克风权限
</div>
<div className="experience_content" >
<el-tabs
className="experience_tabs"
type="border-card"
>
<el-tab-pane label="语音聊天" key="1">
<ChatT></ChatT>
</el-tab-pane>
<el-tab-pane label="声纹识别" key="2">
<VPRT></VPRT>
</el-tab-pane>
<el-tab-pane label="语音识别" key="3">
<ASRT></ASRT>
</el-tab-pane>
<el-tab-pane label="语音合成" key="4">
<TTST></TTST>
</el-tab-pane>
<el-tab-pane label="语音指令" key="5">
<IET></IET>
</el-tab-pane>
</el-tabs>
</div>
</div>
</div>
</template>
<style lang="less">
@import "./style.less";
</style>
\ No newline at end of file
<template>
<div class="asrbox">
<h5> ASR 体验</h5>
<div class="home" style="margin:1vw;">
<el-button :type="recoType" @click="startRecorderChunk()" style="margin:1vw;">{{ recoText }} (流式)</el-button>
<el-button :type="recoType" @click="startRecorder()" style="margin:1vw;">{{ recoText }} (端到端)</el-button>
</div>
<a> asr_stream: {{ streamAsrResult }}</a>
<br>
<a> asr_offline: {{ asrResultOffline }} </a>
</div>
</template>
<script>
import Recorder from 'js-audio-recorder'
const recorder_chunk = new Recorder({
sampleBits: 16, // 采样位数,支持 8 或 16,默认是16
sampleRate: 16000, // 采样率,支持 11025、16000、22050、24000、44100、48000,根据浏览器默认值,我的chrome是48000
numChannels: 1, // 声道,支持 1 或 2, 默认是1
compiling: true
})
const recorder = new Recorder({
sampleBits: 16, // 采样位数,支持 8 或 16,默认是16
sampleRate: 16000, // 采样率,支持 11025、16000、22050、24000、44100、48000,根据浏览器默认值,我的chrome是48000
numChannels: 1, // 声道,支持 1 或 2, 默认是1
compiling: true
})
export default {
name: "ASR",
data(){
return {
streamAsrResult: '',
recoType: "primary",
recoText: "开始录音",
playType: "success",
asrResultOffline: '',
onReco: false,
ws:'',
}
},
mounted (){
// 初始化ws
this.ws = new WebSocket("ws://localhost:8010/ws/asr/onlineStream")
// 定义消息处理逻辑
var _that = this
this.ws.addEventListener('message', function (event) {
var temp = JSON.parse(event.data);
// console.log('ws message', event.data)
if(temp.result && (temp.result != _that.streamAsrResult)){
_that.streamAsrResult = temp.result
_that.$nextTick(()=>{})
console.log('更新了')
}
})
},
methods: {
startRecorder () {
if(!this.onReco){
recorder.clear()
recorder.start().then(() => {
}, (error) => {
console.log("录音出错");
})
this.onReco = true
this.recoType = "danger"
this.recoText = "结束录音"
this.$nextTick(()=>{
})
} else {
// 结束录音
recorder.stop()
this.onReco = false
this.recoType = "primary"
this.recoText = "开始录音"
this.$nextTick(()=>{})
// 音频导出成wav,然后上传到服务器
const wavs = recorder.getWAVBlob()
this.uploadFile(wavs, "/api/asr/offline")
}
},
startRecorderChunk() {
if(!this.onReco){
// 跟后端说:开始流式传输
var start = JSON.stringify({name:"test.wav", "nbest":5, signal:"start"})
this.ws.send(start)
recorder_chunk.start().then(() => {
setInterval(() => {
// 持续录音
let newData = recorder_chunk.getNextData();
if (!newData.length) {
return;
}
// 上传到流式测试1
this.uploadChunk(newData)
}, 500)
}, (error) => {
console.log("录音出错");
})
this.onReco = true
this.recoType = "danger"
this.recoText = "结束录音"
this.$nextTick(()=>{
})
} else {
// 结束录音
recorder_chunk.stop()
// 跟后端说不录了
// var end = JSON.stringify({name:"test.wav", "nbest":5, signal:"end"})
// this.ws.send(end)
this.onReco = false
this.recoType = "primary"
this.recoText = "开始录音"
this.$nextTick(()=>{})
recorder_chunk.clear()
}
},
uploadChunk(chunkDatas){
chunkDatas.forEach((chunkData) => {
this.ws.send(chunkData)
})
},
async uploadFile(file, post_url){
const formData = new FormData()
formData.append('files', file)
const result = await this.$http.post(post_url, formData);
if (result.data.code === 0) {
this.asrResultOffline = result.data.result
this.$nextTick(()=>{})
this.$message.success(result.data.message);
} else {
this.$message.error(result.data.message);
}
},
},
}
</script>
<style lang='less' scoped>
.asrbox {
border: 4px solid #F00;
// position: fixed;
top:40%;
width: 100%;
height: 20%;
overflow: auto;
}
</style>
\ No newline at end of file
<script setup>
import AudioFileIdentification from "./AudioFile/AudioFileIdentification.vue"
import RealTime from "./RealTime/RealTime.vue"
import EndToEndIdentification from "./EndToEnd/EndToEndIdentification.vue";
</script>
<template>
<div class="speech_recognition">
<div class="speech_recognition_tabs">
<div class="frame"></div>
<el-tabs class="speech_recognition_mytabs" type="border-card">
<el-tab-pane label="实时语音识别" key="1">
<RealTime />
</el-tab-pane>
<el-tab-pane label="端到端识别" key="2">
<EndToEndIdentification />
</el-tab-pane>
<el-tab-pane label="音频文件识别" key="3">
<AudioFileIdentification />
</el-tab-pane>
</el-tabs>
</div>
</div>
</template>
<script>
export default {
}
</script>
<style lang="less" scoped>
@import "./style.less";
</style>
\ No newline at end of file
<template>
<div class="audioFileIdentification">
<div v-if="uploadStatus === 0" class="public_recognition_speech">
<!-- 上传前 -->
<el-upload
:multiple="false"
:accept="'.wav'"
:limit="1"
:auto-upload="false"
:on-change="handleChange"
:show-file-list="false"
>
<div class="upload_img">
<div class="upload_img_back"></div>
</div>
</el-upload>
<div class="speech_text">
上传文件
</div>
<div class="speech_text_prompt">
支持50秒内的.wav文件
</div>
</div>
<!-- 上传中 -->
<div v-else-if="uploadStatus === 1" class="on_the_cross_speech">
<div class="on_the_upload_img">
<div class="on_the_upload_img_back"></div>
</div>
<div class="on_the_speech_text">
<span class="on_the_speech_loading"> <Spin indicator={antIcon} /></span> 上传中
</div>
</div>
<div v-else>
<!-- // {/* //开始识别 */} -->
<div v-if="recognitionStatus === 0" class="public_recognition_speech_start">
<div class="public_recognition_speech_content">
<div
class="public_recognition_speech_title"
>
{{ filename }}
</div>
<div
class="public_recognition_speech_again"
@click="uploadAgain()"
>重新上传</div>
<div
class="public_recognition_speech_play"
@click="paly()"
>播放</div>
</div>
<div class="speech_promp"
@click="beginToIdentify()">
开始识别
</div>
</div>
<!-- // {/* 识别中 */} -->
<div v-else-if="recognitionStatus === 1" class="public_recognition_speech_identify">
<div class="public_recognition_speech_identify_box">
<div
class="public_recognition_speech_identify_back_img"
>
<a-spin />
</div>
<div
class="public_recognition__identify_the_promp"
>识别中</div>
</div>
</div>
<!-- // {/* // 重新识别 */} -->
<div v-else class="public_recognition_speech_identify_ahain">
<div class="public_recognition_speech_identify_box_btn">
<div
class="public_recognition__identify_the_btn"
@click="toIdentifyThe()"
>重新识别</div>
</div>
</div>
</div>
<!-- {/* 指向 */} -->
<div class="public_recognition_point_to">
</div>
<!-- {/* 识别结果 */} -->
<div class="public_recognition_result">
<div>识别结果</div>
<div>{{ asrResult }}</div>
</div>
</div>
</template>
<script>
import { asrOffline } from '../../../../api/ApiASR'
let audioCtx = new AudioContext({
latencyHint: 'interactive',
sampleRate: 24000,
});
export default {
name:"",
data(){
return {
uploadStatus : 0,
recognitionStatus : 0,
asrResult : "",
indicator : "",
filename: "",
upfile: ""
}
},
methods:{
// 上传文件切换
handleChange(file, fileList){
this.uploadStatus = 2
this.filename = file.name
this.upfile = file
console.log(file)
// debugger
// var result = Buffer.from(file);
},
readFile(file) {
return new Promise((resolve, reject) => {
const fileReader = new FileReader();
fileReader.onload = function () {
resolve(fileReader);
};
fileReader.onerror = function (err) {
reject(err);
};
fileReader.readAsDataURL(file);
});
},
// 重新上传
uploadAgain(){
this.uploadStatus = 0
this.upfile = ""
this.filename = ""
this.asrResult = ""
},
// 播放音频
playAudioData(wav_buffer){
audioCtx.decodeAudioData(wav_buffer, buffer => {
let source = audioCtx.createBufferSource();
source.buffer = buffer
source.connect(audioCtx.destination);
source.start();
}, function (e) {
});
},
// 播放本地音频
async paly(){
if(this.upfile){
let fileRes = ""
let fileString = ""
fileRes = await this.readFile(this.upfile.raw);
fileString = fileRes.result;
const audioBase64type = (fileString.match(/data:[^;]*;base64,/))?.[0] ?? '';
const isBase64 = !!fileString.match(/data:[^;]*;base64,/);
const uploadBase64 = fileString.substr(audioBase64type.length);
// isBase64 ? uploadBase64 : undefined
// base转换二进制数
let typedArray = this.base64ToUint8Array(isBase64 ? uploadBase64 : undefined)
this.playAudioData(typedArray.buffer)
}
},
base64ToUint8Array(base64String){
const padding = '='.repeat((4 - base64String.length % 4) % 4);
const base64 = (base64String + padding)
.replace(/-/g, '+')
.replace(/_/g, '/');
const rawData = window.atob(base64);
const outputArray = new Uint8Array(rawData.length);
for (let i = 0; i < rawData.length; ++i) {
outputArray[i] = rawData.charCodeAt(i);
}
return outputArray;
},
// 开始识别
async beginToIdentify(){
// 识别中
this.recognitionStatus = 1
const formData = new FormData();
formData.append('files', this.upfile.raw);
const result = await asrOffline(formData)
// 重新识别
this.recognitionStatus = 2
console.log(result);
// debugger
if (result.data.code === 0) {
this.$message.success("识别成功")
// 获取识别文本
this.asrResult = result.data.result
} else {
this.$message.success("识别失败")
};
},
// 重新识别
toIdentifyThe(){
// this.uploadAgain()
this.uploadStatus = 0
this.recognitionStatus = 0
this.asrResult = ""
}
}
}
</script>
<style lang="less" scoped>
@import "./style.less";
</style>
\ No newline at end of file
.audioFileIdentification {
width: 1106px;
height: 270px;
// background-color: pink;
padding-top: 40px;
box-sizing: border-box;
display: flex;
// 开始上传
.public_recognition_speech {
width: 295px;
height: 230px;
padding-top: 32px;
box-sizing: border-box;
// 开始上传
.upload_img {
width: 116px;
height: 116px;
background: #2932E1;
border-radius: 50%;
margin-left: 98px;
cursor: pointer;
margin-bottom: 20px;
display: flex;
justify-content: center;
align-items: center;
.upload_img_back {
width: 34.38px;
height: 30.82px;
background: #2932E1;
background: url("../../../../assets/image/ic_大-上传文件.svg");
background-repeat: no-repeat;
background-position: center;
background-size: 34.38px 30.82px;
cursor: pointer;
}
&:hover {
opacity: 0.9;
};
};
.speech_text {
height: 22px;
font-family: PingFangSC-Medium;
font-size: 16px;
color: #000000;
font-weight: 500;
margin-left: 124px;
margin-bottom: 10px;
};
.speech_text_prompt {
height: 20px;
font-family: PingFangSC-Regular;
font-size: 14px;
color: #999999;
font-weight: 400;
margin-left: 84px;
};
};
// 上传中
.on_the_cross_speech {
width: 295px;
height: 230px;
padding-top: 32px;
box-sizing: border-box;
.on_the_upload_img {
width: 116px;
height: 116px;
background: #7278F5;
border-radius: 50%;
margin-left: 98px;
cursor: pointer;
margin-bottom: 20px;
display: flex;
justify-content: center;
align-items: center;
.on_the_upload_img_back {
width: 34.38px;
height: 30.82px;
background: #7278F5;
background: url("../../../../assets/image/ic_大-上传文件.svg");
background-repeat: no-repeat;
background-position: center;
background-size: 34.38px 30.82px;
cursor: pointer;
};
};
.on_the_speech_text {
height: 22px;
font-family: PingFangSC-Medium;
font-size: 16px;
color: #000000;
font-weight: 500;
margin-left: 124px;
margin-bottom: 10px;
display: flex;
// justify-content: center;
align-items: center;
.on_the_speech_loading {
display: inline-block;
width: 16px;
height: 16px;
background: #7278F5;
// background: url("../../../../assets/image/ic_开始聊天.svg");
// background-repeat: no-repeat;
// background-position: center;
// background-size: 16px 16px;
margin-right: 8px;
};
};
};
//开始识别
.public_recognition_speech_start {
width: 295px;
height: 230px;
padding-top: 32px;
box-sizing: border-box;
position: relative;
.public_recognition_speech_content {
width: 100%;
position: absolute;
top: 40px;
left: 50%;
transform: translateX(-50%);
display: flex;
justify-content: center;
align-items: center;
.public_recognition_speech_title {
height: 22px;
font-family: PingFangSC-Regular;
font-size: 16px;
color: #000000;
font-weight: 400;
};
.public_recognition_speech_again {
height: 22px;
font-family: PingFangSC-Regular;
font-size: 16px;
color: #2932E1;
font-weight: 400;
margin-left: 30px;
cursor: pointer;
};
.public_recognition_speech_play {
height: 22px;
font-family: PingFangSC-Regular;
font-size: 16px;
color: #2932E1;
font-weight: 400;
margin-left: 20px;
cursor: pointer;
};
};
.speech_promp {
position: absolute;
top: 112px;
left: 50%;
transform: translateX(-50%);
width: 142px;
height: 44px;
background: #2932E1;
border-radius: 22px;
font-family: PingFangSC-Medium;
font-size: 14px;
color: #FFFFFF;
text-align: center;
line-height: 44px;
font-weight: 500;
cursor: pointer;
};
};
// 识别中
.public_recognition_speech_identify {
width: 295px;
height: 230px;
padding-top: 32px;
box-sizing: border-box;
position: relative;
.public_recognition_speech_identify_box {
width: 143px;
height: 44px;
background: #7278F5;
border-radius: 22px;
position: absolute;
top: 50%;
left: 50%;
transform: translate(-50%,-50%);
display: flex;
justify-content: center;
align-items: center;
cursor: pointer;
.public_recognition_speech_identify_back_img {
width: 16px;
height: 16px;
// background: #7278F5;
// background: url("../../../../assets/image/ic_开始聊天.svg");
// background-repeat: no-repeat;
// background-position: center;
// background-size: 16px 16px;
};
.public_recognition__identify_the_promp {
height: 20px;
font-family: PingFangSC-Medium;
font-size: 14px;
color: #FFFFFF;
font-weight: 500;
margin-left: 12px;
};
};
};
// 重新识别
.public_recognition_speech_identify_ahain {
width: 295px;
height: 230px;
padding-top: 32px;
box-sizing: border-box;
position: relative;
cursor: pointer;
.public_recognition_speech_identify_box_btn {
width: 143px;
height: 44px;
background: #2932E1;
border-radius: 22px;
position: absolute;
top: 50%;
left: 50%;
transform: translate(-50%,-50%);
display: flex;
justify-content: center;
align-items: center;
cursor: pointer;
.public_recognition__identify_the_btn {
height: 20px;
font-family: PingFangSC-Medium;
font-size: 14px;
color: #FFFFFF;
font-weight: 500;
};
};
};
// 指向
.public_recognition_point_to {
width: 47px;
height: 67px;
background: url("../../../../assets/image/步骤-箭头切图@2x.png") no-repeat;
background-position: center;
background-size: 47px 67px;
margin-top: 91px;
margin-right: 67px;
};
// 识别结果
.public_recognition_result {
width: 680px;
height: 230px;
background: #FAFAFA;
padding: 40px 50px 0px 50px;
div {
&:nth-of-type(1) {
height: 26px;
font-family: PingFangSC-Medium;
font-size: 16px;
color: #666666;
line-height: 26px;
font-weight: 500;
margin-bottom: 20px;
};
&:nth-of-type(2) {
height: 26px;
font-family: PingFangSC-Medium;
font-size: 16px;
color: #666666;
line-height: 26px;
font-weight: 500;
};
};
};
};
\ No newline at end of file
<template>
<div class="endToEndIdentification">
<div class="public_recognition_speech">
<div v-if="onReco">
<!-- 结束录音 -->
<div @click="endRecorder()" class="endToEndIdentification_end_recorder_img">
<div class='endToEndIdentification_end_recorder_img_back'></div>
</div>
</div>
<div v-else>
<div @click="startRecorder()" class="endToEndIdentification_start_recorder_img"></div>
</div>
<div class="endToEndIdentification_prompt" >
<div v-if="onReco">
结束识别
</div>
<div v-else>
开始识别
</div>
</div>
<div class="speech_text_prompt">
停止录音后得到识别结果
</div>
</div>
<div class="public_recognition_point_to"></div>
<div class="public_recognition_result">
<div>识别结果</div>
<div> {{asrResult}} </div>
</div>
</div>
</template>
<script>
import Recorder from 'js-audio-recorder'
import { asrOffline } from '../../../../api/ApiASR'
const recorder = new Recorder({
sampleBits: 16, // 采样位数,支持 8 或 16,默认是16
sampleRate: 16000, // 采样率,支持 11025、16000、22050、24000、44100、48000,根据浏览器默认值,我的chrome是48000
numChannels: 1, // 声道,支持 1 或 2, 默认是1
compiling: true
})
export default {
data () {
return {
onReco: false,
asrResult: "",
}
},
methods: {
// 开始录音
startRecorder(){
this.onReco = true
recorder.clear()
recorder.start()
},
// 停止录音
endRecorder(){
recorder.stop()
this.onReco = false
// this.$nextTick(()=>{})
// 音频导出成wav,然后上传到服务器
const wavs = recorder.getWAVBlob()
this.uploadFile(wavs)
},
// 上传文件
async uploadFile(file){
const formData = new FormData()
formData.append('files', file)
const result = await asrOffline(formData)
if (result.data.code === 0) {
this.asrResult = result.data.result
// this.$nextTick(()=>{})
this.$message.success(result.data.message);
} else {
this.$message.error(result.data.message);
}
},
}
}
</script>
<style lang="less" scoped>
@import "./style.less";
</style>
\ No newline at end of file
此差异已折叠。
import { createApp } from 'vue'
import ElementPlus from 'element-plus'
import 'element-plus/dist/index.css'
import Antd from 'ant-design-vue';
import 'ant-design-vue/dist/antd.css';
import App from './App.vue'
import axios from 'axios'
const app = createApp(App)
app.config.globalProperties.$http = axios
app.use(ElementPlus).use(Antd)
app.mount('#app')
import { defineConfig } from 'vite'
import vue from '@vitejs/plugin-vue'
// https://vitejs.dev/config/
export default defineConfig({
plugins: [vue()],
css:
{ preprocessorOptions:
{ css:
{
charset: false
}
}
},
build: {
assetsInlineLimit: '2048' // 2kb
},
server: {
host: "0.0.0.0",
proxy: {
"/api": {
target: "http://localhost:8010",
changeOrigin: true,
rewrite: (path) => path.replace(/^\/api/, ""),
},
},
},
})
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
#!/usr/bin/python
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -11,9 +12,9 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#!/usr/bin/python
# -*- coding: UTF-8 -*-
# script for calc RTF: grep -rn RTF log.txt | awk '{print $NF}' | awk -F "=" '{sum += $NF} END {print "all time",sum, "audio num", NR, "RTF", sum/NR}'
# calc avg RTF(NOT Accurate): grep -rn RTF log.txt | awk '{print $NF}' | awk -F "=" '{sum += $NF} END {print "all time",sum, "audio num", NR, "RTF", sum/NR}'
# python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --punc.server_ip 127.0.0.1 --punc.port 8190 --wavfile ./zh.wav
# python3 websocket_client.py --server_ip 127.0.0.1 --port 8290 --wavfile ./zh.wav
import argparse
import asyncio
import codecs
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册