未验证 提交 ff7dbcc2 编写于 作者: Honei_X's avatar Honei_X 提交者: GitHub

Merge branch 'develop' into v0.3

......@@ -16,7 +16,7 @@ You can choose one way from meduim and hard to install paddlespeech.
### 2. Prepare config File
The configuration file can be found in `conf/tts_online_application.yaml`.
- `protocol` indicates the network protocol used by the streaming TTS service. Currently, both http and websocket are supported.
- `protocol` indicates the network protocol used by the streaming TTS service. Currently, both **http and websocket** are supported.
- `engine_list` indicates the speech engine that will be included in the service to be started, in the format of `<speech task>_<engine type>`.
- This demo mainly introduces the streaming speech synthesis service, so the speech task should be set to `tts`.
- the engine type supports two forms: **online** and **online-onnx**. `online` indicates an engine that uses python for dynamic graph inference; `online-onnx` indicates an engine that uses onnxruntime for inference. The inference speed of online-onnx is faster.
......@@ -31,12 +31,12 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
- Inference speed: mb_melgan > hifigan; Audio quality: mb_melgan < hifigan
### 3. Server Usage
### 3. Streaming speech synthesis server and client using http protocol
#### 3.1 Server Usage
- Command Line (Recommended)
Start the service (the configuration file uses http by default):
```bash
# start the service
paddlespeech_server start --config_file ./conf/tts_online_application.yaml
```
......@@ -76,7 +76,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
log_file="./log/paddlespeech.log")
```
Output:
Output:
```bash
[2022-04-24 21:00:16,934] [ INFO] - The first response time of the 0 warm up: 1.268730878829956 s
[2022-04-24 21:00:17,046] [ INFO] - The first response time of the 1 warm up: 0.11168622970581055 s
......@@ -94,17 +94,15 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
```
### 4. Streaming TTS client Usage
#### 3.2 Streaming TTS client Usage
- Command Line (Recommended)
```bash
# Access http streaming TTS service
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
Access http streaming TTS service:
# Access websocket streaming TTS service
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
```bash
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
```
Usage:
```bash
......@@ -122,7 +120,6 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
- `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
- `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
- `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
Output:
```bash
......@@ -165,8 +162,144 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
[2022-04-24 21:11:16,802] [ INFO] - 音频时长:3.825 s
[2022-04-24 21:11:16,802] [ INFO] - RTF: 0.7846773683635238
[2022-04-24 21:11:16,837] [ INFO] - 音频保存至:./output.wav
```
### 4. Streaming speech synthesis server and client using websocket protocol
#### 4.1 Server Usage
- Command Line (Recommended)
First modify the configuration file `conf/tts_online_application.yaml`, **set `protocol` to `websocket`**.
Start the service:
```bash
paddlespeech_server start --config_file ./conf/tts_online_application.yaml
```
Usage:
```bash
paddlespeech_server start --help
```
Arguments:
- `config_file`: yaml file of the app, defalut: ./conf/tts_online_application.yaml
- `log_file`: log file. Default: ./log/paddlespeech.log
Output:
```bash
[2022-04-27 10:18:09,107] [ INFO] - The first response time of the 0 warm up: 1.1551103591918945 s
[2022-04-27 10:18:09,219] [ INFO] - The first response time of the 1 warm up: 0.11204338073730469 s
[2022-04-27 10:18:09,324] [ INFO] - The first response time of the 2 warm up: 0.1051797866821289 s
[2022-04-27 10:18:09,325] [ INFO] - **********************************************************************
INFO: Started server process [17600]
[2022-04-27 10:18:09] [INFO] [server.py:75] Started server process [17600]
INFO: Waiting for application startup.
[2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete.
[2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
[2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
server_executor = ServerExecutor()
server_executor(
config_file="./conf/tts_online_application.yaml",
log_file="./log/paddlespeech.log")
```
Output:
```bash
[2022-04-27 10:20:16,660] [ INFO] - The first response time of the 0 warm up: 1.0945196151733398 s
[2022-04-27 10:20:16,773] [ INFO] - The first response time of the 1 warm up: 0.11222052574157715 s
[2022-04-27 10:20:16,878] [ INFO] - The first response time of the 2 warm up: 0.10494542121887207 s
[2022-04-27 10:20:16,878] [ INFO] - **********************************************************************
INFO: Started server process [23466]
[2022-04-27 10:20:16] [INFO] [server.py:75] Started server process [23466]
INFO: Waiting for application startup.
[2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete.
[2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
[2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
```
#### 4.2 Streaming TTS client Usage
- Command Line (Recommended)
Access websocket streaming TTS service:
```bash
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
```
Usage:
```bash
paddlespeech_client tts_online --help
```
Arguments:
- `server_ip`: erver ip. Default: 127.0.0.1
- `port`: server port. Default: 8092
- `protocol`: Service protocol, choices: [http, websocket], default: http.
- `input`: (required): Input text to generate.
- `spk_id`: Speaker id for multi-speaker text to speech. Default: 0
- `speed`: Audio speed, the value should be set between 0 and 3. Default: 1.0
- `volume`: Audio volume, the value should be set between 0 and 3. Default: 1.0
- `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
- `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
- `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
Output:
```bash
[2022-04-27 10:21:04,262] [ INFO] - tts websocket client start
[2022-04-27 10:21:04,496] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
[2022-04-27 10:21:04,496] [ INFO] - 首包响应:0.2124948501586914 s
[2022-04-27 10:21:07,483] [ INFO] - 尾包响应:3.199106454849243 s
[2022-04-27 10:21:07,484] [ INFO] - 音频时长:3.825 s
[2022-04-27 10:21:07,484] [ INFO] - RTF: 0.8363677006141812
[2022-04-27 10:21:07,516] [ INFO] - 音频保存至:output.wav
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import TTSOnlineClientExecutor
import json
executor = TTSOnlineClientExecutor()
executor(
input="您好,欢迎使用百度飞桨语音合成服务。",
server_ip="127.0.0.1",
port=8092,
protocol="websocket",
spk_id=0,
speed=1.0,
volume=1.0,
sample_rate=0,
output="./output.wav",
play=False)
```
Output:
```bash
[2022-04-27 10:22:48,852] [ INFO] - tts websocket client start
[2022-04-27 10:22:49,080] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
[2022-04-27 10:22:49,080] [ INFO] - 首包响应:0.21017956733703613 s
[2022-04-27 10:22:52,100] [ INFO] - 尾包响应:3.2304444313049316 s
[2022-04-27 10:22:52,101] [ INFO] - 音频时长:3.825 s
[2022-04-27 10:22:52,101] [ INFO] - RTF: 0.8445606356352762
[2022-04-27 10:22:52,134] [ INFO] - 音频保存至:./output.wav
```
([简体中文](./README_cn.md)|English)
(简体中文|[English](./README.md))
# 流式语音合成服务
......@@ -16,11 +16,11 @@
### 2. 准备配置文件
配置文件可参见 `conf/tts_online_application.yaml`
- `protocol`表示该流式TTS服务使用的网络协议,目前支持 http 和 websocket 两种。
- `protocol`表示该流式TTS服务使用的网络协议,目前支持 **http 和 websocket** 两种。
- `engine_list`表示即将启动的服务将会包含的语音引擎,格式为 <语音任务>_<引擎类型>
- 该demo主要介绍流式语音合成服务,因此语音任务应设置为tts。
- 目前引擎类型支持两种形式:**online** 表示使用python进行动态图推理的引擎;**online-onnx** 表示使用onnxruntime进行推理的引擎。其中,online-onnx的推理速度更快。
- 流式TTS引擎的AM模型支持:fastspeech2 以及fastspeech2_cnndecoder; Voc 模型支持:hifigan, mb_melgan
- 流式TTS引擎的AM模型支持:**fastspeech2 以及fastspeech2_cnndecoder**; Voc 模型支持:**hifigan, mb_melgan**
- 流式am推理中,每次会对一个chunk的数据进行推理以达到流式的效果。其中`am_block`表示chunk中的有效帧数,`am_pad` 表示一个chunk中am_block前后各加的帧数。am_pad的存在用于消除流式推理产生的误差,避免由流式推理对合成音频质量的影响。
- fastspeech2不支持流式am推理,因此am_pad与am_block对它无效
- fastspeech2_cnndecoder 支持流式推理,当am_pad=12时,流式推理合成音频与非流式合成音频一致
......@@ -30,11 +30,12 @@
- 当voc模型为hifigan,当voc_pad=20时,流式推理合成音频与非流式合成音频一致;当voc_pad=14时,合成音频听感上没有异常。
- 推理速度:mb_melgan > hifigan; 音频质量:mb_melgan < hifigan
### 3. 服务端使用方法
### 3. 使用http协议的流式语音合成服务端及客户端使用方法
#### 3.1 服务端使用方法
- 命令行 (推荐使用)
启动服务(配置文件默认使用http):
```bash
# 启动服务
paddlespeech_server start --config_file ./conf/tts_online_application.yaml
```
......@@ -44,7 +45,7 @@
paddlespeech_server start --help
```
参数:
- `config_file`: 服务的配置文件,默认: ./conf/application.yaml
- `config_file`: 服务的配置文件,默认: ./conf/tts_online_application.yaml
- `log_file`: log 文件. 默认:./log/paddlespeech.log
输出:
......@@ -92,17 +93,15 @@
```
### 4. 流式TTS 客户端使用方法
#### 3.2 客户端使用方法
- 命令行 (推荐使用)
```bash
# 访问 http 流式TTS服务
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
访问 http 流式TTS服务:
# 访问 websocket 流式TTS服务
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
```bash
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
```
使用帮助:
```bash
......@@ -163,8 +162,143 @@
[2022-04-24 21:11:16,802] [ INFO] - 音频时长:3.825 s
[2022-04-24 21:11:16,802] [ INFO] - RTF: 0.7846773683635238
[2022-04-24 21:11:16,837] [ INFO] - 音频保存至:./output.wav
```
### 4. 使用websocket协议的流式语音合成服务端及客户端使用方法
#### 4.1 服务端使用方法
- 命令行 (推荐使用)
首先修改配置文件 `conf/tts_online_application.yaml`**将 `protocol` 设置为 `websocket`**
启动服务:
```bash
paddlespeech_server start --config_file ./conf/tts_online_application.yaml
```
使用方法:
```bash
paddlespeech_server start --help
```
参数:
- `config_file`: 服务的配置文件,默认: ./conf/tts_online_application.yaml
- `log_file`: log 文件. 默认:./log/paddlespeech.log
输出:
```bash
[2022-04-27 10:18:09,107] [ INFO] - The first response time of the 0 warm up: 1.1551103591918945 s
[2022-04-27 10:18:09,219] [ INFO] - The first response time of the 1 warm up: 0.11204338073730469 s
[2022-04-27 10:18:09,324] [ INFO] - The first response time of the 2 warm up: 0.1051797866821289 s
[2022-04-27 10:18:09,325] [ INFO] - **********************************************************************
INFO: Started server process [17600]
[2022-04-27 10:18:09] [INFO] [server.py:75] Started server process [17600]
INFO: Waiting for application startup.
[2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete.
[2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
[2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
server_executor = ServerExecutor()
server_executor(
config_file="./conf/tts_online_application.yaml",
log_file="./log/paddlespeech.log")
```
输出:
```bash
[2022-04-27 10:20:16,660] [ INFO] - The first response time of the 0 warm up: 1.0945196151733398 s
[2022-04-27 10:20:16,773] [ INFO] - The first response time of the 1 warm up: 0.11222052574157715 s
[2022-04-27 10:20:16,878] [ INFO] - The first response time of the 2 warm up: 0.10494542121887207 s
[2022-04-27 10:20:16,878] [ INFO] - **********************************************************************
INFO: Started server process [23466]
[2022-04-27 10:20:16] [INFO] [server.py:75] Started server process [23466]
INFO: Waiting for application startup.
[2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete.
[2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
[2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
```
#### 4.2 客户端使用方法
- 命令行 (推荐使用)
访问 websocket 流式TTS服务:
```bash
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
```
使用帮助:
```bash
paddlespeech_client tts_online --help
```
参数:
- `server_ip`: 服务端ip地址,默认: 127.0.0.1。
- `port`: 服务端口,默认: 8092。
- `protocol`: 服务协议,可选 [http, websocket], 默认: http。
- `input`: (必须输入): 待合成的文本。
- `spk_id`: 说话人 id,用于多说话人语音合成,默认值: 0。
- `speed`: 音频速度,该值应设置在 0 到 3 之间。 默认值:1.0
- `volume`: 音频音量,该值应设置在 0 到 3 之间。 默认值: 1.0
- `sample_rate`: 采样率,可选 [0, 8000, 16000],默认值:0,表示与模型采样率相同
- `output`: 输出音频的路径, 默认值:None,表示不保存音频到本地。
- `play`: 是否播放音频,边合成边播放, 默认值:False,表示不播放。**播放音频需要依赖pyaudio库**。
输出:
```bash
[2022-04-27 10:21:04,262] [ INFO] - tts websocket client start
[2022-04-27 10:21:04,496] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
[2022-04-27 10:21:04,496] [ INFO] - 首包响应:0.2124948501586914 s
[2022-04-27 10:21:07,483] [ INFO] - 尾包响应:3.199106454849243 s
[2022-04-27 10:21:07,484] [ INFO] - 音频时长:3.825 s
[2022-04-27 10:21:07,484] [ INFO] - RTF: 0.8363677006141812
[2022-04-27 10:21:07,516] [ INFO] - 音频保存至:output.wav
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import TTSOnlineClientExecutor
import json
executor = TTSOnlineClientExecutor()
executor(
input="您好,欢迎使用百度飞桨语音合成服务。",
server_ip="127.0.0.1",
port=8092,
protocol="websocket",
spk_id=0,
speed=1.0,
volume=1.0,
sample_rate=0,
output="./output.wav",
play=False)
```
输出:
```bash
[2022-04-27 10:22:48,852] [ INFO] - tts websocket client start
[2022-04-27 10:22:49,080] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
[2022-04-27 10:22:49,080] [ INFO] - 首包响应:0.21017956733703613 s
[2022-04-27 10:22:52,100] [ INFO] - 尾包响应:3.2304444313049316 s
[2022-04-27 10:22:52,101] [ INFO] - 音频时长:3.825 s
[2022-04-27 10:22:52,101] [ INFO] - RTF: 0.8445606356352762
[2022-04-27 10:22:52,134] [ INFO] - 音频保存至:./output.wav
```
......@@ -6,7 +6,7 @@
### Speech Recognition Model
Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech | Example Link
:-------------:| :------------:| :-----: | -----: | :-----: |:-----:| :-----: | :-----: | :-----:
[Ds2 Online Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_fbank161_ckpt_0.2.0.model.tar.gz) | Aishell Dataset | Char-based | 479 MB | 2 Conv + 5 LSTM layers with only forward direction | 0.0718 |-| 151 h | [D2 Online Aishell ASR0](../../examples/aishell/asr0)
[Ds2 Online Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_fbank161_ckpt_0.2.1.model.tar.gz) | Aishell Dataset | Char-based | 491 MB | 2 Conv + 5 LSTM layers with only forward direction | 0.0666 |-| 151 h | [D2 Online Aishell ASR0](../../examples/aishell/asr0)
[Ds2 Offline Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz)| Aishell Dataset | Char-based | 306 MB | 2 Conv + 3 bidirectional GRU layers| 0.064 |-| 151 h | [Ds2 Offline Aishell ASR0](../../examples/aishell/asr0)
[Conformer Online Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_chunk_conformer_aishell_ckpt_0.2.0.model.tar.gz) | Aishell Dataset | Char-based | 189 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.0544 |-| 151 h | [Conformer Online Aishell ASR1](../../examples/aishell/asr1)
[Conformer Offline Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_conformer_aishell_ckpt_0.1.2.model.tar.gz) | Aishell Dataset | Char-based | 189 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0464 |-| 151 h | [Conformer Offline Aishell ASR1](../../examples/aishell/asr1)
......
......@@ -4,6 +4,7 @@
| Model | Number of Params | Release | Config | Test set | Valid Loss | CER |
| --- | --- | --- | --- | --- | --- | --- |
| DeepSpeech2 | 45.18M | r0.2.0 | conf/deepspeech2_online.yaml + U2 Data pipline and spec aug + fbank161 | test | 6.876979827880859 | 0.0666 |
| DeepSpeech2 | 45.18M | r0.2.0 | conf/deepspeech2_online.yaml + spec aug + fbank161 | test | 7.679287910461426 | 0.0718 |
| DeepSpeech2 | 45.18M | r0.2.0 | conf/deepspeech2_online.yaml + spec aug | test | 7.708217620849609| 0.078 |
| DeepSpeech2 | 45.18M | v2.2.0 | conf/deepspeech2_online.yaml + spec aug | test | 7.994938373565674 | 0.080 |
......
# Speaker Diarization on AMI corpus
* sd0 - speaker diarization by AHC,SC base on x-vectors
* sd0 - speaker diarization by AHC,SC base on embeddings
......@@ -7,7 +7,23 @@
The script performs diarization using x-vectors(TDNN,ECAPA-TDNN) on the AMI mix-headset data. We demonstrate the use of different clustering methods: AHC, spectral.
## How to Run
### prepare annotations and audios
Download AMI corpus, You need around 10GB of free space to get whole data
The signals are too large to package in this way, so you need to use the chooser to indicate which ones you wish to download
```bash
## download annotations
wget http://groups.inf.ed.ac.uk/ami/AMICorpusAnnotations/ami_public_manual_1.6.2.zip && unzip ami_public_manual_1.6.2.zip
```
then please follow https://groups.inf.ed.ac.uk/ami/download/ to download the Signals:
1) Select one or more AMI meetings: the IDs please follow ./ami_split.py
2) Select media streams: Just select Headset mix
### start running
Use the following command to run diarization on AMI corpus.
`bash ./run.sh`
```bash
./run.sh --data_folder ./amicorpus --manual_annot_folder ./ami_public_manual_1.6.2
```
## Results (DER) coming soon! :)
......@@ -17,18 +17,6 @@ device=gpu
. ${MAIN_ROOT}/utils/parse_options.sh || exit 1;
if [ $stage -le 0 ]; then
# Prepare data
# Download AMI corpus, You need around 10GB of free space to get whole data
# The signals are too large to package in this way,
# so you need to use the chooser to indicate which ones you wish to download
echo "Please follow https://groups.inf.ed.ac.uk/ami/download/ to download the data."
echo "Annotations: AMI manual annotations v1.6.2 "
echo "Signals: "
echo "1) Select one or more AMI meetings: the IDs please follow ./ami_split.py"
echo "2) Select media streams: Just select Headset mix"
fi
if [ $stage -le 1 ]; then
# Download the pretrained model
wget https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_1.tar.gz
......
......@@ -27,6 +27,16 @@ pretrained_models = {
'ckpt_path':
'exp/conformer/checkpoints/wenetspeech',
},
"conformer_online_multicn-zh-16k": {
'url':
'https://paddlespeech.bj.bcebos.com/s2t/multi_cn/asr1/asr1_chunk_conformer_multi_cn_ckpt_0.2.0.model.tar.gz',
'md5':
'7989b3248c898070904cf042fd656003',
'cfg_path':
'model.yaml',
'ckpt_path':
'exp/chunk_conformer/checkpoints/multi_cn',
},
"conformer_aishell-zh-16k": {
'url':
'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_conformer_aishell_ckpt_0.1.2.model.tar.gz',
......@@ -57,6 +67,20 @@ pretrained_models = {
'ckpt_path':
'exp/transformer/checkpoints/avg_10',
},
"deepspeech2online_wenetspeech-zh-16k": {
'url':
'https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr0/WIP_asr0_deepspeech2_online_wenetspeech_ckpt_1.0.0a.model.tar.gz',
'md5':
'b3ef6fcae8c0058c3c53375341ccb209',
'cfg_path':
'model.yaml',
'ckpt_path':
'exp/deepspeech2_online/checkpoints/avg_3',
'lm_url':
'https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm',
'lm_md5':
'29e02312deb2e59b3c8686c7966d4fe3'
},
"deepspeech2offline_aishell-zh-16k": {
'url':
'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz',
......@@ -73,9 +97,9 @@ pretrained_models = {
},
"deepspeech2online_aishell-zh-16k": {
'url':
'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_fbank161_ckpt_0.2.0.model.tar.gz',
'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_fbank161_ckpt_0.2.1.model.tar.gz',
'md5':
'd314960e83cc10dcfa6b04269f3054d4',
'98b87b171b7240b7cae6e07d8d0bc9be',
'cfg_path':
'model.yaml',
'ckpt_path':
......
......@@ -43,9 +43,9 @@ __all__ = ['ASREngine']
pretrained_models = {
"deepspeech2online_aishell-zh-16k": {
'url':
'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_fbank161_ckpt_0.2.0.model.tar.gz',
'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_fbank161_ckpt_0.2.1.model.tar.gz',
'md5':
'd314960e83cc10dcfa6b04269f3054d4',
'98b87b171b7240b7cae6e07d8d0bc9be',
'cfg_path':
'model.yaml',
'ckpt_path':
......
......@@ -14,14 +14,16 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
paddlespeech asr --input ./zh.wav
paddlespeech asr --model conformer_aishell --input ./zh.wav
paddlespeech asr --model conformer_online_aishell --input ./zh.wav
paddlespeech asr --model conformer_online_multicn --input ./zh.wav
paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav
paddlespeech asr --model deepspeech2offline_aishell --input ./zh.wav
paddlespeech asr --model deepspeech2online_wenetspeech --input ./zh.wav
paddlespeech asr --model deepspeech2online_aishell --input ./zh.wav
paddlespeech asr --model deepspeech2offline_librispeech --lang en --input ./en.wav
# long audio restriction
{
wget -c wget https://paddlespeech.bj.bcebos.com/datasets/single_wav/zh/test_long_audio_01.wav
wget -c https://paddlespeech.bj.bcebos.com/datasets/single_wav/zh/test_long_audio_01.wav
paddlespeech asr --input test_long_audio_01.wav
if [ $? -ne 255 ]; then
echo -e "\e[1;31mTime restriction not passed\e[0m"
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册