未验证 提交 5c8136b0 编写于 作者: K KP 提交者: GitHub

Add speech models. (#1678)

上级 8f403bac
......@@ -192,7 +192,7 @@
- <img src="../../imgs/Install_Related/mac/output_img.png" alt="output image" width="600" align="center"/>
## 第6步:飞桨预训练模型探索之旅
- 恭喜你,到这里PaddleHub在windows环境下的安装和入门案例就全部完成了,快快开启你更多的深度学习模型探索之旅吧。[【更多模型探索,跳转飞桨官网】](https://www.paddlepaddle.org.cn/hublist)
- 恭喜你,到这里PaddleHub在mac环境下的安装和入门案例就全部完成了,快快开启你更多的深度学习模型探索之旅吧。[【更多模型探索,跳转飞桨官网】](https://www.paddlepaddle.org.cn/hublist)
......
# deepspeech2_aishell
|模型名称|deepspeech2_aishell|
| :--- | :---: |
|类别|语音-语音识别|
|网络|DeepSpeech2|
|数据集|AISHELL-1|
|是否支持Fine-tuning|否|
|模型大小|306MB|
|最新更新日期|2021-10-20|
|数据指标|中文CER 0.065|
## 一、模型基本信息
### 模型介绍
DeepSpeech2是百度于2015年提出的适用于英文和中文的end-to-end语音识别模型。deepspeech2_aishell使用了DeepSpeech2离线模型的结构,模型主要由2层卷积网络和3层GRU组成,并在中文普通话开源语音数据集[AISHELL-1](http://www.aishelltech.com/kysjcp)进行了预训练,该模型在其测试集上的CER指标是0.065。
<p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/DeepSpeech/Hub/docs/images/ds2offlineModel.png" hspace='10'/> <br />
</p>
更多详情请参考[Deep Speech 2: End-to-End Speech Recognition in English and Mandarin](https://arxiv.org/abs/1512.02595)
## 二、安装
- ### 1、系统依赖
- libsndfile, swig >= 3.0
- Linux
```shell
$ sudo apt-get install libsndfile swig
or
$ sudo yum install libsndfile swig
```
- MacOs
```
$ brew install libsndfile swig
```
- ### 2、环境依赖
- swig_decoder:
```
git clone https://github.com/PaddlePaddle/DeepSpeech.git && cd DeepSpeech && git reset --hard b53171694e7b87abe7ea96870b2f4d8e0e2b1485 && cd deepspeech/decoders/ctcdecoder/swig && sh setup.sh
```
- paddlepaddle >= 2.1.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 3、安装
- ```shell
$ hub install deepspeech2_aishell
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
```python
import paddlehub as hub
# 采样率为16k,格式为wav的中文语音音频
wav_file = '/PATH/TO/AUDIO'
model = hub.Module(
name='deepspeech2_aishell',
version='1.0.0')
text = model.speech_recognize(wav_file)
print(text)
```
- ### 2、API
- ```python
def check_audio(audio_file)
```
- 检查输入音频格式和采样率是否满足为16000
- **参数**
- `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
- ```python
def speech_recognize(
audio_file,
device='cpu',
)
```
- 将输入的音频识别成文字
- **参数**
- `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
- `device`:预测时使用的设备,默认为`cpu`,如需使用gpu预测,请设置为`gpu`。
- **返回**
- `text`:str类型,返回输入音频的识别文字结果。
## 四、服务部署
- PaddleHub Serving可以部署一个在线的语音识别服务。
- ### 第一步:启动PaddleHub Serving
- ```shell
$ hub serving start -m deepspeech2_aishell
```
- 这样就完成了一个语音识别服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
# 需要识别的音频的存放路径,确保部署服务的机器可访问
file = '/path/to/input.wav'
# 以key的方式指定text传入预测方法的时的参数,此例中为"audio_file"
data = {"audio_file": file}
# 发送post请求,content-type类型应指定json方式,url中的ip地址需改为对应机器的ip
url = "http://127.0.0.1:8866/predict/deepspeech2_aishell"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 五、更新历史
* 1.0.0
初始发布
```shell
$ hub install deepspeech2_aishell
```
# https://yaml.org/type/float.html
data:
train_manifest: data/manifest.train
dev_manifest: data/manifest.dev
test_manifest: data/manifest.test
min_input_len: 0.0
max_input_len: 27.0 # second
min_output_len: 0.0
max_output_len: .inf
min_output_input_ratio: 0.00
max_output_input_ratio: .inf
collator:
batch_size: 64 # one gpu
mean_std_filepath: data/mean_std.json
unit_type: char
vocab_filepath: data/vocab.txt
augmentation_config: conf/augmentation.json
random_seed: 0
spm_model_prefix:
spectrum_type: linear
feat_dim:
delta_delta: False
stride_ms: 10.0
window_ms: 20.0
n_fft: None
max_freq: None
target_sample_rate: 16000
use_dB_normalization: True
target_dB: -20
dither: 1.0
keep_transcription_text: False
sortagrad: True
shuffle_method: batch_shuffle
num_workers: 2
model:
num_conv_layers: 2
num_rnn_layers: 3
rnn_layer_size: 1024
use_gru: True
share_rnn_weights: False
blank_id: 0
ctc_grad_norm_type: instance
training:
n_epoch: 80
accum_grad: 1
lr: 2e-3
lr_decay: 0.83
weight_decay: 1e-06
global_grad_clip: 3.0
log_interval: 100
checkpoint:
kbest_n: 50
latest_n: 5
decoding:
batch_size: 128
error_rate_type: cer
decoding_method: ctc_beam_search
lang_model_path: data/lm/zh_giga.no_cna_cmn.prune01244.klm
alpha: 1.9
beta: 5.0
beam_size: 300
cutoff_prob: 0.99
cutoff_top_n: 40
num_proc_bsearch: 10
{"mean_stat": [-13505966.65209869, -12778154.889588555, -13487728.30750011, -12897344.94123812, -12472281.490772562, -12631566.475106332, -13391790.349327326, -14045382.570026815, -14159320.465516506, -14273422.438486755, -14639805.161347123, -15145380.07768254, -15612893.133258691, -15938542.05012206, -16115293.502621327, -16188225.698757892, -16317206.280373082, -16500598.476283036, -16671564.297937019, -16804599.860397574, -16916423.142814968, -17011785.59439087, -17075067.62262626, -17154580.16740178, -17257812.961825978, -17355683.228599995, -17441455.258318607, -17473199.925130684, -17488835.5763828, -17491232.15414511, -17485000.29006962, -17499471.646940477, -17551398.97122984, -17641732.10682403, -17757209.077974595, -17843801.500521667, -17935647.58641936, -18020362.347413756, -18117633.806080323, -18232427.58935143, -18316024.35215119, -18378789.145393644, -18421147.25807373, -18445805.18294822, -18460946.27810118, -18467914.04034822, -18469404.319909714, -18469606.974339806, -18470754.294192698, -18458320.91921723, -18441354.111811973, -18428332.216321833, -18422281.413955193, -18433421.585668042, -18460521.025954794, -18494800.856363494, -18539532.288011573, -18583823.79899225, -18614474.56256926, -18646872.180154275, -18661137.85367877, -18673590.719379324, -18702967.62040798, -18736434.748098046, -18777912.13098326, -18794675.486509323, -18837225.856196072, -18874872.796128694, -18927340.44407057, -18994929.076545004, -19060701.164406348, -19118006.18996682, -19175792.05766062, -19230755.996405277, -19270174.594219487, -19334788.35904946, -19401456.988906194, -19484580.095938426, -19582040.4715673, -19696598.86662636, -19810401.513227757, -19931755.37941177, -20021867.47620737, -20082298.984455004, -20114708.336475413, -20143802.72793865, -20146821.988139726, -20165613.317683898, -20189938.602584295, -20220059.08673595, -20242848.528134122, -20250859.979931064, -20267382.93048284, -20267964.544716164, -20261372.89563879, -20252878.74023849, -20247550.771284755, -20231778.31093504, -20231376.103159923, -20236926.52293088, -20248068.41488535, -20255076.901920393, -20262924.167151034, -20263926.583205637, -20263790.273742784, -20268560.080967404, -20268997.150654405, -20269810.816284582, -20267771.864327505, -20256472.703380838, -20241790.559690386, -20241865.794732895, -20244924.716114976, -20249736.631184842, -20257257.816903576, -20268027.212145977, -20277399.95533857, -20281840.8112546, -20270512.52002465, -20255938.63066214, -20242421.685443826, -20241986.654626504, -20237836.034444932, -20231458.31132546, -20218092.819713395, -20204994.19634715, -20198880.142133974, -20197376.49014031, -20198117.60450857, -20197443.473929476, -20191142.03632657, -20174428.452719454, -20159204.32090646, -20137981.294740904, -20124944.79897834, -20112774.604521394, -20109389.248600915, -20115248.61302806, -20117743.853294585, -20123076.93515528, -20132224.95454374, -20147099.26793121, -20169581.367630124, -20190957.518733896, -20215197.057997894, -20242033.589256056, -20282032.217160087, -20316778.653784916, -20360354.215504933, -20425089.908502825, -20534553.0465662, -20737928.349233944, -21091705.14104186, -21646013.197923105, -22403182.076235127, -23313516.63322832, -24244679.879594248, -25027534.00417361, -25502455.708560493, -25665136.744125813, -26602318.88405537], "var_stat": [209924783.1093623, 185218712.4577822, 209991180.89829063, 196198511.40798286, 186098265.7827955, 191905798.58923203, 214281935.29191792, 235042114.51049897, 240179456.24597096, 244657890.3963041, 256099586.32657292, 271849135.9872555, 287174069.13527167, 298171137.28863454, 304112589.91933817, 306553976.2206335, 310813670.30674237, 316958840.3099824, 322651440.3639528, 327213725.196089, 331252123.26114285, 334856188.3081607, 337217897.6545214, 340385427.82557064, 344400488.5633641, 348086880.08086526, 351349070.53148264, 352648076.18415344, 353409462.33704513, 353598061.4967693, 353405322.74993587, 353917215.6834277, 355784796.898883, 359222461.3224974, 363671441.7428676, 366908651.69908494, 370304677.0615045, 373477194.79721, 377174088.9808273, 381531608.6574547, 384703574.426059, 387104126.9474883, 388723211.11308575, 389687817.27351815, 390351031.4418706, 390659006.3690262, 390704649.89417714, 390702370.1919126, 390731862.59274197, 390216004.4126628, 389516083.054853, 389017745.636457, 388788872.1127645, 389269311.2239042, 390401819.5968815, 391842612.97859454, 393708801.05223197, 395569598.4694, 396868892.67152405, 398210915.02133286, 398743299.4753882, 399330344.88417244, 400565940.1325846, 401901693.4656316, 403513855.43933284, 404103248.96526104, 405986814.274556, 407507145.4104169, 409598353.6517908, 412453848.0248063, 415138273.0558441, 417479272.96907294, 419785633.3276395, 422003065.1681787, 423610264.8868346, 426260552.96545905, 428973536.3620236, 432368654.40899384, 436359561.5468266, 441119512.777527, 445884989.25794005, 451037422.65838546, 454872292.24179226, 457497136.8780015, 458904066.0675219, 460155836.4432799, 460272943.80738074, 461087498.6828549, 462144907.7850926, 463483598.81228757, 464530694.44478536, 464971538.85301507, 465771535.6019992, 465936698.93801653, 465741012.7287712, 465448625.0011534, 465296363.8603534, 464718299.2207512, 464720391.25778216, 465016640.5248736, 465564374.0248998, 465982788.8695927, 466425068.01245564, 466595649.90489674, 466707658.8296169, 467015570.78026086, 467099213.08769494, 467201640.15951264, 467163862.3709329, 466727597.56313753, 466174871.71213347, 466255498.45248336, 466439062.65458614, 466693130.99620277, 467068587.1422199, 467536070.1402474, 467955819.1549621, 468187227.1069643, 467742976.2778335, 467159585.250493, 466592359.52916145, 466583195.8099961, 466424348.9572719, 466155323.6074322, 465569620.1801811, 465021642.5158305, 464757658.6383867, 464713882.60103834, 464724239.2941314, 464679163.728191, 464407007.8705965, 463660736.0136739, 463001339.2385198, 462077058.47595775, 461505071.67199403, 460946277.95973784, 460816158.9197017, 461123589.268546, 461232998.1572812, 461445601.0442877, 461803238.28569543, 462436966.22005004, 463391404.7434971, 464299608.85523456, 465319405.3931429, 466432961.70208246, 468168080.3331244, 469640808.6809098, 471501539.22440934, 474301795.1694898, 479155711.93441755, 488314271.10405815, 504537056.23994666, 530509400.5201074, 566892036.4437443, 611792826.0442055, 658913502.9004005, 699716882.9169292, 725237302.8248898, 734259159.9571886, 789267050.8287783], "frame_num": 899422}
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Evaluation for DeepSpeech2 model."""
import os
import sys
from pathlib import Path
import paddle
from deepspeech.frontend.featurizer.text_featurizer import TextFeaturizer
from deepspeech.io.collator import SpeechCollator
from deepspeech.models.ds2 import DeepSpeech2Model
from deepspeech.utils import mp_tools
from deepspeech.utils.utility import UpdateConfig
class DeepSpeech2Tester:
def __init__(self, config):
self.config = config
self.collate_fn_test = SpeechCollator.from_config(config)
self._text_featurizer = TextFeaturizer(unit_type=config.collator.unit_type, vocab_filepath=None)
def compute_result_transcripts(self, audio, audio_len, vocab_list, cfg):
result_transcripts = self.model.decode(
audio,
audio_len,
vocab_list,
decoding_method=cfg.decoding_method,
lang_model_path=cfg.lang_model_path,
beam_alpha=cfg.alpha,
beam_beta=cfg.beta,
beam_size=cfg.beam_size,
cutoff_prob=cfg.cutoff_prob,
cutoff_top_n=cfg.cutoff_top_n,
num_processes=cfg.num_proc_bsearch)
#replace the '<space>' with ' '
result_transcripts = [self._text_featurizer.detokenize(sentence) for sentence in result_transcripts]
return result_transcripts
@mp_tools.rank_zero_only
@paddle.no_grad()
def test(self, audio_file):
self.model.eval()
cfg = self.config
collate_fn_test = self.collate_fn_test
audio, _ = collate_fn_test.process_utterance(audio_file=audio_file, transcript=" ")
audio_len = audio.shape[0]
audio = paddle.to_tensor(audio, dtype='float32')
audio_len = paddle.to_tensor(audio_len)
audio = paddle.unsqueeze(audio, axis=0)
vocab_list = collate_fn_test.vocab_list
result_transcripts = self.compute_result_transcripts(audio, audio_len, vocab_list, cfg.decoding)
return result_transcripts
def setup_model(self):
config = self.config.clone()
with UpdateConfig(config):
config.model.feat_size = self.collate_fn_test.feature_size
config.model.dict_size = self.collate_fn_test.vocab_size
model = DeepSpeech2Model.from_config(config.model)
self.model = model
def resume(self, checkpoint):
"""Resume from the checkpoint at checkpoints in the output
directory or load a specified checkpoint.
"""
model_dict = paddle.load(checkpoint)
self.model.set_state_dict(model_dict)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from pathlib import Path
import sys
import numpy as np
from paddlehub.env import MODULE_HOME
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddle.utils.download import get_path_from_url
try:
import swig_decoders
except ModuleNotFoundError as e:
logger.error(e)
logger.info('The module requires additional dependencies: swig_decoders. '
'please install via:\n\'git clone https://github.com/PaddlePaddle/DeepSpeech.git '
'&& cd DeepSpeech && git reset --hard b53171694e7b87abe7ea96870b2f4d8e0e2b1485 '
'&& cd deepspeech/decoders/ctcdecoder/swig && sh setup.sh\'')
sys.exit(1)
import paddle
import soundfile as sf
# TODO: Remove system path when deepspeech can be installed via pip.
sys.path.append(os.path.join(MODULE_HOME, 'deepspeech2_aishell'))
from deepspeech.exps.deepspeech2.config import get_cfg_defaults
from deepspeech.utils.utility import UpdateConfig
from .deepspeech_tester import DeepSpeech2Tester
LM_URL = 'https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm'
LM_MD5 = '29e02312deb2e59b3c8686c7966d4fe3'
@moduleinfo(name="deepspeech2_aishell", version="1.0.0", summary="", author="Baidu", author_email="", type="audio/asr")
class DeepSpeech2(paddle.nn.Layer):
def __init__(self):
super(DeepSpeech2, self).__init__()
# resource
res_dir = os.path.join(MODULE_HOME, 'deepspeech2_aishell', 'assets')
conf_file = os.path.join(res_dir, 'conf/deepspeech2.yaml')
checkpoint = os.path.join(res_dir, 'checkpoints/avg_1.pdparams')
# Download LM manually cause its large size.
lm_path = os.path.join(res_dir, 'data', 'lm')
lm_file = os.path.join(lm_path, LM_URL.split('/')[-1])
if not os.path.isfile(lm_file):
logger.info(f'Downloading lm from {LM_URL}.')
get_path_from_url(url=LM_URL, root_dir=lm_path, md5sum=LM_MD5)
# config
self.model_type = 'offline'
self.config = get_cfg_defaults(self.model_type)
self.config.merge_from_file(conf_file)
# TODO: Remove path updating snippet.
with UpdateConfig(self.config):
self.config.collator.mean_std_filepath = os.path.join(res_dir, self.config.collator.mean_std_filepath)
self.config.collator.vocab_filepath = os.path.join(res_dir, self.config.collator.vocab_filepath)
self.config.collator.augmentation_config = os.path.join(res_dir, self.config.collator.augmentation_config)
self.config.decoding.lang_model_path = os.path.join(res_dir, self.config.decoding.lang_model_path)
# model
self.tester = DeepSpeech2Tester(self.config)
self.tester.setup_model()
self.tester.resume(checkpoint)
@staticmethod
def check_audio(audio_file):
sig, sample_rate = sf.read(audio_file)
assert sample_rate == 16000, 'Excepting sample rate of input audio is 16000, but got {}'.format(sample_rate)
@serving
def speech_recognize(self, audio_file, device='cpu'):
assert os.path.isfile(audio_file), 'File not exists: {}'.format(audio_file)
self.check_audio(audio_file)
paddle.set_device(device)
return self.tester.test(audio_file)[0]
# system level: libsnd swig
loguru
yacs
jsonlines
scipy==1.2.1
sentencepiece
resampy==0.2.2
SoundFile==0.9.0.post1
soxbindings
kaldiio
typeguard
editdistance
# deepspeech2_librispeech
|模型名称|deepspeech2_librispeech|
| :--- | :---: |
|类别|语音-语音识别|
|网络|DeepSpeech2|
|数据集|LibriSpeech|
|是否支持Fine-tuning|否|
|模型大小|518MB|
|最新更新日期|2021-10-20|
|数据指标|英文WER 0.072|
## 一、模型基本信息
### 模型介绍
DeepSpeech2是百度于2015年提出的适用于英文和中文的end-to-end语音识别模型。deepspeech2_librispeech使用了DeepSpeech2离线模型的结构,模型主要由2层卷积网络和3层GRU组成,并在英文开源语音数据集[LibriSpeech ASR corpus](http://www.openslr.org/12/)进行了预训练,该模型在其测试集上的WER指标是0.072。
<p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/DeepSpeech/Hub/docs/images/ds2offlineModel.png" hspace='10'/> <br />
</p>
更多详情请参考[Deep Speech 2: End-to-End Speech Recognition in English and Mandarin](https://arxiv.org/abs/1512.02595)
## 二、安装
- ### 1、系统依赖
- libsndfile, swig >= 3.0
- Linux
```shell
$ sudo apt-get install libsndfile swig
or
$ sudo yum install libsndfile swig
```
- MacOs
```
$ brew install libsndfile swig
```
- ### 2、环境依赖
- swig_decoder:
```
git clone https://github.com/paddlepaddle/deepspeech && cd DeepSpeech && git reset --hard b53171694e7b87abe7ea96870b2f4d8e0e2b1485 && cd deepspeech/decoders/ctcdecoder/swig && sh setup.sh
```
- paddlepaddle >= 2.1.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 3、安装
- ```shell
$ hub install deepspeech2_librispeech
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
```python
import paddlehub as hub
# 采样率为16k,格式为wav的英文语音音频
wav_file = '/PATH/TO/AUDIO'
model = hub.Module(
name='deepspeech2_librispeech',
version='1.0.0')
text = model.speech_recognize(wav_file)
print(text)
```
- ### 2、API
- ```python
def check_audio(audio_file)
```
- 检查输入音频格式和采样率是否满足为16000
- **参数**
- `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
- ```python
def speech_recognize(
audio_file,
device='cpu',
)
```
- 将输入的音频识别成文字
- **参数**
- `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
- `device`:预测时使用的设备,默认为`cpu`,如需使用gpu预测,请设置为`gpu`。
- **返回**
- `text`:str类型,返回输入音频的识别文字结果。
## 四、服务部署
- PaddleHub Serving可以部署一个在线的语音识别服务。
- ### 第一步:启动PaddleHub Serving
- ```shell
$ hub serving start -m deepspeech2_librispeech
```
- 这样就完成了一个语音识别服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
# 需要识别的音频的存放路径,确保部署服务的机器可访问
file = '/path/to/input.wav'
# 以key的方式指定text传入预测方法的时的参数,此例中为"audio_file"
data = {"audio_file": file}
# 发送post请求,content-type类型应指定json方式,url中的ip地址需改为对应机器的ip
url = "http://127.0.0.1:8866/predict/deepspeech2_librispeech"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 五、更新历史
* 1.0.0
初始发布
```shell
$ hub install deepspeech2_librispeech
```
# https://yaml.org/type/float.html
data:
train_manifest: data/manifest.train
dev_manifest: data/manifest.dev-clean
test_manifest: data/manifest.test-clean
min_input_len: 0.0
max_input_len: 30.0 # second
min_output_len: 0.0
max_output_len: .inf
min_output_input_ratio: 0.00
max_output_input_ratio: .inf
collator:
batch_size: 20
mean_std_filepath: data/mean_std.json
unit_type: char
vocab_filepath: data/vocab.txt
augmentation_config: conf/augmentation.json
random_seed: 0
spm_model_prefix:
spectrum_type: linear
target_sample_rate: 16000
max_freq: None
n_fft: None
stride_ms: 10.0
window_ms: 20.0
delta_delta: False
dither: 1.0
use_dB_normalization: True
target_dB: -20
random_seed: 0
keep_transcription_text: False
sortagrad: True
shuffle_method: batch_shuffle
num_workers: 2
model:
num_conv_layers: 2
num_rnn_layers: 3
rnn_layer_size: 2048
use_gru: False
share_rnn_weights: True
blank_id: 0
ctc_grad_norm_type: instance
training:
n_epoch: 50
accum_grad: 1
lr: 1e-3
lr_decay: 0.83
weight_decay: 1e-06
global_grad_clip: 5.0
log_interval: 100
checkpoint:
kbest_n: 50
latest_n: 5
decoding:
batch_size: 128
error_rate_type: wer
decoding_method: ctc_beam_search
lang_model_path: data/lm/common_crawl_00.prune01111.trie.klm
alpha: 1.9
beta: 0.3
beam_size: 500
cutoff_prob: 1.0
cutoff_top_n: 40
num_proc_bsearch: 8
此差异已折叠。
loguru
yacs
jsonlines
scipy==1.2.1
sentencepiece
resampy==0.2.2
SoundFile==0.9.0.post1
soxbindings
kaldiio
typeguard
editdistance
此差异已折叠。
{"mean_stat": [533749178.75492024, 537379151.9412827, 553560684.251823, 587164297.7995199, 631868827.5506272, 662598279.7375823, 684377628.7270963, 695391900.076011, 692470493.5234187, 679434068.1698124, 666124153.9164762, 656323498.7897255, 665750586.0282139, 678693518.7836165, 681921713.5434498, 679622373.0941861, 669891550.4909347, 656595089.7941492, 653838531.0994304, 637678601.7858486, 628412248.7348012, 644835299.462052, 638840698.1892803, 646181879.4332589, 639724189.2981818, 642757470.3933163, 637471382.8647255, 642368839.4687729, 643414999.4559816, 647384269.1630985, 649348352.9727564, 649293860.0141628, 650234047.7200857, 654485430.6703687, 660474314.9996675, 667417041.2224753, 673157601.3226709, 675674470.304284, 675124085.6890339, 668017589.4583111, 670061307.6169846, 662625614.6886193, 663144526.4351237, 662504003.7634674, 666413530.1149732, 672263295.5639057, 678483738.2530766, 685387098.3034457, 692570857.529439, 699066050.4399202, 700784878.5879861, 701201520.50868, 702666292.305144, 705443439.2278953, 706070270.9023902, 705988909.8337733, 702843339.0362502, 699318566.4701376, 696089900.3030818, 687559674.541517, 675279201.9502573, 663676352.2301354, 662963751.7464145, 664300133.8414352, 666095384.4212626, 671682092.7777623, 676652386.6696675, 680097668.2490273, 683810023.0071762, 688701544.3655603, 692082724.9923568, 695788849.6782106, 701085780.0070009, 706389529.7959046, 711492753.1344281, 717637923.73355, 719691678.2081754, 715810733.4964175, 696362890.4862831, 604649423.9932467], "var_stat": [5413314850.92017, 5559847287.933615, 6150990253.613769, 6921242242.585692, 7999776708.347419, 8789877370.390867, 9405801233.462742, 9768050110.323652, 9759783206.942099, 9430647265.679018, 9090547056.72849, 8873147345.425886, 9155912918.518642, 9542539953.84679, 9653547618.806402, 9593434792.936714, 9316633026.420147, 8959273999.588833, 8863548125.445953, 8450615911.730164, 8211598033.615433, 8587083872.162145, 8432613574.987708, 8583943640.722399, 8401731458.393406, 8439359231.367369, 8293779802.711447, 8401506934.147289, 8427506949.839874, 8525176341.071184, 8577080109.482346, 8575106681.347283, 8594987363.896849, 8701703698.13697, 8854967559.695303, 9029484499.828356, 9168774993.437275, 9221457044.693224, 9194525496.858181, 8997085233.031223, 9024585998.805922, 8819398159.92156, 8807895653.788486, 8777245867.886335, 8869681168.825321, 9017397167.041729, 9173402827.38027, 9345595113.30765, 9530638054.282673, 9701241750.610865, 9749002220.142677, 9762753891.356327, 9802020174.527405, 9874432300.977995, 9883303068.689241, 9873499335.610315, 9780680890.924107, 9672603363.913414, 9569436761.47915, 9321842521.985804, 8968140697.297707, 8646348638.918655, 8616965457.523136, 8648620220.395298, 8702086138.675117, 8859213220.99842, 8999405313.087536, 9105949447.399998, 9220413227.016796, 9358601578.269663, 9451405873.00428, 9552727080.824707, 9695443509.54488, 9836687193.669691, 9970962418.410656, 10135881535.317768, 10189390919.400673, 10070483257.345238, 9532953296.22076, 7261219636.045063], "frame_num": 54068199}
此差异已折叠。
loguru
yacs
jsonlines
scipy==1.2.1
sentencepiece
resampy==0.2.2
SoundFile==0.9.0.post1
soxbindings
kaldiio
typeguard
editdistance
textgrid
此差异已折叠。
{"mean_stat": [3419817384.9589553, 3554070049.1888413, 3818511309.9166613, 4066044518.3850017, 4291564631.2871633, 4447813845.146345, 4533096457.680424, 4535743891.989957, 4529762966.952207, 4506798370.255702, 4563810141.721841, 4621582319.277632, 4717208210.814803, 4782916961.295261, 4800534153.252695, 4816978042.979026, 4813370098.242317, 4783029495.131413, 4797780594.144404, 4697681126.278327, 4615891408.325888, 4660549391.6024275, 4576180438.146472, 4609080513.250168, 4575296489.058092, 4602504837.872262, 4568039825.650208, 4596829549.204861, 4590634987.343898, 4604371982.549804, 4623782318.317643, 4643582410.8842745, 4681460771.788484, 4759470876.31175, 4808639788.683043, 4828470941.416027, 4868984035.113543, 4906503986.801533, 4945995579.443381, 4936645225.986488, 4975902400.919519, 4960230208.656678, 4986734786.199859, 4983472199.8246765, 5002204376.162232, 5030432036.352981, 5060386169.086892, 5093482058.577236, 5118330657.308789, 5137270836.326198, 5140137363.319094, 5144296534.330122, 5158812605.654329, 5166263515.51458, 5156261604.282723, 5155820011.532965, 5154511256.8968, 5152063882.193671, 5153425524.412178, 5149000486.683038, 5154587156.35868, 5134412165.07972, 5092874838.792056, 5062281231.5140915, 5029059442.072953, 4996045017.917702, 4962203662.170533, 4928110046.282831, 4900476581.092096, 4881407033.533021, 4859626116.955097, 4851430742.3865795, 4850317443.454599, 4848197040.155383, 4837178106.464577, 4818448202.7298765, 4803345264.527405, 4765785994.104498, 4735296707.352132, 4699957946.40757], "var_stat": [39487786239.20539, 42865198005.60155, 49718916704.468704, 55953639455.490585, 62156293826.00315, 66738657819.12445, 69416921986.47835, 69657873431.17258, 69240303799.53061, 68286972351.43054, 69718367152.18843, 71405427710.7103, 74174200331.87572, 76047347951.43869, 76478048614.40665, 76810929560.19212, 76540466184.85634, 75538479521.34026, 75775624554.07217, 72775991318.16557, 70350402972.93352, 71358602366.48341, 68872845697.9878, 69552396791.49916, 68471390455.59991, 69022047288.07498, 67982260910.11236, 68656154716.71916, 68461419064.9241, 68795285460.65717, 69270474608.52791, 69754495937.76433, 70596044579.14969, 72207936275.97945, 73629619360.65047, 74746445259.57487, 75925168496.81197, 76973508692.04265, 78074337163.3413, 77765963787.96971, 78839167623.49733, 78328768943.2287, 79016127287.03778, 78922638306.99306, 79489768324.9408, 80354861037.44005, 81311991408.12526, 82368205917.26112, 83134782296.1741, 83667769421.23245, 83673751953.46239, 83806087685.62842, 84193971202.07523, 84424752763.34825, 84092846117.64104, 84039114093.08766, 83982515225.7085, 83909645482.75613, 83947278563.15077, 83800767707.19617, 83851106027.8772, 83089292432.37892, 82056425825.3622, 81138570746.92316, 80131843258.75557, 79130160837.19037, 78092166878.71533, 77104785522.79205, 76308548392.10454, 75709445890.58063, 75084778641.6033, 74795849006.19067, 74725807683.832, 74645651838.2169, 74300193368.39339, 73696619147.86806, 73212785808.97992, 72240491743.0697, 71420246227.32545, 70457076435.4593], "frame_num": 345484372}
此差异已折叠。
loguru
yacs
jsonlines
scipy==1.2.1
sentencepiece
resampy==0.2.2
SoundFile==0.9.0.post1
soxbindings
kaldiio
typeguard
editdistance
textgrid
此差异已折叠。
此差异已折叠。
git+https://github.com/PaddlePaddle/Parakeet@8040cb0#egg=paddle-parakeet
此差异已折叠。
此差异已折叠。
git+https://github.com/PaddlePaddle/Parakeet@8040cb0#egg=paddle-parakeet
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册