Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
943d4ac1
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 2 年 前同步成功
通知
210
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
943d4ac1
编写于
3月 29, 2022
作者:
H
Hui Zhang
提交者:
GitHub
3月 29, 2022
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #1612 from Jackwaterveg/update
[ASR] Replace kaidi_fbank with paddleaudio
上级
0cc97857
f47146af
变更
3
隐藏空白更改
内联
并排
Showing
3 changed file
with
75 addition
and
4 deletion
+75
-4
examples/aishell/asr1/conf/preprocess.yaml
examples/aishell/asr1/conf/preprocess.yaml
+0
-4
paddlespeech/s2t/transform/spectrogram.py
paddlespeech/s2t/transform/spectrogram.py
+74
-0
paddlespeech/s2t/transform/transformation.py
paddlespeech/s2t/transform/transformation.py
+1
-0
未找到文件。
examples/aishell/asr1/conf/preprocess.yaml
浏览文件 @
943d4ac1
...
@@ -23,7 +23,3 @@ process:
...
@@ -23,7 +23,3 @@ process:
n_mask
:
2
n_mask
:
2
inplace
:
true
inplace
:
true
replace_with_zero
:
false
replace_with_zero
:
false
paddlespeech/s2t/transform/spectrogram.py
浏览文件 @
943d4ac1
...
@@ -14,8 +14,11 @@
...
@@ -14,8 +14,11 @@
# Modified from espnet(https://github.com/espnet/espnet)
# Modified from espnet(https://github.com/espnet/espnet)
import
librosa
import
librosa
import
numpy
as
np
import
numpy
as
np
import
paddle
from
python_speech_features
import
logfbank
from
python_speech_features
import
logfbank
import
paddleaudio.compliance.kaldi
as
kaldi
def
stft
(
x
,
def
stft
(
x
,
n_fft
,
n_fft
,
...
@@ -309,6 +312,77 @@ class IStft():
...
@@ -309,6 +312,77 @@ class IStft():
class
LogMelSpectrogramKaldi
():
class
LogMelSpectrogramKaldi
():
def
__init__
(
self
,
fs
=
16000
,
n_mels
=
80
,
n_shift
=
160
,
# unit:sample, 10ms
win_length
=
400
,
# unit:sample, 25ms
energy_floor
=
0.0
,
dither
=
0.1
):
"""
The Kaldi implementation of LogMelSpectrogram
Args:
fs (int): sample rate of the audio
n_mels (int): number of mel filter banks
n_shift (int): number of points in a frame shift
win_length (int): number of points in a frame windows
energy_floor (float): Floor on energy in Spectrogram computation (absolute)
dither (float): Dithering constant
Returns:
LogMelSpectrogramKaldi
"""
self
.
fs
=
fs
self
.
n_mels
=
n_mels
num_point_ms
=
fs
/
1000
self
.
n_frame_length
=
win_length
/
num_point_ms
self
.
n_frame_shift
=
n_shift
/
num_point_ms
self
.
energy_floor
=
energy_floor
self
.
dither
=
dither
def
__repr__
(
self
):
return
(
"{name}(fs={fs}, n_mels={n_mels}, "
"n_frame_shift={n_frame_shift}, n_frame_length={n_frame_length}, "
"dither={dither}))"
.
format
(
name
=
self
.
__class__
.
__name__
,
fs
=
self
.
fs
,
n_mels
=
self
.
n_mels
,
n_frame_shift
=
self
.
n_frame_shift
,
n_frame_length
=
self
.
n_frame_length
,
dither
=
self
.
dither
,
))
def
__call__
(
self
,
x
,
train
):
"""
Args:
x (np.ndarray): shape (Ti,)
train (bool): True, train mode.
Raises:
ValueError: not support (Ti, C)
Returns:
np.ndarray: (T, D)
"""
dither
=
self
.
dither
if
train
else
0.0
if
x
.
ndim
!=
1
:
raise
ValueError
(
"Not support x: [Time, Channel]"
)
waveform
=
paddle
.
to_tensor
(
np
.
expand_dims
(
x
,
0
),
dtype
=
paddle
.
float32
)
mat
=
kaldi
.
fbank
(
waveform
,
n_mels
=
self
.
n_mels
,
frame_length
=
self
.
n_frame_length
,
frame_shift
=
self
.
n_frame_shift
,
dither
=
dither
,
energy_floor
=
self
.
energy_floor
,
sr
=
self
.
fs
)
mat
=
np
.
squeeze
(
mat
.
numpy
())
return
mat
class
LogMelSpectrogramKaldi_decay
():
def
__init__
(
def
__init__
(
self
,
self
,
fs
=
16000
,
fs
=
16000
,
...
...
paddlespeech/s2t/transform/transformation.py
浏览文件 @
943d4ac1
...
@@ -31,6 +31,7 @@ import_alias = dict(
...
@@ -31,6 +31,7 @@ import_alias = dict(
freq_mask
=
"paddlespeech.s2t.transform.spec_augment:FreqMask"
,
freq_mask
=
"paddlespeech.s2t.transform.spec_augment:FreqMask"
,
spec_augment
=
"paddlespeech.s2t.transform.spec_augment:SpecAugment"
,
spec_augment
=
"paddlespeech.s2t.transform.spec_augment:SpecAugment"
,
speed_perturbation
=
"paddlespeech.s2t.transform.perturb:SpeedPerturbation"
,
speed_perturbation
=
"paddlespeech.s2t.transform.perturb:SpeedPerturbation"
,
speed_perturbation_sox
=
"paddlespeech.s2t.transform.perturb:SpeedPerturbationSox"
,
volume_perturbation
=
"paddlespeech.s2t.transform.perturb:VolumePerturbation"
,
volume_perturbation
=
"paddlespeech.s2t.transform.perturb:VolumePerturbation"
,
noise_injection
=
"paddlespeech.s2t.transform.perturb:NoiseInjection"
,
noise_injection
=
"paddlespeech.s2t.transform.perturb:NoiseInjection"
,
bandpass_perturbation
=
"paddlespeech.s2t.transform.perturb:BandpassPerturbation"
,
bandpass_perturbation
=
"paddlespeech.s2t.transform.perturb:BandpassPerturbation"
,
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录