Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
a8448714
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
1 年多 前同步成功
通知
207
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
a8448714
编写于
6月 15, 2021
作者:
H
Hui Zhang
提交者:
GitHub
6月 15, 2021
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #669 from iclementine/dsp
add kaldi-style frame and stft
上级
02537195
abccbb5c
变更
1
显示空白变更内容
内联
并排
Showing
1 changed file
with
146 addition
and
0 deletion
+146
-0
third_party/paddle_audio/frontend.py
third_party/paddle_audio/frontend.py
+146
-0
未找到文件。
third_party/paddle_audio/frontend.py
0 → 100644
浏览文件 @
a8448714
from
typing
import
Tuple
import
numpy
as
np
import
paddle
from
paddle
import
Tensor
from
paddle
import
nn
from
paddle.nn
import
functional
as
F
def
frame
(
x
:
Tensor
,
num_samples
:
Tensor
,
win_length
:
int
,
hop_length
:
int
,
clip
:
bool
=
True
)
->
Tuple
[
Tensor
,
Tensor
]:
"""Extract frames from audio.
Parameters
----------
x : Tensor
Shape (N, T), batched waveform.
num_samples : Tensor
Shape (N, ), number of samples of each waveform.
win_length : int
Window length.
hop_length : int
Number of samples shifted between ajancent frames.
clip : bool, optional
Whether to clip audio that does not fit into the last frame, by
default True
Returns
-------
frames : Tensor
Shape (N, T', win_length).
num_frames : Tensor
Shape (N, ) number of valid frames
"""
assert
hop_length
<=
win_length
num_frames
=
(
num_samples
-
win_length
)
//
hop_length
padding
=
(
0
,
0
)
if
not
clip
:
num_frames
+=
1
# NOTE: pad hop_length - 1 to the right to ensure that there is at most
# one frame dangling to the righe edge
padding
=
(
0
,
hop_length
-
1
)
weight
=
paddle
.
eye
(
win_length
).
unsqueeze
(
1
)
frames
=
F
.
conv1d
(
x
.
unsqueeze
(
1
),
weight
,
padding
=
padding
,
stride
=
(
hop_length
,
))
return
frames
,
num_frames
class
STFT
(
nn
.
Layer
):
"""A module for computing stft transformation in a differentiable way.
Parameters
------------
n_fft : int
Number of samples in a frame.
hop_length : int
Number of samples shifted between adjacent frames.
win_length : int
Length of the window.
clip: bool
Whether to clip audio is necesaary.
"""
def
__init__
(
self
,
n_fft
:
int
,
hop_length
:
int
,
win_length
:
int
,
window_type
:
str
=
None
,
clip
:
bool
=
True
):
super
().
__init__
()
self
.
hop_length
=
hop_length
self
.
n_bin
=
1
+
n_fft
//
2
self
.
n_fft
=
n_fft
self
.
clip
=
clip
# calculate window
if
window_type
is
None
:
window
=
np
.
ones
(
win_length
)
elif
window_type
==
"hann"
:
window
=
np
.
hanning
(
win_length
)
elif
window_type
==
"hamming"
:
window
=
np
.
hamming
(
win_length
)
else
:
raise
ValueError
(
"Not supported yet!"
)
if
win_length
<
n_fft
:
window
=
F
.
pad
(
window
,
(
0
,
n_fft
-
win_length
))
elif
win_length
>
n_fft
:
window
=
window
[:
n_fft
]
# (n_bins, n_fft) complex
kernel_size
=
min
(
n_fft
,
win_length
)
weight
=
np
.
fft
.
fft
(
np
.
eye
(
n_fft
))[:
self
.
n_bin
,
:
kernel_size
]
w_real
=
weight
.
real
w_imag
=
weight
.
imag
# (2 * n_bins, kernel_size)
w
=
np
.
concatenate
([
w_real
,
w_imag
],
axis
=
0
)
w
=
w
*
window
# (2 * n_bins, 1, kernel_size) # (C_out, C_in, kernel_size)
w
=
np
.
expand_dims
(
w
,
1
)
weight
=
paddle
.
cast
(
paddle
.
to_tensor
(
w
),
paddle
.
get_default_dtype
())
self
.
register_buffer
(
"weight"
,
weight
)
def
forward
(
self
,
x
:
Tensor
,
num_samples
:
Tensor
)
->
Tuple
[
Tensor
,
Tensor
]:
"""Compute the stft transform.
Parameters
------------
x : Tensor [shape=(B, T)]
The input waveform.
num_samples : Tensor
Number of samples of each waveform.
Returns
------------
D : Tensor
Shape(N, T', n_bins, 2) Spectrogram.
num_frames: Tensor
Shape (N,) number of samples of each spectrogram
"""
num_frames
=
(
num_samples
-
self
.
win_length
)
//
self
.
hop_length
padding
=
(
0
,
0
)
if
not
self
.
clip
:
num_frames
+=
1
padding
=
(
0
,
self
.
hop_length
-
1
)
batch_size
,
_
,
_
=
paddle
.
shape
(
x
)
x
=
x
.
unsqueeze
(
-
1
)
D
=
F
.
conv1d
(
self
.
weight
,
x
,
stride
=
(
self
.
hop_length
,
),
padding
=
padding
,
data_format
=
"NLC"
)
D
=
paddle
.
reshape
(
D
,
[
batch_size
,
-
1
,
self
.
n_bin
,
2
])
return
D
,
num_frames
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录