Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
b769579e
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 2 年 前同步成功
通知
210
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
b769579e
编写于
3月 22, 2021
作者:
H
Hui Zhang
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add audio utils
上级
7635f98b
变更
4
隐藏空白更改
内联
并排
Showing
4 changed file
with
87 addition
and
2 deletion
+87
-2
deepspeech/frontend/utility.py
deepspeech/frontend/utility.py
+84
-0
deepspeech/modules/encoder_layer.py
deepspeech/modules/encoder_layer.py
+1
-1
deepspeech/utils/common.py
deepspeech/utils/common.py
+1
-1
requirements.txt
requirements.txt
+1
-0
未找到文件。
deepspeech/frontend/utility.py
浏览文件 @
b769579e
...
...
@@ -13,6 +13,8 @@
# limitations under the License.
"""Contains data helper functions."""
import
numpy
as
np
import
math
import
json
import
codecs
import
os
...
...
@@ -50,3 +52,85 @@ def read_manifest(manifest_path, max_duration=float('inf'), min_duration=0.0):
json_data
[
"duration"
]
>=
min_duration
):
manifest
.
append
(
json_data
)
return
manifest
def
rms_to_db
(
rms
:
float
):
"""Root Mean Square to dB.
Args:
rms ([float]): root mean square
Returns:
float: dB
"""
return
20.0
*
math
.
log10
(
max
(
1e-16
,
rms
))
def
rms_to_dbfs
(
rms
:
float
):
"""Root Mean Square to dBFS.
https://fireattack.wordpress.com/2017/02/06/replaygain-loudness-normalization-and-applications/
Audio is mix of sine wave, so 1 amp sine wave's Full scale is 0.7071, equal to -3.0103dB.
dB = dBFS + 3.0103
dBFS = db - 3.0103
e.g. 0 dB = -3.0103 dBFS
Args:
rms ([float]): root mean square
Returns:
float: dBFS
"""
return
rms_to_db
(
rms
)
-
3.0103
def
max_dbfs
(
sample_data
:
np
.
ndarry
):
"""Peak dBFS based on the maximum energy sample.
Args:
sample_data ([np.ndarry]): float array, [-1, 1].
Returns:
float: dBFS
"""
# Peak dBFS based on the maximum energy sample. Will prevent overdrive if used for normalization.
return
rms_to_dbfs
(
max
(
abs
(
np
.
min
(
sample_data
)),
abs
(
np
.
max
(
sample_data
))))
def
mean_dbfs
(
sample_data
):
"""Peak dBFS based on the RMS energy.
Args:
sample_data ([np.ndarry]): float array, [-1, 1].
Returns:
float: dBFS
"""
return
rms_to_dbfs
(
math
.
sqrt
(
np
.
mean
(
np
.
square
(
sample_data
,
dtype
=
np
.
float64
))))
def
gain_db_to_ratio
(
gain_db
:
float
):
"""dB to ratio
Args:
gain_db (float): gain in dB
Returns:
float: scale in amp
"""
return
math
.
pow
(
10.0
,
gain_db
/
20.0
)
def
normalize_audio
(
sample_data
:
np
.
ndarry
,
dbfs
:
float
=-
3.0103
):
"""Nomalize audio to dBFS.
Args:
sample_data (np.ndarry): input wave samples, [-1, 1].
dbfs (float, optional): target dBFS. Defaults to -3.0103.
Returns:
np.ndarry: normalized wave
"""
return
np
.
maximum
(
np
.
minimum
(
sample_data
*
gain_db_to_ratio
(
dbfs
-
max_dbfs
(
sample_data
)),
1.0
),
-
1.0
)
deepspeech/modules/encoder_layer.py
浏览文件 @
b769579e
...
...
@@ -133,7 +133,7 @@ class ConformerEncoderLayer(nn.Layer):
def
__init__
(
self
,
size
:
int
,
self_attn
:
int
,
self_attn
:
nn
.
Layer
,
feed_forward
:
Optional
[
nn
.
Layer
]
=
None
,
feed_forward_macaron
:
Optional
[
nn
.
Layer
]
=
None
,
conv_module
:
Optional
[
nn
.
Layer
]
=
None
,
...
...
deepspeech/utils/common.py
浏览文件 @
b769579e
...
...
@@ -110,4 +110,4 @@ def log_add(args: List[int]) -> float:
return
-
float
(
'inf'
)
a_max
=
max
(
args
)
lsp
=
math
.
log
(
sum
(
math
.
exp
(
a
-
a_max
)
for
a
in
args
))
return
a_max
+
lsp
return
a_max
+
lsp
\ No newline at end of file
requirements.txt
浏览文件 @
b769579e
...
...
@@ -6,3 +6,4 @@ tensorboardX
yacs
typeguard
pre-commit
paddlepaddle-gpu
==2.0.0
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录