Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
ebfe3e6b
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 2 年 前同步成功
通知
210
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
ebfe3e6b
编写于
4月 03, 2022
作者:
X
xiongxinlei
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
test.py update the CSVDataset, test=doc
上级
acebfad7
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
32 addition
and
3 deletion
+32
-3
paddlespeech/vector/io/batch.py
paddlespeech/vector/io/batch.py
+1
-1
paddlespeech/vector/io/dataset.py
paddlespeech/vector/io/dataset.py
+31
-2
未找到文件。
paddlespeech/vector/io/batch.py
浏览文件 @
ebfe3e6b
...
@@ -60,7 +60,7 @@ def pad_right_2d(x, target_length, axis=-1, mode='constant', **kwargs):
...
@@ -60,7 +60,7 @@ def pad_right_2d(x, target_length, axis=-1, mode='constant', **kwargs):
def
batch_feature_normalize
(
batch
,
mean_norm
:
bool
=
True
,
std_norm
:
bool
=
True
):
def
batch_feature_normalize
(
batch
,
mean_norm
:
bool
=
True
,
std_norm
:
bool
=
True
):
ids
=
[
item
[
'id'
]
for
item
in
batch
]
ids
=
[
item
[
'
utt_
id'
]
for
item
in
batch
]
lengths
=
np
.
asarray
([
item
[
'feat'
].
shape
[
1
]
for
item
in
batch
])
lengths
=
np
.
asarray
([
item
[
'feat'
].
shape
[
1
]
for
item
in
batch
])
feats
=
list
(
feats
=
list
(
map
(
lambda
x
:
pad_right_2d
(
x
,
lengths
.
max
()),
map
(
lambda
x
:
pad_right_2d
(
x
,
lengths
.
max
()),
...
...
paddlespeech/vector/io/dataset.py
浏览文件 @
ebfe3e6b
...
@@ -16,6 +16,7 @@ from dataclasses import fields
...
@@ -16,6 +16,7 @@ from dataclasses import fields
from
paddle.io
import
Dataset
from
paddle.io
import
Dataset
from
paddleaudio
import
load
as
load_audio
from
paddleaudio
import
load
as
load_audio
from
paddleaudio.compliance.librosa
import
melspectrogram
from
paddlespeech.s2t.utils.log
import
Log
from
paddlespeech.s2t.utils.log
import
Log
logger
=
Log
(
__name__
).
getlog
()
logger
=
Log
(
__name__
).
getlog
()
...
@@ -48,19 +49,39 @@ class meta_info:
...
@@ -48,19 +49,39 @@ class meta_info:
label
:
str
label
:
str
# csv dataset support feature type
# raw: return the pcm data sample point
# melspectrogram: fbank feature
feat_funcs
=
{
'raw'
:
None
,
'melspectrogram'
:
melspectrogram
,
}
class
CSVDataset
(
Dataset
):
class
CSVDataset
(
Dataset
):
def
__init__
(
self
,
csv_path
,
label2id_path
=
None
,
config
=
None
):
def
__init__
(
self
,
csv_path
,
label2id_path
=
None
,
config
=
None
,
random_chunk
=
True
,
feat_type
:
str
=
"raw"
,
**
kwargs
):
"""Implement the CSV Dataset
"""Implement the CSV Dataset
Args:
Args:
csv_path (str): csv dataset file path
csv_path (str): csv dataset file path
label2id_path (str): the utterance label to integer id map file path
label2id_path (str): the utterance label to integer id map file path
config (CfgNode): yaml config
config (CfgNode): yaml config
feat_type (str): dataset feature type. if it is raw, it return pcm data.
kwargs : feature type args
"""
"""
super
().
__init__
()
super
().
__init__
()
self
.
csv_path
=
csv_path
self
.
csv_path
=
csv_path
self
.
label2id_path
=
label2id_path
self
.
label2id_path
=
label2id_path
self
.
config
=
config
self
.
config
=
config
self
.
random_chunk
=
random_chunk
self
.
feat_type
=
feat_type
self
.
feat_config
=
kwargs
self
.
id2label
=
{}
self
.
id2label
=
{}
self
.
label2id
=
{}
self
.
label2id
=
{}
self
.
data
=
self
.
load_data_csv
()
self
.
data
=
self
.
load_data_csv
()
...
@@ -128,7 +149,15 @@ class CSVDataset(Dataset):
...
@@ -128,7 +149,15 @@ class CSVDataset(Dataset):
# we only return the waveform as feat
# we only return the waveform as feat
waveform
=
waveform
[
start
:
stop
]
waveform
=
waveform
[
start
:
stop
]
record
.
update
({
'feat'
:
waveform
})
# all availabel feature type is in feat_funcs
assert
self
.
feat_type
in
feat_funcs
.
keys
(),
\
f
"Unknown feat_type:
{
self
.
feat_type
}
, it must be one in
{
list
(
feat_funcs
.
keys
())
}
"
feat_func
=
feat_funcs
[
self
.
feat_type
]
feat
=
feat_func
(
waveform
,
sr
=
sr
,
**
self
.
feat_config
)
if
feat_func
else
waveform
record
.
update
({
'feat'
:
feat
})
if
self
.
label2id
:
if
self
.
label2id
:
record
.
update
({
'label'
:
self
.
label2id
[
record
[
'label'
]]})
record
.
update
({
'label'
:
self
.
label2id
[
record
[
'label'
]]})
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录