Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
bf3eb498
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 1 年 前同步成功
通知
206
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
bf3eb498
编写于
9月 29, 2022
作者:
Y
YangZhou
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add different tess feature config
上级
58197709
变更
6
隐藏空白更改
内联
并排
Showing
6 changed file
with
174 addition
and
9 deletion
+174
-9
examples/tess/README.md
examples/tess/README.md
+34
-0
examples/tess/cls0/conf/panns_logmelspectrogram.yaml
examples/tess/cls0/conf/panns_logmelspectrogram.yaml
+32
-0
examples/tess/cls0/conf/panns_melspectrogram.yaml
examples/tess/cls0/conf/panns_melspectrogram.yaml
+32
-0
examples/tess/cls0/conf/panns_mfcc.yaml
examples/tess/cls0/conf/panns_mfcc.yaml
+33
-0
examples/tess/cls0/conf/panns_spectrogram.yaml
examples/tess/cls0/conf/panns_spectrogram.yaml
+28
-0
examples/tess/cls0/local/train.py
examples/tess/cls0/local/train.py
+15
-9
未找到文件。
examples/tess/README.md
0 → 100644
浏览文件 @
bf3eb498
# 背景
模型任务与模型间接请参见 examples/esc50, 本目录是为了校验和测试 paddle.audio 的feature, backend等相关模块而建立.
## 数据集
[
TESS: Toronto emotional speech set
](
https://tspace.library.utoronto.ca/handle/1807/24487
)
是一个包含有 200 个目标词的时长为 2 ~ 3 秒的音频,七种情绪的数据集。由两个女演员录制(24岁和64岁),其中情绪分别是愤怒,恶心,害怕,高兴,惊喜,伤心,平淡.
## 模型指标
根据
`TESS`
提供的fold信息,对数据集进行 5-fold 的 fine-tune 2 epoch 训练和评估,dev准确率如下:
|Model|feat_type|Acc|
|--|--|--|
|CNN14| mfcc | 0.8304 |
|CNN14| logmelspectrogram | 0.9893 |
|CNN14| spectrogram| 0.1304 |
|CNN14| melspectrogram| 0.1339 |
因为是功能验证,所以只config中训练了 2 个epoch.
log_melspectrogram feature 在迭代 3 个epoch后, acc可以达到0.9983%.
mfcc feature 在迭代3个epoch后, acc可以达到0.9983%.
spectrogram feature 在迭代11个epoch后,acc可达0.95%.
melspectrogram feature 在迭代17个epoch后,acc可到0.9375%.
### 模型训练
启动训练:
```
shell
$ CUDA_VISIBLE_DEVICES
=
0 ./run.sh 1 conf/panns_mfcc.yaml
$ CUDA_VISIBLE_DEVICES
=
0 ./run.sh 1 conf/panns_logmelspectrogram.yaml
$ CUDA_VISIBLE_DEVICES
=
0 ./run.sh 1 conf/panns_melspectrogram.yaml
$ CUDA_VISIBLE_DEVICES
=
0 ./run.sh 1 conf/panns_pectrogram.yaml
```
examples/tess/cls0/conf/panns_logmelspectrogram.yaml
0 → 100644
浏览文件 @
bf3eb498
data
:
dataset
:
'
paddle.audio.datasets:TESS'
num_classes
:
7
train
:
mode
:
'
train'
split
:
1
feat_type
:
'
logmelspectrogram'
dev
:
mode
:
'
dev'
split
:
1
feat_type
:
'
logmelspectrogram'
model
:
backbone
:
'
paddlespeech.cls.models:cnn14'
feature
:
n_fft
:
1024
hop_length
:
320
window
:
'
hann'
win_length
:
1024
f_min
:
50.0
f_max
:
14000.0
n_mels
:
64
training
:
epochs
:
2
learning_rate
:
0.0005
num_workers
:
2
batch_size
:
128
checkpoint_dir
:
'
./checkpoint_logmelspectrogram'
save_freq
:
1
log_freq
:
1
examples/tess/cls0/conf/panns_melspectrogram.yaml
0 → 100644
浏览文件 @
bf3eb498
data
:
dataset
:
'
paddle.audio.datasets:TESS'
num_classes
:
7
train
:
mode
:
'
train'
split
:
1
feat_type
:
'
melspectrogram'
dev
:
mode
:
'
dev'
split
:
1
feat_type
:
'
melspectrogram'
model
:
backbone
:
'
paddlespeech.cls.models:cnn14'
feature
:
n_fft
:
1024
hop_length
:
320
window
:
'
hann'
win_length
:
1024
f_min
:
50.0
f_max
:
14000.0
n_mels
:
64
training
:
epochs
:
2
learning_rate
:
0.0005
num_workers
:
2
batch_size
:
128
checkpoint_dir
:
'
./checkpoint_melspectrogram'
save_freq
:
1
log_freq
:
1
examples/tess/cls0/conf/panns_mfcc.yaml
0 → 100644
浏览文件 @
bf3eb498
data
:
dataset
:
'
paddle.audio.datasets:TESS'
num_classes
:
7
train
:
mode
:
'
train'
split
:
1
feat_type
:
'
mfcc'
dev
:
mode
:
'
dev'
split
:
1
feat_type
:
'
mfcc'
model
:
backbone
:
'
paddlespeech.cls.models:cnn14'
feature
:
n_fft
:
1024
hop_length
:
320
window
:
'
hann'
win_length
:
1024
f_min
:
50.0
f_max
:
14000.0
n_mfcc
:
64
n_mels
:
64
training
:
epochs
:
2
learning_rate
:
0.0005
num_workers
:
2
batch_size
:
128
checkpoint_dir
:
'
./checkpoint_mfcc'
save_freq
:
1
log_freq
:
1
examples/tess/cls0/conf/panns_spectrogram.yaml
0 → 100644
浏览文件 @
bf3eb498
data
:
dataset
:
'
paddle.audio.datasets:TESS'
num_classes
:
7
train
:
mode
:
'
train'
split
:
1
feat_type
:
'
spectrogram'
dev
:
mode
:
'
dev'
split
:
1
feat_type
:
'
spectrogram'
model
:
backbone
:
'
paddlespeech.cls.models:cnn14'
feature
:
n_fft
:
126
hop_length
:
320
window
:
'
hann'
training
:
epochs
:
2
learning_rate
:
0.0005
num_workers
:
2
batch_size
:
128
checkpoint_dir
:
'
./checkpoint_spectrogram'
save_freq
:
1
log_freq
:
1
examples/tess/cls0/local/train.py
浏览文件 @
bf3eb498
...
...
@@ -22,6 +22,7 @@ from paddleaudio.utils import Timer
from
paddlespeech.cls.models
import
SoundClassifier
from
paddlespeech.utils.dynamic_import
import
dynamic_import
# yapf: disable
parser
=
argparse
.
ArgumentParser
(
__doc__
)
parser
.
add_argument
(
"--cfg_path"
,
type
=
str
,
required
=
True
)
...
...
@@ -61,12 +62,17 @@ if __name__ == "__main__":
model_conf
=
config
[
'model'
]
data_conf
=
config
[
'data'
]
feat_conf
=
config
[
'feature'
]
feat_type
=
data_conf
[
'train'
][
'feat_type'
]
training_conf
=
config
[
'training'
]
# Dataset
# set audio backend, make sure paddleaudio >= 1.0.2 installed.
paddle
.
audio
.
backends
.
set_backend
(
'soundfile'
)
ds_class
=
dynamic_import
(
data_conf
[
'dataset'
])
train_ds
=
ds_class
(
**
data_conf
[
'train'
])
dev_ds
=
ds_class
(
**
data_conf
[
'dev'
])
train_ds
=
ds_class
(
**
data_conf
[
'train'
]
,
**
feat_conf
)
dev_ds
=
ds_class
(
**
data_conf
[
'dev'
]
,
**
feat_conf
)
train_sampler
=
paddle
.
io
.
DistributedBatchSampler
(
train_ds
,
batch_size
=
training_conf
[
'batch_size'
],
...
...
@@ -101,7 +107,7 @@ if __name__ == "__main__":
num_corrects
=
0
num_samples
=
0
for
batch_idx
,
batch
in
enumerate
(
train_loader
):
feats
,
labels
,
length
=
batch
# feats(N, length, n_mels)
feats
,
labels
,
length
=
batch
# feats
-->
(N, length, n_mels)
logits
=
model
(
feats
)
...
...
@@ -129,7 +135,7 @@ if __name__ == "__main__":
avg_loss
/=
training_conf
[
'log_freq'
]
avg_acc
=
num_corrects
/
num_samples
print_msg
=
'
Epoch={}/{}, Step={}/{}'
.
format
(
print_msg
=
feat_type
+
'
Epoch={}/{}, Step={}/{}'
.
format
(
epoch
,
training_conf
[
'epochs'
],
batch_idx
+
1
,
steps_per_epoch
)
print_msg
+=
' loss={:.4f}'
.
format
(
avg_loss
)
...
...
@@ -153,23 +159,23 @@ if __name__ == "__main__":
dev_ds
,
batch_sampler
=
dev_sampler
,
num_workers
=
training_conf
[
'num_workers'
],
return_list
=
True
,
)
return_list
=
True
,
use_buffer_reader
=
True
,
collate_fn
=
_collate_features
)
model
.
eval
()
num_corrects
=
0
num_samples
=
0
with
logger
.
processing
(
'Evaluation on validation dataset'
):
for
batch_idx
,
batch
in
enumerate
(
dev_loader
):
waveforms
,
labels
=
batch
feats
=
feature_extractor
(
waveforms
)
feats
,
labels
,
length
=
batch
logits
=
model
(
feats
)
preds
=
paddle
.
argmax
(
logits
,
axis
=
1
)
num_corrects
+=
(
preds
==
labels
).
numpy
().
sum
()
num_samples
+=
feats
.
shape
[
0
]
print_msg
=
'[Evaluation result]
'
print_msg
=
'[Evaluation result]
'
+
str
(
feat_type
)
print_msg
+=
' dev_acc={:.4f}'
.
format
(
num_corrects
/
num_samples
)
logger
.
eval
(
print_msg
)
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录