Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
8d349432
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 1 年 前同步成功
通知
207
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
未验证
提交
8d349432
编写于
11月 10, 2022
作者:
Z
Zth9730
提交者:
GitHub
11月 10, 2022
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
[ASR] wav2vec2_en, test=asr (#2637)
* wav2vec2_en, test=asr * wav2vec2_en, test=asr * wav2vec2_en, test=asr
上级
07279848
变更
4
隐藏空白更改
内联
并排
Showing
4 changed file
with
12 addition
and
12 deletion
+12
-12
docs/source/released_model.md
docs/source/released_model.md
+1
-1
examples/librispeech/asr3/conf/wav2vec2ASR.yaml
examples/librispeech/asr3/conf/wav2vec2ASR.yaml
+6
-4
paddlespeech/s2t/exps/wav2vec2/model.py
paddlespeech/s2t/exps/wav2vec2/model.py
+5
-2
paddlespeech/s2t/models/wav2vec2/processing/speech_augmentation.py
...ech/s2t/models/wav2vec2/processing/speech_augmentation.py
+0
-5
未找到文件。
docs/source/released_model.md
浏览文件 @
8d349432
...
...
@@ -22,7 +22,7 @@ Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER |
Model | Pre-Train Method | Pre-Train Data | Finetune Data | Size | Descriptions | CER | WER | Example Link |
:-------------:| :------------:| :-----: | -----: | :-----: |:-----:| :-----: | :-----: | :-----: |
[
Wav2vec2-large-960h-lv60-self Model
](
https://paddlespeech.bj.bcebos.com/wav2vec/wav2vec2-large-960h-lv60-self.pdparams
)
| wav2vec2 | Librispeech and LV-60k Dataset (5.3w h) | - | 1.18 GB |Pre-trained Wav2vec2.0 Model | - | - | - |
[
Wav2vec2ASR-large-960h-librispeech Model
](
https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr3/wav2vec2ASR-large-960h-librispeech_ckpt_1.3.
0.model.tar.gz
)
| wav2vec2 | Librispeech and LV-60k Dataset (5.3w h) | Librispeech (960 h) | 1.18 G
B |Encoder: Wav2vec2.0, Decoder: CTC, Decoding method: Greedy search | - | 0.0189 |
[
Wav2vecASR Librispeech ASR3
](
../../examples/librispeech/asr3
)
|
[
Wav2vec2ASR-large-960h-librispeech Model
](
https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr3/wav2vec2ASR-large-960h-librispeech_ckpt_1.3.
1.model.tar.gz
)
| wav2vec2 | Librispeech and LV-60k Dataset (5.3w h) | Librispeech (960 h) | 718 M
B |Encoder: Wav2vec2.0, Decoder: CTC, Decoding method: Greedy search | - | 0.0189 |
[
Wav2vecASR Librispeech ASR3
](
../../examples/librispeech/asr3
)
|
### Language Model based on NGram
Language Model | Training Data | Token-based | Size | Descriptions
...
...
examples/librispeech/asr3/conf/wav2vec2ASR.yaml
浏览文件 @
8d349432
...
...
@@ -70,7 +70,6 @@ train_manifest: data/manifest.train
dev_manifest
:
data/manifest.dev
test_manifest
:
data/manifest.test-clean
###########################################
# Dataloader #
###########################################
...
...
@@ -95,6 +94,12 @@ dist_sampler: True
shortest_first
:
True
return_lens_rate
:
True
############################################
# Data Augmentation #
############################################
audio_augment
:
# for raw audio
sample_rate
:
16000
speeds
:
[
95
,
100
,
105
]
###########################################
# Training #
...
...
@@ -115,6 +120,3 @@ log_interval: 1
checkpoint
:
kbest_n
:
50
latest_n
:
5
augment
:
True
paddlespeech/s2t/exps/wav2vec2/model.py
浏览文件 @
8d349432
...
...
@@ -71,7 +71,8 @@ class Wav2Vec2ASRTrainer(Trainer):
wavs_lens_rate
=
wavs_lens
/
wav
.
shape
[
1
]
target_lens_rate
=
target_lens
/
target
.
shape
[
1
]
wav
=
wav
[:,
:,
0
]
wav
=
self
.
speech_augmentation
(
wav
,
wavs_lens_rate
)
if
hasattr
(
train_conf
,
'speech_augment'
):
wav
=
self
.
speech_augmentation
(
wav
,
wavs_lens_rate
)
loss
=
self
.
model
(
wav
,
wavs_lens_rate
,
target
,
target_lens_rate
)
# loss div by `batch_size * accum_grad`
loss
/=
train_conf
.
accum_grad
...
...
@@ -277,7 +278,9 @@ class Wav2Vec2ASRTrainer(Trainer):
logger
.
info
(
"Setup model!"
)
# setup speech augmentation for wav2vec2
self
.
speech_augmentation
=
TimeDomainSpecAugment
()
if
hasattr
(
config
,
'audio_augment'
)
and
self
.
train
:
self
.
speech_augmentation
=
TimeDomainSpecAugment
(
**
config
.
audio_augment
)
if
not
self
.
train
:
return
...
...
paddlespeech/s2t/models/wav2vec2/processing/speech_augmentation.py
浏览文件 @
8d349432
...
...
@@ -641,14 +641,11 @@ class DropChunk(nn.Layer):
class
TimeDomainSpecAugment
(
nn
.
Layer
):
"""A time-domain approximation of the SpecAugment algorithm.
This augmentation module implements three augmentations in
the time-domain.
1. Drop chunks of the audio (zero amplitude or white noise)
2. Drop frequency bands (with band-drop filters)
3. Speed peturbation (via resampling to slightly different rate)
Arguments
---------
perturb_prob : float from 0 to 1
...
...
@@ -677,7 +674,6 @@ class TimeDomainSpecAugment(nn.Layer):
drop_chunk_noise_factor : float
The noise factor used to scale the white noise inserted, relative to
the average amplitude of the utterance. Default 0 (no noise inserted).
Example
-------
>>> inputs = paddle.randn([10, 16000])
...
...
@@ -718,7 +714,6 @@ class TimeDomainSpecAugment(nn.Layer):
def
forward
(
self
,
waveforms
,
lengths
):
"""Returns the distorted waveforms.
Arguments
---------
waveforms : tensor
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录