Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
ad40dafa
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 2 年 前同步成功
通知
210
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
ad40dafa
编写于
1月 12, 2023
作者:
Z
zxcd
提交者:
GitHub
1月 12, 2023
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
fix some bug. (#2825)
上级
faa2f866
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
11 addition
and
7 deletion
+11
-7
paddlespeech/s2t/models/whisper/tokenizer.py
paddlespeech/s2t/models/whisper/tokenizer.py
+4
-0
paddlespeech/s2t/models/whisper/whipser.py
paddlespeech/s2t/models/whisper/whipser.py
+7
-7
未找到文件。
paddlespeech/s2t/models/whisper/tokenizer.py
浏览文件 @
ad40dafa
...
@@ -155,6 +155,10 @@ class Tokenizer:
...
@@ -155,6 +155,10 @@ class Tokenizer:
if
ids
<
len
(
self
.
tokenizer
):
if
ids
<
len
(
self
.
tokenizer
):
ids_list
.
append
(
ids
)
ids_list
.
append
(
ids
)
token_ids
=
ids_list
token_ids
=
ids_list
elif
len
(
token_ids
)
==
1
:
token_ids
=
token_ids
[
0
]
else
:
raise
ValueError
(
f
"token_ids
{
token_ids
}
load error."
)
return
self
.
tokenizer
.
decode
(
token_ids
,
**
kwargs
)
return
self
.
tokenizer
.
decode
(
token_ids
,
**
kwargs
)
...
...
paddlespeech/s2t/models/whisper/whipser.py
浏览文件 @
ad40dafa
...
@@ -17,12 +17,11 @@ from typing import Union
...
@@ -17,12 +17,11 @@ from typing import Union
import
numpy
as
np
import
numpy
as
np
import
paddle
import
paddle
import
paddle.nn.functional
as
F
import
paddle.nn.functional
as
F
import
paddlespeech.s2t.modules.align
as
paddlespeech_nn
import
soundfile
import
soundfile
import
tqdm
import
tqdm
from
paddle
import
nn
from
paddle
import
nn
from
paddle.distribution
import
Categorical
from
paddle.distribution
import
Categorical
import
paddlespeech.s2t.modules.align
as
paddlespeech_nn
from
paddlespeech.s2t.models.whisper
import
utils
from
paddlespeech.s2t.models.whisper
import
utils
from
paddlespeech.s2t.models.whisper.tokenizer
import
get_tokenizer
from
paddlespeech.s2t.models.whisper.tokenizer
import
get_tokenizer
from
paddlespeech.s2t.models.whisper.tokenizer
import
LANGUAGES
from
paddlespeech.s2t.models.whisper.tokenizer
import
LANGUAGES
...
@@ -771,8 +770,10 @@ class GreedyDecoder(TokenDecoder):
...
@@ -771,8 +770,10 @@ class GreedyDecoder(TokenDecoder):
if
temperature
==
0
:
if
temperature
==
0
:
next_tokens
=
paddle
.
argmax
(
logits
,
axis
=-
1
)
next_tokens
=
paddle
.
argmax
(
logits
,
axis
=-
1
)
else
:
else
:
next_tokens
=
Categorical
(
logits
=
logits
/
temperature
).
sample
(
next_tokens
=
Categorical
(
logits
=
logits
/
temperature
).
sample
([
1
])
shape
=
logits
.
shape
)
next_tokens
=
paddle
.
reshape
(
next_tokens
,
[
next_tokens
.
shape
[
0
]
*
next_tokens
.
shape
[
1
],
])
logprobs
=
F
.
log_softmax
(
logits
,
axis
=-
1
,
dtype
=
paddle
.
float32
)
logprobs
=
F
.
log_softmax
(
logits
,
axis
=-
1
,
dtype
=
paddle
.
float32
)
current_logprobs
=
logprobs
[
paddle
.
arange
(
logprobs
.
shape
[
0
]),
current_logprobs
=
logprobs
[
paddle
.
arange
(
logprobs
.
shape
[
0
]),
...
@@ -1205,9 +1206,8 @@ class DecodingTask:
...
@@ -1205,9 +1206,8 @@ class DecodingTask:
DecodingResult
(
DecodingResult
(
audio_features
=
features
,
audio_features
=
features
,
language
=
language
,
language
=
language
,
language_probs
=
probs
)
language_probs
=
probs
)
for
features
,
language
,
probs
in
for
features
,
language
,
probs
in
zip
(
audio_features
,
languages
,
zip
(
audio_features
,
languages
,
language_probs
)
language_probs
)
]
]
# repeat the audio & text tensors by the group size, for beam search or best-of-n sampling
# repeat the audio & text tensors by the group size, for beam search or best-of-n sampling
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录