Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
DeepSpeech
提交
e8f2d8f1
D
DeepSpeech
项目概览
PaddlePaddle
/
DeepSpeech
大约 2 年 前同步成功
通知
210
Star
8425
Fork
1598
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
245
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeech
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
245
Issue
245
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
e8f2d8f1
编写于
3月 01, 2022
作者:
H
Hui Zhang
提交者:
GitHub
3月 01, 2022
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #1507 from zh794390558/cli
[cli] add cli batch/pipe example to readme
上级
2517df92
335638ba
变更
14
隐藏空白更改
内联
并排
Showing
14 changed file
with
68 addition
and
30 deletion
+68
-30
.gitignore
.gitignore
+4
-0
README.md
README.md
+13
-2
README_cn.md
README_cn.md
+11
-0
demos/speech_recognition/.gitignore
demos/speech_recognition/.gitignore
+1
-0
demos/speech_recognition/README.md
demos/speech_recognition/README.md
+2
-0
demos/speech_recognition/README_cn.md
demos/speech_recognition/README_cn.md
+2
-0
demos/speech_recognition/run.sh
demos/speech_recognition/run.sh
+6
-0
demos/text_to_speech/README.md
demos/text_to_speech/README.md
+4
-1
demos/text_to_speech/README_cn.md
demos/text_to_speech/README_cn.md
+4
-0
demos/text_to_speech/run.sh
demos/text_to_speech/run.sh
+4
-0
paddlespeech/s2t/io/sampler.py
paddlespeech/s2t/io/sampler.py
+1
-1
paddlespeech/s2t/models/u2_st/u2_st.py
paddlespeech/s2t/models/u2_st/u2_st.py
+1
-3
paddlespeech/t2s/modules/transformer/repeat.py
paddlespeech/t2s/modules/transformer/repeat.py
+1
-1
tests/unit/asr/deepspeech2_online_model_test.py
tests/unit/asr/deepspeech2_online_model_test.py
+14
-22
未找到文件。
.gitignore
浏览文件 @
e8f2d8f1
...
@@ -2,6 +2,7 @@
...
@@ -2,6 +2,7 @@
*.pyc
*.pyc
.vscode
.vscode
*log
*log
*.wav
*.pdmodel
*.pdmodel
*.pdiparams*
*.pdiparams*
*.zip
*.zip
...
@@ -30,5 +31,8 @@ tools/OpenBLAS/
...
@@ -30,5 +31,8 @@ tools/OpenBLAS/
tools/Miniconda3-latest-Linux-x86_64.sh
tools/Miniconda3-latest-Linux-x86_64.sh
tools/activate_python.sh
tools/activate_python.sh
tools/miniconda.sh
tools/miniconda.sh
tools/CRF++-0.58/
speechx/fc_patch/
*output/
*output/
README.md
浏览文件 @
e8f2d8f1
...
@@ -196,16 +196,18 @@ Developers can have a try of our models with [PaddleSpeech Command Line](./paddl
...
@@ -196,16 +196,18 @@ Developers can have a try of our models with [PaddleSpeech Command Line](./paddl
```
shell
```
shell
paddlespeech cls
--input
input.wav
paddlespeech cls
--input
input.wav
```
```
**Automatic Speech Recognition**
**Automatic Speech Recognition**
```
shell
```
shell
paddlespeech asr
--lang
zh
--input
input_16k.wav
paddlespeech asr
--lang
zh
--input
input_16k.wav
```
```
**Speech Translation**
(English to Chinese)
**Speech Translation**
(English to Chinese)
(not support for Mac and Windows now)
(not support for Mac and Windows now)
```
shell
```
shell
paddlespeech st
--input
input_16k.wav
paddlespeech st
--input
input_16k.wav
```
```
**Text-to-Speech**
**Text-to-Speech**
```
shell
```
shell
paddlespeech tts
--input
"你好,欢迎使用飞桨深度学习框架!"
--output
output.wav
paddlespeech tts
--input
"你好,欢迎使用飞桨深度学习框架!"
--output
output.wav
...
@@ -218,7 +220,16 @@ paddlespeech tts --input "你好,欢迎使用飞桨深度学习框架!" --ou
...
@@ -218,7 +220,16 @@ paddlespeech tts --input "你好,欢迎使用飞桨深度学习框架!" --ou
paddlespeech text
--task
punc
--input
今天的天气真不错啊你下午有空吗我想约你一起去吃饭
paddlespeech text
--task
punc
--input
今天的天气真不错啊你下午有空吗我想约你一起去吃饭
```
```
**Batch Process**
```
echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
```
**Shell Pipeline**
ASR + Punc:
```
paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
```
For more command lines, please see:
[
demos
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos
)
For more command lines, please see:
[
demos
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos
)
...
...
README_cn.md
浏览文件 @
e8f2d8f1
...
@@ -216,6 +216,17 @@ paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!
...
@@ -216,6 +216,17 @@ paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!
paddlespeech text
--task
punc
--input
今天的天气真不错啊你下午有空吗我想约你一起去吃饭
paddlespeech text
--task
punc
--input
今天的天气真不错啊你下午有空吗我想约你一起去吃饭
```
```
**批处理**
```
echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
```
**Shell管道**
ASR + Punc:
```
paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
```
更多命令行命令请参考
[
demos
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos
)
更多命令行命令请参考
[
demos
](
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos
)
> Note: 如果需要训练或者微调,请查看[语音识别](./docs/source/asr/quick_start.md), [语音合成](./docs/source/tts/quick_start.md)。
> Note: 如果需要训练或者微调,请查看[语音识别](./docs/source/asr/quick_start.md), [语音合成](./docs/source/tts/quick_start.md)。
...
...
demos/speech_recognition/.gitignore
0 → 100644
浏览文件 @
e8f2d8f1
*.wav
demos/speech_recognition/README.md
浏览文件 @
e8f2d8f1
...
@@ -27,6 +27,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
...
@@ -27,6 +27,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
paddlespeech asr
--input
./zh.wav
paddlespeech asr
--input
./zh.wav
# English
# English
paddlespeech asr
--model
transformer_librispeech
--lang
en
--input
./en.wav
paddlespeech asr
--model
transformer_librispeech
--lang
en
--input
./en.wav
# Chinese ASR + Punctuation Restoration
paddlespeech asr
--input
./zh.wav | paddlespeech text
--task
punc
```
```
(It doesn't matter if package
`paddlespeech-ctcdecoders`
is not found, this package is optional.)
(It doesn't matter if package
`paddlespeech-ctcdecoders`
is not found, this package is optional.)
...
...
demos/speech_recognition/README_cn.md
浏览文件 @
e8f2d8f1
...
@@ -25,6 +25,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
...
@@ -25,6 +25,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
paddlespeech asr
--input
./zh.wav
paddlespeech asr
--input
./zh.wav
# 英文
# 英文
paddlespeech asr
--model
transformer_librispeech
--lang
en
--input
./en.wav
paddlespeech asr
--model
transformer_librispeech
--lang
en
--input
./en.wav
# 中文 + 标点恢复
paddlespeech asr
--input
./zh.wav | paddlespeech text
--task
punc
```
```
(如果显示
`paddlespeech-ctcdecoders`
这个 python 包没有找到的 Error,没有关系,这个包是非必须的。)
(如果显示
`paddlespeech-ctcdecoders`
这个 python 包没有找到的 Error,没有关系,这个包是非必须的。)
...
...
demos/speech_recognition/run.sh
浏览文件 @
e8f2d8f1
#!/bin/bash
#!/bin/bash
wget
-c
https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget
-c
https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
# asr
paddlespeech asr
--input
./zh.wav
paddlespeech asr
--input
./zh.wav
# asr + punc
paddlespeech asr
--input
./zh.wav | paddlespeech text
--task
punc
\ No newline at end of file
demos/text_to_speech/README.md
浏览文件 @
e8f2d8f1
...
@@ -17,11 +17,14 @@ The input of this demo should be a text of the specific language that can be pas
...
@@ -17,11 +17,14 @@ The input of this demo should be a text of the specific language that can be pas
### 3. Usage
### 3. Usage
-
Command Line (Recommended)
-
Command Line (Recommended)
-
Chinese
-
Chinese
The default acoustic model is
`Fastspeech2`
, and the default vocoder is
`Parallel WaveGAN`
.
The default acoustic model is
`Fastspeech2`
, and the default vocoder is
`Parallel WaveGAN`
.
```
bash
```
bash
paddlespeech tts
--input
"你好,欢迎使用百度飞桨深度学习框架!"
paddlespeech tts
--input
"你好,欢迎使用百度飞桨深度学习框架!"
```
```
-
Batch Process
```
bash
echo
-e
"1 欢迎光临。
\n
2 谢谢惠顾。"
| paddlespeech tts
```
-
Chinese, use
`SpeedySpeech`
as the acoustic model
-
Chinese, use
`SpeedySpeech`
as the acoustic model
```
bash
```
bash
paddlespeech tts
--am
speedyspeech_csmsc
--input
"你好,欢迎使用百度飞桨深度学习框架!"
paddlespeech tts
--am
speedyspeech_csmsc
--input
"你好,欢迎使用百度飞桨深度学习框架!"
...
...
demos/text_to_speech/README_cn.md
浏览文件 @
e8f2d8f1
...
@@ -24,6 +24,10 @@
...
@@ -24,6 +24,10 @@
```
bash
```
bash
paddlespeech tts
--input
"你好,欢迎使用百度飞桨深度学习框架!"
paddlespeech tts
--input
"你好,欢迎使用百度飞桨深度学习框架!"
```
```
-
批处理
```
bash
echo
-e
"1 欢迎光临。
\n
2 谢谢惠顾。"
| paddlespeech tts
```
-
中文,使用
`SpeedySpeech`
作为声学模型
-
中文,使用
`SpeedySpeech`
作为声学模型
```
bash
```
bash
paddlespeech tts
--am
speedyspeech_csmsc
--input
"你好,欢迎使用百度飞桨深度学习框架!"
paddlespeech tts
--am
speedyspeech_csmsc
--input
"你好,欢迎使用百度飞桨深度学习框架!"
...
...
demos/text_to_speech/run.sh
浏览文件 @
e8f2d8f1
#!/bin/bash
#!/bin/bash
# single process
paddlespeech tts
--input
今天的天气不错啊
paddlespeech tts
--input
今天的天气不错啊
# Batch process
echo
-e
"1 欢迎光临。
\n
2 谢谢惠顾。"
| paddlespeech tts
\ No newline at end of file
paddlespeech/s2t/io/sampler.py
浏览文件 @
e8f2d8f1
...
@@ -51,7 +51,7 @@ def _batch_shuffle(indices, batch_size, epoch, clipped=False):
...
@@ -51,7 +51,7 @@ def _batch_shuffle(indices, batch_size, epoch, clipped=False):
"""
"""
rng
=
np
.
random
.
RandomState
(
epoch
)
rng
=
np
.
random
.
RandomState
(
epoch
)
shift_len
=
rng
.
randint
(
0
,
batch_size
-
1
)
shift_len
=
rng
.
randint
(
0
,
batch_size
-
1
)
batch_indices
=
list
(
zip
(
*
[
iter
(
indices
[
shift_len
:])]
*
batch_size
))
batch_indices
=
list
(
zip
(
*
[
iter
(
indices
[
shift_len
:])]
*
batch_size
))
rng
.
shuffle
(
batch_indices
)
rng
.
shuffle
(
batch_indices
)
batch_indices
=
[
item
for
batch
in
batch_indices
for
item
in
batch
]
batch_indices
=
[
item
for
batch
in
batch_indices
for
item
in
batch
]
assert
clipped
is
False
assert
clipped
is
False
...
...
paddlespeech/s2t/models/u2_st/u2_st.py
浏览文件 @
e8f2d8f1
...
@@ -33,8 +33,6 @@ from paddlespeech.s2t.modules.decoder import TransformerDecoder
...
@@ -33,8 +33,6 @@ from paddlespeech.s2t.modules.decoder import TransformerDecoder
from
paddlespeech.s2t.modules.encoder
import
ConformerEncoder
from
paddlespeech.s2t.modules.encoder
import
ConformerEncoder
from
paddlespeech.s2t.modules.encoder
import
TransformerEncoder
from
paddlespeech.s2t.modules.encoder
import
TransformerEncoder
from
paddlespeech.s2t.modules.loss
import
LabelSmoothingLoss
from
paddlespeech.s2t.modules.loss
import
LabelSmoothingLoss
from
paddlespeech.s2t.modules.mask
import
mask_finished_preds
from
paddlespeech.s2t.modules.mask
import
mask_finished_scores
from
paddlespeech.s2t.modules.mask
import
subsequent_mask
from
paddlespeech.s2t.modules.mask
import
subsequent_mask
from
paddlespeech.s2t.utils
import
checkpoint
from
paddlespeech.s2t.utils
import
checkpoint
from
paddlespeech.s2t.utils
import
layer_tools
from
paddlespeech.s2t.utils
import
layer_tools
...
@@ -291,7 +289,7 @@ class U2STBaseModel(nn.Layer):
...
@@ -291,7 +289,7 @@ class U2STBaseModel(nn.Layer):
device
=
speech
.
place
device
=
speech
.
place
# Let's assume B = batch_size and N = beam_size
# Let's assume B = batch_size and N = beam_size
# 1. Encoder and init hypothesis
# 1. Encoder and init hypothesis
encoder_out
,
encoder_mask
=
self
.
_forward_encoder
(
encoder_out
,
encoder_mask
=
self
.
_forward_encoder
(
speech
,
speech_lengths
,
decoding_chunk_size
,
speech
,
speech_lengths
,
decoding_chunk_size
,
num_decoding_left_chunks
,
num_decoding_left_chunks
,
...
...
paddlespeech/t2s/modules/transformer/repeat.py
浏览文件 @
e8f2d8f1
...
@@ -36,4 +36,4 @@ def repeat(N, fn):
...
@@ -36,4 +36,4 @@ def repeat(N, fn):
Returns:
Returns:
MultiSequential: Repeated model instance.
MultiSequential: Repeated model instance.
"""
"""
return
MultiSequential
(
*
[
fn
(
n
)
for
n
in
range
(
N
)])
return
MultiSequential
(
*
[
fn
(
n
)
for
n
in
range
(
N
)])
tests/unit/asr/deepspeech2_online_model_test.py
浏览文件 @
e8f2d8f1
...
@@ -11,16 +11,17 @@
...
@@ -11,16 +11,17 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
import
os
import
pickle
import
unittest
import
unittest
import
numpy
as
np
import
numpy
as
np
import
paddle
import
paddle
import
pickle
import
os
from
paddle
import
inference
from
paddle
import
inference
from
paddlespeech.s2t.models.ds2_online
import
DeepSpeech2ModelOnline
from
paddlespeech.s2t.models.ds2_online
import
DeepSpeech2InferModelOnline
from
paddlespeech.s2t.models.ds2_online
import
DeepSpeech2InferModelOnline
from
paddlespeech.s2t.models.ds2_online
import
DeepSpeech2ModelOnline
class
TestDeepSpeech2ModelOnline
(
unittest
.
TestCase
):
class
TestDeepSpeech2ModelOnline
(
unittest
.
TestCase
):
def
setUp
(
self
):
def
setUp
(
self
):
...
@@ -185,15 +186,12 @@ class TestDeepSpeech2ModelOnline(unittest.TestCase):
...
@@ -185,15 +186,12 @@ class TestDeepSpeech2ModelOnline(unittest.TestCase):
paddle
.
allclose
(
final_state_c_box
,
final_state_c_box_chk
),
True
)
paddle
.
allclose
(
final_state_c_box
,
final_state_c_box_chk
),
True
)
class
TestDeepSpeech2StaticModelOnline
(
unittest
.
TestCase
):
class
TestDeepSpeech2StaticModelOnline
(
unittest
.
TestCase
):
def
setUp
(
self
):
def
setUp
(
self
):
export_prefix
=
"exp/deepspeech2_online/checkpoints/test_export"
export_prefix
=
"exp/deepspeech2_online/checkpoints/test_export"
if
not
os
.
path
.
exists
(
os
.
path
.
dirname
(
export_prefix
)):
if
not
os
.
path
.
exists
(
os
.
path
.
dirname
(
export_prefix
)):
os
.
makedirs
(
os
.
path
.
dirname
(
export_prefix
),
mode
=
0o755
)
os
.
makedirs
(
os
.
path
.
dirname
(
export_prefix
),
mode
=
0o755
)
infer_model
=
DeepSpeech2InferModelOnline
(
infer_model
=
DeepSpeech2InferModelOnline
(
feat_size
=
161
,
feat_size
=
161
,
dict_size
=
4233
,
dict_size
=
4233
,
num_conv_layers
=
2
,
num_conv_layers
=
2
,
...
@@ -207,27 +205,25 @@ class TestDeepSpeech2StaticModelOnline(unittest.TestCase):
...
@@ -207,27 +205,25 @@ class TestDeepSpeech2StaticModelOnline(unittest.TestCase):
with
open
(
"test_data/static_ds2online_inputs.pickle"
,
"rb"
)
as
f
:
with
open
(
"test_data/static_ds2online_inputs.pickle"
,
"rb"
)
as
f
:
self
.
data_dict
=
pickle
.
load
(
f
)
self
.
data_dict
=
pickle
.
load
(
f
)
self
.
setup_model
(
export_prefix
)
self
.
setup_model
(
export_prefix
)
def
setup_model
(
self
,
export_prefix
):
def
setup_model
(
self
,
export_prefix
):
deepspeech_config
=
inference
.
Config
(
deepspeech_config
=
inference
.
Config
(
export_prefix
+
".pdmodel"
,
export_prefix
+
".pdmodel"
,
export_prefix
+
".pdiparams"
)
export_prefix
+
".pdiparams"
)
if
(
'CUDA_VISIBLE_DEVICES'
in
os
.
environ
.
keys
()
and
if
(
'CUDA_VISIBLE_DEVICES'
in
os
.
environ
.
keys
()
and
os
.
environ
[
'CUDA_VISIBLE_DEVICES'
].
strip
()
!=
''
):
os
.
environ
[
'CUDA_VISIBLE_DEVICES'
].
strip
()
!=
''
):
deepspeech_config
.
enable_use_gpu
(
100
,
0
)
deepspeech_config
.
enable_use_gpu
(
100
,
0
)
deepspeech_config
.
enable_memory_optim
()
deepspeech_config
.
enable_memory_optim
()
deepspeech_predictor
=
inference
.
create_predictor
(
deepspeech_config
)
deepspeech_predictor
=
inference
.
create_predictor
(
deepspeech_config
)
self
.
predictor
=
deepspeech_predictor
self
.
predictor
=
deepspeech_predictor
def
test_unit
(
self
):
def
test_unit
(
self
):
input_names
=
self
.
predictor
.
get_input_names
()
input_names
=
self
.
predictor
.
get_input_names
()
audio_handle
=
self
.
predictor
.
get_input_handle
(
input_names
[
0
])
audio_handle
=
self
.
predictor
.
get_input_handle
(
input_names
[
0
])
audio_len_handle
=
self
.
predictor
.
get_input_handle
(
input_names
[
1
])
audio_len_handle
=
self
.
predictor
.
get_input_handle
(
input_names
[
1
])
h_box_handle
=
self
.
predictor
.
get_input_handle
(
input_names
[
2
])
h_box_handle
=
self
.
predictor
.
get_input_handle
(
input_names
[
2
])
c_box_handle
=
self
.
predictor
.
get_input_handle
(
input_names
[
3
])
c_box_handle
=
self
.
predictor
.
get_input_handle
(
input_names
[
3
])
x_chunk
=
self
.
data_dict
[
"audio_chunk"
]
x_chunk
=
self
.
data_dict
[
"audio_chunk"
]
x_chunk_lens
=
self
.
data_dict
[
"audio_chunk_lens"
]
x_chunk_lens
=
self
.
data_dict
[
"audio_chunk_lens"
]
...
@@ -246,13 +242,9 @@ class TestDeepSpeech2StaticModelOnline(unittest.TestCase):
...
@@ -246,13 +242,9 @@ class TestDeepSpeech2StaticModelOnline(unittest.TestCase):
c_box_handle
.
reshape
(
chunk_state_c_box
.
shape
)
c_box_handle
.
reshape
(
chunk_state_c_box
.
shape
)
c_box_handle
.
copy_from_cpu
(
chunk_state_c_box
)
c_box_handle
.
copy_from_cpu
(
chunk_state_c_box
)
output_names
=
self
.
predictor
.
get_output_names
()
output_names
=
self
.
predictor
.
get_output_names
()
output_handle
=
self
.
predictor
.
get_output_handle
(
output_handle
=
self
.
predictor
.
get_output_handle
(
output_names
[
0
])
output_names
[
0
])
output_lens_handle
=
self
.
predictor
.
get_output_handle
(
output_names
[
1
])
output_lens_handle
=
self
.
predictor
.
get_output_handle
(
output_names
[
1
])
output_state_h_handle
=
self
.
predictor
.
get_output_handle
(
output_state_h_handle
=
self
.
predictor
.
get_output_handle
(
output_names
[
2
])
output_names
[
2
])
output_state_c_handle
=
self
.
predictor
.
get_output_handle
(
output_state_c_handle
=
self
.
predictor
.
get_output_handle
(
...
@@ -264,7 +256,7 @@ class TestDeepSpeech2StaticModelOnline(unittest.TestCase):
...
@@ -264,7 +256,7 @@ class TestDeepSpeech2StaticModelOnline(unittest.TestCase):
chunk_state_h_box
=
output_state_h_handle
.
copy_to_cpu
()
chunk_state_h_box
=
output_state_h_handle
.
copy_to_cpu
()
chunk_state_c_box
=
output_state_c_handle
.
copy_to_cpu
()
chunk_state_c_box
=
output_state_c_handle
.
copy_to_cpu
()
return
True
return
True
if
__name__
==
'__main__'
:
if
__name__
==
'__main__'
:
unittest
.
main
()
unittest
.
main
()
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录