update readme cn add python api

d7a75527 · iftaken · 4a58ff6c · d7a75527
显示空白变更内容
内联并排

Showing with 200 addition and 148 deletion

README_cn.md README_cn.md +200 -148

未找到文件。
--- a/README_cn.md
+++ b/README_cn.md
@@ -39,7 +39,117 @@

 **PaddleSpeech** 荣获 [NAACL2022 Best Demo Award](https://2022.naacl.org/blog/best-demo-award/), 请访问 [Arxiv](https://arxiv.org/abs/2205.12007) 论文。
  
+### 效果展示

+##### 语音识别
+
+<div align = "center">
+<table style="width:100%">
+  <thead>
+    <tr>
+      <th> 输入音频  </th>
+      <th width="550"> 识别结果 </th>
+    </tr>
+  </thead>
+  <tbody>
+   <tr>
+      <td align = "center">
+      <a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav" rel="nofollow">
+            <img align="center" src="./docs/images/audio_icon.png" width="200 style="max-width: 100%;"></a><br>
+      </td>
+      <td >I knocked at the door on the ancient side of the building.</td>
+    </tr>
+    <tr>
+      <td align = "center">
+      <a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav" rel="nofollow">
+            <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
+      </td>
+      <td>我认为跑步最重要的就是给我带来了身体健康。</td>
+    </tr>
+  </tbody>
+</table>
+
+</div>
+
+##### 语音翻译 (英译中)
+
+<div align = "center">
+<table style="width:100%">
+  <thead>
+    <tr>
+      <th> 输入音频 </th>
+      <th width="550"> 翻译结果 </th>
+    </tr>
+  </thead>
+  <tbody>
+   <tr>
+      <td align = "center">
+      <a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav" rel="nofollow">
+            <img align="center" src="./docs/images/audio_icon.png" width="200 style="max-width: 100%;"></a><br>
+      </td>
+      <td >我 在 这栋 建筑 的 古老 门上 敲门。</td>
+    </tr>
+  </tbody>
+</table>
+
+</div>
+
+##### 语音合成
+<div align = "center">
+<table style="width:100%">
+  <thead>
+    <tr>
+      <th width="550">输入文本</th>
+      <th>合成音频</th>
+    </tr>
+  </thead>
+  <tbody>
+   <tr>
+      <td >Life was like a box of chocolates, you never know what you're gonna get.</td>
+      <td align = "center">
+      <a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/tacotron2_ljspeech_waveflow_samples_0.2/sentence_1.wav" rel="nofollow">
+            <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
+      </td>
+    </tr>
+    <tr>
+      <td >早上好，今天是2020/10/29，最低温度是-3°C。</td>
+      <td align = "center">
+      <a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/001.wav" rel="nofollow">
+            <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
+      </td>
+    </tr>
+    <tr>
+      <td >季姬寂，集鸡，鸡即棘鸡。棘鸡饥叽，季姬及箕稷济鸡。鸡既济，跻姬笈，季姬忌，急咭鸡，鸡急，继圾几，季姬急，即籍箕击鸡，箕疾击几伎，伎即齑，鸡叽集几基，季姬急极屐击鸡，鸡既殛，季姬激，即记《季姬击鸡记》。</td>
+      <td align = "center">
+      <a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/jijiji.wav" rel="nofollow">
+            <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
+      </td>
+    </tr>
+  </tbody>
+</table>
+
+</div>
+
+更多合成音频，可以参考 [PaddleSpeech 语音合成音频示例](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html)。
+
+##### 标点恢复
+<div align = "center">
+<table style="width:100%">
+  <thead>
+    <tr>
+      <th width="390"> 输入文本 </th>
+      <th width="390"> 输出文本 </th>
+    </tr>
+  </thead>
+  <tbody>
+   <tr>
+      <td>今天的天气真不错啊你下午有空吗我想约你一起去吃饭</td>
+      <td>今天的天气真不错啊！你下午有空吗？我想约你一起去吃饭。</td>
+    </tr>
+  </tbody>
+</table>
+
+</div>


 ### 特性
@@ -47,7 +157,7 @@
 本项目采用了易用、高效、灵活以及可扩展的实现，旨在为工业应用、学术研究提供更好的支持，实现的功能包含训练、推断以及测试模块，以及部署过程，主要包括
 - 📦 **易用性**: 安装门槛低，可使用 [CLI](#quick-start) 快速开始。
 - 🏆 **对标 SoTA**: 提供了高速、轻量级模型，且借鉴了最前沿的技术。
- 🏆 **流式ASR和TTS系统**：工业级的端到端流式识别、流式合成系统。
+- 🏆 **流式 ASR 和 TTS 系统**：工业级的端到端流式识别、流式合成系统。
 - 💯 **基于规则的中文前端**: 我们的前端包含文本正则化和字音转换（G2P）。此外，我们使用自定义语言规则来适应中文语境。
 - **多种工业界以及学术界主流功能支持**:
  - 🛎️ 典型音频任务: 本工具包提供了音频任务如音频分类、语音翻译、自动语音识别、文本转语音、语音合成、声纹识别、KWS等任务的实现。
@@ -82,7 +192,7 @@

 ### 相关依赖
 + gcc >= 4.8.5
-+ paddlepaddle >= 2.2.0
+ paddlepaddle >= 2.3.1
 + python >= 3.7
 + linux(推荐), mac, windows

@@ -108,12 +218,14 @@ pip install pytest-runner
 pip install .
 ```

-更多关于安装问题，如conda环境，librosa依赖的系统库，gcc环境问题，kaldi安装等，可以参考这篇[安装文档](docs/source/install_cn.md)，如安装上遇到问题可以在[#2150](https://github.com/PaddlePaddle/PaddleSpeech/issues/2150)上留言以及查找相关问题
+更多关于安装问题，如 conda 环境，librosa 依赖的系统库，gcc 环境问题，kaldi 安装等，可以参考这篇[安装文档](docs/source/install_cn.md)，如安装上遇到问题可以在 [#2150](https://github.com/PaddlePaddle/PaddleSpeech/issues/2150) 上留言以及查找相关问题

 <a name="快速开始"></a>
 ## 快速开始

-安装完成后，开发者可以通过命令行或者Python快速开始，改变 `--input` 可以尝试用自己的音频或文本测试，支持16k wav格式音频。你也可以在`aistudio`中进行体验[PaddleSpeech API Demo ](https://aistudio.baidu.com/aistudio/projectdetail/4281335?shared=1)。
+安装完成后，开发者可以通过命令行或者Python快速开始，命令行模式下改变 `--input` 可以尝试用自己的音频或文本测试，支持16k wav格式音频。
+
+你也可以在`aistudio`中快速体验 👉🏻[PaddleSpeech API Demo ](https://aistudio.baidu.com/aistudio/projectdetail/4281335?shared=1)。

 测试音频示例下载
 ```shell
@@ -121,95 +233,147 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
 wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
 ```

-**语音识别**
+### 语音识别
+<details><summary>&emsp;（点击可展开）开源中文语音识别</summary>
+
+命令行一键体验
+
 ```shell
 paddlespeech asr --lang zh --input zh.wav
 ```

+Python API 一键预测
+
 ```python
-from paddlespeech.cli.asr.infer import ASRExecutor
-asr = ASRExecutor()
-result = asr(audio_file="zh.wav")
+>>> from paddlespeech.cli.asr.infer import ASRExecutor
+>>> asr = ASRExecutor()
+>>> result = asr(audio_file="zh.wav")
+>>> print(result)
+我认为跑步最重要的就是给我带来了身体健康
 ```
+</details>
+
+### 语音合成
+
+<details><summary>&emsp;开源中文语音合成</summary>
+
+输出 24k 采样率wav格式音频

-**语音合成** 

-输出24k采样率wav格式音频
+命令行一键体验

 ```shell
 paddlespeech tts --input "你好，欢迎使用百度飞桨深度学习框架！" --output output.wav
 ```

+Python API 一键预测
+
 ```python
-from paddlespeech.cli.tts.infer import TTSExecutor
-tts = TTSExecutor()
-tts(text="今天天气十分不错。", output="output.wav")
+>>> from paddlespeech.cli.tts.infer import TTSExecutor
+>>> tts = TTSExecutor()
+>>> tts(text="今天天气十分不错。", output="output.wav")
 ```
- 语音合成的 web demo 已经集成进了 [Huggingface Spaces](https://huggingface.co/spaces). 请参考: [TTS Demo](https://huggingface.co/spaces/akhaliq/paddlespeech)
+- 语音合成的 web demo 已经集成进了 [Huggingface Spaces](https://huggingface.co/spaces). 请参考: [TTS Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS)

-**声音分类**   
+</details>
+
+### 声音分类   
+
+<details><summary>&emsp;适配多场景的开放领域声音分类工具</summary>

-输出声音分类结果
+基于AudioSet数据集527个类别的声音分类模型
+
+命令行一键体验

 ```shell
 paddlespeech cls --input zh.wav
 ```

+python API 一键预测
+
 ```python
-from paddlespeech.cli.cls.infer import CLSExecutor
-cls = CLSExecutor()
-result = cls(audio_file="zh.wav")
+>>> from paddlespeech.cli.cls.infer import CLSExecutor
+>>> cls = CLSExecutor()
+>>> result = cls(audio_file="zh.wav")
+>>> print(result)
+Speech 0.9027186632156372
 ```

-**声纹提取**
+</details>
+
+### 声纹提取

-输出声音提取向量
+<details><summary>&emsp;工业级声纹提取工具</summary>
+
+命令行一键体验

 ```shell
 paddlespeech vector --task spk --input zh.wav
 ```

+Python API 一键预测
+
 ```python
-from paddlespeech.cli.vector import VectorExecutor
-vec = VectorExecutor()
-result = vec(audio_file="zh.wav")
+>>> from paddlespeech.cli.vector import VectorExecutor
+>>> vec = VectorExecutor()
+>>> result = vec(audio_file="zh.wav")
+>>> print(result) # 187维向量
+[ -0.19083306   9.474295   -14.122263    -2.0916545    0.04848729
+   4.9295826    1.4780062    0.3733844   10.695862     3.2697146
+  -4.48199     -0.6617882   -9.170393   -11.1568775   -1.2358263 ...]
 ```

-**标点恢复** 
+</details>
+
+### 标点恢复 

-输出添加标点的文本
+<details><summary>&emsp;一键恢复文本标点，可与ASR模型配合使用</summary>
+
+命令行一键体验

 ```shell
 paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
 ```

+Python API 一键预测
+
 ```python
-from paddlespeech.cli.text.infer import TextExecutor
-text_punc = TextExecutor()
-result = text_punc(text="今天的天气真不错啊你下午有空吗我想约你一起去吃饭")
+>>> from paddlespeech.cli.text.infer import TextExecutor
+>>> text_punc = TextExecutor()
+>>> result = text_punc(text="今天的天气真不错啊你下午有空吗我想约你一起去吃饭")
+今天的天气真不错啊！你下午有空吗？我想约你一起去吃饭。
 ```

-**语音翻译**
+</details>
+
+### 语音翻译
+
+<details><summary>&emsp;端到端英译中语音翻译工具</summary>

 使用预编译的kaldi相关工具，只支持在Ubuntu系统中体验

+命令行一键体验
+
 ```shell
 paddlespeech st --input en.wav
 ```

+python API 一键预测
+
 ```python
-from paddlespeech.cli.st.infer import STExecutor
-st = STExecutor()
-result = st(audio_file="en.wav")
+>>> from paddlespeech.cli.st.infer import STExecutor
+>>> st = STExecutor()
+>>> result = st(audio_file="en.wav")
+['我 在 这栋 建筑 的 古老 门上 敲门 。']
 ```

+</details>
+

-更多命令行命令请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos)
-> Note: 如果需要训练或者微调，请查看[语音识别](./docs/source/asr/quick_start.md)， [语音合成](./docs/source/tts/quick_start.md)。

 <a name="快速使用服务"></a>
 ## 快速使用服务
-安装完成后，开发者可以通过命令行快速使用服务。
+安装完成后，开发者可以通过命令行一键启动语音识别，语音合成，音频分类三种服务。

 **启动服务**     
 ```shell
@@ -587,118 +751,6 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声

 语音合成模块最初被称为 [Parakeet](https://github.com/PaddlePaddle/Parakeet)，现在与此仓库合并。如果您对该任务的学术研究感兴趣，请参阅 [TTS 研究概述](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview)。此外，[模型介绍](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/tts/models_introduction.md) 是了解语音合成流程的一个很好的指南。

-## 效果展示
-
-##### 语音识别
-
-<div align = "center">
-<table style="width:100%">
-  <thead>
-    <tr>
-      <th> 输入音频  </th>
-      <th width="550"> 识别结果 </th>
-    </tr>
-  </thead>
-  <tbody>
-   <tr>
-      <td align = "center">
-      <a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav" rel="nofollow">
-            <img align="center" src="./docs/images/audio_icon.png" width="200 style="max-width: 100%;"></a><br>
-      </td>
-      <td >I knocked at the door on the ancient side of the building.</td>
-    </tr>
-    <tr>
-      <td align = "center">
-      <a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav" rel="nofollow">
-            <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
-      </td>
-      <td>我认为跑步最重要的就是给我带来了身体健康。</td>
-    </tr>
-  </tbody>
-</table>
-
-</div>
-
-##### 语音翻译 (英译中)
-
-<div align = "center">
-<table style="width:100%">
-  <thead>
-    <tr>
-      <th> 输入音频 </th>
-      <th width="550"> 翻译结果 </th>
-    </tr>
-  </thead>
-  <tbody>
-   <tr>
-      <td align = "center">
-      <a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav" rel="nofollow">
-            <img align="center" src="./docs/images/audio_icon.png" width="200 style="max-width: 100%;"></a><br>
-      </td>
-      <td >我 在 这栋 建筑 的 古老 门上 敲门。</td>
-    </tr>
-  </tbody>
-</table>
-
-</div>
-
-##### 语音合成
-<div align = "center">
-<table style="width:100%">
-  <thead>
-    <tr>
-      <th width="550">输入文本</th>
-      <th>合成音频</th>
-    </tr>
-  </thead>
-  <tbody>
-   <tr>
-      <td >Life was like a box of chocolates, you never know what you're gonna get.</td>
-      <td align = "center">
-      <a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/tacotron2_ljspeech_waveflow_samples_0.2/sentence_1.wav" rel="nofollow">
-            <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
-      </td>
-    </tr>
-    <tr>
-      <td >早上好，今天是2020/10/29，最低温度是-3°C。</td>
-      <td align = "center">
-      <a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/001.wav" rel="nofollow">
-            <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
-      </td>
-    </tr>
-    <tr>
-      <td >季姬寂，集鸡，鸡即棘鸡。棘鸡饥叽，季姬及箕稷济鸡。鸡既济，跻姬笈，季姬忌，急咭鸡，鸡急，继圾几，季姬急，即籍箕击鸡，箕疾击几伎，伎即齑，鸡叽集几基，季姬急极屐击鸡，鸡既殛，季姬激，即记《季姬击鸡记》。</td>
-      <td align = "center">
-      <a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/jijiji.wav" rel="nofollow">
-            <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
-      </td>
-    </tr>
-  </tbody>
-</table>
-
-</div>
-
-更多合成音频，可以参考 [PaddleSpeech 语音合成音频示例](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html)。
-
-##### 标点恢复
-<div align = "center">
-<table style="width:100%">
-  <thead>
-    <tr>
-      <th width="390"> 输入文本 </th>
-      <th width="390"> 输出文本 </th>
-    </tr>
-  </thead>
-  <tbody>
-   <tr>
-      <td>今天的天气真不错啊你下午有空吗我想约你一起去吃饭</td>
-      <td>今天的天气真不错啊！你下午有空吗？我想约你一起去吃饭。</td>
-    </tr>
-  </tbody>
-</table>
-
-</div>
-

 ## ⭐ 应用案例
 - **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): 使用 PaddleSpeech 的语音合成模块生成虚拟人的声音。**