diff --git a/README.md b/README.md
index da413001a51a74be9de4c70f6c1b8f23732a3597..a1d6777e123b65c7242e13259de1440733bfa682 100644
--- a/README.md
+++ b/README.md
@@ -23,7 +23,7 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
4.What is the goal of this project?
-->
-**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech, with the state-of-art and influential models.
+**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.
##### Speech-to-Text
@@ -86,77 +86,76 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
For more synthesized audios, please refer to [PaddleSpeech Text-to-Speech samples](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html).
+##### Speech Translation
+
+
+
+
+
+ Input Audio |
+ Translations Result |
+
+
+
+
+
+
+ 
+ |
+ 我 在 这栋 建筑 的 古老 门上 敲门。 |
+
+
+
+
+
+
Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at:
-- **Fast and Light-weight**: we provide high-speed and ultra-lightweight models that are convenient for industrial deployment.
+- **Ease of Use**: low barries to install, and [CLI](#quick-start) is available to quick-start your journey.
+- **Align to the State-of-the-Art**: we provide high-speed and ultra-lightweight models, and also cutting edge technology.
- **Rule-based Chinese frontend**: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.
- **Varieties of Functions that Vitalize both Industrial and Academia**:
- - *Implementation of critical audio tasks*: this toolkit contains audio functions like Speech Translation, Automatic Speech Recognition, Text-to-Speech Synthesis, Voice Cloning, etc.
+ - *Implementation of critical audio tasks*: this toolkit contains audio functions like Audio Classification, Speech Translation, Automatic Speech Recognition, Text-to-Speech Synthesis, etc.
- *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model list](#model-list) for more details.
- - *Cascaded models application*: as an extension of the application of traditional audio tasks, we combine the workflows of aforementioned tasks with other fields like Natural language processing (NLP), like Punctuation Restoration.
+ - *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
## Installation
-The base environment in this page is
-- Ubuntu 16.04
-- python>=3.7
-- paddlepaddle>=2.2.0
-
-If you want to set up PaddleSpeech in other environment, please see the [installation](./docs/source/install.md) documents for all the alternatives.
+We strongly recommend our users to install PaddleSpeech in *Linux* with *python>=3.7* and *paddlepaddle>=2.2.0*, where `paddlespeech` can be easily installed with `pip`:
+```python
+pip install paddlespeech
+```
+If you want to set up in other environment, please see the [installation](./docs/source/install.md) for all the alternatives.
## Quick Start
-Developers can have a try of our model with only a few lines of code.
-
-A tiny DeepSpeech2 **Speech-to-Text** model training on toy set of LibriSpeech:
+Developers can have a try of our models with [PaddleSpeech Command Line](./paddlespeech/cli/README.md). Change `--input` to test your own audio/text.
+**Audio Classification**
+```shell
+paddlespeech cls --input input.wav
+```
+**Automatic Speech Recognition**
```shell
-cd examples/tiny/asr0/
-# source the environment
-source path.sh
-source ../../../utils/parse_options.sh
-# prepare data
-bash ./local/data.sh
-# train model, all `ckpt` under `exp` dir, if you use paddlepaddle-gpu, you can set CUDA_VISIBLE_DEVICES before the train script
-./local/train.sh conf/deepspeech2.yaml deepspeech2 offline
-# avg n best model to get the test model, in this case, n = 1
-avg.sh best exp/deepspeech2/checkpoints 1
-# evaluate the test model
-./local/test.sh conf/deepspeech2.yaml exp/deepspeech2/checkpoints/avg_1 offline
+paddlespeech asr --lang zh --input input_16k.wav
```
+**Speech Translation** (English to Chinese)
-For **Text-to-Speech**, try pretrained FastSpeech2 + Parallel WaveGAN on CSMSC:
+(not support for Windows now)
+```shell
+paddlespeech st --input input_16k.wav
+```
+**Text-to-Speech**
```shell
-cd examples/csmsc/tts3
-# download the pretrained models and unaip them
-wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_ckpt_0.4.zip
-unzip pwg_baker_ckpt_0.4.zip
-wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_ckpt_0.4.zip
-unzip fastspeech2_nosil_baker_ckpt_0.4.zip
-# source the environment
-source path.sh
-# run end-to-end synthesize
-FLAGS_allocator_strategy=naive_best_fit \
-FLAGS_fraction_of_gpu_memory_to_use=0.01 \
-python3 ${BIN_DIR}/synthesize_e2e.py \
- --fastspeech2-config=fastspeech2_nosil_baker_ckpt_0.4/default.yaml \
- --fastspeech2-checkpoint=fastspeech2_nosil_baker_ckpt_0.4/snapshot_iter_76000.pdz \
- --fastspeech2-stat=fastspeech2_nosil_baker_ckpt_0.4/speech_stats.npy \
- --pwg-config=pwg_baker_ckpt_0.4/pwg_default.yaml \
- --pwg-checkpoint=pwg_baker_ckpt_0.4/pwg_snapshot_iter_400000.pdz \
- --pwg-stat=pwg_baker_ckpt_0.4/pwg_stats.npy \
- --text=${BIN_DIR}/../sentences.txt \
- --output-dir=exp/default/test_e2e \
- --inference-dir=exp/default/inference \
- --phones-dict=fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt
+paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!" --output output.wav
```
-If you want to try more functions like training and tuning, please see [Speech-to-Text Quick Start](./docs/source/asr/quick_start.md) and [Text-to-Speech Quick Start](./docs/source/tts/quick_start.md).
+If you want to try more functions like training and tuning, please have a look at [Speech-to-Text Quick Start](./docs/source/asr/quick_start.md) and [Text-to-Speech Quick Start](./docs/source/tts/quick_start.md).
## Model List
-PaddleSpeech supports a series of most popular models, summarized in [released models](./docs/source/released_model.md) with available pretrained models.
+PaddleSpeech supports a series of most popular models. They are summarized in [released models](./docs/source/released_model.md) and attached with available pretrained models.
-Speech-to-Text module contains *Acoustic Model* and *Language Model*, with the following details:
+**Speech-to-Text** contains *Acoustic Model* and *Language Model*, with the following details: