README.md 2.5 KB
Newer Older
G
gongenlei 已提交
1
([简体中文](./README_cn.md)|English)
K
KP 已提交
2 3 4 5
# Speech Translation
## Introduction
Speech translation is the process by which conversational spoken phrases are instantly translated and spoken aloud in a second language.

小湉湉's avatar
小湉湉 已提交
6
This demo is an implementation to recognize text from a specific audio file and translate it to the target language. It can be done by a single command or a few lines in python using `PaddleSpeech`. 
K
KP 已提交
7 8 9

## Usage
### 1. Installation
J
Jackwaterveg 已提交
10 11 12 13
see https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md. 

You can choose one way from easy, meduim and hard to install paddlespeech.

K
KP 已提交
14 15

### 2. Prepare Input File
小湉湉's avatar
小湉湉 已提交
16
The input of this demo should be a WAV file(`.wav`).
K
KP 已提交
17 18 19

Here are sample files for this demo that can be downloaded:
```bash
K
KP 已提交
20
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
K
KP 已提交
21 22
```

小湉湉's avatar
小湉湉 已提交
23
### 3. Usage (not support for Windows now)
K
KP 已提交
24 25
- Command Line(Recommended)
  ```bash
K
KP 已提交
26
  paddlespeech st --input ./en.wav
K
KP 已提交
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
  ```
  Usage:
  ```bash
  paddlespeech st --help
  ```
  Arguments:
  - `input`(required): Audio file to recognize and translate.
  - `model`: Model type of st task. Default: `fat_st_ted`.
  - `src_lang`: Source language. Default: `en`.
  - `tgt_lang`: Target language. Default: `zh`.
  - `sample_rate`: Sample rate of the model. Default: `16000`.
  - `config`: Config of st task. Use pretrained model when it is None. Default: `None`.
  - `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`.
  - `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.

  Output:
  ```bash
  [2021-12-09 11:13:03,178] [    INFO] [utils.py] [L225] - ST Result: ['我 在 这栋 建筑 的 古老 门上 敲门 。']
  ```

- Python API
  ```python
  import paddle
  from paddlespeech.cli import STExecutor

  st_executor = STExecutor()
  text = st_executor(
      model='fat_st_ted',
      src_lang='en',
      tgt_lang='zh',
      sample_rate=16000,
      config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
      ckpt_path=None,
      audio_file='./en.wav',
      device=paddle.get_device())
  print('ST Result: \n{}'.format(text))
  ```

  Output:
  ```bash
  ST Result:
  ['我 在 这栋 建筑 的 古老 门上 敲门 。'] 
  ```

### 4.Pretrained Models
小湉湉's avatar
小湉湉 已提交
72
Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:
K
KP 已提交
73 74 75 76

| Model | Source Language | Target Language
| :--- | :---: | :---: |
| fat_st_ted| en| zh