README.md 7.2 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158
([简体中文](./README_cn.md)|English)
# Speech Verification)

## Introduction

Speaker Verification, refers to the problem of getting a speaker embedding from an audio. 

This demo is an implementation to extract speaker embedding from a specific audio file. It can be done by a single command or a few lines in python using `PaddleSpeech`. 

## Usage
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).

You can choose one way from easy, meduim and hard to install paddlespeech.

### 2. Prepare Input File
The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.

Here are sample files for this demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
```

### 3. Usage
- Command Line(Recommended)
  ```bash
  paddlespeech vector --task spk --input 85236145389.wav

  echo -e "demo1 85236145389.wav" > vec.job
  paddlespeech vector --task spk --input vec.job

  echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk
  ```
  
  Usage:
  ```bash
  paddlespeech asr --help
  ```
  Arguments:
  - `input`(required): Audio file to recognize.
  - `model`: Model type of asr task. Default: `conformer_wenetspeech`.
  - `sample_rate`: Sample rate of the model. Default: `16000`.
  - `config`: Config of asr task. Use pretrained model when it is None. Default: `None`.
  - `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`.
  - `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.

  Output:

```bash
  demo [ -5.749211     9.505463    -8.200284    -5.2075014    5.3940268
  -3.04878      1.611095    10.127234   -10.534177   -15.821609
   1.2032688   -0.35080156   1.2629458  -12.643498    -2.5758228
 -11.343508     2.3385992   -8.719341    14.213509    15.404744
  -0.39327756   6.338786     2.688887     8.7104025   17.469526
  -8.77959      7.0576906    4.648855    -1.3089896  -23.294737
   8.013747    13.891729    -9.926753     5.655307    -5.9422326
 -22.842539     0.6293588  -18.46266    -10.811862     9.8192625
   3.0070958    3.8072643   -2.3861165    3.0821571  -14.739942
   1.7594414   -0.6485091    4.485623     2.0207152    7.264915
  -6.40137     23.63524      2.9711294  -22.708025     9.93719
  20.354511   -10.324688    -0.700492    -8.783211    -5.27593
  15.999649     3.3004563   12.747926    15.429879     4.7849145
   5.6699696   -2.3826702   10.605882     3.9112158    3.1500628
  15.859915    -2.1832209  -23.908653    -6.4799504   -4.5365124
  -9.224193    14.568347   -10.568833     4.982321    -4.342062
   0.0914714   12.645902    -5.74285     -3.2141201   -2.7173362
  -6.680575     0.4757669   -5.035051    -6.7964664   16.865469
 -11.54324      7.681869     0.44475392   9.708182    -8.932846
   0.4123232   -4.361452     1.3948607    9.511665     0.11667654
   2.9079323    6.049952     9.275183   -18.078873     6.2983274
  -0.7500531   -2.725033    -7.6027865    3.3404543    2.990815
   4.010979    11.000591    -2.8873312    7.1352735  -16.79663
  18.495346   -14.293832     7.89578      2.2714825   22.976387
  -4.875734    -3.0836344   -2.9999814   13.751918     6.448228
 -11.924197     2.171869     2.0423572   -6.173772    10.778437
  25.77281     -4.9495463   14.57806      0.3044315    2.6132357
  -7.591999    -2.076944     9.025118     1.7834753   -3.1799617
  -4.9401326   23.465864     5.1685796   -9.018578     9.037825
  -4.4150195    6.859591   -12.274467    -0.88911164   5.186309
  -3.9988663  -13.638606    -9.925445    -0.06329413  -3.6709652
 -12.397416   -12.719869    -1.395601     2.1150916    5.7381287
  -4.4691963   -3.82819     -0.84233856  -1.1604277  -13.490127
   8.731719   -20.778936   -11.495662     5.8033476   -4.752041
  10.833007    -6.717991     4.504732    13.4244375    1.1306485
   7.3435574    1.400918    14.704036    -9.501399     7.2315617
  -6.417456     1.3333273   11.872697    -0.30664724   8.8845
   6.5569253    4.7948146    0.03662816  -8.704245     6.224871
  -3.2701402  -11.508579  ]
  ```

- Python API
  ```python
  import paddle
  from paddlespeech.cli import VectorExecutor

  vector_executor = VectorExecutor()
  audio_emb = vector_executor(
      model='ecapatdnn_voxceleb12',
      sample_rate=16000,
      config=None, 
      ckpt_path=None,
      audio_file='./85236145389.wav',
      force_yes=False,
      device=paddle.get_device())
  print('Audio embedding Result: \n{}'.format(audio_emb))
  ```

  Output:
  ```bash
  # Vector Result:
  [ -5.749211     9.505463    -8.200284    -5.2075014    5.3940268
  -3.04878      1.611095    10.127234   -10.534177   -15.821609
   1.2032688   -0.35080156   1.2629458  -12.643498    -2.5758228
 -11.343508     2.3385992   -8.719341    14.213509    15.404744
  -0.39327756   6.338786     2.688887     8.7104025   17.469526
  -8.77959      7.0576906    4.648855    -1.3089896  -23.294737
   8.013747    13.891729    -9.926753     5.655307    -5.9422326
 -22.842539     0.6293588  -18.46266    -10.811862     9.8192625
   3.0070958    3.8072643   -2.3861165    3.0821571  -14.739942
   1.7594414   -0.6485091    4.485623     2.0207152    7.264915
  -6.40137     23.63524      2.9711294  -22.708025     9.93719
  20.354511   -10.324688    -0.700492    -8.783211    -5.27593
  15.999649     3.3004563   12.747926    15.429879     4.7849145
   5.6699696   -2.3826702   10.605882     3.9112158    3.1500628
  15.859915    -2.1832209  -23.908653    -6.4799504   -4.5365124
  -9.224193    14.568347   -10.568833     4.982321    -4.342062
   0.0914714   12.645902    -5.74285     -3.2141201   -2.7173362
  -6.680575     0.4757669   -5.035051    -6.7964664   16.865469
 -11.54324      7.681869     0.44475392   9.708182    -8.932846
   0.4123232   -4.361452     1.3948607    9.511665     0.11667654
   2.9079323    6.049952     9.275183   -18.078873     6.2983274
  -0.7500531   -2.725033    -7.6027865    3.3404543    2.990815
   4.010979    11.000591    -2.8873312    7.1352735  -16.79663
  18.495346   -14.293832     7.89578      2.2714825   22.976387
  -4.875734    -3.0836344   -2.9999814   13.751918     6.448228
 -11.924197     2.171869     2.0423572   -6.173772    10.778437
  25.77281     -4.9495463   14.57806      0.3044315    2.6132357
  -7.591999    -2.076944     9.025118     1.7834753   -3.1799617
  -4.9401326   23.465864     5.1685796   -9.018578     9.037825
  -4.4150195    6.859591   -12.274467    -0.88911164   5.186309
  -3.9988663  -13.638606    -9.925445    -0.06329413  -3.6709652
 -12.397416   -12.719869    -1.395601     2.1150916    5.7381287
  -4.4691963   -3.82819     -0.84233856  -1.1604277  -13.490127
   8.731719   -20.778936   -11.495662     5.8033476   -4.752041
  10.833007    -6.717991     4.504732    13.4244375    1.1306485
   7.3435574    1.400918    14.704036    -9.501399     7.2315617
  -6.417456     1.3333273   11.872697    -0.30664724   8.8845
   6.5569253    4.7948146    0.03662816  -8.704245     6.224871
  -3.2701402  -11.508579  ]
  ```

### 4.Pretrained Models

Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:

| Model | Sample Rate
| :--- | :---: |
| ecapatdnn_voxceleb12 | 16k