README_cn.md 7.5 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
(简体中文|[English](./README.md))

# 声纹识别
## 介绍
声纹识别是一项用计算机程序自动提取说话人特征的技术。

这个 demo 是一个从给定音频文件提取说话人特征,它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。

## 使用方法
### 1. 安装
请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)

你可以从 easy,medium,hard 三中方式中选择一种方式安装。

### 2. 准备输入
这个 demo 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。

可以下载此 demo 的示例音频:
```bash
# 该音频的内容是数字串 85236145389
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
```
### 3. 使用方法
- 命令行 (推荐使用)
  ```bash
  paddlespeech vector --task spk --input 85236145389.wav

  echo -e "demo1 85236145389.wav" > vec.job
  paddlespeech vector --task spk --input vec.job

  echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk
  ```
  
  使用方法:
  ```bash
36
  paddlespeech vector --help
37 38 39
  ```
  参数:
  - `input`(必须输入):用于识别的音频文件。
40
  - `task` (必须输入): 用于指定 `vector` 处理的具体任务,默认是 `spk`
41 42 43 44 45 46 47 48
  - `model`:声纹任务的模型,默认值:`ecapatdnn_voxceleb12`
  - `sample_rate`:音频采样率,默认值:`16000`
  - `config`:声纹任务的参数文件,若不设置则使用预训练模型中的默认配置,默认值:`None`
  - `ckpt_path`:模型参数文件,若不设置则下载预训练模型使用,默认值:`None`
  - `device`:执行预测的设备,默认值:当前系统下 paddlepaddle 的默认 device。

  输出:
  ```bash
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
  demo  [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
    1.756596     5.167894    10.80636     -3.8226728   -5.6141334
    2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
    -9.723131     0.6619743   -6.976803    10.213478     7.494748
    2.9105635    3.8949256    3.7999806    7.1061673   16.905321
    -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
    11.232214     7.1274667   -4.2828417    2.452362    -5.130748
    -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
    0.7618269    1.1253023   -2.083836     4.725744    -8.782597
    -3.539873     3.814236     5.1420674    2.162061     4.096431
    -6.4162116   12.747448     1.9429878  -15.152943     6.417416
    16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
    11.567354     3.69788     11.258265     7.442363     9.183411
    4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
    7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
    -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
    0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
    -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
    -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
    -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
    3.272176     2.8382776    5.134597    -9.190781    -0.5657382
    -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
    -0.31784213   9.493548     2.1144536    4.358092   -12.089823
    8.451689    -7.925461     4.6242585    4.4289427   18.692003
    -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
    -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
    16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
    -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
    0.66607     15.443222     4.740594    -3.4725387   11.592567
    -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
    -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
    -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
    -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
    1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
    7.3629923    0.4657332    3.132599    12.438889    -1.8337058
    4.532936     2.7264361   10.145339    -6.521951     2.897153
    -3.3925855    5.079156     7.759716     4.677565     5.8457737
    2.402413     7.7071047    3.9711342   -6.390043     6.1268735
    -3.7760346  -11.118123  ]
88 89 90 91 92 93 94 95 96 97 98 99 100
  ```

- Python API
  ```python
  import paddle
  from paddlespeech.cli import VectorExecutor

  vector_executor = VectorExecutor()
  audio_emb = vector_executor(
      model='ecapatdnn_voxceleb12',
      sample_rate=16000,
      config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
      ckpt_path=None,
101
      audio_file='./85236145389.wav',
102 103 104 105 106 107 108
      device=paddle.get_device())
  print('Audio embedding Result: \n{}'.format(audio_emb))
  ```

  输出:
  ```bash
  # Vector Result:
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148
   Audio embedding Result:
    [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
    1.756596     5.167894    10.80636     -3.8226728   -5.6141334
    2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
    -9.723131     0.6619743   -6.976803    10.213478     7.494748
    2.9105635    3.8949256    3.7999806    7.1061673   16.905321
    -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
    11.232214     7.1274667   -4.2828417    2.452362    -5.130748
    -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
    0.7618269    1.1253023   -2.083836     4.725744    -8.782597
    -3.539873     3.814236     5.1420674    2.162061     4.096431
    -6.4162116   12.747448     1.9429878  -15.152943     6.417416
    16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
    11.567354     3.69788     11.258265     7.442363     9.183411
    4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
    7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
    -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
    0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
    -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
    -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
    -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
    3.272176     2.8382776    5.134597    -9.190781    -0.5657382
    -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
    -0.31784213   9.493548     2.1144536    4.358092   -12.089823
    8.451689    -7.925461     4.6242585    4.4289427   18.692003
    -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
    -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
    16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
    -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
    0.66607     15.443222     4.740594    -3.4725387   11.592567
    -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
    -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
    -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
    -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
    1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
    7.3629923    0.4657332    3.132599    12.438889    -1.8337058
    4.532936     2.7264361   10.145339    -6.521951     2.897153
    -3.3925855    5.079156     7.759716     4.677565     5.8457737
    2.402413     7.7071047    3.9711342   -6.390043     6.1268735
    -3.7760346  -11.118123  ]
149 150 151 152 153 154 155 156
  ```

### 4.预训练模型
以下是 PaddleSpeech 提供的可以被命令行和 python API 使用的预训练模型列表:

| 模型 | 采样率
| :--- | :---: |
| ecapatdnn_voxceleb12 | 16k