README.md 2.9 KB
Newer Older
K
KP 已提交
1 2 3
# Audio Tagging

## Introduction
小湉湉's avatar
小湉湉 已提交
4
Audio tagging is the task of labeling an audio clip with one or more labels or tags, including music tagging, acoustic scene classification, audio event classification, etc.
K
KP 已提交
5

K
KP 已提交
6
This demo is an implementation to tag an audio file with 527 [AudioSet](https://research.google.com/audioset/) labels. It can be done by a single command or a few lines in python using `PaddleSpeech`. 
K
KP 已提交
7 8 9

## Usage
### 1. Installation
K
KP 已提交
10
```bash
K
KP 已提交
11 12 13 14
pip install paddlespeech
```

### 2. Prepare Input File
小湉湉's avatar
小湉湉 已提交
15
The input of this demo should be a WAV file(`.wav`).
K
KP 已提交
16 17

Here are sample files for this demo that can be downloaded:
K
KP 已提交
18
```bash
K
KP 已提交
19
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/dog.wav
K
KP 已提交
20 21 22 23
```

### 3. Usage
- Command Line(Recommended)
K
KP 已提交
24
  ```bash
K
KP 已提交
25
  paddlespeech cls --input ./cat.wav --topk 10
K
KP 已提交
26
  ```
K
KP 已提交
27 28 29 30 31
  Usage:
  ```bash
  paddlespeech cls --help
  ```
  Arguments:
小湉湉's avatar
小湉湉 已提交
32
  - `input`(required): The audio file to tag.
K
KP 已提交
33
  - `model`: Model type of tagging task. Default: `panns_cnn14`.
小湉湉's avatar
小湉湉 已提交
34 35 36 37 38
  - `config`: Config of tagging task. Use a pretrained model when it is None. Default: `None`.
  - `ckpt_path`: Model checkpoint. Use a pretrained model when it is None. Default: `None`.
  - `label_file`: Label file of tagging task. Use audio set labels when it is None. Default: `None`.
  - `topk`: Show topk tagging labels of the result. Default: `1`.
  - `device`: Choose the device to execute model inference. Default: default device of paddlepaddle in the current environment.
K
KP 已提交
39 40

  Output:
K
KP 已提交
41
  ```bash
K
KP 已提交
42 43 44 45 46 47 48 49 50 51 52 53 54 55
  [2021-12-08 14:49:40,671] [    INFO] [utils.py] [L225] - CLS Result:
  Cat: 0.8991316556930542
  Domestic animals, pets: 0.8806838393211365
  Meow: 0.8784668445587158
  Animal: 0.8776564598083496
  Caterwaul: 0.2232048511505127
  Speech: 0.03101264126598835
  Music: 0.02870696596801281
  Inside, small room: 0.016673989593982697
  Purr: 0.008387474343180656
  Bird: 0.006304860580712557
  ```

- Python API
K
KP 已提交
56
  ```python
K
KP 已提交
57 58 59 60 61
  import paddle
  from paddlespeech.cli import CLSExecutor

  cls_executor = CLSExecutor()
  result = cls_executor(
K
KP 已提交
62 63
      model='panns_cnn14',
      config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
K
KP 已提交
64 65 66 67
      label_file=None,
      ckpt_path=None,
      audio_file='./cat.wav',
      topk=10,
K
KP 已提交
68
      device=paddle.get_device())
K
KP 已提交
69
  print('CLS Result: \n{}'.format(result))
K
KP 已提交
70 71
  ```
  Output:
K
KP 已提交
72
  ```bash
K
KP 已提交
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
  CLS Result:
  Cat: 0.8991316556930542
  Domestic animals, pets: 0.8806838393211365
  Meow: 0.8784668445587158
  Animal: 0.8776564598083496
  Caterwaul: 0.2232048511505127
  Speech: 0.03101264126598835
  Music: 0.02870696596801281
  Inside, small room: 0.016673989593982697
  Purr: 0.008387474343180656
  Bird: 0.006304860580712557
  ```

### 4.Pretrained Models

小湉湉's avatar
小湉湉 已提交
88
Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:
K
KP 已提交
89 90 91 92 93 94

| Model | Sample Rate
| :--- | :---: 
| panns_cnn6| 32000
| panns_cnn10| 32000
| panns_cnn14| 32000