README.md 2.9 KB
Newer Older
K
KP 已提交
1 2 3 4 5
# Audio Tagging

## Introduction
Audio tagging is the task of labelling an audio clip with one or more labels or tags, includeing music tagging, acoustic scene classification, audio event classification, etc.

K
KP 已提交
6
This demo is an implementation to tag an audio file with 527 [AudioSet](https://research.google.com/audioset/) labels. It can be done by a single command or a few lines in python using `PaddleSpeech`. 
K
KP 已提交
7 8 9

## Usage
### 1. Installation
K
KP 已提交
10
```bash
K
KP 已提交
11 12 13 14 15 16 17
pip install paddlespeech
```

### 2. Prepare Input File
Input of this demo should be a WAV file(`.wav`).

Here are sample files for this demo that can be downloaded:
K
KP 已提交
18
```bash
K
KP 已提交
19
wget https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/dog.wav
K
KP 已提交
20 21 22 23
```

### 3. Usage
- Command Line(Recommended)
K
KP 已提交
24
  ```bash
K
KP 已提交
25 26
  paddlespeech cls --input ~/cat.wav --topk 10
  ```
K
KP 已提交
27 28 29 30 31
  Usage:
  ```bash
  paddlespeech cls --help
  ```
  Arguments:
K
KP 已提交
32
  - `input`(required): Audio file to tag.
K
KP 已提交
33
  - `model`: Model type of tagging task. Default: `panns_cnn14`.
K
KP 已提交
34 35 36 37 38 39 40
  - `config`: Config of tagging task. Use pretrained model when it is None. Default: `None`.
  - `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`.
  - `label_file`: Label file of tagging task. Use audioset labels when it is None. Default: `None`.
  - `topk`: Show topk tagging labels of result. Default: `1`.
  - `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.

  Output:
K
KP 已提交
41
  ```bash
K
KP 已提交
42 43 44 45 46 47 48 49 50 51 52 53 54 55
  [2021-12-08 14:49:40,671] [    INFO] [utils.py] [L225] - CLS Result:
  Cat: 0.8991316556930542
  Domestic animals, pets: 0.8806838393211365
  Meow: 0.8784668445587158
  Animal: 0.8776564598083496
  Caterwaul: 0.2232048511505127
  Speech: 0.03101264126598835
  Music: 0.02870696596801281
  Inside, small room: 0.016673989593982697
  Purr: 0.008387474343180656
  Bird: 0.006304860580712557
  ```

- Python API
K
KP 已提交
56
  ```python
K
KP 已提交
57 58 59 60 61
  import paddle
  from paddlespeech.cli import CLSExecutor

  cls_executor = CLSExecutor()
  result = cls_executor(
K
KP 已提交
62 63
      model='panns_cnn14',
      config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
K
KP 已提交
64 65 66 67
      label_file=None,
      ckpt_path=None,
      audio_file='./cat.wav',
      topk=10,
K
KP 已提交
68
      device=paddle.get_device())
K
KP 已提交
69
  print('CLS Result: \n{}'.format(result))
K
KP 已提交
70 71
  ```
  Output:
K
KP 已提交
72
  ```bash
K
KP 已提交
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
  CLS Result:
  Cat: 0.8991316556930542
  Domestic animals, pets: 0.8806838393211365
  Meow: 0.8784668445587158
  Animal: 0.8776564598083496
  Caterwaul: 0.2232048511505127
  Speech: 0.03101264126598835
  Music: 0.02870696596801281
  Inside, small room: 0.016673989593982697
  Purr: 0.008387474343180656
  Bird: 0.006304860580712557
  ```


### 4.Pretrained Models

K
KP 已提交
89
Here is a list of pretrained models released by PaddleSpeech that can be used by command and python api:
K
KP 已提交
90 91 92 93 94 95

| Model | Sample Rate
| :--- | :---: 
| panns_cnn6| 32000
| panns_cnn10| 32000
| panns_cnn14| 32000