README.md 2.9 KB
Newer Older
K
KP 已提交
1 2 3 4 5 6 7 8 9
# Audio Tagging

## Introduction
Audio tagging is the task of labelling an audio clip with one or more labels or tags, includeing music tagging, acoustic scene classification, audio event classification, etc.

This demo is an implementation to tag an audio file with 527 [AudioSet](https://research.google.com/audioset/) labels. It can be done by a single command line  or a few lines in python using `PaddleSpeech`. 

## Usage
### 1. Installation
K
KP 已提交
10
```bash
K
KP 已提交
11 12 13 14 15 16 17
pip install paddlespeech
```

### 2. Prepare Input File
Input of this demo should be a WAV file(`.wav`).

Here are sample files for this demo that can be downloaded:
K
KP 已提交
18
```bash
K
KP 已提交
19
wget https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/dog.wav
K
KP 已提交
20 21 22 23
```

### 3. Usage
- Command Line(Recommended)
K
KP 已提交
24
  ```bash
K
KP 已提交
25 26
  paddlespeech cls --input ~/cat.wav --topk 10
  ```
K
KP 已提交
27 28 29 30 31
  Usage:
  ```bash
  paddlespeech cls --help
  ```
  Arguments:
K
KP 已提交
32
  - `input`(required): Audio file to tag.
K
KP 已提交
33
  - `model`: Model type of tagging task. Default: `panns_cnn14`.
K
KP 已提交
34 35 36 37 38 39 40
  - `config`: Config of tagging task. Use pretrained model when it is None. Default: `None`.
  - `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`.
  - `label_file`: Label file of tagging task. Use audioset labels when it is None. Default: `None`.
  - `topk`: Show topk tagging labels of result. Default: `1`.
  - `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.

  Output:
K
KP 已提交
41
  ```bash
K
KP 已提交
42 43 44 45 46 47 48 49 50 51 52 53 54 55
  [2021-12-08 14:49:40,671] [    INFO] [utils.py] [L225] - CLS Result:
  Cat: 0.8991316556930542
  Domestic animals, pets: 0.8806838393211365
  Meow: 0.8784668445587158
  Animal: 0.8776564598083496
  Caterwaul: 0.2232048511505127
  Speech: 0.03101264126598835
  Music: 0.02870696596801281
  Inside, small room: 0.016673989593982697
  Purr: 0.008387474343180656
  Bird: 0.006304860580712557
  ```

- Python API
K
KP 已提交
56
  ```python
K
KP 已提交
57 58 59 60 61
  import paddle
  from paddlespeech.cli import CLSExecutor

  cls_executor = CLSExecutor()
  result = cls_executor(
K
KP 已提交
62 63
      model='panns_cnn14',
      config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
K
KP 已提交
64 65 66 67
      label_file=None,
      ckpt_path=None,
      audio_file='./cat.wav',
      topk=10,
K
KP 已提交
68
      device=paddle.get_device())
K
KP 已提交
69
  print('CLS Result: \n{}'.format(result))
K
KP 已提交
70 71
  ```
  Output:
K
KP 已提交
72
  ```bash
K
KP 已提交
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
  CLS Result:
  Cat: 0.8991316556930542
  Domestic animals, pets: 0.8806838393211365
  Meow: 0.8784668445587158
  Animal: 0.8776564598083496
  Caterwaul: 0.2232048511505127
  Speech: 0.03101264126598835
  Music: 0.02870696596801281
  Inside, small room: 0.016673989593982697
  Purr: 0.008387474343180656
  Bird: 0.006304860580712557
  ```


### 4.Pretrained Models

Here is a list of pretrained models released by PaddleSpeech and can be used by command and python api:

| Model | Sample Rate
| :--- | :---: 
| panns_cnn6| 32000
| panns_cnn10| 32000
| panns_cnn14| 32000