diff --git a/docs/source/cls/custom_dataset.md b/docs/source/cls/custom_dataset.md new file mode 100644 index 0000000000000000000000000000000000000000..0b9ed72683ada953022b7ec72bd3440ee678220f --- /dev/null +++ b/docs/source/cls/custom_dataset.md @@ -0,0 +1,117 @@ +# Customize Dataset for Audio Classification + +Following this tutorial you can customize your dataset for audio classification task by using `paddlespeech` and `paddleaudio`. + +A base class of classification dataset is `paddleaudio.dataset.AudioClassificationDataset`. To customize your dataset you should write a dataset class derived from `AudioClassificationDataset`. + +Assuming you have some wave files that stored in your own directory. You should prepare a meta file with the information of filepaths and labels. For example the absolute path of it is `/PATH/TO/META_FILE.txt`: +``` +/PATH/TO/WAVE_FILE/1.wav cat +/PATH/TO/WAVE_FILE/2.wav cat +/PATH/TO/WAVE_FILE/3.wav dog +/PATH/TO/WAVE_FILE/4.wav dog +``` +Here is an example to build your custom dataset in `custom_dataset.py`: + +```python +from paddleaudio.datasets.dataset import AudioClassificationDataset + +class CustomDataset(AudioClassificationDataset): + # All *.wav file with same sample rate 16k/24k/32k/44k. + sample_rate = 16000 + meta_file = '/PATH/TO/META_FILE.txt' + # List all the class labels + label_list = [ + 'cat', + 'dog', + ] + + def __init__(self): + files, labels = self._get_data() + super(CustomDataset, self).__init__( + files=files, labels=labels, feat_type='raw') + + def _get_data(self): + ''' + This method offer information of wave files and labels. + ''' + files = [] + labels = [] + + with open(self.meta_file) as f: + for line in f: + file, label_str = line.strip().split(' ') + files.append(file) + labels.append(self.label_list.index(label_str)) + + return files, labels +``` + +Then you can build dataset and data loader from `CustomDataset`: +```python +import paddle +from paddleaudio.features import LogMelSpectrogram + +from custom_dataset import CustomDataset + +train_ds = CustomDataset() +feature_extractor = LogMelSpectrogram(sr=train_ds.sample_rate) + +train_sampler = paddle.io.DistributedBatchSampler( + train_ds, batch_size=4, shuffle=True, drop_last=False) +train_loader = paddle.io.DataLoader( + train_ds, + batch_sampler=train_sampler, + return_list=True, + use_buffer_reader=True) +``` + +Train model with `CustomDataset`: +```python +from paddlespeech.cls.models import cnn14 +from paddlespeech.cls.models import SoundClassifier + +backbone = cnn14(pretrained=True, extract_embedding=True) +model = SoundClassifier(backbone, num_class=len(train_ds.label_list)) +optimizer = paddle.optimizer.Adam( + learning_rate=1e-6, parameters=model.parameters()) +criterion = paddle.nn.loss.CrossEntropyLoss() + +steps_per_epoch = len(train_sampler) +epochs = 10 +for epoch in range(1, epochs + 1): + model.train() + + for batch_idx, batch in enumerate(train_loader): + waveforms, labels = batch + # Need a padding when lengths of waveforms differ in a batch. + feats = feature_extractor(waveforms) + feats = paddle.transpose(feats, [0, 2, 1]) + logits = model(feats) + loss = criterion(logits, labels) + loss.backward() + optimizer.step() + if isinstance(optimizer._learning_rate, + paddle.optimizer.lr.LRScheduler): + optimizer._learning_rate.step() + optimizer.clear_grad() + + # Calculate loss + avg_loss = loss.numpy()[0] + + # Calculate metrics + preds = paddle.argmax(logits, axis=1) + num_corrects = (preds == labels).numpy().sum() + num_samples = feats.shape[0] + + avg_acc = num_corrects / num_samples + + print_msg = 'Epoch={}/{}, Step={}/{}'.format( + epoch, epochs, batch_idx + 1, steps_per_epoch) + print_msg += ' loss={:.4f}'.format(avg_loss) + print_msg += ' acc={:.4f}'.format(avg_acc) + print_msg += ' lr={:.6f}'.format(optimizer.get_lr()) + print(print_msg) +``` + +If you want to save the checkpoint of model and evaluate from a specific dataset, please see `paddlespeech/cli/exp/panns/train.py` for more details. diff --git a/docs/source/cls/quick_start.md b/docs/source/cls/quick_start.md new file mode 100644 index 0000000000000000000000000000000000000000..e173255cf9dfd1df5627c51e81c88c198f72bab6 --- /dev/null +++ b/docs/source/cls/quick_start.md @@ -0,0 +1,51 @@ +# Quick Start of Audio Classification +Several shell scripts provided in `./examples/esc50/cls0` will help us to quickly give it a try, for most major modules, including data preparation, model training, model evaluation, with [ESC50](ttps://github.com/karolpiczak/ESC-50) dataset. + +Some of the scripts in `./examples` are not configured with GPUs. If you want to train with 8 GPUs, please modify `CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`. If you don't have any GPU available, please set `CUDA_VISIBLE_DEVICES=` to use CPUs instead. + +Let's start a audio classification task with the following steps: + +- Go to the directory + + ```bash + cd examples/esc50/cls0 + ``` + +- Source env + ```bash + source path.sh + ``` + +- Main entry point + ```bash + CUDA_VISIBLE_DEVICES=0 ./run.sh 1 + ``` + +This demo includes fine-tuning, evaluating and deploying a audio classificatio model. More detailed information is provided in the following sections. + +## Fine-tuning a model +PANNs([PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition](https://arxiv.org/pdf/1912.10211.pdf)) are pretrained models with [Audioset](https://research.google.com/audioset/). They can be easily used to extract audio embeddings for audio classification task. + +To start a model fine-tuning, please run: +```bash +ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}') +feat_backend=numpy +./local/train.sh ${ngpu} ${feat_backend} +``` + +## Deploy a model +Once you save a model checkpoint, you can export it to static graph and deploy by python scirpt: + +- Export to a static graph + ```bash + ./local/export.sh ${ckpt_dir} ./export + ``` + The argument `ckpt_dir` should be a directory in which a model checkpoint stored, for example `checkpoint/epoch_50`. + + The static graph will be exported to `./export`. + +- Inference + ```bash + ./local/static_model_infer.sh ${infer_device} ./export ${audio_file} + ``` + The argument `infer_device` can be `cpu` or `gpu`, and it means which device to be used to infer. And `audio_file` should be a wave file with name `*.wav`.