{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. PP-TSM Introduction\n", "\n", "Video classification is similar to image classification which belongs to the recognition task. For a given input video, the video classification model aims to output its predicted label category. If tags are all action categories, this task is also called action recognition. Different from image classification, video classification often requires the use of temporal information between multiple frames of images. PP-TSM is a practical industrial video classification model developed by PaddleVideo. Based on the implementation of state-of-the-art algorithms, we slim the model size and optimize the accuracy with the considerations of the trade-off between speed and precision.\n", "\n", "PP-TSM is produced based on ResNet-50 backbone. Optimized methods includes data augmentation, network structure fine-tuning, training strategy, preciceBN, pretrain model selection and model distillation. Under the premise of basically not increasing the amount of calculation, using the center-sampling evaluation method, the accuracy of PP-TSM on Kinetics-400 is 3.95 points higher than that of the original paper, reaching 76.16%, which exceeds the 3D model under the same backbone network, and the inference speed is 4.5 times faster!\n", "\n", "More information about PaddleVideo can be found here https://github.com/PaddlePaddle/PaddleVideo .\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Model Effects and Application Scenarios\n", "### 2.1 action recognition Tasks:\n", "\n", "#### 2.1.1 Datasets:\n", "\n", "The dataset is mainly in Kinetics-400, which is divided into training set and test set.\n", "\n", "#### 2.1.2 Model Effects:\n", "\n", "The recognition effect of PP-TSM on the picture is:\n", "\n", "