# Video Classification Based on Temporal Segment Network
Video classification has drawn a significant amount of attentions in the past few years. This page introduces how to perform video classification with PaddlePaddle Fluid, on the public UCF-101 dataset, based on the state-of-the-art Temporal Segment Network (TSN) method.
Running sample code in this directory requires PaddelPaddle Fluid v0.13.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in <ahref="http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html"rel="nofollow">installation document</a> and make an update.
### Data preparation
#### download UCF-101 dataset
Users can download the UCF-101 dataset by the provided script in <code>data/download.sh</code>.
#### decode video into frame
To avoid the process of decoding videos in network training, we offline decode them into frames and save it in the <code>pickle</code> format, easily readable for python.
Users can refer to the script <code>data/video_decode.py</code> for video decoding.
#### split data into train and test
We follow the split 1 of UCF-101 dataset. After data splitting, users can get 9537 videos for training and 3783 videos for validation. The reference script is <code>data/split_data.py</code>.
#### save pickle for training
As stated above, we save all data as <code>pickle</code> format for training. All information in each video is saved into one pickle, includes video id, frames binary and label. Please refer to the script <code>data/generate_train_data.py</code>.
After this operation, one can get two directories containing training and testing data in <code>pickle</code> format, and two files <em>train.list</em> and <em>test.list</em>, with each line seperated by SPACE.
### Training
After data preparation, users can start the PaddlePaddle Fluid training by:
```
python train.py \
--batch_size=128 \
--total_videos=9537 \
--class_dim=101 \
--num_epochs=60 \
--image_shape=3,224,224 \
--model_save_dir=output/ \
--with_mem_opt=True \
--lr_init=0.01 \
--num_layers=50 \
--seg_num=7 \
--pretrained_model={path_to_pretrained_model}
```
<strong>parameter introduction:</strong>
<li>batch_size: the size of each mini-batch.</li>
<li>total_videos: total number of videos in the training set.</li>
<li>class_dim: the class number of the classification task.</li>
<li>num_epochs: the number of epochs.</li>
<li>image_shape: input size of the network.</li>
<li>model_save_dir: the directory to save trained model.</li>
<li>with_mem_opt: whether to use memory optimization or not.</li>
<li>lr_init: initialized learning rate.</li>
<li>num_layers: the number of layers for ResNet.</li>
<li>seg_num: the number of segments in TSN.</li>
<li>pretrained_model: model path for pretraining.</li>
</br>
<strong>data reader introduction:</strong>
Data reader is defined in <code>reader.py</code>. Note that we use group operation for all frames in one video.
Evaluation is to evaluate the performance of a trained model. One can download pretrained models and set its path to path_to_pretrain_model. Then top1/top5 accuracy can be obtained by running the following command:
```
python eval.py \
--batch_size=128 \
--class_dim=101 \
--image_shape=3,224,224 \
--with_mem_opt=True \
--num_layers=50 \
--seg_num=7 \
--test_model={path_to_pretrained_model}
```
According to the congfiguration of evaluation, the output log is like: