introduction_en.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. PP-msvsr Introduction\n",
    "Video super-resolution originates from image super-resolution, which aims to recover high-resolution (HR) images from one or more low resolution (LR) images. The difference between them is that the video is composed of multiple frames, so the video super-resolution usually uses the information between frames to repair. PP-MSVSR is a multi-stage VSR deep architecture,  with local fusion module, auxiliary loss and refined align module to refine the enhanced result progressively. Specifically, in order to strengthen the fusion of features across frames in feature propagation, a local fusion module is designed in stage-1 to perform local feature fusion before feature propagation. Moreover, an auxiliary loss in stage-2 is introduced to make the features obtained by the propagation module reserve more correlated information connected to the HR space, and introduced a refined align module in stage-3 to make full use of the feature information of the previous stage. Extensive experiments substantiate that PP-MSVSR achieves a promising performance of Vid4 datasets, which PSNR metric can achieve 28.13 with only 1.45M parameters.\n",
    "\n",
    "The PP-MSVSR model is officially produced by PaddlePaddle and is a video super-resolution model developed by PaddleGan. More information about PaddleGAN can be found here https://github.com/PaddlePaddle/PaddleGAN.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Model Effects and Application Scenarios\n",
    "### 2.1 Video Super-Resolution Tasks:\n",
    "\n",
    "#### 2.1.1 Datasets:\n",
    "\n",
    "The commonly used video super-resolution dataset Vid4 is taken as an example.\n",
    "\n",
    "#### 2.1.2 Model Effects:\n",
    "\n",
    "PP-MSVSR在图片上的超分效果为：\n",
    "The video super-resolution effect of PP-msvsr on the video is:\n",
    "\n",
    "<div align='center'>\n",
    "<img src='https://user-images.githubusercontent.com/48054808/144848981-00c6ad21-0702-4381-9544-becb227ed9f0.gif' width = \"80%\"  />\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. How to Use the Model\n",
    "\n",
    "### 3.1 Model Inference:\n",
    "* Download \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "%cd /home/aistudio/work\n",
    "!git clone https://github.com/PaddlePaddle/PaddleGAN.git"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Installation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# The script needs to be run in the PaddleGAN directory\n",
    "%cd /home/aistudio/work/PaddleGAN/\n",
    "\n",
    "# Install the required dependencies [already persisted, no need to install again].\n",
    "!pip install -r requirements.txt\n",
    "\n",
    "# The script needs to be run in the PaddleGAN directory\n",
    "%cd /home/aistudio/work/PaddleGAN/\n",
    "\n",
    "# Download PaddleGAN \n",
    "!python setup.py develop  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Quick experience\n",
    "\n",
    "Congratulations! Now that you've successfully installed PaddleGAN, let's get a quick feel at video super-resolution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Predict a video on the GPU.\n",
    "# Low resolution video download address: https://user-images.githubusercontent.com/79366697/200290225-7fdd364c-2fbe-48b6-a3bf-87349aedec98.mp4\n",
    "%cd ~/work/PaddleGAN/applications/\n",
    "!python tools/video-enhance.py --input demo/Peking_input360p_clip6_5s.mp4 \\\n",
    "                               --process_order PPMSVSR \\\n",
    "                               --output output_dir"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A video with the predicted result is generated under the output_dir folder.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 Model Training\n",
    "* Clone the PaddleGAN repository (see 3.1 for details)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Prepare the datasets\n",
    "\n",
    "Here are 4 commonly used video super-resolution dataset, REDS, Vimeo90K, Vid4, UDM10. The REDS and Vimeo90K dataset include train dataset and test dataset, Vid4 and UDM10 are test dataset. Download and decompress the required dataset and place it under the ``PaddleGAN/data``.\n",
    "\n",
    "REDS（[download](https://seungjunnah.github.io/Datasets/reds.html)）is a newly proposed high-quality (720p) video dataset in the NTIRE19 Competition. REDS consists of 240 training clips, 30 validation clips and 30 testing clips (each with 100 consecutive frames). Since the test ground truth is not available, we select four representative clips (they are '000', '011', '015', '020', with diverse scenes and motions) as our test set, denoted by REDS4. The remaining training and validation clips are re-grouped as our training dataset (a total of 266 clips).\n",
    "\n",
    "The structure of the processed REDS is as follows:\n",
    "```\n",
    "  PaddleGAN\n",
    "    ├── data\n",
    "        ├── REDS\n",
    "              ├── train_sharp\n",
    "              |    └──X4\n",
    "              ├── train_sharp_bicubic\n",
    "              |    └──X4\n",
    "              ├── REDS4_test_sharp\n",
    "              |    └──X4\n",
    "              └── REDS4_test_sharp_bicubic\n",
    "                    └──X4\n",
    "            ...\n",
    "```\n",
    "\n",
    "Vimeo90K ([download](http://toflow.csail.mit.edu/)) is designed by Tianfan Xue etc. for the following four video processing tasks: temporal frame interpolation, video denoising, video deblocking, and video super-resolution. Vimeo90K is a large-scale, high-quality video dataset. This dataset consists of 89,800 video clips downloaded from vimeo.com, which covers large variaty of scenes and actions.\n",
    "\n",
    "The structure of the processed Vimeo90K is as follows:\n",
    "```\n",
    "  PaddleGAN\n",
    "    ├── data\n",
    "        ├── Vimeo90K\n",
    "              ├── vimeo_septuplet\n",
    "              |    |──sequences\n",
    "              |    └──sep_trainlist.txt\n",
    "              ├── vimeo_septuplet_BD_matlabLRx4\n",
    "              |    └──sequences\n",
    "              └── vimeo_super_resolution_test\n",
    "                    |──low_resolution\n",
    "                    |──target\n",
    "                    └──sep_testlist.txt\n",
    "            ...\n",
    "```\n",
    "\n",
    "Vid4 ([Data Download](https://paddlegan.bj.bcebos.com/datasets/Vid4.zip)) is a commonly used test dataset for VSR, which contains 4 video segments.\n",
    "The structure of the processed Vid4 is as follows:\n",
    "```\n",
    "  PaddleGAN\n",
    "    ├── data\n",
    "        ├── Vid4\n",
    "              ├── BDx4\n",
    "              └── GT\n",
    "            ...\n",
    "```\n",
    "\n",
    "UDM10 ([Data Download](https://paddlegan.bj.bcebos.com/datasets/udm10_paddle.tar)) is a commonly used test dataset for VSR, which contains 10 video segments.\n",
    "The structure of the processed UDM10 is as follows:\n",
    "```\n",
    "  PaddleGAN\n",
    "  ├── data\n",
    "        ├── udm10\n",
    "              ├── BDx4\n",
    "              └── GT\n",
    "            ...\n",
    "```\n",
    "\n",
    "Using the REDS dataset as an example, verify that the dataset is ready by using the following command.\n",
    " "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Review the extract directory\n",
    "#%cd /home/aistudio/work/PaddleGAN/\n",
    "#!tree -d data/REDS"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Change yaml configurations files.\n",
    "\n",
    "Change yaml configurations files``` configs/msvsr_reds.yaml```\n",
    "\n",
    "```\n",
    "total_iters: 150000  \n",
    "output_dir: output_dir  \n",
    "find_unused_parameters: True\n",
    "checkpoints_dir: checkpoints\n",
    "use_dataset: True\n",
    "# tensor range for function tensor2img\n",
    "min_max:\n",
    "  (0., 1.)\n",
    "\n",
    "model:\n",
    "  name: MultiStageVSRModel\n",
    "  fix_iter: 2500\n",
    "  generator:\n",
    "    name: MSVSR\n",
    "    mid_channels: 32\n",
    "    num_init_blocks: 2\n",
    "    num_blocks: 3\n",
    "    num_reconstruction_blocks: 2\n",
    "    only_last: True\n",
    "    use_tiny_spynet: True\n",
    "    deform_groups: 4\n",
    "    stage1_groups: 8\n",
    "    auxiliary_loss: True\n",
    "    use_refine_align: True\n",
    "    aux_reconstruction_blocks: 1\n",
    "    use_local_connnect: True\n",
    "  pixel_criterion:\n",
    "    name: CharbonnierLoss\n",
    "    reduction: mean\n",
    "\n",
    "dataset:\n",
    "  train:\n",
    "    name: RepeatDataset\n",
    "    times: 1000\n",
    "    num_workers: 6\n",
    "    batch_size: 2  \n",
    "    dataset:\n",
    "      name: VSRREDSMultipleGTDataset\n",
    "      lq_folder: data/REDS/train_sharp_bicubic/X4  \n",
    "      gt_folder: data/REDS/train_sharp/X4  \n",
    "      ann_file: data/REDS/meta_info_REDS_GT.txt  \n",
    "      num_frames: 20  \n",
    "      preprocess:\n",
    "        - name: GetNeighboringFramesIdx\n",
    "          interval_list: [1]\n",
    "        - name: ReadImageSequence\n",
    "          key: lq\n",
    "        - name: ReadImageSequence\n",
    "          key: gt\n",
    "        - name: Transforms\n",
    "          input_keys: [lq, gt]\n",
    "          pipeline:\n",
    "            - name: SRPairedRandomCrop\n",
    "              gt_patch_size: 256\n",
    "              scale: 4\n",
    "              keys: [image, image]\n",
    "            - name: PairedRandomHorizontalFlip\n",
    "              keys: [image, image]\n",
    "            - name: PairedRandomVerticalFlip\n",
    "              keys: [image, image]\n",
    "            - name: PairedRandomTransposeHW\n",
    "              keys: [image, image]\n",
    "            - name: TransposeSequence\n",
    "              keys: [image, image]\n",
    "            - name: NormalizeSequence\n",
    "              mean: [0., 0., 0.]\n",
    "              std: [255., 255., 255.]\n",
    "              keys: [image, image]\n",
    "\n",
    "  test:\n",
    "    name: VSRREDSMultipleGTDataset\n",
    "    lq_folder: data/REDS/REDS4_test_sharp_bicubic/X4  \n",
    "    gt_folder: data/REDS/REDS4_test_sharp/X4  \n",
    "    ann_file: data/REDS/meta_info_REDS_GT.txt \n",
    "    num_frames: 100  \n",
    "    test_mode: True\n",
    "    preprocess:\n",
    "        - name: GetNeighboringFramesIdx\n",
    "          interval_list: [1]\n",
    "        - name: ReadImageSequence\n",
    "          key: lq\n",
    "        - name: ReadImageSequence\n",
    "          key: gt\n",
    "        - name: Transforms\n",
    "          input_keys: [lq, gt]\n",
    "          pipeline:\n",
    "            - name: TransposeSequence\n",
    "              keys: [image, image]\n",
    "            - name: NormalizeSequence\n",
    "              mean: [0., 0., 0.]\n",
    "              std: [255., 255., 255.]\n",
    "              keys: [image, image]\n",
    "\n",
    "lr_scheduler:\n",
    "  name: CosineAnnealingRestartLR\n",
    "  learning_rate: !!float 2e-4  \n",
    "  periods: [150000]\n",
    "  restart_weights: [1]\n",
    "  eta_min: !!float 1e-7\n",
    "\n",
    "optimizer:\n",
    "  name: Adam\n",
    "  # add parameters of net_name to optim\n",
    "  # name should in self.nets\n",
    "  net_names:\n",
    "    - generator\n",
    "  beta1: 0.9\n",
    "  beta2: 0.99\n",
    "\n",
    "validate:\n",
    "  interval: 5000 \n",
    "  save_img: false \n",
    "\n",
    "  metrics:\n",
    "    psnr: # metric name, can be arbitrary\n",
    "      name: PSNR  \n",
    "      crop_border: 0\n",
    "      test_y_channel: false\n",
    "    ssim:\n",
    "      name: SSIM  \n",
    "      crop_border: 0\n",
    "      test_y_channel: false\n",
    "\n",
    "log_config:\n",
    "  interval: 10  \n",
    "  visiual_interval: 5000 \n",
    "\n",
    "snapshot_config:\n",
    "  interval: 5000 \n",
    "\n",
    "export_model:\n",
    "  - {name: 'generator', inputs_num: 1}\n",
    "\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Train the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "%cd ~/work/PaddleGAN/\n",
    "%export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8\n",
    "# Beginning training\n",
    "!ppython -m paddle.distributed.launch tools/main.py --config-file configs/msvsr_reds.yaml"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Model evaluation\n",
    "\n",
    "We provide ```configs/msvsr_reds.yaml```for evaluating the effect of REDS dataset, to evaluate the effect of REDS dataset, you must first download the REDS dataset from the REDS dataset download page, and extract it to ```PaddleGAN/data/REDS```. Using the following command can evaluate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%cd ~/work/PaddleGAN/\n",
    "%env CUDA_VISIBLE_DEVICES=0\n",
    "#Download address for model: https://paddlegan.bj.bcebos.com/models/PP-MSVSR_reds_x4.pdparams\n",
    "!python tools/main.py --config-file configs/msvsr_reds.yaml --evaluate-only --load ${PATH_OF_WEIGHT}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Model Principles\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Different from the Single Image Super-Resolution(SISR) task, the key for Video Super-Resolution(VSR) task is to make full use of complementary information across frames to reconstruct the high-resolution sequence. Since images from different frames with diverse motion and scene, accurately aligning multiple frames and effectively fusing different frames has always been the key research work of VSR tasks. To utilizes rich complementary information of neighboring frames, PP-MSVSR propose a multi-stage VSR deep architecture, with local fusion module, auxiliary loss and refined align module to refine the enhanced result progressively. \n",
    "* PP-MSVSR propose a multi-stage network that combines the idea of sliding-window framework and recurrent framework, with local fusion module, auxiliary loss and refined align module to refine the enhanced result progressively. \n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/79366697/200302378-cd0a5f1b-5d89-44a5-aae3-8f4c587a3575.png\" width = \"80%\"  />\n",
    "</div>\n",
    "\n",
    "* Inspired by the idea of sliding-window VSR, PP-MSVSR designed a local fusion module in stage-1, denoted as LFM, to perform local feature fusion before feature propagation, which can strengthen cross-frame feature fusion in feature propagation. Specifically, the purpose of the LFM is to let the features of the current frame fuse the information of its neighboring frames first, and then send the fused features to the propagation module.\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/79366697/200302612-6ac31db1-ac32-4b14-af3e-479f66746ac9.png\" width = \"80%\"  />\n",
    "</div>\n",
    "\n",
    "* Inspired by the power of recurrent vsr network, PP-MSVSR use a same structure as basicvsr++ to fusion the information from different video frame and local merged feature and then propagates the underlying information between each video frame at the stage-2. In addition, PP-MSVSR add a auxiliary loss to make feature more closed to HR space. To be specific, the auxiliary loss is used after upsampling the feature, which propagation in stage-2\n",
    "* Different from SISR, in order to better integrate the in- formation of adjacent frames, VSR usually aligns adjacent frames with the current frame. In some large motion video restoration tasks, the role of alignment is particularly ob- vious. In the process of using a two-way recurrent neural network, there are often multiple identical alignment opera- tions. In order to make full use of the results of the previous alignment operations, PP-MSVSR propose a Refined Align Module, denoted as RAM, that can utilize the previously aligned parameters and achieve a better alignment result.\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/79366697/200302619-87b10406-580b-4a65-a6bd-bbb1525005b1.png\" width = \"80%\"  />\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Related papers and citations\n",
    "```\n",
    "@article{jiang2021PP-MSVSR,\n",
    "  author = {Jiang, Lielin and Wang, Na and Dang, Qingqing and Liu, Rui and Lai, Baohua},\n",
    "  title = {PP-MSVSR: Multi-Stage Video Super-Resolution},\n",
    "  booktitle = {arXiv preprint arXiv:2112.02828},\n",
    "  year = {2021}\n",
    "  }\n",
    "```\n"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}