From 8317fbe53f5004949e1ff5b7b9c405f9b921e1ed Mon Sep 17 00:00:00 2001 From: wangna11BD <79366697+wangna11BD@users.noreply.github.com> Date: Thu, 17 Nov 2022 15:27:17 +0800 Subject: [PATCH] add ppmsvsr introduction_cn.ipynb (#5554) * add ppmsvsr introduction_cn.ipynb * add benchmark info and download * add en doc * fix info.yaml * Update info.yaml Co-authored-by: liuTINA0907 <65896652+liuTINA0907@users.noreply.github.com> --- modelcenter/PP-MSVSR/.gitkeep | 0 modelcenter/PP-MSVSR/benchmark_cn.md | 31 ++ modelcenter/PP-MSVSR/benchmark_en.md | 31 ++ modelcenter/PP-MSVSR/download_cn.md | 4 + modelcenter/PP-MSVSR/download_en.md | 5 + modelcenter/PP-MSVSR/info.yaml | 29 ++ modelcenter/PP-MSVSR/introduction_cn.ipynb | 465 +++++++++++++++++++++ modelcenter/PP-MSVSR/introduction_en.ipynb | 458 ++++++++++++++++++++ 8 files changed, 1023 insertions(+) delete mode 100644 modelcenter/PP-MSVSR/.gitkeep create mode 100644 modelcenter/PP-MSVSR/benchmark_cn.md create mode 100644 modelcenter/PP-MSVSR/benchmark_en.md create mode 100644 modelcenter/PP-MSVSR/download_cn.md create mode 100644 modelcenter/PP-MSVSR/download_en.md create mode 100644 modelcenter/PP-MSVSR/info.yaml create mode 100644 modelcenter/PP-MSVSR/introduction_cn.ipynb create mode 100644 modelcenter/PP-MSVSR/introduction_en.ipynb diff --git a/modelcenter/PP-MSVSR/.gitkeep b/modelcenter/PP-MSVSR/.gitkeep deleted file mode 100644 index e69de29b..00000000 diff --git a/modelcenter/PP-MSVSR/benchmark_cn.md b/modelcenter/PP-MSVSR/benchmark_cn.md new file mode 100644 index 00000000..2133e4bc --- /dev/null +++ b/modelcenter/PP-MSVSR/benchmark_cn.md @@ -0,0 +1,31 @@ +## 1. 训练Benchmark + +### 1.1 软硬件环境 + +* PP-MSVSR模型训练过程中使用8 GPUs,每GPU batch size为2进行训练,如训练GPU数和batch size不使用上述配置,须参考FAQ调整学习率和迭代次数。 + +### 1.2 数据集 +PP-MSVSR模型使用REDS数据集用来训练和验证。REDS数据集由240个训练片段、30个验证片段和30个测试片段组成(每个片段有100个连续帧)。由于测试数据集不可用,这里在训练集选择了四个具有代表性的片段(分别为'000', '011', '015', '020',它们具有不同的场景和动作)作为测试集,用REDS4表示。剩下的训练和验证片段被重新分组为训练数据集(总共266个片段)。 + +### 1.3 指标 + +|模型名称 | 模型简介 | 数据集 | 参数量(M) | +|---|---|---|---|---|---|---| +|PP-MSVSR | 视频超分 | REDS | 1.45 | + +## 2. 推理 Benchmark + +### 2.1 软硬件环境 + +* PP-MSVSR模型推理测试采用单卡V100,batch size=1进行测试,使用CUDA 10.2, CUDNN 7.5.1。 + +### 2.2 数据集 +PP-MSVSR模型PP-MSVSR模型使用REDS数据集用来训练和验证。REDS数据集由240个训练片段、30个验证片段和30个测试片段组成(每个片段有100个连续帧)。由于测试数据集不可用,这里在训练集选择了四个具有代表性的片段(分别为'000', '011', '015', '020',它们具有不同的场景和动作)作为测试集,用REDS4表示。剩下的训练和验证片段被重新分组为训练数据集(总共266个片段)。 + +### 2.3 指标 +|模型名称 | 模型简介 | 数据集 | 参数量(M) | 计算量(G) | PSNR | SSIM | +|---|---|---|---|---|---|---| +|PP-MSVSR | 视频超分 | REDS4 | 1.45 | 111 | 31.2535 | 0.8884 | + +## 3. 相关使用说明 +请参考:https://github.com/PaddlePaddle/PaddleGAN/blob/develop/docs/zh_CN/tutorials/video_super_resolution.md diff --git a/modelcenter/PP-MSVSR/benchmark_en.md b/modelcenter/PP-MSVSR/benchmark_en.md new file mode 100644 index 00000000..ee2e9fd5 --- /dev/null +++ b/modelcenter/PP-MSVSR/benchmark_en.md @@ -0,0 +1,31 @@ +## 1. Training Benchmark + +### 1.1 Environment + +* The training process of PP-MSVSR model uses 8 GPUs, every GPU batch size is 2 for training. If the number GPU and batch size of training do not use the above configuration, you should refer to the FAQ to adjust the learning rate and number of iterations. + +### 1.2 Datasets +The PP-MSVSR model uses REDS dataset for train and test. REDS consists of 240 training clips, 30 validation clips and 30 testing clips (each with 100 consecutive frames). Since the test ground truth is not available, we select four representative clips (they are '000', '011', '015', '020', with diverse scenes and motions) as our test set, denoted by REDS4. The remaining training and validation clips are re-grouped as our training dataset (a total of 266 clips). + +### 1.3 Benchmark + +| model | task | dataset | Parameter (M) | +|---|---|---|---|---|---|---| +|PP-MSVSR | Video Super-Resolution | REDS | 1.45 | + +## 2. Inference Benchmark + +### 2.1 Environment + +* The PP-MSVSR model's inference test is tested with single-card V100, batch size=1, CUDA 10.2, CUDNN 7.5.1. + +### 2.2 Datasets +The PP-MSVSR model uses REDS dataset for train and test. REDS consists of 240 training clips, 30 validation clips and 30 testing clips (each with 100 consecutive frames). Since the test ground truth is not available, we select four representative clips (they are '000', '011', '015', '020', with diverse scenes and motions) as our test set, denoted by REDS4. The remaining training and validation clips are re-grouped as our training dataset (a total of 266 clips). + +### 2.3 Benchmark +| model | task | dataset | Parameter (M) | FLOPs (G) | PSNR | SSIM | +|---|---|---|---|---|---|---| +|PP-MSVSR | Video Super-Resolution | REDS4 | 1.45 | 111 | 31.2535 | 0.8884 | + +## 3. Reference +Ref: https://github.com/PaddlePaddle/PaddleGAN/blob/develop/docs/zh_CN/tutorials/video_super_resolution.md diff --git a/modelcenter/PP-MSVSR/download_cn.md b/modelcenter/PP-MSVSR/download_cn.md new file mode 100644 index 00000000..bc38c9c6 --- /dev/null +++ b/modelcenter/PP-MSVSR/download_cn.md @@ -0,0 +1,4 @@ +# 下载: +|模型名称 | 模型简介 | 数据集 | 参数量(M) | 计算量(G) | PSNR | SSIM | 下载地址 | +|---|---|---|---|---|---|---|---| +|PP-MSVSR_reds_x4 | 视频超分 | REDS/REDS4 | 1.45 | 111 | 31.2535 | 0.8884 |[推理模型](https://paddlegan.bj.bcebos.com/static_model/msvsr_reds_infer.zip)/[预训练模型](https://paddlegan.bj.bcebos.com/models/PP-MSVSR_reds_x4.pdparams) | diff --git a/modelcenter/PP-MSVSR/download_en.md b/modelcenter/PP-MSVSR/download_en.md new file mode 100644 index 00000000..38fcb38a --- /dev/null +++ b/modelcenter/PP-MSVSR/download_en.md @@ -0,0 +1,5 @@ +# Download + +| model | task | dataset | Parameter (M) | FLOPs (G) | PSNR | SSIM | download | +|---|---|---|---|---|---|---|---| +|PP-MSVSR_reds_x4 | Video Super-Resolution | REDS/REDS4 | 1.45 | 111 | 31.2535 | 0.8884 |[inference_model](https://paddlegan.bj.bcebos.com/static_model/msvsr_reds_infer.zip)/[Pretrained_model](https://paddlegan.bj.bcebos.com/models/PP-MSVSR_reds_x4.pdparams) | diff --git a/modelcenter/PP-MSVSR/info.yaml b/modelcenter/PP-MSVSR/info.yaml new file mode 100644 index 00000000..37f79662 --- /dev/null +++ b/modelcenter/PP-MSVSR/info.yaml @@ -0,0 +1,29 @@ +--- +Model_Info: + name: "PP-MSVSR" + description: "视频超分" + description_en: "video super resolution" + icon: "@后续UE统一设计之后,会存到bos上某个位置" + from_repo: "PaddleGAN" +Task: +- tag_en: "Computer Vision" + tag: "计算机视觉" + sub_tag_en: "Image Super-resolution" + sub_tag: "图像超分辨" +Example: +- tag_en: "Internet" + tag: "互联网" + sub_tag_en: "视频修复" + sub_tag: "video repair" + title: "老北京城影像修复" + title_en: "Restoration of old Beijing videos" + url: "https://aistudio.baidu.com/aistudio/projectdetail/1161285" + url_en: +Datasets: "REDS, Vimeo90K, Vid4, UDM10" +Pulisher: "Baidu" +License: "apache.2.0" +Paper: +- title: "PP-MSVSR: Multi-Stage Video Super-Resolution" + url: "https://arxiv.org/pdf/2112.02828.pdf" +IfTraining: 1 +IfOnlineDemo: 1 diff --git a/modelcenter/PP-MSVSR/introduction_cn.ipynb b/modelcenter/PP-MSVSR/introduction_cn.ipynb new file mode 100644 index 00000000..18b2d217 --- /dev/null +++ b/modelcenter/PP-MSVSR/introduction_cn.ipynb @@ -0,0 +1,465 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. PP-msvsr模型简介\n", + "视频超分源于图像超分,其目的是从一个或多个低分辨率(LR)图像中恢复高分辨率(HR)图像。它们的区别也很明显,由于视频是由多个帧组成的,所以视频超分通常利用帧间的信息来进行修复。PP-MSVSR是一种多阶段视频超分深度架构,具有局部融合模块、辅助损失和细化对齐模块,以逐步细化增强结果。具体来说,在第一阶段设计了局部融合模块,在特征传播之前进行局部特征融合, 以加强特征传播中跨帧特征的融合。在第二阶段中引入了一个辅助损失,使传播模块获得的特征保留了更多与HR空间相关的信息。在第三阶段中引入了一个细化的对齐模块,以充分利用前一阶段传播模块的特征信息。实验证实,PP-MSVSR在Vid4数据集性能优异,仅使用 1.45M 参数PSNR指标即可达到28.13dB!\n", + "\n", + "PP-MSVSR模型由飞桨官方出品,是PaddleGAN自研的视频超分模型。\n", + "更多关于PaddleGAN可以点击https://github.com/PaddlePaddle/PaddleGAN 进行了解。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. 模型效果及应用场景\n", + "### 2.1 视频超分任务:\n", + "\n", + "#### 2.1.1 数据集:\n", + "\n", + "数据集以常用的视频超分验证数据集Vid4为例。\n", + "\n", + "#### 2.1.2 模型效果速览:\n", + "\n", + "PP-MSVSR在图片上的超分效果为:\n", + "\n", + "
\n", + "\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. 模型如何使用\n", + "\n", + "### 3.1 模型推理:\n", + "* 下载 \n", + "\n", + "(不在Jupyter Notebook上运行时需要将\"!\"或者\"%\"去掉。)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "%cd ~/work\n", + "# 克隆PaddleGAN(从gitee上更快),本项目已经做持久化处理,不用克隆了。\n", + "!git clone https://github.com/PaddlePaddle/PaddleGAN.git" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* 安装" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# 运行脚本需在PaddleGAN目录下\n", + "%cd ~/work/PaddleGAN/\n", + "\n", + "# 安装所需依赖项【已经做持久化处理,无需再安装】\n", + "!pip install -r requirements.txt\n", + "\n", + "# 运行脚本需在PaddleGAN目录下\n", + "%cd ~/work/PaddleGAN/\n", + "\n", + "# 开始安装PaddleGAN \n", + "!python setup.py develop #如果安装过程中长时间卡住,可中断后继续重新执行," + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* 快速体验\n", + "\n", + "恭喜! 您已经成功安装了PaddleGAN,接下来快速体验视频超分效果" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# 在GPU上预测一段视频\n", + "# 低分辨率视频下载地址:https://user-images.githubusercontent.com/79366697/200290225-7fdd364c-2fbe-48b6-a3bf-87349aedec98.mp4\n", + "%cd ~/work/PaddleGAN/applications/\n", + "!python tools/video-enhance.py --input demo/Peking_input360p_clip6_5s.mp4 \\\n", + " --process_order PPMSVSR \\\n", + " --output output_dir" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "执行完成后会在output_dir文件夹下生成一个超分后的视频。\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.2 模型训练:\n", + "* 克隆PaddleGAN仓库(详见3.1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* 准备数据集\n", + "\n", + "这里介绍4个视频超分辨率常用数据集,REDS,Vimeo90K,Vid4,UDM10。其中REDS和vimeo90k数据集包括训练集和测试集,Vid4和UDM10为测试数据集。将需要的数据集下载解压后放到``PaddleGAN/data``文件夹下 。\n", + "\n", + " REDS([数据下载](https://seungjunnah.github.io/Datasets/reds.html))数据集是NTIRE19比赛最新提出的高质量(720p)视频数据集,其由240个训练片段、30个验证片段和30个测试片段组成(每个片段有100个连续帧)。由于测试数据集不可用,这里在训练集选择了四个具有代表性的片段(分别为'000', '011', '015', '020',它们具有不同的场景和动作)作为测试集,用REDS4表示。剩下的训练和验证片段被重新分组为训练数据集(总共266个片段)。\n", + "\n", + " 处理后的数据集 REDS 的组成形式如下:\n", + " ```\n", + " PaddleGAN\n", + " ├── data\n", + " ├── REDS\n", + " ├── train_sharp\n", + " | └──X4\n", + " ├── train_sharp_bicubic\n", + " | └──X4\n", + " ├── REDS4_test_sharp\n", + " | └──X4\n", + " └── REDS4_test_sharp_bicubic\n", + " └──X4\n", + " ...\n", + " ```\n", + "\n", + " Vimeo90K([数据下载](http://toflow.csail.mit.edu/))数据集是Tianfan Xue等人构建的一个用于视频超分、视频降噪、视频去伪影、视频插帧的数据集。Vimeo90K是大规模、高质量的视频数据集,包含从vimeo.com下载的 89,800 个视频剪辑,涵盖了大量场景和动作。\n", + "\n", + " 处理后的数据集 Vimeo90K 的组成形式如下:\n", + " ```\n", + " PaddleGAN\n", + " ├── data\n", + " ├── Vimeo90K\n", + " ├── vimeo_septuplet\n", + " | |──sequences\n", + " | └──sep_trainlist.txt\n", + " ├── vimeo_septuplet_BD_matlabLRx4\n", + " | └──sequences\n", + " └── vimeo_super_resolution_test\n", + " |──low_resolution\n", + " |──target\n", + " └──sep_testlist.txt\n", + " ...\n", + " ```\n", + "\n", + " Vid4([数据下载](https://paddlegan.bj.bcebos.com/datasets/Vid4.zip))数据集是常用的视频超分验证数据集,包含4个视频段。\n", + "\n", + " 处理后的数据集 Vid4 的组成形式如下:\n", + " ```\n", + " PaddleGAN\n", + " ├── data\n", + " ├── Vid4\n", + " ├── BDx4\n", + " └── GT\n", + " ...\n", + " ```\n", + "\n", + " UDM10([数据下载](https://paddlegan.bj.bcebos.com/datasets/udm10_paddle.tar))数据集是常用的视频超分验证数据集,包含10个视频段。\n", + "\n", + " 处理后的数据集 UDM10 的组成形式如下:\n", + " ```\n", + " PaddleGAN\n", + " ├── data\n", + " ├── udm10\n", + " ├── BDx4\n", + " └── GT\n", + " ...\n", + " ```\n", + "以REDS数据集为例,通过以下命令确认数据集已经准备完成。\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "# 查看解压目录\n", + "#%cd ~/work/PaddleGAN/\n", + "#!tree -d data/REDS" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* 修改yaml配置文件\n", + "\n", + "\n", + "\n", + "修改配置文件``` configs/msvsr_reds.yaml```\n", + "\n", + "```\n", + "total_iters: 150000 # 总的训练迭代次数\n", + "output_dir: output_dir # 模型参数保存路径\n", + "find_unused_parameters: True\n", + "checkpoints_dir: checkpoints\n", + "use_dataset: True\n", + "# tensor range for function tensor2img\n", + "min_max:\n", + " (0., 1.)\n", + "\n", + "model:\n", + " name: MultiStageVSRModel\n", + " fix_iter: 2500\n", + " generator:\n", + " name: MSVSR\n", + " mid_channels: 32\n", + " num_init_blocks: 2\n", + " num_blocks: 3\n", + " num_reconstruction_blocks: 2\n", + " only_last: True\n", + " use_tiny_spynet: True\n", + " deform_groups: 4\n", + " stage1_groups: 8\n", + " auxiliary_loss: True\n", + " use_refine_align: True\n", + " aux_reconstruction_blocks: 1\n", + " use_local_connnect: True\n", + " pixel_criterion:\n", + " name: CharbonnierLoss\n", + " reduction: mean\n", + "\n", + "dataset:\n", + " train:\n", + " name: RepeatDataset\n", + " times: 1000\n", + " num_workers: 6\n", + " batch_size: 2 #建议使用单机8卡训练,每个卡batch_size为2\n", + " dataset:\n", + " name: VSRREDSMultipleGTDataset\n", + " lq_folder: data/REDS/train_sharp_bicubic/X4 # 训练数据低分辨率图像路径\n", + " gt_folder: data/REDS/train_sharp/X4 # 训练数据高分辨率图像(GT)路径\n", + " ann_file: data/REDS/meta_info_REDS_GT.txt # 数据集信息文档\n", + " num_frames: 20 # 训练时模型的输入视频帧数\n", + " preprocess:\n", + " - name: GetNeighboringFramesIdx\n", + " interval_list: [1]\n", + " - name: ReadImageSequence\n", + " key: lq\n", + " - name: ReadImageSequence\n", + " key: gt\n", + " - name: Transforms\n", + " input_keys: [lq, gt]\n", + " pipeline:\n", + " - name: SRPairedRandomCrop\n", + " gt_patch_size: 256\n", + " scale: 4\n", + " keys: [image, image]\n", + " - name: PairedRandomHorizontalFlip\n", + " keys: [image, image]\n", + " - name: PairedRandomVerticalFlip\n", + " keys: [image, image]\n", + " - name: PairedRandomTransposeHW\n", + " keys: [image, image]\n", + " - name: TransposeSequence\n", + " keys: [image, image]\n", + " - name: NormalizeSequence\n", + " mean: [0., 0., 0.]\n", + " std: [255., 255., 255.]\n", + " keys: [image, image]\n", + "\n", + " test:\n", + " name: VSRREDSMultipleGTDataset\n", + " lq_folder: data/REDS/REDS4_test_sharp_bicubic/X4 # 测试/验证数据低分辨率图像路径\n", + " gt_folder: data/REDS/REDS4_test_sharp/X4 # 测试/验证数据高分辨率图像(GT)路径\n", + " ann_file: data/REDS/meta_info_REDS_GT.txt # 数据集信息文档\n", + " num_frames: 100 # 测试/验证时模型的输入视频帧数\n", + " test_mode: True\n", + " preprocess:\n", + " - name: GetNeighboringFramesIdx\n", + " interval_list: [1]\n", + " - name: ReadImageSequence\n", + " key: lq\n", + " - name: ReadImageSequence\n", + " key: gt\n", + " - name: Transforms\n", + " input_keys: [lq, gt]\n", + " pipeline:\n", + " - name: TransposeSequence\n", + " keys: [image, image]\n", + " - name: NormalizeSequence\n", + " mean: [0., 0., 0.]\n", + " std: [255., 255., 255.]\n", + " keys: [image, image]\n", + "\n", + "lr_scheduler:\n", + " name: CosineAnnealingRestartLR\n", + " learning_rate: !!float 2e-4 # 学习率\n", + " periods: [150000]\n", + " restart_weights: [1]\n", + " eta_min: !!float 1e-7\n", + "\n", + "optimizer:\n", + " name: Adam\n", + " # add parameters of net_name to optim\n", + " # name should in self.nets\n", + " net_names:\n", + " - generator\n", + " beta1: 0.9\n", + " beta2: 0.99\n", + "\n", + "validate:\n", + " interval: 5000 # 多少个迭代次进行验证\n", + " save_img: false # 验证时是否保存图像\n", + "\n", + " metrics:\n", + " psnr: # metric name, can be arbitrary\n", + " name: PSNR # 验证指标 PSNR\n", + " crop_border: 0\n", + " test_y_channel: false\n", + " ssim:\n", + " name: SSIM # 验证指标 SSIM\n", + " crop_border: 0\n", + " test_y_channel: false\n", + "\n", + "log_config:\n", + " interval: 10 # 每多少个迭代次显示\n", + " visiual_interval: 5000 # 多少个迭代次可视化\n", + "\n", + "snapshot_config:\n", + " interval: 5000 # 多少个迭代次保存一次\n", + "\n", + "export_model:\n", + " - {name: 'generator', inputs_num: 1}\n", + "\n", + "```\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* 训练模型" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "%cd ~/work/PaddleGAN/\n", + "%export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8\n", + "#开始训练\n", + "!ppython -m paddle.distributed.launch tools/main.py --config-file configs/msvsr_reds.yaml" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* 模型评估\n", + "\n", + "模型评估也是使用配置文件```configs/msvsr_reds.yaml```以REDS数据集为例,需要下载好REDS数据集的验证数据集并且解压到 PaddleGAN/data/REDS/下,在配置文件中更改验证数据集路径后,使用如下命令进行评估。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "%cd ~/work/PaddleGAN/\n", + "%env CUDA_VISIBLE_DEVICES=0\n", + "#训练完以后,进行评估\n", + "#模型参数下载地址:https://paddlegan.bj.bcebos.com/models/PP-MSVSR_reds_x4.pdparams\n", + "!python tools/main.py --config-file configs/msvsr_reds.yaml --evaluate-only --load ${PATH_OF_WEIGHT}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. 模型原理\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* 与单幅图像超分(SISR)任务不同,视频超分(VSR)任务的关键是充分利用跨帧之间的互补信息来重建高分辨率图像序列。由于来自不同帧的图像具有不同的运动和场景,准确的对齐多个帧和有效的融合不同帧一直是 VSR 任务的研究重点。为了利用相邻帧的丰富互补信息,PP-MSVSR设计了一种多阶段 VSR 深度架构,包括局部融合模块、辅助损失和再对齐模块来逐步增强超分效果。\n", + "* PP-MSVSR结合了滑动窗口方法和循环网络方法的思想,使用多阶段策略进行视频超分,设计了局部融合模块、辅助损失和细化对齐模块以逐步细化增强结果。\n", + "
\n", + "\n", + "
\n", + "\n", + "* 受滑动窗口方法思想的启发,PP-MSVSR在第一阶段设计了一个局部融合模块LFM。该模块在特征传播之前先进行局部特征融合,以加强特征传播中的跨帧特征融合。 具体来说,LFM的目的是让当前帧的特征先融合其相邻帧的信息,然后将融合后的特传给下一阶段的传播模块。\n", + "
\n", + "\n", + "
\n", + "\n", + "* 受到循环网络的启发,PP-MSVSR在第二阶段使用与 basicvsr++ 相同的双向循环结构来融合和传播特征,并设计了一个辅助损失,使特征更接近真实高分辨率特征空间。 具体来说,对第二阶段传播后的特征上采样后添加辅助损失。\n", + "* 与图像超分不同,视频超分通常需要将相邻帧与当前帧对齐以更好地整合相邻帧的信息。 在一些大型运动视频超分任务中,对齐的作用尤为明显。在使用双向循环网络的过程中,往往会有多次相同的对齐操作。为了充分利用之前对齐操作的结果,PP-MSVSR提出了细化对齐模块RAM,可以利用之前对齐的参数并获得更好的对齐结果。\n", + "
\n", + "\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. 相关论文以及引用信息\n", + "```\n", + "@article{jiang2021PP-MSVSR,\n", + " author = {Jiang, Lielin and Wang, Na and Dang, Qingqing and Liu, Rui and Lai, Baohua},\n", + " title = {PP-MSVSR: Multi-Stage Video Super-Resolution},\n", + " booktitle = {arXiv preprint arXiv:2112.02828},\n", + " year = {2021}\n", + " }\n", + "```\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + }, + "orig_nbformat": 4 + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/modelcenter/PP-MSVSR/introduction_en.ipynb b/modelcenter/PP-MSVSR/introduction_en.ipynb new file mode 100644 index 00000000..039aab38 --- /dev/null +++ b/modelcenter/PP-MSVSR/introduction_en.ipynb @@ -0,0 +1,458 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. PP-msvsr Introduction\n", + "Video super-resolution originates from image super-resolution, which aims to recover high-resolution (HR) images from one or more low resolution (LR) images. The difference between them is that the video is composed of multiple frames, so the video super-resolution usually uses the information between frames to repair. PP-MSVSR is a multi-stage VSR deep architecture, with local fusion module, auxiliary loss and refined align module to refine the enhanced result progressively. Specifically, in order to strengthen the fusion of features across frames in feature propagation, a local fusion module is designed in stage-1 to perform local feature fusion before feature propagation. Moreover, an auxiliary loss in stage-2 is introduced to make the features obtained by the propagation module reserve more correlated information connected to the HR space, and introduced a refined align module in stage-3 to make full use of the feature information of the previous stage. Extensive experiments substantiate that PP-MSVSR achieves a promising performance of Vid4 datasets, which PSNR metric can achieve 28.13 with only 1.45M parameters.\n", + "\n", + "The PP-MSVSR model is officially produced by PaddlePaddle and is a video super-resolution model developed by PaddleGan. More information about PaddleGAN can be found here https://github.com/PaddlePaddle/PaddleGAN.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Model Effects and Application Scenarios\n", + "### 2.1 Video Super-Resolution Tasks:\n", + "\n", + "#### 2.1.1 Datasets:\n", + "\n", + "The commonly used video super-resolution dataset Vid4 is taken as an example.\n", + "\n", + "#### 2.1.2 Model Effects:\n", + "\n", + "PP-MSVSR在图片上的超分效果为:\n", + "The video super-resolution effect of PP-msvsr on the video is:\n", + "\n", + "
\n", + "\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. How to Use the Model\n", + "\n", + "### 3.1 Model Inference:\n", + "* Download \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "%cd /home/aistudio/work\n", + "!git clone https://github.com/PaddlePaddle/PaddleGAN.git" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* Installation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# The script needs to be run in the PaddleGAN directory\n", + "%cd /home/aistudio/work/PaddleGAN/\n", + "\n", + "# Install the required dependencies [already persisted, no need to install again].\n", + "!pip install -r requirements.txt\n", + "\n", + "# The script needs to be run in the PaddleGAN directory\n", + "%cd /home/aistudio/work/PaddleGAN/\n", + "\n", + "# Download PaddleGAN \n", + "!python setup.py develop " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* Quick experience\n", + "\n", + "Congratulations! Now that you've successfully installed PaddleGAN, let's get a quick feel at video super-resolution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Predict a video on the GPU.\n", + "# Low resolution video download address: https://user-images.githubusercontent.com/79366697/200290225-7fdd364c-2fbe-48b6-a3bf-87349aedec98.mp4\n", + "%cd ~/work/PaddleGAN/applications/\n", + "!python tools/video-enhance.py --input demo/Peking_input360p_clip6_5s.mp4 \\\n", + " --process_order PPMSVSR \\\n", + " --output output_dir" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A video with the predicted result is generated under the output_dir folder.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.2 Model Training\n", + "* Clone the PaddleGAN repository (see 3.1 for details)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* Prepare the datasets\n", + "\n", + "Here are 4 commonly used video super-resolution dataset, REDS, Vimeo90K, Vid4, UDM10. The REDS and Vimeo90K dataset include train dataset and test dataset, Vid4 and UDM10 are test dataset. Download and decompress the required dataset and place it under the ``PaddleGAN/data``.\n", + "\n", + "REDS([download](https://seungjunnah.github.io/Datasets/reds.html))is a newly proposed high-quality (720p) video dataset in the NTIRE19 Competition. REDS consists of 240 training clips, 30 validation clips and 30 testing clips (each with 100 consecutive frames). Since the test ground truth is not available, we select four representative clips (they are '000', '011', '015', '020', with diverse scenes and motions) as our test set, denoted by REDS4. The remaining training and validation clips are re-grouped as our training dataset (a total of 266 clips).\n", + "\n", + "The structure of the processed REDS is as follows:\n", + "```\n", + " PaddleGAN\n", + " ├── data\n", + " ├── REDS\n", + " ├── train_sharp\n", + " | └──X4\n", + " ├── train_sharp_bicubic\n", + " | └──X4\n", + " ├── REDS4_test_sharp\n", + " | └──X4\n", + " └── REDS4_test_sharp_bicubic\n", + " └──X4\n", + " ...\n", + "```\n", + "\n", + "Vimeo90K ([download](http://toflow.csail.mit.edu/)) is designed by Tianfan Xue etc. for the following four video processing tasks: temporal frame interpolation, video denoising, video deblocking, and video super-resolution. Vimeo90K is a large-scale, high-quality video dataset. This dataset consists of 89,800 video clips downloaded from vimeo.com, which covers large variaty of scenes and actions.\n", + "\n", + "The structure of the processed Vimeo90K is as follows:\n", + "```\n", + " PaddleGAN\n", + " ├── data\n", + " ├── Vimeo90K\n", + " ├── vimeo_septuplet\n", + " | |──sequences\n", + " | └──sep_trainlist.txt\n", + " ├── vimeo_septuplet_BD_matlabLRx4\n", + " | └──sequences\n", + " └── vimeo_super_resolution_test\n", + " |──low_resolution\n", + " |──target\n", + " └──sep_testlist.txt\n", + " ...\n", + "```\n", + "\n", + "Vid4 ([Data Download](https://paddlegan.bj.bcebos.com/datasets/Vid4.zip)) is a commonly used test dataset for VSR, which contains 4 video segments.\n", + "The structure of the processed Vid4 is as follows:\n", + "```\n", + " PaddleGAN\n", + " ├── data\n", + " ├── Vid4\n", + " ├── BDx4\n", + " └── GT\n", + " ...\n", + "```\n", + "\n", + "UDM10 ([Data Download](https://paddlegan.bj.bcebos.com/datasets/udm10_paddle.tar)) is a commonly used test dataset for VSR, which contains 10 video segments.\n", + "The structure of the processed UDM10 is as follows:\n", + "```\n", + " PaddleGAN\n", + " ├── data\n", + " ├── udm10\n", + " ├── BDx4\n", + " └── GT\n", + " ...\n", + "```\n", + "\n", + "Using the REDS dataset as an example, verify that the dataset is ready by using the following command.\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "# Review the extract directory\n", + "#%cd /home/aistudio/work/PaddleGAN/\n", + "#!tree -d data/REDS" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* Change yaml configurations files.\n", + "\n", + "Change yaml configurations files``` configs/msvsr_reds.yaml```\n", + "\n", + "```\n", + "total_iters: 150000 \n", + "output_dir: output_dir \n", + "find_unused_parameters: True\n", + "checkpoints_dir: checkpoints\n", + "use_dataset: True\n", + "# tensor range for function tensor2img\n", + "min_max:\n", + " (0., 1.)\n", + "\n", + "model:\n", + " name: MultiStageVSRModel\n", + " fix_iter: 2500\n", + " generator:\n", + " name: MSVSR\n", + " mid_channels: 32\n", + " num_init_blocks: 2\n", + " num_blocks: 3\n", + " num_reconstruction_blocks: 2\n", + " only_last: True\n", + " use_tiny_spynet: True\n", + " deform_groups: 4\n", + " stage1_groups: 8\n", + " auxiliary_loss: True\n", + " use_refine_align: True\n", + " aux_reconstruction_blocks: 1\n", + " use_local_connnect: True\n", + " pixel_criterion:\n", + " name: CharbonnierLoss\n", + " reduction: mean\n", + "\n", + "dataset:\n", + " train:\n", + " name: RepeatDataset\n", + " times: 1000\n", + " num_workers: 6\n", + " batch_size: 2 \n", + " dataset:\n", + " name: VSRREDSMultipleGTDataset\n", + " lq_folder: data/REDS/train_sharp_bicubic/X4 \n", + " gt_folder: data/REDS/train_sharp/X4 \n", + " ann_file: data/REDS/meta_info_REDS_GT.txt \n", + " num_frames: 20 \n", + " preprocess:\n", + " - name: GetNeighboringFramesIdx\n", + " interval_list: [1]\n", + " - name: ReadImageSequence\n", + " key: lq\n", + " - name: ReadImageSequence\n", + " key: gt\n", + " - name: Transforms\n", + " input_keys: [lq, gt]\n", + " pipeline:\n", + " - name: SRPairedRandomCrop\n", + " gt_patch_size: 256\n", + " scale: 4\n", + " keys: [image, image]\n", + " - name: PairedRandomHorizontalFlip\n", + " keys: [image, image]\n", + " - name: PairedRandomVerticalFlip\n", + " keys: [image, image]\n", + " - name: PairedRandomTransposeHW\n", + " keys: [image, image]\n", + " - name: TransposeSequence\n", + " keys: [image, image]\n", + " - name: NormalizeSequence\n", + " mean: [0., 0., 0.]\n", + " std: [255., 255., 255.]\n", + " keys: [image, image]\n", + "\n", + " test:\n", + " name: VSRREDSMultipleGTDataset\n", + " lq_folder: data/REDS/REDS4_test_sharp_bicubic/X4 \n", + " gt_folder: data/REDS/REDS4_test_sharp/X4 \n", + " ann_file: data/REDS/meta_info_REDS_GT.txt \n", + " num_frames: 100 \n", + " test_mode: True\n", + " preprocess:\n", + " - name: GetNeighboringFramesIdx\n", + " interval_list: [1]\n", + " - name: ReadImageSequence\n", + " key: lq\n", + " - name: ReadImageSequence\n", + " key: gt\n", + " - name: Transforms\n", + " input_keys: [lq, gt]\n", + " pipeline:\n", + " - name: TransposeSequence\n", + " keys: [image, image]\n", + " - name: NormalizeSequence\n", + " mean: [0., 0., 0.]\n", + " std: [255., 255., 255.]\n", + " keys: [image, image]\n", + "\n", + "lr_scheduler:\n", + " name: CosineAnnealingRestartLR\n", + " learning_rate: !!float 2e-4 \n", + " periods: [150000]\n", + " restart_weights: [1]\n", + " eta_min: !!float 1e-7\n", + "\n", + "optimizer:\n", + " name: Adam\n", + " # add parameters of net_name to optim\n", + " # name should in self.nets\n", + " net_names:\n", + " - generator\n", + " beta1: 0.9\n", + " beta2: 0.99\n", + "\n", + "validate:\n", + " interval: 5000 \n", + " save_img: false \n", + "\n", + " metrics:\n", + " psnr: # metric name, can be arbitrary\n", + " name: PSNR \n", + " crop_border: 0\n", + " test_y_channel: false\n", + " ssim:\n", + " name: SSIM \n", + " crop_border: 0\n", + " test_y_channel: false\n", + "\n", + "log_config:\n", + " interval: 10 \n", + " visiual_interval: 5000 \n", + "\n", + "snapshot_config:\n", + " interval: 5000 \n", + "\n", + "export_model:\n", + " - {name: 'generator', inputs_num: 1}\n", + "\n", + "```\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* Train the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "%cd ~/work/PaddleGAN/\n", + "%export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8\n", + "# Beginning training\n", + "!ppython -m paddle.distributed.launch tools/main.py --config-file configs/msvsr_reds.yaml" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* Model evaluation\n", + "\n", + "We provide ```configs/msvsr_reds.yaml```for evaluating the effect of REDS dataset, to evaluate the effect of REDS dataset, you must first download the REDS dataset from the REDS dataset download page, and extract it to ```PaddleGAN/data/REDS```. Using the following command can evaluate." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "%cd ~/work/PaddleGAN/\n", + "%env CUDA_VISIBLE_DEVICES=0\n", + "#Download address for model: https://paddlegan.bj.bcebos.com/models/PP-MSVSR_reds_x4.pdparams\n", + "!python tools/main.py --config-file configs/msvsr_reds.yaml --evaluate-only --load ${PATH_OF_WEIGHT}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Model Principles\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* Different from the Single Image Super-Resolution(SISR) task, the key for Video Super-Resolution(VSR) task is to make full use of complementary information across frames to reconstruct the high-resolution sequence. Since images from different frames with diverse motion and scene, accurately aligning multiple frames and effectively fusing different frames has always been the key research work of VSR tasks. To utilizes rich complementary information of neighboring frames, PP-MSVSR propose a multi-stage VSR deep architecture, with local fusion module, auxiliary loss and refined align module to refine the enhanced result progressively. \n", + "* PP-MSVSR propose a multi-stage network that combines the idea of sliding-window framework and recurrent framework, with local fusion module, auxiliary loss and refined align module to refine the enhanced result progressively. \n", + "
\n", + "\n", + "
\n", + "\n", + "* Inspired by the idea of sliding-window VSR, PP-MSVSR designed a local fusion module in stage-1, denoted as LFM, to perform local feature fusion before feature propagation, which can strengthen cross-frame feature fusion in feature propagation. Specifically, the purpose of the LFM is to let the features of the current frame fuse the information of its neighboring frames first, and then send the fused features to the propagation module.\n", + "
\n", + "\n", + "
\n", + "\n", + "* Inspired by the power of recurrent vsr network, PP-MSVSR use a same structure as basicvsr++ to fusion the information from different video frame and local merged feature and then propagates the underlying information between each video frame at the stage-2. In addition, PP-MSVSR add a auxiliary loss to make feature more closed to HR space. To be specific, the auxiliary loss is used after upsampling the feature, which propagation in stage-2\n", + "* Different from SISR, in order to better integrate the in- formation of adjacent frames, VSR usually aligns adjacent frames with the current frame. In some large motion video restoration tasks, the role of alignment is particularly ob- vious. In the process of using a two-way recurrent neural network, there are often multiple identical alignment opera- tions. In order to make full use of the results of the previous alignment operations, PP-MSVSR propose a Refined Align Module, denoted as RAM, that can utilize the previously aligned parameters and achieve a better alignment result.\n", + "
\n", + "\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Related papers and citations\n", + "```\n", + "@article{jiang2021PP-MSVSR,\n", + " author = {Jiang, Lielin and Wang, Na and Dang, Qingqing and Liu, Rui and Lai, Baohua},\n", + " title = {PP-MSVSR: Multi-Stage Video Super-Resolution},\n", + " booktitle = {arXiv preprint arXiv:2112.02828},\n", + " year = {2021}\n", + " }\n", + "```\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + }, + "orig_nbformat": 4 + }, + "nbformat": 4, + "nbformat_minor": 2 +} -- GitLab