未验证 提交 87537ad9 编写于 作者: F FutureSI 提交者: GitHub

Photopen (#483)

* add photopen 

* update photopen.md

* fix photopen_model.py

* fix the ci problem
Co-authored-by: Nqingqing01 <dangqingqing@baidu.com>
上级 283c8916
import paddle
import os
import sys
sys.path.insert(0, os.getcwd())
from ppgan.apps import PhotoPenPredictor
import argparse
from ppgan.utils.config import get_config
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--semantic_label_path",
type=str,
default=None,
help="path to input semantic label")
parser.add_argument("--output_path",
type=str,
default=None,
help="path to output image dir")
parser.add_argument("--weight_path",
type=str,
default=None,
help="path to model weight")
parser.add_argument("--config-file",
type=str,
default=None,
help="path to yaml file")
parser.add_argument("--cpu",
dest="cpu",
action="store_true",
help="cpu mode.")
args = parser.parse_args()
if args.cpu:
paddle.set_device('cpu')
cfg = get_config(args.config_file)
predictor = PhotoPenPredictor(output_path=args.output_path,
weight_path=args.weight_path,
gen_cfg=cfg.predict)
predictor.run(semantic_label_path=args.semantic_label_path)
\ No newline at end of file
total_iters: 1
output_dir: output_dir
checkpoints_dir: checkpoints
model:
name: PhotoPenModel
generator:
name: SPADEGenerator
ngf: 24
num_upsampling_layers: normal
crop_size: 256
aspect_ratio: 1.0
norm_G: spectralspadebatch3x3
semantic_nc: 14
use_vae: False
nef: 16
discriminator:
name: MultiscaleDiscriminator
ndf: 128
num_D: 4
crop_size: 256
label_nc: 12
output_nc: 3
contain_dontcare_label: True
no_instance: False
n_layers_D: 6
criterion:
name: PhotoPenPerceptualLoss
crop_size: 224
lambda_vgg: 1.6
label_nc: 12
contain_dontcare_label: True
batchSize: 1
crop_size: 256
lambda_feat: 10.0
dataset:
train:
name: PhotoPenDataset
content_root: test/coco_stuff
load_size: 286
crop_size: 256
num_workers: 0
batch_size: 1
test:
name: PhotoPenDataset_test
content_root: test/coco_stuff
load_size: 286
crop_size: 256
num_workers: 0
batch_size: 1
lr_scheduler: # abundoned
name: LinearDecay
learning_rate: 0.0001
start_epoch: 99999
decay_epochs: 99999
# will get from real dataset
iters_per_epoch: 1
optimizer:
lr: 0.0001
optimG:
name: Adam
net_names:
- net_gen
beta1: 0.9
beta2: 0.999
optimD:
name: Adam
net_names:
- net_des
beta1: 0.9
beta2: 0.999
log_config:
interval: 1
visiual_interval: 1
snapshot_config:
interval: 1
predict:
name: SPADEGenerator
ngf: 24
num_upsampling_layers: normal
crop_size: 256
aspect_ratio: 1.0
norm_G: spectralspadebatch3x3
semantic_nc: 14
use_vae: False
nef: 16
contain_dontcare_label: True
label_nc: 12
batchSize: 1
\ No newline at end of file
# GauGAN(加SimAM注意力的改进版)
## 1.简介:
本应用的模型出自论文《Semantic Image Synthesis with Spatially-Adaptive Normalization》,是一个像素风格迁移网络 Pix2PixHD,能够根据输入的语义分割标签生成照片风格的图片。为了解决模型归一化层导致标签语义信息丢失的问题,论文作者向 Pix2PixHD 的生成器网络中添加了 SPADE(Spatially-Adaptive Normalization)空间自适应归一化模块,通过两个卷积层保留了归一化时训练的缩放与偏置参数的空间维度,以增强生成图片的质量。
![](https://ai-studio-static-online.cdn.bcebos.com/4fc3036fdc18443a9dcdcddb960b5da1c689725bbfa84de2b92421a8640e0ee5)
此模型在 GauGAN 的 SPADE 模块上添加了无参的 SimAM 注意力模块,增强了生成图片的立体质感。
![](https://ai-studio-static-online.cdn.bcebos.com/94731023eab94b1b97b9ca80bd3b30830c918cf162d046bd88540dda450295a3)
## 2.快速体验
预训练模型可以从如下地址下载: (https://paddlegan.bj.bcebos.com/models/photopen.pdparams)
输入一张png格式的语义标签图片给模型,输出一张按标签语义生成的照片风格的图片。预测代码如下:
```
python applications/tools/photopen.py \
--semantic_label_path test/sem.png \
--weight_path test/n_g.pdparams \
--output_path test/pic.jpg \
--config-file configs/photopen.yaml
```
**参数说明:**
* semantic_label_path:输入的语义标签路径,为png图片文件
* weight_path:训练完成的模型权重存储路径,为 statedict 格式(.pdparams)的 Paddle 模型行权重文件
* output_path:预测生成图片的存储路径
* config-file:存储参数设定的yaml文件存储路径,与训练过程使用同一个yaml文件,预测参数由 predict 下字段设定
## 3.训练
**数据准备:**
数据集目录结构如下:
```
└─coco_stuff
├─train_img
└─train_inst
```
coco_stuff 是数据集根目录可任意改变,其下的 train_img 子目录存放训练用的风景图片(一般jpg格式),train_inst 子目录下存放与风景图片文件名一一对应、尺寸相同的语义标签图片(一般png格式)。
### 3.1 gpu 单卡训练
`python -u tools/main.py --config-file configs/photopen.yaml`
* config-file:训练使用的超参设置 yamal 文件的存储路径
### 3.2 gpu 多卡训练
```
!python -m paddle.distributed.launch \
tools/main.py \
--config-file configs/photopen.yaml \
-o model.generator.norm_G=spectralspadesyncbatch3x3 \
model.batchSize=4 \
dataset.train.batch_size=4
```
* config-file:训练使用的超参设置 yamal 文件的存储路径
* model.generator.norm_G:设置使用 syncbatch 归一化,使多个 GPU 中的数据一起进行归一化
* model.batchSize:设置模型的 batch size,一般为 GPU 个数的整倍数
* dataset.train.batch_size:设置数据读取的 batch size,要和模型的 batch size 一致
### 3.3 继续训练
`python -u tools/main.py --config-file configs/photopen.yaml --resume output_dir\photopen-2021-09-30-15-59\iter_3_checkpoint.pdparams`
* config-file:训练使用的超参设置 yamal 文件的存储路径
* resume:指定读取的 checkpoint 路径
## 4.模型效果展示
![](https://ai-studio-static-online.cdn.bcebos.com/72a4a6ede506436ebaa6fb6982aa899607a80e20a54f4b138fb7ae9673e12e6e)
## 5.参考
```
@inproceedings{park2019SPADE,
title={Semantic Image Synthesis with Spatially-Adaptive Normalization},
author={Park, Taesung and Liu, Ming-Yu and Wang, Ting-Chun and Zhu, Jun-Yan},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2019}
}
@InProceedings{pmlr-v139-yang21o,
title = {SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks},
author = {Yang, Lingxiao and Zhang, Ru-Yuan and Li, Lida and Xie, Xiaohua},
booktitle = {Proceedings of the 38th International Conference on Machine Learning},
pages = {11863--11874},
year = {2021},
editor = {Meila, Marina and Zhang, Tong},
volume = {139},
series = {Proceedings of Machine Learning Research},
month = {18--24 Jul},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v139/yang21o/yang21o.pdf},
url = {http://proceedings.mlr.press/v139/yang21o.html}
}
```
......@@ -30,3 +30,4 @@ from .pixel2style2pixel_predictor import Pixel2Style2PixelPredictor
from .wav2lip_predictor import Wav2LipPredictor
from .mpr_predictor import MPRPredictor
from .lapstyle_predictor import LapStylePredictor
from .photopen_predictor import PhotoPenPredictor
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from PIL import Image, ImageOps
import cv2
import numpy as np
import os
import paddle
from .base_predictor import BasePredictor
from ppgan.models.generators import SPADEGenerator
from ppgan.utils.photopen import data_onehot_pro
from ..utils.filesystem import load
class PhotoPenPredictor(BasePredictor):
def __init__(self,
output_path,
weight_path,
gen_cfg):
# 初始化模型
gen = SPADEGenerator(
gen_cfg.ngf,
gen_cfg.num_upsampling_layers,
gen_cfg.crop_size,
gen_cfg.aspect_ratio,
gen_cfg.norm_G,
gen_cfg.semantic_nc,
gen_cfg.use_vae,
gen_cfg.nef,
)
gen.eval()
para = load(weight_path)
if 'net_gen' in para:
gen.set_state_dict(para['net_gen'])
else:
gen.set_state_dict(para)
self.gen = gen
self.output_path = output_path
self.gen_cfg = gen_cfg
def run(self, semantic_label_path):
sem = Image.open(semantic_label_path)
sem = sem.resize((self.gen_cfg.crop_size, self.gen_cfg.crop_size), Image.NEAREST)
sem = np.array(sem).astype('float32')
sem = paddle.to_tensor(sem)
sem = sem.reshape([1, 1, self.gen_cfg.crop_size, self.gen_cfg.crop_size])
one_hot = data_onehot_pro(sem, self.gen_cfg)
predicted = self.gen(one_hot)
pic = predicted.numpy()[0].reshape((3, 256, 256)).transpose((1,2,0))
pic = ((pic + 1.) / 2. * 255).astype('uint8')
pic = cv2.cvtColor(pic,cv2.COLOR_BGR2RGB)
path, _ = os.path.split(self.output_path)
if not os.path.exists(path):
os.mkdir(path)
cv2.imwrite(self.output_path, pic)
\ No newline at end of file
......@@ -26,3 +26,4 @@ from .firstorder_dataset import FirstOrderDataset
from .lapstyle_dataset import LapStyleDataset
from .sr_reds_multiple_gt_dataset import SRREDSMultipleGTDataset
from .mpr_dataset import MPRTrain, MPRVal, MPRTest
from .photopen_dataset import PhotoPenDataset
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import os
import numpy as np
from PIL import Image
import paddle
import paddle.vision.transforms as T
from paddle.io import Dataset
import cv2
import random
from .builder import DATASETS
logger = logging.getLogger(__name__)
def data_transform(img, resize_w, resize_h, load_size=286, pos=[0, 0, 256, 256], flip=True, is_image=True):
if is_image:
resized = img.resize((resize_w, resize_h), Image.BICUBIC)
else:
resized = img.resize((resize_w, resize_h), Image.NEAREST)
croped = resized.crop((pos[0], pos[1], pos[2], pos[3]))
fliped = ImageOps.mirror(croped) if flip else croped
fliped = np.array(fliped) # transform to numpy array
expanded = np.expand_dims(fliped, 2) if len(fliped.shape) < 3 else fliped
transposed = np.transpose(expanded, (2, 0, 1)).astype('float32')
if is_image:
normalized = transposed / 255. * 2. - 1.
else:
normalized = transposed
return normalized
@DATASETS.register()
class PhotoPenDataset(Dataset):
def __init__(self, content_root, load_size, crop_size):
super(PhotoPenDataset, self).__init__()
inst_dir = os.path.join(content_root, 'train_inst')
_, _, inst_list = next(os.walk(inst_dir))
self.inst_list = np.sort(inst_list)
self.content_root = content_root
self.load_size = load_size
self.crop_size = crop_size
def __getitem__(self, idx):
ins = Image.open(os.path.join(self.content_root, 'train_inst', self.inst_list[idx]))
img = Image.open(os.path.join(self.content_root, 'train_img', self.inst_list[idx].replace(".png", ".jpg")))
img = img.convert('RGB')
w, h = img.size
resize_w, resize_h = 0, 0
if w < h:
resize_w, resize_h = self.load_size, int(h * self.load_size / w)
else:
resize_w, resize_h = int(w * self.load_size / h), self.load_size
left = random.randint(0, resize_w - self.crop_size)
top = random.randint(0, resize_h - self.crop_size)
flip = False
img = data_transform(img, resize_w, resize_h, load_size=self.load_size,
pos=[left, top, left + self.crop_size, top + self.crop_size], flip=flip, is_image=True)
ins = data_transform(ins, resize_w, resize_h, load_size=self.load_size,
pos=[left, top, left + self.crop_size, top + self.crop_size], flip=flip, is_image=False)
return {'img': img, 'ins': ins, 'img_path': self.inst_list[idx]}
def __len__(self):
return len(self.inst_list)
def name(self):
return 'PhotoPenDataset'
@DATASETS.register()
class PhotoPenDataset_test(Dataset):
def __init__(self, content_root, load_size, crop_size):
super(PhotoPenDataset_test, self).__init__()
inst_dir = os.path.join(content_root, 'test_inst')
_, _, inst_list = next(os.walk(inst_dir))
self.inst_list = np.sort(inst_list)
self.content_root = content_root
self.load_size = load_size
self.crop_size = crop_size
def __getitem__(self, idx):
ins = Image.open(os.path.join(self.content_root, 'test_inst', self.inst_list[idx]))
w, h = ins.size
resize_w, resize_h = 0, 0
if w < h:
resize_w, resize_h = self.load_size, int(h * self.load_size / w)
else:
resize_w, resize_h = int(w * self.load_size / h), self.load_size
left = random.randint(0, resize_w - self.crop_size)
top = random.randint(0, resize_h - self.crop_size)
flip = False
ins = data_transform(ins, resize_w, resize_h, load_size=self.load_size,
pos=[left, top, left + self.crop_size, top + self.crop_size], flip=flip, is_image=False)
return {'ins': ins, 'img_path': self.inst_list[idx]}
def __len__(self):
return len(self.inst_list)
def name(self):
return 'PhotoPenDataset'
......@@ -32,3 +32,4 @@ from .firstorder_model import FirstOrderModel
from .lapstyle_model import LapStyleDraModel, LapStyleRevFirstModel, LapStyleRevSecondModel
from .basicvsr_model import BasicVSRModel
from .mpr_model import MPRModel
from .photopen_model import PhotoPenModel
......@@ -3,5 +3,6 @@ from .perceptual_loss import PerceptualLoss
from .pixel_loss import L1Loss, MSELoss, CharbonnierLoss, \
CalcStyleEmdLoss, CalcContentReltLoss, \
CalcContentLoss, CalcStyleLoss, EdgeLoss
from .photopen_perceptual_loss import PhotoPenPerceptualLoss
from .builder import build_criterion
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
import paddle.vision.models.vgg as vgg
from paddle import ParamAttr
from paddle.nn import Conv2D, BatchNorm, Linear, Dropout
from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D
from ppgan.utils.download import get_path_from_url
from .builder import CRITERIONS
class ConvBlock(nn.Layer):
def __init__(self, input_channels, output_channels, groups, name=None):
super(ConvBlock, self).__init__()
self.groups = groups
self._conv_1 = Conv2D(
in_channels=input_channels,
out_channels=output_channels,
kernel_size=3,
stride=1,
padding=1,
bias_attr=False)
if groups == 2 or groups == 3 or groups == 4:
self._conv_2 = Conv2D(
in_channels=output_channels,
out_channels=output_channels,
kernel_size=3,
stride=1,
padding=1,
bias_attr=False)
if groups == 3 or groups == 4:
self._conv_3 = Conv2D(
in_channels=output_channels,
out_channels=output_channels,
kernel_size=3,
stride=1,
padding=1,
bias_attr=False)
if groups == 4:
self._conv_4 = Conv2D(
in_channels=output_channels,
out_channels=output_channels,
kernel_size=3,
stride=1,
padding=1,
bias_attr=False)
self._pool = MaxPool2D(kernel_size=2, stride=2, padding=0)
def forward(self, inputs):
x = self._conv_1(inputs)
x = F.relu(x)
if self.groups == 2 or self.groups == 3 or self.groups == 4:
x = self._conv_2(x)
x = F.relu(x)
if self.groups == 3 or self.groups == 4:
x = self._conv_3(x)
x = F.relu(x)
if self.groups == 4:
x = self._conv_4(x)
x = F.relu(x)
x = self._pool(x)
return x
class VGG19(nn.Layer):
def __init__(self, layers=19, class_dim=1000):
super(VGG19, self).__init__()
self.layers = layers
self.vgg_configure = {
11: [1, 1, 2, 2, 2],
13: [2, 2, 2, 2, 2],
16: [2, 2, 3, 3, 3],
19: [2, 2, 4, 4, 4]
}
assert self.layers in self.vgg_configure.keys(), \
"supported layers are {} but input layer is {}".format(
vgg_configure.keys(), layers)
self.groups = self.vgg_configure[self.layers]
self._conv_block_1 = ConvBlock(3, 64, self.groups[0], name="conv1_")
self._conv_block_2 = ConvBlock(64, 128, self.groups[1], name="conv2_")
self._conv_block_3 = ConvBlock(128, 256, self.groups[2], name="conv3_")
self._conv_block_4 = ConvBlock(256, 512, self.groups[3], name="conv4_")
self._conv_block_5 = ConvBlock(512, 512, self.groups[4], name="conv5_")
self._drop = Dropout(p=0.5, mode="downscale_in_infer")
self._fc1 = Linear(
7 * 7 * 512,
4096,)
self._fc2 = Linear(
4096,
4096,)
self._out = Linear(
4096,
class_dim,)
def forward(self, inputs):
features = []
features.append(inputs)
x = self._conv_block_1(inputs)
features.append(x)
x = self._conv_block_2(x)
features.append(x)
x = self._conv_block_3(x)
features.append(x)
x = self._conv_block_4(x)
features.append(x)
x = self._conv_block_5(x)
x = paddle.reshape(x, [0, -1])
x = self._fc1(x)
x = F.relu(x)
x = self._drop(x)
x = self._fc2(x)
x = F.relu(x)
x = self._drop(x)
x = self._out(x)
return x, features
@CRITERIONS.register()
class PhotoPenPerceptualLoss(nn.Layer):
def __init__(self,
crop_size,
lambda_vgg,
# pretrained='test/vgg19pretrain.pdparams',
pretrained='https://paddlegan.bj.bcebos.com/models/vgg19pretrain.pdparams',
):
super(PhotoPenPerceptualLoss, self).__init__()
self.model = VGG19()
weight_path = get_path_from_url(pretrained)
vgg_weight = paddle.load(weight_path)
self.model.set_state_dict(vgg_weight)
print('PerceptualVGG loaded pretrained weight.')
self.rates = [1.0 / 32, 1.0 / 16, 1.0 / 8, 1.0 / 4, 1.0]
self.crop_size = crop_size
self.lambda_vgg = lambda_vgg
def forward(self, img_r, img_f):
img_r = F.interpolate(img_r, (self.crop_size, self.crop_size))
img_f = F.interpolate(img_f, (self.crop_size, self.crop_size))
_, feat_r = self.model(img_r)
_, feat_f = self.model(img_f)
g_vggloss = paddle.to_tensor(0.)
for i in range(len(feat_r)):
g_vggloss += self.rates[i] * nn.L1Loss()(feat_r[i], feat_f[i])
g_vggloss *= self.lambda_vgg
return g_vggloss
......@@ -23,3 +23,4 @@ from .wav2lip_disc_qual import Wav2LipDiscQual
from .discriminator_starganv2 import StarGANv2Discriminator
from .discriminator_firstorder import FirstOrderDiscriminator
from .discriminator_lapstyle import LapStyleDiscriminator
from .discriminator_photopen import MultiscaleDiscriminator
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import re
import copy
import numpy as np
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn.utils import spectral_norm
from ppgan.utils.photopen import build_norm_layer, simam, Dict
from .builder import DISCRIMINATORS
class NLayersDiscriminator(nn.Layer):
def __init__(self, opt):
super(NLayersDiscriminator, self).__init__()
kw = 4
padw = int(np.ceil((kw - 1.0) / 2))
nf = opt.ndf
input_nc = self.compute_D_input_nc(opt)
layer_count = 0
layer = nn.Sequential(
nn.Conv2D(input_nc, nf, kw, 2, padw),
nn.GELU()
)
self.add_sublayer('block_'+str(layer_count), layer)
layer_count += 1
feat_size_prev = np.floor((opt.crop_size + padw * 2 - (kw - 2)) / 2).astype('int64')
InstanceNorm = build_norm_layer('instance')
for n in range(1, opt.n_layers_D):
nf_prev = nf
nf = min(nf * 2, 512)
stride = 1 if n == opt.n_layers_D - 1 else 2
feat_size = np.floor((feat_size_prev + padw * 2 - (kw - stride)) / stride).astype('int64')
feat_size_prev = feat_size
layer = nn.Sequential(
spectral_norm(nn.Conv2D(nf_prev, nf, kw, stride, padw,
weight_attr=None,
bias_attr=None)),
InstanceNorm(nf),
nn.GELU()
)
self.add_sublayer('block_'+str(layer_count), layer)
layer_count += 1
layer = nn.Conv2D(nf, 1, kw, 1, padw)
self.add_sublayer('block_'+str(layer_count), layer)
layer_count += 1
def forward(self, input):
output = []
for layer in self._sub_layers.values():
output.append(simam(layer(input)))
input = output[-1]
return output
def compute_D_input_nc(self, opt):
input_nc = opt.label_nc + opt.output_nc
if opt.contain_dontcare_label:
input_nc += 1
if not opt.no_instance:
input_nc += 1
return input_nc
@DISCRIMINATORS.register()
class MultiscaleDiscriminator(nn.Layer):
def __init__(self,
ndf,
num_D,
crop_size,
label_nc,
output_nc,
contain_dontcare_label,
no_instance,
n_layers_D,
):
super(MultiscaleDiscriminator, self).__init__()
opt = {
'ndf': ndf,
'num_D': num_D,
'crop_size': crop_size,
'label_nc': label_nc,
'output_nc': output_nc,
'contain_dontcare_label': contain_dontcare_label,
'no_instance': no_instance,
'n_layers_D': n_layers_D,
}
opt = Dict(opt)
for i in range(opt.num_D):
sequence = []
crop_size_bkp = opt.crop_size
feat_size = opt.crop_size
for j in range(i):
sequence += [nn.AvgPool2D(3, 2, 1)]
feat_size = np.floor((feat_size + 1 * 2 - (3 - 2)) / 2).astype('int64')
opt.crop_size = feat_size
sequence += [NLayersDiscriminator(opt)]
opt.crop_size = crop_size_bkp
sequence = nn.Sequential(*sequence)
self.add_sublayer('nld_'+str(i), sequence)
def forward(self, input):
output = []
for layer in self._sub_layers.values():
output.append(layer(input))
return output
......@@ -35,4 +35,5 @@ from .mpr import MPRNet
from .iconvsr import IconVSR
from .gpen import GPEN
from .pan import PAN
from .basicvsr_plus_plus import BasicVSRPlusPlus
from .generater_photopen import SPADEGenerator
from .basicvsr_plus_plus import BasicVSRPlusPlus
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import re
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn.utils import spectral_norm
from ppgan.utils.photopen import build_norm_layer, simam, Dict
from .builder import GENERATORS
class SPADE(nn.Layer):
def __init__(self, config_text, norm_nc, label_nc):
super(SPADE, self).__init__()
parsed = re.search(r'spade(\D+)(\d)x\d', config_text)
param_free_norm_type = str(parsed.group(1))
ks = int(parsed.group(2))
self.param_free_norm = build_norm_layer(param_free_norm_type)(norm_nc)
# The dimension of the intermediate embedding space. Yes, hardcoded.
nhidden = 128
pw = ks // 2
self.mlp_shared = nn.Sequential(*[
nn.Conv2D(label_nc, nhidden, ks, 1, pw),
nn.GELU(),
])
self.mlp_gamma = nn.Conv2D(nhidden, norm_nc, ks, 1, pw)
self.mlp_beta = nn.Conv2D(nhidden, norm_nc, ks, 1, pw)
def forward(self, x, segmap):
# Part 1. generate parameter-free normalized activations
normalized = self.param_free_norm(x)
# Part 2. produce scaling and bias conditioned on semantic map
segmap = F.interpolate(segmap, x.shape[2:])
actv = self.mlp_shared(segmap)
gamma = self.mlp_gamma(actv)
beta = self.mlp_beta(actv)
# apply scale and bias
out = normalized * (1 + gamma) + beta
return out
class SPADEResnetBlock(nn.Layer):
def __init__(self, fin, fout, opt):
super(SPADEResnetBlock, self).__init__()
# Attributes
self.learned_shortcut = (fin != fout)
fmiddle = min(fin, fout)
# define spade layers
spade_config_str = opt.norm_G.replace('spectral', '')
self.spade_0 = SPADE(spade_config_str, fin, opt.semantic_nc)
self.spade_1 = SPADE(spade_config_str, fmiddle, opt.semantic_nc)
if self.learned_shortcut:
self.spade_s = SPADE(spade_config_str, fin, opt.semantic_nc)
# define act_conv layers
self.act_conv_0 = nn.Sequential(*[
nn.GELU(),
spectral_norm(nn.Conv2D(fin, fmiddle, 3, 1, 1,
weight_attr=None,
bias_attr=None)),
])
self.act_conv_1 = nn.Sequential(*[
nn.GELU(),
spectral_norm(nn.Conv2D(fmiddle, fout, 3, 1, 1,
weight_attr=None,
bias_attr=None)),
])
if self.learned_shortcut:
self.act_conv_s = nn.Sequential(*[
spectral_norm(nn.Conv2D(fin, fout, 1, 1, 0, bias_attr=False,
weight_attr=None)),
])
def forward(self, x, seg):
x_s = self.shortcut(x, seg)
dx = self.act_conv_0(self.spade_0(x, seg))
dx = self.act_conv_1(self.spade_1(dx, seg))
return simam(dx + x_s)
def shortcut(self, x, seg):
if self.learned_shortcut:
x_s = self.act_conv_s(self.spade_s(x, seg))
else:
x_s = x
return x_s
@GENERATORS.register()
class SPADEGenerator(nn.Layer):
def __init__(self,
ngf,
num_upsampling_layers,
crop_size,
aspect_ratio,
norm_G,
semantic_nc,
use_vae,
nef,
):
super(SPADEGenerator, self).__init__()
opt = {
'ngf': ngf,
'num_upsampling_layers': num_upsampling_layers,
'crop_size': crop_size,
'aspect_ratio': aspect_ratio,
'norm_G': norm_G,
'semantic_nc': semantic_nc,
'use_vae': use_vae,
'nef': nef,
}
self.opt = Dict(opt)
nf = self.opt.ngf
self.sw, self.sh = self.compute_latent_vector_size(self.opt)
if self.opt.use_vae:
self.fc = nn.Linear(opt.z_dim, 16 * opt.nef * self.sw * self.sh)
self.head_0 = SPADEResnetBlock(16 * opt.nef, 16 * nf, self.opt)
else:
self.fc = nn.Conv2D(self.opt.semantic_nc, 16 * nf, 3, 1, 1)
self.head_0 = SPADEResnetBlock(16 * nf, 16 * nf, self.opt)
self.G_middle_0 = SPADEResnetBlock(16 * nf, 16 * nf, self.opt)
self.G_middle_1 = SPADEResnetBlock(16 * nf, 16 * nf, self.opt)
self.up_0 = SPADEResnetBlock(16 * nf, 8 * nf, self.opt)
self.up_1 = SPADEResnetBlock(8 * nf, 4 * nf, self.opt)
self.up_2 = SPADEResnetBlock(4 * nf, 2 * nf, self.opt)
self.up_3 = SPADEResnetBlock(2 * nf, 1 * nf, self.opt)
final_nc = nf
if self.opt.num_upsampling_layers == 'most':
self.up_4 = SPADEResnetBlock(1 * nf, nf // 2, self.opt)
final_nc = nf // 2
self.conv_img = nn.Conv2D(final_nc, 3, 3, 1, 1)
self.up = nn.Upsample(scale_factor=2)
def forward(self, input, z=None):
seg = input
if self.opt.use_vae:
x = self.fc(z)
x = paddle.reshape(x, [-1, 16 * self.opt.nef, self.sh, self.sw])
else:
x = F.interpolate(seg, (self.sh, self.sw))
x = self.fc(x)
x = self.head_0(x, seg)
x = self.up(x)
x = self.G_middle_0(x, seg)
if self.opt.num_upsampling_layers == 'more' or \
self.opt.num_upsampling_layers == 'most':
x = self.up(x)
x = self.G_middle_1(x, seg)
x = self.up(x)
x = self.up_0(x, seg)
x = self.up(x)
x = self.up_1(x, seg)
x = self.up(x)
x = self.up_2(x, seg)
x = self.up(x)
x = self.up_3(x, seg)
if self.opt.num_upsampling_layers == 'most':
x = self.up(x)
x = self.up_4(x, seg)
x = self.conv_img(F.gelu(x))
x = F.tanh(x)
return x
def compute_latent_vector_size(self, opt):
if opt.num_upsampling_layers == 'normal':
num_up_layers = 5
elif opt.num_upsampling_layers == 'more':
num_up_layers = 6
elif opt.num_upsampling_layers == 'most':
num_up_layers = 7
else:
raise ValueError('opt.num_upsampling_layers [%s] not recognized' %
opt.num_upsampling_layers)
sw = opt.crop_size // (2**num_up_layers)
sh = round(sw / opt.aspect_ratio)
return sw, sh
class VAE_Encoder(nn.Layer):
def __init__(self, opt):
super(VAE_Encoder, self).__init__()
kw = 3
pw = int(np.ceil((kw - 1.0) / 2))
ndf = opt.nef
InstanceNorm = build_norm_layer('instance')
model = [
spectral_norm(nn.Conv2D(3, ndf, kw, 2, pw,
weight_attr=None,
bias_attr=None)),
InstanceNorm(ndf),
nn.GELU(),
spectral_norm(nn.Conv2D(ndf * 1, ndf * 2, kw, 2, pw,
weight_attr=None,
bias_attr=None)),
InstanceNorm(ndf * 2),
nn.GELU(),
spectral_norm(nn.Conv2D(ndf * 2, ndf * 4, kw, 2, pw,
weight_attr=None,
bias_attr=None)),
InstanceNorm(ndf * 4),
nn.GELU(),
spectral_norm(nn.Conv2D(ndf * 4, ndf * 8, kw, 2, pw,
weight_attr=None,
bias_attr=None)),
InstanceNorm(ndf * 8),
nn.GELU(),
spectral_norm(nn.Conv2D(ndf * 8, ndf * 8, kw, 2, pw,
weight_attr=None,
bias_attr=None)),
InstanceNorm(ndf * 8),
]
if opt.crop_size >= 256:
model += [
nn.GELU(),
spectral_norm(nn.Conv2D(ndf * 8, ndf * 8, kw, 2, pw,
weight_attr=None,
bias_attr=None)),
InstanceNorm(ndf * 8),
]
model += [nn.GELU(),]
self.flatten = nn.Flatten(1, -1)
self.so = 4
self.fc_mu = nn.Linear(ndf * 8 * self.so * self.so, opt.z_dim)
self.fc_var = nn.Linear(ndf * 8 * self.so * self.so, opt.z_dim)
self.model = nn.Sequential(*model)
def forward(self, x):
x = self.model(x)
x = self.flatten(x)
return self.fc_mu(x), self.fc_var(x)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn.functional as F
from .base_model import BaseModel
from .builder import MODELS
from .generators.builder import build_generator
from .criterions import build_criterion
from .discriminators.builder import build_discriminator
from ..modules.init import init_weights
from ..solver import build_optimizer
from ppgan.utils.photopen import data_onehot_pro, Dict
@MODELS.register()
class PhotoPenModel(BaseModel):
def __init__(self,
generator,
discriminator,
criterion,
label_nc,
contain_dontcare_label,
batchSize,
crop_size,
lambda_feat,
):
super(PhotoPenModel, self).__init__()
opt = {
'label_nc': label_nc,
'contain_dontcare_label': contain_dontcare_label,
'batchSize': batchSize,
'crop_size': crop_size,
'lambda_feat': lambda_feat,
# 'semantic_nc': semantic_nc,
# 'use_vae': use_vae,
# 'nef': nef,
}
self.opt = Dict(opt)
# define nets
self.nets['net_gen'] = build_generator(generator)
# init_weights(self.nets['net_gen'])
self.nets['net_des'] = build_discriminator(discriminator)
# init_weights(self.nets['net_des'])
self.net_vgg = build_criterion(criterion)
def setup_input(self, input):
if 'img' in input.keys():
self.img = paddle.to_tensor(input['img'])
self.ins = paddle.to_tensor(input['ins'])
self.img_paths = input['img_path']
def forward(self):
self.one_hot = data_onehot_pro(self.ins, self.opt)
self.img_f = self.nets['net_gen'](self.one_hot)
self.visual_items['img_f'] = self.img_f
def backward_G(self):
fake_data = paddle.concat((self.one_hot, self.img_f), 1)
real_data = paddle.concat((self.one_hot, self.img), 1)
fake_and_real_data = paddle.concat((fake_data, real_data), 0)
pred = self.nets['net_des'](fake_and_real_data)
"""content loss"""
g_ganloss = 0.
for i in range(len(pred)):
pred_i = pred[i][-1][:self.opt.batchSize]
new_loss = -pred_i.mean() # hinge loss
g_ganloss += new_loss
g_ganloss /= len(pred)
g_featloss = 0.
for i in range(len(pred)):
for j in range(len(pred[i]) - 1): # 除去最后一层的中间层featuremap
unweighted_loss = (pred[i][j][:self.opt.batchSize] - pred[i][j][self.opt.batchSize:]).abs().mean() # L1 loss
g_featloss += unweighted_loss * self.opt.lambda_feat / len(pred)
g_vggloss = self.net_vgg(self.img, self.img_f)
self.g_loss = g_ganloss + g_featloss + g_vggloss
self.g_loss.backward()
self.losses['g_ganloss'] = g_ganloss
self.losses['g_featloss'] = g_featloss
self.losses['g_vggloss'] = g_vggloss
def backward_D(self):
fake_data = paddle.concat((self.one_hot, self.img_f), 1)
real_data = paddle.concat((self.one_hot, self.img), 1)
fake_and_real_data = paddle.concat((fake_data, real_data), 0)
pred = self.nets['net_des'](fake_and_real_data)
"""content loss"""
df_ganloss = 0.
for i in range(len(pred)):
pred_i = pred[i][-1][:self.opt.batchSize]
new_loss = -paddle.minimum(-pred_i - 1, paddle.zeros_like(pred_i)).mean() # hingle loss
df_ganloss += new_loss
df_ganloss /= len(pred)
dr_ganloss = 0.
for i in range(len(pred)):
pred_i = pred[i][-1][self.opt.batchSize:]
new_loss = -paddle.minimum(pred_i - 1, paddle.zeros_like(pred_i)).mean() # hingle loss
dr_ganloss += new_loss
dr_ganloss /= len(pred)
self.d_loss = df_ganloss + dr_ganloss
self.d_loss.backward()
self.losses['df_ganloss'] = df_ganloss
self.losses['dr_ganloss'] = dr_ganloss
def train_iter(self, optimizers=None):
self.forward()
self.optimizers['optimG'].clear_grad()
self.backward_G()
self.optimizers['optimG'].step()
self.forward()
self.optimizers['optimD'].clear_grad()
self.backward_D()
self.optimizers['optimD'].step()
def test_iter(self, metrics=None):
self.eval()
with paddle.no_grad():
self.forward()
self.train()
def setup_optimizers(self, lr, cfg):
for opt_name, opt_cfg in cfg.items():
if opt_name == 'lr':
learning_rate = opt_cfg
continue
cfg_ = opt_cfg.copy()
net_names = cfg_.pop('net_names')
parameters = []
for net_name in net_names:
parameters += self.nets[net_name].parameters()
if opt_name == 'optimG':
lr = learning_rate * 4
else:
lr = learning_rate
self.optimizers[opt_name] = build_optimizer(
cfg_, lr, parameters)
return self.optimizers
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
from paddle.io import Dataset, DataLoader
import paddle
import paddle.nn as nn
import math
import functools
from paddle.nn import Conv1DTranspose, Conv2DTranspose, Conv3DTranspose, Linear
# 处理图片数据:裁切、水平翻转、调整图片数据形状、归一化数据
def data_transform(img, resize_w, resize_h, load_size=286, pos=[0, 0, 256, 256], flip=True, is_image=True):
if is_image:
resized = img.resize((resize_w, resize_h), Image.BICUBIC)
else:
resized = img.resize((resize_w, resize_h), Image.NEAREST)
croped = resized.crop((pos[0], pos[1], pos[2], pos[3]))
fliped = ImageOps.mirror(croped) if flip else croped
fliped = np.array(fliped) # transform to numpy array
expanded = np.expand_dims(fliped, 2) if len(fliped.shape) < 3 else fliped
transposed = np.transpose(expanded, (2, 0, 1)).astype('float32')
if is_image:
normalized = transposed / 255. * 2. - 1.
else:
normalized = transposed
return normalized
# 定义CoCo数据集对象
class COCODateset(Dataset):
def __init__(self, opt):
super(COCODateset, self).__init__()
inst_dir = opt.dataroot+'train_inst/'
_, _, inst_list = next(os.walk(inst_dir))
self.inst_list = np.sort(inst_list)
self.opt = opt
def __getitem__(self, idx):
ins = Image.open(self.opt.dataroot+'train_inst/'+self.inst_list[idx])
img = Image.open(self.opt.dataroot+'train_img/'+self.inst_list[idx].replace(".png", ".jpg"))
img = img.convert('RGB')
w, h = img.size
resize_w, resize_h = 0, 0
if w < h:
resize_w, resize_h = self.opt.load_size, int(h * self.opt.load_size / w)
else:
resize_w, resize_h = int(w * self.opt.load_size / h), self.opt.load_size
left = random.randint(0, resize_w - self.opt.crop_size)
top = random.randint(0, resize_h - self.opt.crop_size)
flip = False
img = data_transform(img, resize_w, resize_h, load_size=opt.load_size,
pos=[left, top, left + self.opt.crop_size, top + self.opt.crop_size], flip=flip, is_image=True)
ins = data_transform(ins, resize_w, resize_h, load_size=opt.load_size,
pos=[left, top, left + self.opt.crop_size, top + self.opt.crop_size], flip=flip, is_image=False)
return img, ins, self.inst_list[idx]
def __len__(self):
return len(self.inst_list)
def data_onehot_pro(instance, opt):
shape = instance.shape
nc = opt.label_nc + 1 if opt.contain_dontcare_label \
else opt.label_nc
shape[1] = nc
semantics = paddle.nn.functional.one_hot(instance.astype('int64'). \
reshape([opt.batchSize, opt.crop_size, opt.crop_size]), nc). \
transpose((0, 3, 1, 2))
# edge
edge = np.zeros(instance.shape, 'int64')
t = instance.numpy()
edge[:, :, :, 1:] = edge[:, :, :, 1:] | (t[:, :, :, 1:] != t[:, :, :, :-1])
edge[:, :, :, :-1] = edge[:, :, :, :-1] | (t[:, :, :, 1:] != t[:, :, :, :-1])
edge[:, :, 1:, :] = edge[:, :, 1:, :] | (t[:, :, 1:, :] != t[:, :, :-1, :])
edge[:, :, :-1, :] = edge[:, :, :-1, :] | (t[:, :, 1:, :] != t[:, :, :-1, :])
edge = paddle.to_tensor(edge).astype('float32')
semantics = paddle.concat([semantics, edge], 1)
return semantics
# 设置除 spade 以外的归一化层
def build_norm_layer(norm_type='instance'):
"""Return a normalization layer
Args:
norm_type (str) -- the name of the normalization layer: batch | instance | none
For BatchNorm, we do not use learnable affine parameters and track running statistics (mean/stddev).
For InstanceNorm, we do not use learnable affine parameters. We do not track running statistics.
"""
if norm_type == 'batch':
norm_layer = functools.partial(
nn.BatchNorm2D,
weight_attr=False,
bias_attr=False)
elif norm_type == 'syncbatch':
norm_layer = functools.partial(
nn.SyncBatchNorm,
weight_attr=False,
bias_attr=False)
elif norm_type == 'instance':
norm_layer = functools.partial(
nn.InstanceNorm2D,)
elif norm_type == 'spectral':
norm_layer = functools.partial(Spectralnorm)
elif norm_type == 'none':
def norm_layer(x):
return Identity()
else:
raise NotImplementedError('normalization layer [%s] is not found' %
norm_type)
return norm_layer
def simam(x, e_lambda=1e-4):
b, c, h, w = x.shape
n = w * h - 1
x_minus_mu_square = (x - x.mean(axis=[2, 3], keepdim=True)) ** 2
y = x_minus_mu_square / (4 * (x_minus_mu_square.sum(axis=[2, 3], keepdim=True) / n + e_lambda)) + 0.5
return x * nn.functional.sigmoid(y)
class Dict(dict):
__setattr__ = dict.__setitem__
__getattr__ = dict.__getitem__
## 本路径下所有文件均为 photopen 模型的训练过程、预测过程的测试使用。
# 使用预训练模型预测
python applications/tools/photopen.py --semantic_label_path test/sem.png --weight_path test/generator.pdparams --output_path output_dir/pic.jpg --config-file configs/photopen.yaml
# 使用checkpoint预测
python applications/tools/photopen.py --semantic_label_path test/sem.png --weight_path output_dir/photopen-2021-10-05-14-38/iter_1_weight.pdparams --output_path output_dir/pic.jpg --config-file configs/photopen.yaml
# 训练
python -u tools/main.py --config-file configs/photopen.yaml
# 继续训练
python -u tools/main.py --config-file configs/photopen.yaml --resume output_dir/photopen-2021-09-30-15-59/iter_3_checkpoint.pdparams
# 训练,覆盖参数
python -u tools/main.py --config-file configs/photopen.yaml --o model.generator.ngf=1 model.discriminator.ndf=1
# 测试
python -u tools/main.py --config-file configs/photopen.yaml --evaluate-only --load output_dir/photopen-2021-11-06-20-59/iter_1_checkpoint.pdparams
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册