未验证 提交 ee5fdb92 编写于 作者: L lvmengsi 提交者: GitHub

add AttGAN and STGAN (#2484)

add AttGAN and STGAN
上级 7feb739d
#图像生成模型库
生成对抗网络(Generative Adversarial Network\[[1](#参考文献)\], 简称GAN) 是一种非监督学习的方式,通过让两个神经网络相互博弈的方法进行学习,该方法由lan Goodfellow等人在2014年提出。生成对抗网络由一个生成网络和一个判别网络组成,生成网络从潜在的空间(latent space)中随机采样作为输入,其输出结果需要尽量模仿训练集中的真实样本。判别网络的输入为真实样本或生成网络的输出,其目的是将生成网络的输出从真实样本中尽可能的分辨出来。而生成网络则尽可能的欺骗判别网络,两个网络相互对抗,不断调整参数。
生成对抗网络常用于生成以假乱真的图片。此外,该方法还被用于生成影片,三维物体模型等。\[[2](#参考文献)\]
---
##内容
-[简介](#简介)
-[快速开始](#快速开始)
-[参考文献](#参考文献)
##简介
本图像生成模型库包含CGAN\[[3](#参考文献)\], DCGAN\[[4](#参考文献)\], Pix2Pix\[[5](#参考文献)\], CycleGAN\[[6](#参考文献)\], StarGAN\[[7](#参考文献)\], AttGAN\[[8](#参考文献)\], STGAN\[[9](#参考文献)\]
图像生成模型库库的目录结构如下:
```
├── download.py 下载数据
├── data_reader.py 数据预处理
├── train.py 模型的训练入口
├── infer.py 模型的预测入口
├── trainer 不同模型的训练脚本
│ ├── CGAN.py Conditional GAN的训练脚本
│ ├── DCGAN.py Deep Convolutional GAN的训练脚本
│ ├── Pix2pix.py Pix2Pix GAN的训练脚本
│ ├── CycleGAN.py CycleGAN的训练脚本
│ ├── StarGAN.py StarGAN的训练脚本
│ ├── AttGAN.py AttGAN的训练脚本
│ ├── STGAN.py STGAN的训练脚本
├── network 不同模型的网络结构
│ ├── base_network.py GAN模型需要的公共基础网络结构
│ ├── CGAN_network.py Conditional GAN的网络结构
│ ├── DCGAN_network.py Deep Convolutional GAN的网络结构
│ ├── Pix2pix_network.py Pix2Pix GAN的网络结构
│ ├── CycleGAN_network.py CycleGAN的网络结构
│ ├── StarGAN_network.py StarGAN的网络结构
│ ├── AttGAN_network.py AttGAN的网络结构
│ ├── STGAN_network.py STGAN的网络结构
├── util 网络的基础配置和公共模块
│ ├── config.py 网络公用的基础配置
│ ├── utility.py 保存模型等网络公用的模块
├── scripts 多个模型的训练启动和测试启动示例
│ ├── run_....py 训练启动示例
│ ├── infer_....py 测试启动示例
│ ├── make_pair_data.py pix2pix GAN的数据list的生成脚本
```
##快速开始
**安装[PaddlePaddle](https://github.com/PaddlePaddle/Paddle):**
在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.5或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://paddlepaddle.org/documentation/docs/zh/1.4/beginners_guide/install/index_cn.html)中的说明来更新PaddlePaddle。
###数据准备
模型库中提供了download.py数据下载脚本,该脚本支持下载MNIST数据集,CycleGAN和Pix2Pix所需要的数据集。使用以下命令下载数据:
python download.py --dataset=mnist
通过指定dataset参数来下载相应的数据集。
StarGAN, AttGAN和STGAN所需要的[Celeba](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)数据集可以自行下载。
**自定义数据集:**
用户可以使用自定义的数据集,只要设置成所对应的生成模型所需要的数据格式即可。
ps: pix2pix模型数据集准备中的list文件需要通过scripts文件夹里的make_pair_data.py来生成,可以使用以下命令来生成:
python scripts/make_pair_data.py \
--direction=A2B
用户可以通过指定direction参数生成list文件,从而确保图像风格转变的方向。
###模型训练
**开始训练:** 数据准备完毕后,可以通过一下方式启动训练:
python train.py \
--model_net=$(name_of_model) \
--dataset=$(name_of_dataset) \
--data_dir=$(path_to_data) \
--train_list=$(path_to_train_data_list) \
--test_list=$(path_to_test_data_list) \
--batch_size=$(batch_size)
用户可以通过设置model_net参数来选择想要训练的模型,通过设置dataset参数来选择训练所需要的数据集。
###模型测试
模型测试是利用训练完成的生成模型进行图像生成。infer.py是主要的执行程序,调用示例如下:
python infer.py \
--model_net=$(name_of_model) \
--init_model=$(path_to_model) \
--dataset_dir=$(path_to_data)
##参考文献
[1] [Goodfellow, Ian J.; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil; Courville, Aaron; Bengio, Yoshua. Generative Adversarial Networks. 2014. arXiv:1406.2661 [stat.ML].](https://arxiv.org/abs/1406.2661)
[2] [https://zh.wikipedia.org/wiki/生成对抗网络](https://zh.wikipedia.org/wiki/生成对抗网络)
[3] [Conditional Generative Adversarial Nets](https://arxiv.org/abs/1411.1784)
[4] [Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/abs/1511.06434)
[5] [Image-to-Image Translation with Conditional Adversarial Networks](https://arxiv.org/abs/1611.07004)
[6] [Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593)
[7] [StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation](https://arxiv.org/abs/1711.09020)
[8] [AttGAN: Facial Attribute Editing by Only Changing What You Want](https://arxiv.org/abs/1711.10678)
[9] [STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing](https://arxiv.org/abs/1904.09709)
......@@ -237,6 +237,113 @@ class pair_reader_creator(reader_creator):
return reader
class celeba_reader_creator(reader_creator):
''' read and preprocess dataset'''
def __init__(self,
image_dir,
list_filename,
args,
batch_size=1,
drop_last=False):
self.image_dir = image_dir
self.list_filename = list_filename
self.batch_size = batch_size
self.drop_last = drop_last
print(self.image_dir, self.list_filename)
lines = open(self.list_filename).readlines()
all_attr_names = lines[1].split()
attr2idx = {}
for i, attr_name in enumerate(all_attr_names):
attr2idx[attr_name] = i
lines = lines[2:]
self.images = []
attr_names = args.selected_attrs.split(',')
for line in lines:
arr = line.strip().split()
name = './images/' + arr[0]
label = []
for attr_name in attr_names:
idx = attr2idx[attr_name]
label.append(arr[idx + 1] == "1")
self.images.append((name, label))
def len(self):
if self.drop_last or len(self.images) % self.batch_size == 0:
return len(self.images) // self.batch_size
else:
return len(self.images) // self.batch_size + 1
def get_train_reader(self, args, shuffle=False, return_name=False):
def reader():
batch_out_1 = []
batch_out_2 = []
while True:
if shuffle:
np.random.shuffle(self.images)
for file, label in self.images:
img = Image.open(os.path.join(self.image_dir,
file)).convert('RGB')
label = np.array(label).astype("float32")
label = (label + 1) // 2
img = CentorCrop(img, args.crop_size, args.crop_size)
img = img.resize((args.load_size, args.load_size),
Image.BILINEAR)
img = (np.array(img).astype('float32') / 255.0 - 0.5) / 0.5
img = img.transpose([2, 0, 1])
batch_out_1.append(img)
batch_out_2.append(label)
if len(batch_out_1) == self.batch_size:
yield batch_out_1, batch_out_2
batch_out_1 = []
batch_out_2 = []
if self.drop_last == False and len(batch_out_1) != 0:
yield batch_out_1, batch_out_2
return reader
def get_test_reader(self, args, shuffle=False, return_name=False):
def reader():
batch_out_1 = []
batch_out_2 = []
batch_out_3 = []
for file, label in self.images:
img = Image.open(os.path.join(self.image_dir, file)).convert(
'RGB')
label = np.array(label).astype("float32")
img = CentorCrop(img, 170, 170)
img = img.resize((args.image_size, args.image_size),
Image.BILINEAR)
img = (np.array(img).astype('float32') / 255.0 - 0.5) / 0.5
img = img.transpose([2, 0, 1])
if return_name:
batch_out_1.append(img)
batch_out_2.append(label)
batch_out_3.append(os.path.basename(file))
else:
batch_out_1.append(img)
batch_out_2.append(label)
if len(batch_out_1) == self.batch_size:
if return_name:
yield batch_out_1, batch_out_2, batch_out_3
batch_out_1 = []
batch_out_2 = []
batch_out_3 = []
else:
yield batch_out_1, batch_out_2
batch_out_1 = []
batch_out_2 = []
if len(batch_out_1) != 0:
if return_name:
yield batch_out_1, batch_out_2, batch_out_3
else:
yield batch_out_1, batch_out_2
return reader
def mnist_reader_creator(image_filename, label_filename, buffer_size):
def reader():
with gzip.GzipFile(image_filename, 'rb') as image_file:
......@@ -346,6 +453,35 @@ class data_reader(object):
return a_reader, b_reader, a_reader_test, b_reader_test, batch_num
elif self.cfg.model_net == 'StarGAN' or self.cfg.model_net == 'STGAN' or self.cfg.model_net == 'AttGAN':
dataset_dir = os.path.join(self.cfg.data_dir, self.cfg.dataset)
train_list = os.path.join(dataset_dir, 'train.txt')
if self.cfg.train_list is not None:
train_list = self.cfg.train_list
train_reader = celeba_reader_creator(
image_dir=dataset_dir,
list_filename=train_list,
batch_size=self.cfg.batch_size,
args=self.cfg,
drop_last=self.cfg.drop_last)
reader_test = None
if self.cfg.run_test:
test_list = os.path.join(dataset_dir, "test.txt")
if self.cfg.test_list is not None:
test_list = self.cfg.test_list
test_reader = celeba_reader_creator(
image_dir=dataset_dir,
list_filename=test_list,
batch_size=self.cfg.n_samples,
drop_last=self.cfg.drop_last,
args=self.cfg)
reader_test = test_reader.get_test_reader(
self.cfg, shuffle=False, return_name=True)
batch_num = train_reader.len()
reader = train_reader.get_train_reader(
self.cfg, shuffle=self.shuffle)
return reader, reader_test, batch_num
else:
dataset_dir = os.path.join(self.cfg.data_dir, self.cfg.dataset)
train_list = os.path.join(dataset_dir, 'train.txt')
......
......@@ -41,12 +41,29 @@ add_arg('use_gpu', bool, True, "Whether to use GPU to tr
add_arg('dropout', bool, False, "Whether to use dropout")
add_arg('data_shape', int, 256, "The shape of load image")
add_arg('g_base_dims', int, 64, "Base channels in CycleGAN generator")
add_arg('c_dim', int, 13, "the size of attrs")
add_arg('use_gru', bool, False, "Whether to use GRU")
add_arg('crop_size', int, 178, "crop size")
add_arg('image_size', int, 128, "image size")
add_arg('selected_attrs', str,
"Bald,Bangs,Black_Hair,Blond_Hair,Brown_Hair,Bushy_Eyebrows,Eyeglasses,Male,Mouth_Slightly_Open,Mustache,No_Beard,Pale_Skin,Young",
"the attributes we selected to change")
add_arg('batch_size', int, 16, "batch size when test")
add_arg('test_list', str, "./data/celeba/test_list_attr_celeba.txt", "the test list file")
add_arg('dataset_dir', str, "./data/celeba/", "the dataset directory")
add_arg('n_layers', int, 5, "default layers in generotor")
add_arg('gru_n_layers', int, 4, "default layers of GRU in generotor")
# yapf: enable
def infer(args):
data_shape = [-1, 3, args.data_shape, args.data_shape]
input = fluid.layers.data(name='input', shape=data_shape, dtype='float32')
label_org_ = fluid.layers.data(
name='label_org_', shape=[args.c_dim], dtype='float32')
label_trg_ = fluid.layers.data(
name='label_trg_', shape=[args.c_dim], dtype='float32')
model_name = 'net_G'
if args.model_net == 'cyclegan':
from network.CycleGAN_network import CycleGAN_model
......@@ -62,10 +79,19 @@ def infer(args):
model = Pix2pix_model()
fake = model.network_G(input, "generator", cfg=args)
elif args.model_net == 'cgan':
pass
elif args.model_net == 'STGAN':
from network.STGAN_network import STGAN_model
model = STGAN_model()
fake, _ = model.network_G(
input, label_org_, label_trg_, cfg=args, name='net_G')
elif args.model_net == 'AttGAN':
from network.AttGAN_network import AttGAN_model
model = AttGAN_model()
fake, _ = model.network_G(
input, label_org_, label_trg_, cfg=args, name='net_G')
else:
pass
raise NotImplementedError("model_net {} is not support".format(
args.model_net))
# prepare environment
place = fluid.CPUPlace()
......@@ -82,24 +108,73 @@ def infer(args):
if not os.path.exists(args.output):
os.makedirs(args.output)
for file in glob.glob(args.input):
print("read {}".format(file))
image_name = os.path.basename(file)
image = Image.open(file).convert('RGB')
image = image.resize((256, 256), Image.BICUBIC)
image = np.array(image).transpose([2, 0, 1]).astype('float32')
image = image / 255.0
image = (image - 0.5) / 0.5
data = image[np.newaxis, :]
tensor = fluid.LoDTensor()
tensor.set(data, place)
if args.model_net == 'AttGAN' or args.model_net == 'STGAN':
test_reader = celeba_reader_creator(
image_dir=args.dataset_dir,
list_filename=args.test_list,
batch_size=args.batch_size,
drop_last=False,
args=args)
reader_test = test_reader.get_test_reader(
args, shuffle=False, return_name=True)
for data in zip(reader_test()):
real_img, label_org, name = data[0]
print("read {}".format(name))
label_trg = copy.deepcopy(label_org)
tensor_img = fluid.LoDTensor()
tensor_label_org = fluid.LoDTensor()
tensor_label_trg = fluid.LoDTensor()
tensor_label_org_ = fluid.LoDTensor()
tensor_label_trg_ = fluid.LoDTensor()
tensor_img.set(real_img, place)
tensor_label_org.set(label_org, place)
real_img_temp = np.squeeze(real_img).transpose([0, 2, 3, 1])
images = [real_img_temp]
for i in range(args.c_dim):
label_trg_tmp = copy.deepcopy(label_trg)
for j in range(args.batch_size):
label_trg_tmp[j][i] = 1.0 - label_trg_tmp[j][i]
label_trg_ = map(lambda x: ((x * 2) - 1) * 0.5, label_trg_tmp)
for j in range(args.batch_size):
label_trg_[j][i] = label_trg_[j][i] * 2.0
tensor_label_org_.set(label_org, place)
tensor_label_trg.set(label_trg, place)
tensor_label_trg_.set(label_trg_, place)
out = exe.run(feed={
"input": tensor_img,
"label_org_": tensor_label_org_,
"label_trg_": tensor_label_trg_
},
fetch_list=fake.name)
fake_temp = np.squeeze(out[0]).transpose([0, 2, 3, 1])
images.append(fake_temp)
images_concat = np.concatenate(images, 1)
images_concat = np.concatenate(images_concat, 1)
imsave(args.output + "/fake_img_" + name[0], (
(images_concat + 1) * 127.5).astype(np.uint8))
elif args.model_net == 'Pix2pix' or args.model_net == 'cyclegan':
for file in glob.glob(args.input):
print("read {}".format(file))
image_name = os.path.basename(file)
image = Image.open(file).convert('RGB')
image = image.resize((256, 256), Image.BICUBIC)
image = np.array(image).transpose([2, 0, 1]).astype('float32')
image = image / 255.0
image = (image - 0.5) / 0.5
data = image[np.newaxis, :]
tensor = fluid.LoDTensor()
tensor.set(data, place)
fake_temp = exe.run(fetch_list=[fake.name], feed={"input": tensor})
fake_temp = np.squeeze(fake_temp[0]).transpose([1, 2, 0])
input_temp = np.squeeze(data).transpose([1, 2, 0])
fake_temp = exe.run(fetch_list=[fake.name], feed={"input": tensor})
fake_temp = np.squeeze(fake_temp[0]).transpose([1, 2, 0])
input_temp = np.squeeze(data).transpose([1, 2, 0])
imsave(args.output + "/fake_" + image_name, (
(fake_temp + 1) * 127.5).astype(np.uint8))
imsave(args.output + "/fake_" + image_name, (
(fake_temp + 1) * 127.5).astype(np.uint8))
else:
raise NotImplementedError("model_net {} is not support".format(
args.model_net))
if __name__ == "__main__":
......
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from .base_network import conv2d, deconv2d, norm_layer, linear
import paddle.fluid as fluid
import numpy as np
MAX_DIM = 64 * 16
class AttGAN_model(object):
def __init__(self):
pass
def network_G(self, input, label_org, label_trg, cfg, name="generator"):
_a = label_org
_b = label_trg
z = self.Genc(
input,
name=name + '_Genc',
dim=cfg.g_base_dims,
n_layers=cfg.n_layers)
fake_image = self.Gdec(z, _b, name=name + '_Gdec', dim=cfg.g_base_dims)
rec_image = self.Gdec(z, _a, name=name + '_Gdec', dim=cfg.g_base_dims)
return fake_image, rec_image
def network_D(self, input, cfg, name="discriminator"):
return self.D(input,
n_atts=cfg.c_dim,
name=name,
dim=cfg.d_base_dims,
fc_dim=cfg.d_fc_dim,
n_layers=cfg.n_layers)
def concat(self, z, a):
"""Concatenate attribute vector on feature map axis."""
ones = fluid.layers.fill_constant_batch_size_like(
z, [-1, a.shape[1], z.shape[2], z.shape[3]], "float32", 1.0)
return fluid.layers.concat([z, ones * a], axis=1)
def Genc(self, input, dim=64, n_layers=5, name='G_enc_'):
z = input
zs = []
for i in range(n_layers):
d = min(dim * 2**i, MAX_DIM)
#SAME padding
z = conv2d(
z,
d,
4,
2,
padding_type='SAME',
norm='batch_norm',
activation_fn='leaky_relu',
name=name + str(i),
use_bias=False,
relufactor=0.01,
initial='kaiming')
zs.append(z)
return zs
def Gdec(self,
zs,
a,
dim=64,
n_layers=5,
shortcut_layers=1,
inject_layers=1,
name='G_dec_'):
shortcut_layers = min(shortcut_layers, n_layers - 1)
inject_layers = min(inject_layers, n_layers - 1)
z = self.concat(zs[-1], a)
for i in range(n_layers):
if i < n_layers - 1:
d = min(dim * 2**(n_layers - 1 - i), MAX_DIM)
z = deconv2d(
z,
d,
4,
2,
padding_type='SAME',
name=name + str(i),
norm='batch_norm',
activation_fn='relu',
use_bias=False,
initial='kaiming')
if shortcut_layers > i:
z = fluid.layers.concat([z, zs[n_layers - 2 - i]], axis=1)
if inject_layers > i:
z = self.concat(z, a)
else:
x = z = deconv2d(
z,
3,
4,
2,
padding_type='SAME',
name=name + str(i),
activation_fn='tanh',
use_bias=True,
initial='kaiming')
return x
def D(self,
x,
n_atts=13,
dim=64,
fc_dim=1024,
n_layers=5,
norm='instance_norm',
name='D_'):
y = x
for i in range(n_layers):
d = min(dim * 2**i, MAX_DIM)
y = conv2d(
y,
d,
4,
2,
norm=None,
padding=1,
activation_fn='leaky_relu',
name=name + str(i),
use_bias=True,
relufactor=0.01,
initial='kaiming')
logit_gan = linear(
y,
fc_dim,
activation_fn='relu',
name=name + 'fc_adv_1',
initial='kaiming')
logit_gan = linear(
logit_gan, 1, name=name + 'fc_adv_2', initial='kaiming')
logit_att = linear(
y,
fc_dim,
activation_fn='relu',
name=name + 'fc_cls_1',
initial='kaiming')
logit_att = linear(logit_att, n_atts, name=name + 'fc_cls_2')
return logit_gan, logit_att
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from .base_network import conv2d, deconv2d, norm_layer, linear
import paddle.fluid as fluid
import numpy as np
MAX_DIM = 64 * 16
class STGAN_model(object):
def __init__(self):
pass
def network_G(self, input, label_org, label_trg, cfg, name="generator"):
_a = label_org
_b = label_trg
z = self.Genc(
input,
name=name + '_Genc',
n_layers=cfg.n_layers,
dim=cfg.g_base_dims)
zb = self.GRU(z,
fluid.layers.elementwise_sub(_b, _a),
name=name + '_GRU',
dim=cfg.g_base_dims,
n_layers=cfg.gru_n_layers) if cfg.use_gru else z
fake_image = self.Gdec(
zb,
fluid.layers.elementwise_sub(_b, _a),
name=name + '_Gdec',
dim=cfg.g_base_dims,
n_layers=cfg.n_layers)
za = self.GRU(z,
fluid.layers.elementwise_sub(_a, _a),
name=name + '_GRU',
dim=cfg.g_base_dims,
n_layers=cfg.gru_n_layers) if cfg.use_gru else z
rec_image = self.Gdec(
za,
fluid.layers.elementwise_sub(_a, _a),
name=name + '_Gdec',
dim=cfg.g_base_dims,
n_layers=cfg.n_layers)
return fake_image, rec_image
def network_D(self, input, cfg, name="discriminator"):
return self.D(input,
n_atts=cfg.c_dim,
dim=cfg.d_base_dims,
fc_dim=cfg.d_fc_dim,
n_layers=cfg.n_layers,
name=name)
def concat(self, z, a):
"""Concatenate attribute vector on feature map axis."""
ones = fluid.layers.fill_constant_batch_size_like(
z, [-1, a.shape[1], z.shape[2], z.shape[3]], "float32", 1.0)
return fluid.layers.concat([z, ones * a], axis=1)
def Genc(self, input, dim=64, n_layers=5, name='G_enc_'):
z = input
zs = []
for i in range(n_layers):
d = min(dim * 2**i, MAX_DIM)
z = conv2d(
z,
d,
4,
2,
padding_type='SAME',
norm="batch_norm",
activation_fn='leaky_relu',
name=name + str(i),
use_bias=False,
relufactor=0.01,
initial='kaiming')
zs.append(z)
return zs
def GRU(self,
zs,
a,
dim=64,
n_layers=4,
inject_layers=4,
kernel_size=3,
norm=None,
pass_state='lstate',
name='G_gru_'):
zs_ = [zs[-1]]
state = self.concat(zs[-1], a)
for i in range(n_layers):
d = min(dim * 2**(n_layers - 1 - i), MAX_DIM)
output = self.gru_cell(
zs[n_layers - 1 - i],
state,
d,
kernel_size=kernel_size,
norm=norm,
pass_state=pass_state,
name=name + str(i))
zs_.insert(0, output[0] + zs[n_layers - 1 - i])
if inject_layers > i:
state = self.concat(output[1], a)
else:
state = output[1]
return zs_
def Gdec(self,
zs,
a,
dim=64,
n_layers=5,
shortcut_layers=4,
inject_layers=4,
name='G_dec_'):
shortcut_layers = min(shortcut_layers, n_layers - 1)
inject_layers = min(inject_layers, n_layers - 1)
z = self.concat(zs[-1], a)
for i in range(n_layers):
if i < n_layers - 1:
d = min(dim * 2**(n_layers - 1 - i), MAX_DIM)
z = deconv2d(
z,
d,
4,
2,
padding_type='SAME',
name=name + str(i),
norm='batch_norm',
activation_fn='relu',
use_bias=False,
initial='kaiming')
if shortcut_layers > i:
z = fluid.layers.concat([z, zs[n_layers - 2 - i]], axis=1)
if inject_layers > i:
z = self.concat(z, a)
else:
x = z = deconv2d(
z,
3,
4,
2,
padding_type='SAME',
name=name + str(i),
activation_fn='tanh',
use_bias=True,
initial='kaiming')
return x
def D(self,
x,
n_atts=13,
dim=64,
fc_dim=1024,
n_layers=5,
norm='instance_norm',
name='D_'):
y = x
for i in range(n_layers):
d = min(dim * 2**i, MAX_DIM)
y = conv2d(
y,
d,
4,
2,
norm=None,
padding=1,
activation_fn='leaky_relu',
name=name + str(i),
use_bias=True,
relufactor=0.01,
initial='kaiming')
logit_gan = linear(
y,
fc_dim,
activation_fn='relu',
name=name + 'fc_adv_1',
initial='kaiming')
logit_gan = linear(
logit_gan, 1, name=name + 'fc_adv_2', initial='kaiming')
logit_att = linear(
y,
fc_dim,
activation_fn='relu',
name=name + 'fc_cls_1',
initial='kaiming')
logit_att = linear(
logit_att, n_atts, name=name + 'fc_cls_2', initial='kaiming')
return logit_gan, logit_att
def gru_cell(self,
in_data,
state,
out_channel,
kernel_size=3,
norm=None,
pass_state='lstate',
name='gru'):
state_ = deconv2d(
state,
out_channel,
4,
2,
padding_type='SAME',
name=name + '_deconv2d',
use_bias=True,
initial='kaiming'
) # upsample and make `channel` identical to `out_channel`
reset_gate = conv2d(
fluid.layers.concat(
[in_data, state_], axis=1),
out_channel,
kernel_size,
norm=norm,
activation_fn='sigmoid',
padding_type='SAME',
use_bias=True,
name=name + '_reset_gate',
initial='kaiming')
update_gate = conv2d(
fluid.layers.concat(
[in_data, state_], axis=1),
out_channel,
kernel_size,
norm=norm,
activation_fn='sigmoid',
padding_type='SAME',
use_bias=True,
name=name + '_update_gate',
initial='kaiming')
left_state = reset_gate * state_
new_info = conv2d(
fluid.layers.concat(
[in_data, left_state], axis=1),
out_channel,
kernel_size,
norm=norm,
activation_fn='tanh',
name=name + '_info',
padding_type='SAME',
use_bias=True,
initial='kaiming')
output = (1 - update_gate) * state_ + update_gate * new_info
if pass_state == 'output':
return output, output
elif pass_state == 'state':
return output, state_
else:
return output, left_state
......@@ -15,19 +15,29 @@
from __future__ import division
import paddle.fluid as fluid
import numpy as np
import math
import os
import warnings
use_cudnn = True
if 'ce_mode' in os.environ:
use_cudnn = False
def cal_padding(img_size, stride, filter_size, dilation=1):
"""Calculate padding size."""
valid_filter_size = dilation * (filter_size - 1) + 1
if img_size % stride == 0:
out_size = max(filter_size - stride, 0)
else:
out_size = max(filter_size - (img_size % stride), 0)
return out_size // 2, out_size - out_size // 2
def norm_layer(input, norm_type='batch_norm', name=None):
if norm_type == 'batch_norm':
param_attr = fluid.ParamAttr(
name=name + '_w',
initializer=fluid.initializer.NormalInitializer(
loc=1.0, scale=0.02))
name=name + '_w', initializer=fluid.initializer.Constant(1.0))
bias_attr = fluid.ParamAttr(
name=name + '_b', initializer=fluid.initializer.Constant(value=0.0))
return fluid.layers.batch_norm(
......@@ -49,8 +59,7 @@ def norm_layer(input, norm_type='batch_norm', name=None):
offset_name = name + "_offset"
scale_param = fluid.ParamAttr(
name=scale_name,
initializer=fluid.initializer.NormalInitializer(
loc=0.0, scale=0.02),
initializer=fluid.initializer.Constant(1.0),
trainable=True)
offset_param = fluid.ParamAttr(
name=offset_name,
......@@ -69,6 +78,38 @@ def norm_layer(input, norm_type='batch_norm', name=None):
raise NotImplementedError("norm tyoe: [%s] is not support" % norm_type)
def initial_type(name,
init="normal",
use_bias=False,
f_in=0,
filter_size=0,
stddev=0.02):
if init == "kaiming":
fan_in = f_in * filter_size * filter_size
bound = 1 / math.sqrt(fan_in)
param_attr = fluid.ParamAttr(
name=name + "_w",
initializer=fluid.initializer.MSRAInitializer(uniform=True))
if use_bias == True:
bias_attr = fluid.ParamAttr(
name=name + '_b',
initializer=fluid.initializer.Uniform(
low=-bound, high=bound))
else:
bias_attr = False
else:
param_attr = fluid.ParamAttr(
name=name + "_w",
initializer=fluid.initializer.NormalInitializer(
loc=0.0, scale=stddev))
if use_bias == True:
bias_attr = fluid.ParamAttr(
name=name + "_b", initializer=fluid.initializer.Constant(0.0))
else:
bias_attr = False
return param_attr, bias_attr
def conv2d(input,
num_filters=64,
filter_size=7,
......@@ -79,16 +120,42 @@ def conv2d(input,
norm=None,
activation_fn=None,
relufactor=0.0,
use_bias=False):
param_attr = fluid.ParamAttr(
name=name + "_w",
initializer=fluid.initializer.NormalInitializer(
loc=0.0, scale=stddev))
if use_bias == True:
bias_attr = fluid.ParamAttr(
name=name + "_b", initializer=fluid.initializer.Constant(0.0))
use_bias=False,
padding_type=None,
initial="normal"):
if padding != 0 and padding_type != None:
warnings.warn(
'padding value and padding type are set in the same time, and the final padding width and padding height are computed by padding_type'
)
param_attr, bias_attr = initial_type(
name=name,
init=initial,
use_bias=use_bias,
f_in=input.shape[1],
filter_size=filter_size,
stddev=stddev)
need_crop = False
if padding_type == "SAME":
top_padding, bottom_padding = cal_padding(input.shape[2], stride,
filter_size)
left_padding, right_padding = cal_padding(input.shape[2], stride,
filter_size)
height_padding = bottom_padding
width_padding = right_padding
if top_padding != bottom_padding or left_padding != right_padding:
height_padding = top_padding + stride
width_padding = left_padding + stride
need_crop = True
padding = [height_padding, width_padding]
elif padding_type == "VALID":
height_padding = 0
width_padding = 0
padding = [height_padding, width_padding]
else:
bias_attr = False
padding = padding
conv = fluid.layers.conv2d(
input,
......@@ -109,6 +176,8 @@ def conv2d(input,
conv, alpha=relufactor, name=name + '_leaky_relu')
elif activation_fn == 'tanh':
conv = fluid.layers.tanh(conv, name=name + '_tanh')
elif activation_fn == 'sigmoid':
conv = fluid.layers.sigmoid(conv, name=name + '_sigmoid')
elif activation_fn == None:
conv = conv
else:
......@@ -123,23 +192,49 @@ def deconv2d(input,
filter_size=7,
stride=1,
stddev=0.02,
padding=[0, 0],
padding=0,
outpadding=[0, 0, 0, 0],
name="deconv2d",
norm=None,
activation_fn=None,
relufactor=0.0,
use_bias=False,
output_size=None):
param_attr = fluid.ParamAttr(
name=name + "_w",
initializer=fluid.initializer.NormalInitializer(
loc=0.0, scale=stddev))
if use_bias == True:
bias_attr = fluid.ParamAttr(
name=name + "_b", initializer=fluid.initializer.Constant(0.0))
padding_type=None,
output_size=None,
initial="normal"):
if padding != 0 and padding_type != None:
warnings.warn(
'padding value and padding type are set in the same time, and the final padding width and padding height are computed by padding_type'
)
param_attr, bias_attr = initial_type(
name=name,
init=initial,
use_bias=use_bias,
f_in=input.shape[1],
filter_size=filter_size,
stddev=stddev)
need_crop = False
if padding_type == "SAME":
top_padding, bottom_padding = cal_padding(input.shape[2], stride,
filter_size)
left_padding, right_padding = cal_padding(input.shape[2], stride,
filter_size)
height_padding = bottom_padding
width_padding = right_padding
if top_padding != bottom_padding or left_padding != right_padding:
height_padding = top_padding + stride
width_padding = left_padding + stride
need_crop = True
padding = [height_padding, width_padding]
elif padding_type == "VALID":
height_padding = 0
width_padding = 0
padding = [height_padding, width_padding]
else:
bias_attr = False
padding = padding
conv = fluid.layers.conv2d_transpose(
input,
......@@ -153,8 +248,9 @@ def deconv2d(input,
param_attr=param_attr,
bias_attr=bias_attr)
conv = fluid.layers.pad2d(
conv, paddings=outpadding, mode='constant', pad_value=0.0)
if outpadding != 0 and padding_type == None:
conv = fluid.layers.pad2d(
conv, paddings=outpadding, mode='constant', pad_value=0.0)
if norm is not None:
conv = norm_layer(input=conv, norm_type=norm, name=name + "_norm")
......@@ -185,13 +281,17 @@ def linear(input,
stddev=0.02,
activation_fn=None,
relufactor=0.2,
name='linear'):
param_attr = fluid.ParamAttr(
name=name + '_w',
initializer=fluid.initializer.NormalInitializer(
loc=0.0, scale=stddev))
bias_attr = fluid.ParamAttr(
name=name + "_b", initializer=fluid.initializer.Constant(0.0))
name="linear",
initial="normal"):
param_attr, bias_attr = initial_type(
name=name,
init=initial,
use_bias=True,
f_in=input.shape[1],
filter_size=1,
stddev=stddev)
linear = fluid.layers.fc(input,
output_size,
param_attr=param_attr,
......
python train.py --model_net AttGAN --dataset celeba --crop_size 170 --load_size 128 --train_list ./data/celeba/list_attr_celeba.txt --test_list ./data/celeba/test_list_attr_celeba.txt --gan_mode wgan --batch_size 32 --print_freq 1 --num_discriminator_time 5 --epoch 200 >log.out #2>log_err
python train.py --model_net STGAN --dataset celeba --crop_size 170 --load_size 128 --train_list ./data/celeba/list_attr_celeba.txt --test_list ./data/celeba/test_list_attr_celeba.txt --gan_mode wgan --batch_size 32 --print_freq 1 --num_discriminator_time 5 --epoch 200 >log.out #2>log_err
......@@ -56,6 +56,12 @@ def train(cfg):
elif cfg.model_net == 'Pix2pix':
from trainer.Pix2pix import Pix2pix
model = Pix2pix(cfg, train_reader, test_reader, batch_num)
elif cfg.model_net == 'AttGAN':
from trainer.AttGAN import AttGAN
model = AttGAN(cfg, train_reader, test_reader, batch_num)
elif cfg.model_net == 'STGAN':
from trainer.STGAN import STGAN
model = STGAN(cfg, train_reader, test_reader, batch_num)
else:
pass
......@@ -65,7 +71,7 @@ def train(cfg):
if __name__ == "__main__":
cfg = config.parse_args()
config.print_arguments(cfg)
assert cfg.load_size >= cfg.crop_size, "Load Size CANNOT less than Crop Size!"
#assert cfg.load_size >= cfg.crop_size, "Load Size CANNOT less than Crop Size!"
if cfg.profile:
if cfg.use_gpu:
with profiler.profiler('All', 'total', '/tmp/profile') as prof:
......
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from network.AttGAN_network import AttGAN_model
from util import utility
import paddle.fluid as fluid
import sys
import time
import copy
import numpy as np
class GTrainer():
def __init__(self, image_real, label_org, label_org_, label_trg, label_trg_,
cfg, step_per_epoch):
self.program = fluid.default_main_program().clone()
with fluid.program_guard(self.program):
model = AttGAN_model()
self.fake_img, self.rec_img = model.network_G(
image_real, label_org_, label_trg_, cfg, name="generator")
self.fake_img.persistable = True
self.rec_img.persistable = True
self.infer_program = self.program.clone(for_test=True)
self.g_loss_rec = fluid.layers.mean(
fluid.layers.abs(
fluid.layers.elementwise_sub(
x=image_real, y=self.rec_img)))
self.pred_fake, self.cls_fake = model.network_D(
self.fake_img, cfg, name="discriminator")
#wgan
if cfg.gan_mode == "wgan":
self.g_loss_fake = -1 * fluid.layers.mean(self.pred_fake)
#lsgan
elif cfg.gan_mode == "lsgan":
ones = fluid.layers.fill_constant_batch_size_like(
input=self.pred_fake,
shape=self.pred_fake.shape,
value=1.0,
dtype='float32')
self.g_loss_fake = fluid.layers.mean(
fluid.layers.square(
fluid.layers.elementwise_sub(
x=self.pred_fake, y=ones)))
self.g_loss_cls = fluid.layers.mean(
fluid.layers.sigmoid_cross_entropy_with_logits(self.cls_fake,
label_trg))
self.g_loss = self.g_loss_fake + cfg.lambda_rec * self.g_loss_rec + cfg.lambda_cls * self.g_loss_cls
self.g_loss_fake.persistable = True
self.g_loss_rec.persistable = True
self.g_loss_cls.persistable = True
if cfg.epoch <= 100:
lr = cfg.g_lr
else:
lr = fluid.layers.piecewise_decay(
boundaries=[99 * step_per_epoch],
values=[cfg.g_lr, cfg.g_lr * 0.1], )
vars = []
for var in self.program.list_vars():
if fluid.io.is_parameter(var) and var.name.startswith(
"generator"):
vars.append(var.name)
self.param = vars
optimizer = fluid.optimizer.Adam(
learning_rate=lr, beta1=0.5, beta2=0.999, name="net_G")
optimizer.minimize(self.g_loss, parameter_list=vars)
with open('program_gen.txt', 'w') as f:
print(self.program, file=f)
class DTrainer():
def __init__(self, image_real, label_org, label_org_, label_trg, label_trg_,
cfg, step_per_epoch):
self.program = fluid.default_main_program().clone()
lr = cfg.d_lr
with fluid.program_guard(self.program):
model = AttGAN_model()
clone_image_real = []
for b in self.program.blocks:
if b.has_var('image_real'):
clone_image_real = b.var('image_real')
break
self.fake_img, _ = model.network_G(
image_real, label_org, label_trg_, cfg, name="generator")
self.pred_real, self.cls_real = model.network_D(
image_real, cfg, name="discriminator")
self.pred_fake, _ = model.network_D(
self.fake_img, cfg, name="discriminator")
self.d_loss_cls = fluid.layers.mean(
fluid.layers.sigmoid_cross_entropy_with_logits(self.cls_real,
label_org))
#wgan
if cfg.gan_mode == "wgan":
self.d_loss_fake = fluid.layers.reduce_mean(self.pred_fake)
self.d_loss_real = -1 * fluid.layers.reduce_mean(self.pred_real)
self.d_loss_gp = self.gradient_penalty(
model.network_D,
clone_image_real,
self.fake_img,
cfg=cfg,
name="discriminator")
self.d_loss = self.d_loss_real + self.d_loss_fake + 1.0 * self.d_loss_cls + cfg.lambda_gp * self.d_loss_gp
#lsgan
elif cfg.gan_mode == "lsgan":
ones = fluid.layers.fill_constant_batch_size_like(
input=self.pred_real,
shape=self.pred_real.shape,
value=1.0,
dtype='float32')
self.d_loss_real = fluid.layers.mean(
fluid.layers.square(
fluid.layers.elementwise_sub(
x=self.pred_real, y=ones)))
self.d_loss_fake = fluid.layers.mean(
fluid.layers.square(x=self.pred_fake))
self.d_loss = self.d_loss_real + self.d_loss_fake + self.d_loss_cls
self.d_loss_real.persistable = True
self.d_loss_fake.persistable = True
self.d_loss.persistable = True
self.d_loss_cls.persistable = True
self.d_loss_gp.persistable = True
vars = []
for var in self.program.list_vars():
if fluid.io.is_parameter(var) and var.name.startswith(
"discriminator"):
vars.append(var.name)
self.param = vars
if cfg.epoch <= 100:
lr = cfg.d_lr
else:
lr = fluid.layers.piecewise_decay(
boundaries=[99 * step_per_epoch],
values=[cfg.g_lr, cfg.g_lr * 0.1], )
optimizer = fluid.optimizer.Adam(
learning_rate=lr, beta1=0.5, beta2=0.999, name="net_D")
optimizer.minimize(self.d_loss, parameter_list=vars)
with open('program.txt', 'w') as f:
print(self.program, file=f)
def gradient_penalty(self, f, real, fake=None, cfg=None, name=None):
def _interpolate(a, b=None):
shape = [a.shape[0]]
alpha = fluid.layers.uniform_random_batch_size_like(
input=a, shape=shape, min=0.0, max=1.0)
tmp = fluid.layers.elementwise_mul(
fluid.layers.elementwise_sub(b, a), alpha, axis=0)
alpha.stop_gradient = True
tmp.stop_gradient = True
inner = fluid.layers.elementwise_add(a, tmp, axis=0)
return inner
x = _interpolate(real, fake)
pred, _ = f(x, cfg=cfg, name=name)
if isinstance(pred, tuple):
pred = pred[0]
vars = []
for var in fluid.default_main_program().list_vars():
if fluid.io.is_parameter(var) and var.name.startswith(
"discriminator"):
vars.append(var.name)
grad = fluid.gradients(pred, x, no_grad_set=vars)
grad_shape = grad.shape
grad = fluid.layers.reshape(
grad, [-1, grad_shape[1] * grad_shape[2] * grad_shape[3]])
epsilon = 1e-5
norm = fluid.layers.sqrt(
fluid.layers.reduce_sum(
fluid.layers.square(grad), dim=1) + epsilon)
gp = fluid.layers.reduce_mean(fluid.layers.square(norm - 1.0))
return gp
class AttGAN(object):
def add_special_args(self, parser):
parser.add_argument(
'--g_lr',
type=float,
default=0.0002,
help="the base learning rate of generator")
parser.add_argument(
'--d_lr',
type=float,
default=0.0002,
help="the base learning rate of discriminator")
parser.add_argument(
'--c_dim',
type=int,
default=13,
help="the number of attributes we selected")
parser.add_argument(
'--d_fc_dim',
type=int,
default=1024,
help="the base fc dim in discriminator")
parser.add_argument(
'--lambda_cls',
type=float,
default=10.0,
help="the coefficient of classification")
parser.add_argument(
'--lambda_rec',
type=float,
default=100.0,
help="the coefficient of refactor")
parser.add_argument(
'--thres_int',
type=float,
default=0.5,
help="thresh change of attributes")
parser.add_argument(
'--lambda_gp',
type=float,
default=10.0,
help="the coefficient of gradient penalty")
parser.add_argument(
'--n_samples', type=int, default=16, help="batch size when testing")
parser.add_argument(
'--selected_attrs',
type=str,
default="Bald,Bangs,Black_Hair,Blond_Hair,Brown_Hair,Bushy_Eyebrows,Eyeglasses,Male,Mouth_Slightly_Open,Mustache,No_Beard,Pale_Skin,Young",
help="the attributes we selected to change")
parser.add_argument(
'--n_layers',
type=int,
default=5,
help="default layers in the network")
return parser
def __init__(self,
cfg=None,
train_reader=None,
test_reader=None,
batch_num=1):
self.cfg = cfg
self.train_reader = train_reader
self.test_reader = test_reader
self.batch_num = batch_num
def build_model(self):
data_shape = [-1, 3, self.cfg.load_size, self.cfg.load_size]
image_real = fluid.layers.data(
name='image_real', shape=data_shape, dtype='float32')
label_org = fluid.layers.data(
name='label_org', shape=[self.cfg.c_dim], dtype='float32')
label_trg = fluid.layers.data(
name='label_trg', shape=[self.cfg.c_dim], dtype='float32')
label_org_ = fluid.layers.data(
name='label_org_', shape=[self.cfg.c_dim], dtype='float32')
label_trg_ = fluid.layers.data(
name='label_trg_', shape=[self.cfg.c_dim], dtype='float32')
gen_trainer = GTrainer(image_real, label_org, label_org_, label_trg,
label_trg_, self.cfg, self.batch_num)
dis_trainer = DTrainer(image_real, label_org, label_org_, label_trg,
label_trg_, self.cfg, self.batch_num)
# prepare environment
place = fluid.CUDAPlace(0) if self.cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
if self.cfg.init_model:
utility.init_checkpoints(self.cfg, exe, gen_trainer, "net_G")
utility.init_checkpoints(self.cfg, exe, dis_trainer, "net_D")
### memory optim
build_strategy = fluid.BuildStrategy()
build_strategy.enable_inplace = False
build_strategy.memory_optimize = False
gen_trainer_program = fluid.CompiledProgram(
gen_trainer.program).with_data_parallel(
loss_name=gen_trainer.g_loss.name,
build_strategy=build_strategy)
dis_trainer_program = fluid.CompiledProgram(
dis_trainer.program).with_data_parallel(
loss_name=dis_trainer.d_loss.name,
build_strategy=build_strategy)
t_time = 0
for epoch_id in range(self.cfg.epoch):
batch_id = 0
for i in range(self.batch_num):
image, label_org = next(self.train_reader())
label_trg = copy.deepcopy(label_org)
np.random.shuffle(label_trg)
label_org_ = map(lambda x: (x * 2.0 - 1.0) * self.cfg.thres_int,
label_org)
label_trg_ = map(lambda x: (x * 2.0 - 1.0) * self.cfg.thres_int,
label_trg)
tensor_img = fluid.LoDTensor()
tensor_label_org = fluid.LoDTensor()
tensor_label_trg = fluid.LoDTensor()
tensor_label_org_ = fluid.LoDTensor()
tensor_label_trg_ = fluid.LoDTensor()
tensor_img.set(image, place)
tensor_label_org.set(label_org, place)
tensor_label_trg.set(label_trg, place)
tensor_label_org_.set(label_org_, place)
tensor_label_trg_.set(label_trg_, place)
label_shape = tensor_label_trg.shape
s_time = time.time()
# optimize the discriminator network
if (batch_id + 1) % self.cfg.num_discriminator_time != 0:
fetches = [
dis_trainer.d_loss.name, dis_trainer.d_loss_real.name,
dis_trainer.d_loss_fake.name,
dis_trainer.d_loss_cls.name, dis_trainer.d_loss_gp.name
]
d_loss, d_loss_real, d_loss_fake, d_loss_cls, d_loss_gp = exe.run(
dis_trainer_program,
fetch_list=fetches,
feed={
"image_real": tensor_img,
"label_org": tensor_label_org,
"label_org_": tensor_label_org_,
"label_trg": tensor_label_trg,
"label_trg_": tensor_label_trg_
})
batch_time = time.time() - s_time
t_time += batch_time
print("epoch{}: batch{}: \n\
d_loss: {}; d_loss_real: {}; d_loss_fake: {}; d_loss_cls: {}; d_loss_gp: {} \n\
Batch_time_cost: {:.2f}"
.format(epoch_id, batch_id, d_loss[0], d_loss_real[
0], d_loss_fake[0], d_loss_cls[0], d_loss_gp[0],
batch_time))
# optimize the generator network
else:
d_fetches = [
gen_trainer.g_loss_fake.name,
gen_trainer.g_loss_rec.name,
gen_trainer.g_loss_cls.name, gen_trainer.fake_img.name
]
g_loss_fake, g_loss_rec, g_loss_cls, fake_img = exe.run(
gen_trainer_program,
fetch_list=d_fetches,
feed={
"image_real": tensor_img,
"label_org": tensor_label_org,
"label_org_": tensor_label_org_,
"label_trg": tensor_label_trg,
"label_trg_": tensor_label_trg_
})
print("epoch{}: batch{}: \n\
g_loss_fake: {}; g_loss_rec: {}; g_loss_cls: {}"
.format(epoch_id, batch_id, g_loss_fake[0],
g_loss_rec[0], g_loss_cls[0]))
sys.stdout.flush()
batch_id += 1
if self.cfg.run_test:
test_program = gen_trainer.infer_program
utility.save_test_image(epoch_id, self.cfg, exe, place,
test_program, gen_trainer,
self.test_reader)
if self.cfg.save_checkpoints:
utility.checkpoints(epoch_id, self.cfg, exe, gen_trainer,
"net_G")
utility.checkpoints(epoch_id, self.cfg, exe, dis_trainer,
"net_D")
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from network.STGAN_network import STGAN_model
from util import utility
import paddle.fluid as fluid
import sys
import time
import copy
import numpy as np
class GTrainer():
def __init__(self, image_real, label_org, label_org_, label_trg, label_trg_,
cfg, step_per_epoch):
self.program = fluid.default_main_program().clone()
with fluid.program_guard(self.program):
model = STGAN_model()
self.fake_img, self.rec_img = model.network_G(
image_real, label_org_, label_trg_, cfg, name="generator")
self.fake_img.persistable = True
self.rec_img.persistable = True
self.infer_program = self.program.clone(for_test=True)
self.g_loss_rec = fluid.layers.mean(
fluid.layers.abs(
fluid.layers.elementwise_sub(
x=image_real, y=self.rec_img)))
self.pred_fake, self.cls_fake = model.network_D(
self.fake_img, cfg, name="discriminator")
#wgan
if cfg.gan_mode == "wgan":
self.g_loss_fake = -1 * fluid.layers.mean(self.pred_fake)
#lsgan
elif cfg.gan_mode == "lsgan":
ones = fluid.layers.fill_constant_batch_size_like(
input=self.pred_fake,
shape=self.pred_fake.shape,
value=1.0,
dtype='float32')
self.g_loss_fake = fluid.layers.mean(
fluid.layers.square(
fluid.layers.elementwise_sub(
x=self.pred_fake, y=ones)))
self.g_loss_cls = fluid.layers.mean(
fluid.layers.sigmoid_cross_entropy_with_logits(self.cls_fake,
label_trg))
self.g_loss = self.g_loss_fake + cfg.lambda_rec * self.g_loss_rec + cfg.lambda_cls * self.g_loss_cls
self.g_loss_fake.persistable = True
self.g_loss_rec.persistable = True
self.g_loss_cls.persistable = True
lr = cfg.g_lr
vars = []
for var in self.program.list_vars():
if fluid.io.is_parameter(var) and var.name.startswith(
"generator"):
vars.append(var.name)
self.param = vars
optimizer = fluid.optimizer.Adam(
learning_rate=fluid.layers.piecewise_decay(
boundaries=[99 * step_per_epoch], values=[lr, lr * 0.1]),
beta1=0.5,
beta2=0.999,
name="net_G")
optimizer.minimize(self.g_loss, parameter_list=vars)
class DTrainer():
def __init__(self, image_real, label_org, label_org_, label_trg, label_trg_,
cfg, step_per_epoch):
self.program = fluid.default_main_program().clone()
lr = cfg.d_lr
with fluid.program_guard(self.program):
model = STGAN_model()
clone_image_real = []
for b in self.program.blocks:
if b.has_var('image_real'):
clone_image_real = b.var('image_real')
break
self.fake_img, _ = model.network_G(
image_real, label_org, label_trg_, cfg, name="generator")
self.pred_real, self.cls_real = model.network_D(
image_real, cfg, name="discriminator")
self.pred_real.persistable = True
self.cls_real.persistable = True
self.pred_fake, _ = model.network_D(
self.fake_img, cfg, name="discriminator")
self.d_loss_cls = fluid.layers.mean(
fluid.layers.sigmoid_cross_entropy_with_logits(self.cls_real,
label_org))
#wgan
if cfg.gan_mode == "wgan":
self.d_loss_fake = fluid.layers.reduce_mean(self.pred_fake)
self.d_loss_real = -1 * fluid.layers.reduce_mean(self.pred_real)
self.d_loss_gp = self.gradient_penalty(
model.network_D,
clone_image_real,
self.fake_img,
cfg=cfg,
name="discriminator")
self.d_loss = self.d_loss_real + self.d_loss_fake + 1.0 * self.d_loss_cls + cfg.lambda_gp * self.d_loss_gp
#lsgan
elif cfg.gan_mode == "lsgan":
ones = fluid.layers.fill_constant_batch_size_like(
input=self.pred_real,
shape=self.pred_real.shape,
value=1.0,
dtype='float32')
self.d_loss_real = fluid.layers.mean(
fluid.layers.square(
fluid.layers.elementwise_sub(
x=self.pred_real, y=ones)))
self.d_loss_fake = fluid.layers.mean(
fluid.layers.square(x=self.pred_fake))
self.d_loss = self.d_loss_real + self.d_loss_fake + self.d_loss_cls
self.d_loss_real.persistable = True
self.d_loss_fake.persistable = True
self.d_loss.persistable = True
self.d_loss_cls.persistable = True
self.d_loss_gp.persistable = True
vars = []
for var in self.program.list_vars():
if fluid.io.is_parameter(var) and (
var.name.startswith("discriminator")):
vars.append(var.name)
self.param = vars
optimizer = fluid.optimizer.Adam(
learning_rate=fluid.layers.piecewise_decay(
boundaries=[99 * step_per_epoch],
values=[lr, lr * 0.1], ),
beta1=0.5,
beta2=0.999,
name="net_D")
optimizer.minimize(self.d_loss, parameter_list=vars)
def gradient_penalty(self, f, real, fake=None, cfg=None, name=None):
def _interpolate(a, b=None):
shape = [a.shape[0]]
alpha = fluid.layers.uniform_random_batch_size_like(
input=a, shape=shape, min=0.0, max=1.0)
tmp = fluid.layers.elementwise_mul(
fluid.layers.elementwise_sub(b, a), alpha, axis=0)
alpha.stop_gradient = True
tmp.stop_gradient = True
inner = fluid.layers.elementwise_add(a, tmp, axis=0)
return inner
x = _interpolate(real, fake)
pred, _ = f(x, cfg=cfg, name=name)
if isinstance(pred, tuple):
pred = pred[0]
vars = []
for var in fluid.default_main_program().list_vars():
if fluid.io.is_parameter(var) and var.name.startswith(
"discriminator"):
vars.append(var.name)
grad = fluid.gradients(pred, x, no_grad_set=vars)
grad_shape = grad.shape
grad = fluid.layers.reshape(
grad, [-1, grad_shape[1] * grad_shape[2] * grad_shape[3]])
epsilon = 1e-5
norm = fluid.layers.sqrt(
fluid.layers.reduce_sum(
fluid.layers.square(grad), dim=1) + epsilon)
gp = fluid.layers.reduce_mean(fluid.layers.square(norm - 1.0))
return gp
class STGAN(object):
def add_special_args(self, parser):
parser.add_argument(
'--g_lr',
type=float,
default=0.0002,
help="the base learning rate of generator")
parser.add_argument(
'--d_lr',
type=float,
default=0.0002,
help="the base learning rate of discriminator")
parser.add_argument(
'--c_dim',
type=int,
default=13,
help="the number of attributes we selected")
parser.add_argument(
'--d_fc_dim',
type=int,
default=1024,
help="the base fc dim in discriminator")
parser.add_argument(
'--use_gru', type=bool, default=True, help="whether to use GRU")
parser.add_argument(
'--lambda_cls',
type=float,
default=10.0,
help="the coefficient of classification")
parser.add_argument(
'--lambda_rec',
type=float,
default=100.0,
help="the coefficient of refactor")
parser.add_argument(
'--thres_int',
type=float,
default=0.5,
help="thresh change of attributes")
parser.add_argument(
'--lambda_gp',
type=float,
default=10.0,
help="the coefficient of gradient penalty")
parser.add_argument(
'--n_samples', type=int, default=16, help="batch size when testing")
parser.add_argument(
'--selected_attrs',
type=str,
default="Bald,Bangs,Black_Hair,Blond_Hair,Brown_Hair,Bushy_Eyebrows,Eyeglasses,Male,Mouth_Slightly_Open,Mustache,No_Beard,Pale_Skin,Young",
help="the attributes we selected to change")
parser.add_argument(
'--n_layers',
type=int,
default=5,
help="default layers in generotor")
parser.add_argument(
'--gru_n_layers',
type=int,
default=4,
help="default layers of GRU in generotor")
return parser
def __init__(self,
cfg=None,
train_reader=None,
test_reader=None,
batch_num=1):
self.cfg = cfg
self.train_reader = train_reader
self.test_reader = test_reader
self.batch_num = batch_num
def build_model(self):
data_shape = [-1, 3, self.cfg.load_size, self.cfg.load_size]
image_real = fluid.layers.data(
name='image_real', shape=data_shape, dtype='float32')
label_org = fluid.layers.data(
name='label_org', shape=[self.cfg.c_dim], dtype='float32')
label_trg = fluid.layers.data(
name='label_trg', shape=[self.cfg.c_dim], dtype='float32')
label_org_ = fluid.layers.data(
name='label_org_', shape=[self.cfg.c_dim], dtype='float32')
label_trg_ = fluid.layers.data(
name='label_trg_', shape=[self.cfg.c_dim], dtype='float32')
gen_trainer = GTrainer(image_real, label_org, label_org_, label_trg,
label_trg_, self.cfg, self.batch_num)
dis_trainer = DTrainer(image_real, label_org, label_org_, label_trg,
label_trg_, self.cfg, self.batch_num)
# prepare environment
place = fluid.CUDAPlace(0) if self.cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
if self.cfg.init_model:
utility.init_checkpoints(self.cfg, exe, gen_trainer, "net_G")
utility.init_checkpoints(self.cfg, exe, dis_trainer, "net_D")
### memory optim
build_strategy = fluid.BuildStrategy()
build_strategy.enable_inplace = False
build_strategy.memory_optimize = False
gen_trainer_program = fluid.CompiledProgram(
gen_trainer.program).with_data_parallel(
loss_name=gen_trainer.g_loss.name,
build_strategy=build_strategy)
dis_trainer_program = fluid.CompiledProgram(
dis_trainer.program).with_data_parallel(
loss_name=dis_trainer.d_loss.name,
build_strategy=build_strategy)
t_time = 0
for epoch_id in range(self.cfg.epoch):
batch_id = 0
for i in range(self.batch_num):
image, label_org = next(self.train_reader())
label_trg = copy.deepcopy(label_org)
np.random.shuffle(label_trg)
label_org_ = map(lambda x: (x * 2.0 - 1.0) * self.cfg.thres_int,
label_org)
label_trg_ = map(lambda x: (x * 2.0 - 1.0) * self.cfg.thres_int,
label_trg)
tensor_img = fluid.LoDTensor()
tensor_label_org = fluid.LoDTensor()
tensor_label_trg = fluid.LoDTensor()
tensor_label_org_ = fluid.LoDTensor()
tensor_label_trg_ = fluid.LoDTensor()
tensor_img.set(image, place)
tensor_label_org.set(label_org, place)
tensor_label_trg.set(label_trg, place)
tensor_label_org_.set(label_org_, place)
tensor_label_trg_.set(label_trg_, place)
label_shape = tensor_label_trg.shape
s_time = time.time()
# optimize the discriminator network
if (batch_id + 1) % self.cfg.num_discriminator_time != 0:
fetches = [
dis_trainer.d_loss.name, dis_trainer.d_loss_real.name,
dis_trainer.d_loss_fake.name,
dis_trainer.d_loss_cls.name, dis_trainer.d_loss_gp.name
]
d_loss, d_loss_real, d_loss_fake, d_loss_cls, d_loss_gp = exe.run(
dis_trainer_program,
fetch_list=fetches,
feed={
"image_real": tensor_img,
"label_org": tensor_label_org,
"label_org_": tensor_label_org_,
"label_trg": tensor_label_trg,
"label_trg_": tensor_label_trg_
})
batch_time = time.time() - s_time
t_time += batch_time
print("epoch{}: batch{}: \n\
d_loss: {}; d_loss_real: {}; d_loss_fake: {}; d_loss_cls: {}; d_loss_gp: {} \n\
Batch_time_cost: {:.2f}"
.format(epoch_id, batch_id, d_loss[0], d_loss_real[
0], d_loss_fake[0], d_loss_cls[0], d_loss_gp[0],
batch_time))
# optimize the generator network
else:
d_fetches = [
gen_trainer.g_loss_fake.name,
gen_trainer.g_loss_rec.name, gen_trainer.g_loss_cls.name
]
g_loss_fake, g_loss_rec, g_loss_cls = exe.run(
gen_trainer_program,
fetch_list=d_fetches,
feed={
"image_real": tensor_img,
"label_org": tensor_label_org,
"label_org_": tensor_label_org_,
"label_trg": tensor_label_trg,
"label_trg_": tensor_label_trg_
})
print("epoch{}: batch{}: \n\
g_loss_fake: {}; g_loss_rec: {}; g_loss_cls: {}"
.format(epoch_id, batch_id, g_loss_fake[0],
g_loss_rec[0], g_loss_cls[0]))
sys.stdout.flush()
batch_id += 1
if self.cfg.run_test:
test_program = gen_trainer.infer_program
utility.save_test_image(epoch_id, self.cfg, exe, place,
test_program, gen_trainer,
self.test_reader)
if self.cfg.save_checkpoints:
utility.checkpoints(epoch_id, self.cfg, exe, gen_trainer,
"net_G")
utility.checkpoints(epoch_id, self.cfg, exe, dis_trainer,
"net_D")
......@@ -98,6 +98,8 @@ def base_parse_args(parser):
add_arg('lambda_L1', float, 100.0, "the initialize lambda parameter for L1 loss")
add_arg('num_generator_time', int, 1,
"the generator run times in training each epoch")
add_arg('num_discriminator_time', int, 1,
"the discriminator run times in training each epoch")
add_arg('print_freq', int, 10, "the frequency of print loss")
# yapf: enable
......
......@@ -101,40 +101,84 @@ def save_test_image(epoch,
imsave(out_path + "/inputB_" + str(epoch) + "_" + name, (
(input_B_temp + 1) * 127.5).astype(np.uint8))
else:
for data_A, data_B in zip(A_test_reader(), B_test_reader()):
A_name = data_A[0][1]
B_name = data_B[0][1]
tensor_A = fluid.LoDTensor()
tensor_B = fluid.LoDTensor()
tensor_A.set(data_A[0][0], place)
tensor_B.set(data_B[0][0], place)
fake_A_temp, fake_B_temp, cyc_A_temp, cyc_B_temp = exe.run(
test_program,
fetch_list=[
g_trainer.fake_A, g_trainer.fake_B, g_trainer.cyc_A,
g_trainer.cyc_B
],
feed={"input_A": tensor_A,
"input_B": tensor_B})
fake_A_temp = np.squeeze(fake_A_temp[0]).transpose([1, 2, 0])
fake_B_temp = np.squeeze(fake_B_temp[0]).transpose([1, 2, 0])
cyc_A_temp = np.squeeze(cyc_A_temp[0]).transpose([1, 2, 0])
cyc_B_temp = np.squeeze(cyc_B_temp[0]).transpose([1, 2, 0])
input_A_temp = np.squeeze(data_A[0][0]).transpose([1, 2, 0])
input_B_temp = np.squeeze(data_B[0][0]).transpose([1, 2, 0])
if cfg.model_net == 'AttGAN' or cfg.model_net == 'STGAN':
for data in zip(A_test_reader()):
real_img, label_org, name = data[0]
label_trg = copy.deepcopy(label_org)
tensor_img = fluid.LoDTensor()
tensor_label_org = fluid.LoDTensor()
tensor_label_trg = fluid.LoDTensor()
tensor_label_org_ = fluid.LoDTensor()
tensor_label_trg_ = fluid.LoDTensor()
tensor_img.set(real_img, place)
tensor_label_org.set(label_org, place)
real_img_temp = np.squeeze(real_img).transpose([0, 2, 3, 1])
images = [real_img_temp]
for i in range(cfg.c_dim):
label_trg_tmp = copy.deepcopy(label_trg)
imsave(out_path + "/fakeB_" + str(epoch) + "_" + A_name, (
(fake_B_temp + 1) * 127.5).astype(np.uint8))
imsave(out_path + "/fakeA_" + str(epoch) + "_" + B_name, (
(fake_A_temp + 1) * 127.5).astype(np.uint8))
imsave(out_path + "/cycA_" + str(epoch) + "_" + A_name, (
(cyc_A_temp + 1) * 127.5).astype(np.uint8))
imsave(out_path + "/cycB_" + str(epoch) + "_" + B_name, (
(cyc_B_temp + 1) * 127.5).astype(np.uint8))
imsave(out_path + "/inputA_" + str(epoch) + "_" + A_name, (
(input_A_temp + 1) * 127.5).astype(np.uint8))
imsave(out_path + "/inputB_" + str(epoch) + "_" + B_name, (
(input_B_temp + 1) * 127.5).astype(np.uint8))
for j in range(len(label_org)):
label_trg_tmp[j][i] = 1.0 - label_trg_tmp[j][i]
label_trg_ = map(lambda x: ((x * 2) - 1) * 0.5,
label_trg_tmp)
for j in range(len(label_org)):
label_trg_[j][i] = label_trg_[j][i] * 2.0
tensor_label_org_.set(label_org, place)
tensor_label_trg.set(label_trg, place)
tensor_label_trg_.set(label_trg_, place)
out = exe.run(test_program,
feed={
"image_real": tensor_img,
"label_org": tensor_label_org,
"label_org_": tensor_label_org_,
"label_trg": tensor_label_trg,
"label_trg_": tensor_label_trg_
},
fetch_list=[g_trainer.fake_img])
fake_temp = np.squeeze(out[0]).transpose([0, 2, 3, 1])
images.append(fake_temp)
images_concat = np.concatenate(images, 1)
images_concat = np.concatenate(images_concat, 1)
imsave(out_path + "/fake_img" + str(epoch) + '_' + name[0], (
(images_concat + 1) * 127.5).astype(np.uint8))
else:
for data_A, data_B in zip(A_test_reader(), B_test_reader()):
A_name = data_A[0][1]
B_name = data_B[0][1]
tensor_A = fluid.LoDTensor()
tensor_B = fluid.LoDTensor()
tensor_A.set(data_A[0][0], place)
tensor_B.set(data_B[0][0], place)
fake_A_temp, fake_B_temp, cyc_A_temp, cyc_B_temp = exe.run(
test_program,
fetch_list=[
g_trainer.fake_A, g_trainer.fake_B, g_trainer.cyc_A,
g_trainer.cyc_B
],
feed={"input_A": tensor_A,
"input_B": tensor_B})
fake_A_temp = np.squeeze(fake_A_temp[0]).transpose([1, 2, 0])
fake_B_temp = np.squeeze(fake_B_temp[0]).transpose([1, 2, 0])
cyc_A_temp = np.squeeze(cyc_A_temp[0]).transpose([1, 2, 0])
cyc_B_temp = np.squeeze(cyc_B_temp[0]).transpose([1, 2, 0])
input_A_temp = np.squeeze(data_A[0][0]).transpose([1, 2, 0])
input_B_temp = np.squeeze(data_B[0][0]).transpose([1, 2, 0])
imsave(out_path + "/fakeB_" + str(epoch) + "_" + A_name, (
(fake_B_temp + 1) * 127.5).astype(np.uint8))
imsave(out_path + "/fakeA_" + str(epoch) + "_" + B_name, (
(fake_A_temp + 1) * 127.5).astype(np.uint8))
imsave(out_path + "/cycA_" + str(epoch) + "_" + A_name, (
(cyc_A_temp + 1) * 127.5).astype(np.uint8))
imsave(out_path + "/cycB_" + str(epoch) + "_" + B_name, (
(cyc_B_temp + 1) * 127.5).astype(np.uint8))
imsave(out_path + "/inputA_" + str(epoch) + "_" + A_name, (
(input_A_temp + 1) * 127.5).astype(np.uint8))
imsave(out_path + "/inputB_" + str(epoch) + "_" + B_name, (
(input_B_temp + 1) * 127.5).astype(np.uint8))
class ImagePool(object):
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册