提交 22a2500b 编写于 作者: R root

Merge branch 'develop' of https://github.com/PaddlePaddle/models into speedup

......@@ -13,3 +13,6 @@
[submodule "PaddleNLP/knowledge-driven-dialogue"]
path = PaddleNLP/knowledge-driven-dialogue
url = https://github.com/baidu/knowledge-driven-dialogue
[submodule "PaddleNLP/language_representations_kit"]
path = PaddleNLP/language_representations_kit
url = https://github.com/PaddlePaddle/LARK

运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
......@@ -73,8 +72,8 @@ env CUDA_VISIBLE_DEVICES=0 python train.py
执行以下命令读取多张图片进行预测:
```
env CUDA_VISIBLE_DEVICE=0 python infer.py \
--init_model="checkpoints/1" --input="./data/inputA/*" \
env CUDA_VISIBLE_DEVICES=0 python infer.py \
--init_model="output/checkpoints/1" --input="./data/horse2zebra/trainA/*" \
--input_style A --output="./output"
```
......@@ -89,3 +88,5 @@ env CUDA_VISIBLE_DEVICE=0 python infer.py \
<img src="images/B2A.jpg" width="620" hspace='10'/> <br/>
<strong>图 3</strong>
</p>
>在本文示例中,均可通过修改`CUDA_VISIBLE_DEVICES`改变使用的显卡号。
......@@ -2,12 +2,11 @@ import argparse
import functools
import os
from PIL import Image
from paddle.fluid import core
import paddle.fluid as fluid
import paddle
import numpy as np
from scipy.misc import imsave
from model import *
from model import build_generator_resnet_9blocks, build_gen_discriminator
import glob
from utility import add_arguments, print_arguments
......@@ -44,7 +43,6 @@ def infer(args):
if not os.path.exists(args.output):
os.makedirs(args.output)
for file in glob.glob(args.input):
print "read %s" % file
image_name = os.path.basename(file)
image = Image.open(file)
image = image.resize((256, 256))
......@@ -52,7 +50,7 @@ def infer(args):
if len(image.shape) != 3:
continue
data = image.transpose([2, 0, 1])[np.newaxis, :].astype("float32")
tensor = core.LoDTensor()
tensor = fluid.LoDTensor()
tensor.set(data, place)
fake_temp = exe.run(fetch_list=[fake.name], feed={"input": tensor})
......
from layers import *
from layers import conv2d, deconv2d
import paddle.fluid as fluid
......
......@@ -12,17 +12,16 @@ import numpy as np
from scipy.misc import imsave
import paddle.fluid as fluid
import paddle.fluid.profiler as profiler
from paddle.fluid import core
import data_reader
from utility import add_arguments, print_arguments, ImagePool
from trainer import *
from trainer import GATrainer, GBTrainer, DATrainer, DBTrainer
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('batch_size', int, 1, "Minibatch size.")
add_arg('epoch', int, 2, "The number of epoched to be trained.")
add_arg('output', str, "./output_0", "The directory the model and the test result to be saved to.")
add_arg('output', str, "./output", "The directory the model and the test result to be saved to.")
add_arg('init_model', str, None, "The init model file of directory.")
add_arg('save_checkpoints', bool, True, "Whether to save checkpoints.")
add_arg('run_test', bool, True, "Whether to run test.")
......@@ -82,8 +81,8 @@ def train(args):
for data_A, data_B in zip(A_test_reader(), B_test_reader()):
A_name = data_A[1]
B_name = data_B[1]
tensor_A = core.LoDTensor()
tensor_B = core.LoDTensor()
tensor_A = fluid.LoDTensor()
tensor_B = fluid.LoDTensor()
tensor_A.set(data_A[0], place)
tensor_B.set(data_B[0], place)
fake_A_temp, fake_B_temp, cyc_A_temp, cyc_B_temp = exe.run(
......@@ -168,8 +167,8 @@ def train(args):
for i in range(max_images_num):
data_A = next(A_reader)
data_B = next(B_reader)
tensor_A = core.LoDTensor()
tensor_B = core.LoDTensor()
tensor_A = fluid.LoDTensor()
tensor_B = fluid.LoDTensor()
tensor_A.set(data_A, place)
tensor_B.set(data_B, place)
s_time = time.time()
......
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from model import *
from model import build_generator_resnet_9blocks, build_gen_discriminator
import paddle.fluid as fluid
step_per_epoch = 1335
......
......@@ -21,7 +21,6 @@ import six
import random
import glob
import numpy as np
from paddle.fluid import core
def print_arguments(args):
......
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import print_function
from six.moves import range
from PIL import Image, ImageOps
import gzip
import numpy as np
import argparse
import struct
import os
import paddle
def RandomCrop(img, crop_w, crop_h):
w, h = img.shape[0], img.shape[1]
i = np.random.randint(0, w - crop_w)
j = np.random.randint(0, h - crop_h)
return img.crop((i, j, i + crop_w, j + crop_h))
def CentorCrop(img, crop_w, crop_h):
w, h = img.size[0], img.size[1]
i = int((w - crop_w) / 2.0)
j = int((h - crop_h) / 2.0)
return img.crop((i, j, i + crop_w, j + crop_h))
def RandomHorizonFlip(img):
i = np.random.rand()
if i > 0.5:
img = ImageOps.mirror(image)
return img
class reader_creator(object):
''' read and preprocess dataset'''
def __init__(self, image_dir, list_filename, batch_size=1, drop_last=False):
self.image_dir = image_dir
self.list_filename = list_filename
self.batch_size = batch_size
self.drop_last = drop_last
self.lines = open(self.list_filename).readlines()
def len(self):
if self.drop_last or len(self.lines) % self.batch_size == 0:
return len(self.lines) // self.batch_size
else:
return len(self.lines) // self.batch_size + 1
def get_train_reader(self, args, shuffle=False, return_name=False):
print(self.image_dir, self.list_filename)
def reader():
batch_out = []
while True:
if shuffle:
np.random.shuffle(self.lines)
for file in self.lines:
file = file.strip('\n\r\t ')
img = Image.open(os.path.join(self.image_dir,
file)).convert('RGB')
img = img.resize((args.load_size, args.load_size),
Image.BICUBIC)
if args.crop_type == 'Centor':
img = CentorCrop(img, args.crop_size, args.crop_size)
elif args.crop_type == 'Random':
img = RandomCrop(img, args.crop_size, args.crop_size)
img = (np.array(img).astype('float32') / 255.0 - 0.5) / 0.5
img = img.transpose([2, 0, 1])
if return_name:
batch_out.append([img, os.path.basename(file)])
else:
batch_out.append(img)
if len(batch_out) == self.batch_size:
yield batch_out
batch_out = []
if self.drop_last == False and len(batch_out) != 0:
yield batch_out
return reader
def get_test_reader(self, args, shuffle=False, return_name=False):
print(self.image_dir, self.list_filename)
def reader():
batch_out = []
for file in self.lines:
file = file.strip('\n\r\t ')
img = Image.open(os.path.join(self.image_dir, file)).convert(
'RGB')
img = img.resize((args.crop_size, args.crop_size),
Image.BICUBIC)
img = (np.array(img).astype('float32') / 255.0 - 0.5) / 0.5
img = img.transpose([2, 0, 1])
if return_name:
batch_out.append(
[img[np.newaxis, :], os.path.basename(file)])
else:
batch_out.append(img)
if len(batch_out) == self.batch_size:
yield batch_out
batch_out = []
if len(batch_out) != 0:
yield batch_out
return reader
def mnist_reader_creator(image_filename, label_filename, buffer_size):
def reader():
with gzip.GzipFile(image_filename, 'rb') as image_file:
img_buf = image_file.read()
with gzip.GzipFile(label_filename, 'rb') as label_file:
lab_buf = label_file.read()
step_label = 0
offset_img = 0
# read from Big-endian
# get file info from magic byte
# image file : 16B
magic_byte_img = '>IIII'
magic_img, image_num, rows, cols = struct.unpack_from(
magic_byte_img, img_buf, offset_img)
offset_img += struct.calcsize(magic_byte_img)
offset_lab = 0
# label file : 8B
magic_byte_lab = '>II'
magic_lab, label_num = struct.unpack_from(magic_byte_lab,
lab_buf, offset_lab)
offset_lab += struct.calcsize(magic_byte_lab)
while True:
if step_label >= label_num:
break
fmt_label = '>' + str(buffer_size) + 'B'
labels = struct.unpack_from(fmt_label, lab_buf, offset_lab)
offset_lab += struct.calcsize(fmt_label)
step_label += buffer_size
fmt_images = '>' + str(buffer_size * rows * cols) + 'B'
images_temp = struct.unpack_from(fmt_images, img_buf,
offset_img)
images = np.reshape(images_temp, (buffer_size, rows *
cols)).astype('float32')
offset_img += struct.calcsize(fmt_images)
images = images / 255.0 * 2.0 - 1.0
for i in range(buffer_size):
yield images[i, :], int(
labels[i]) # get image and label
return reader
class data_reader(object):
def __init__(self, cfg):
self.cfg = cfg
self.shuffle = self.cfg.shuffle
def make_data(self):
if self.cfg.dataset == 'mnist':
train_images = os.path.join(self.cfg.data_dir, self.cfg.dataset,
"train-images-idx3-ubyte.gz")
train_labels = os.path.join(self.cfg.data_dir, self.cfg.dataset,
"train-labels-idx1-ubyte.gz")
train_reader = paddle.batch(
paddle.reader.shuffle(
mnist_reader_creator(train_images, train_labels, 100),
buf_size=60000),
batch_size=self.cfg.batch_size)
return train_reader
else:
if self.cfg.model_net == 'CycleGAN':
dataset_dir = os.path.join(self.cfg.data_dir, self.cfg.dataset)
trainA_list = os.path.join(dataset_dir, "trainA.txt")
trainB_list = os.path.join(dataset_dir, "trainB.txt")
a_train_reader = reader_creator(
image_dir=dataset_dir,
list_filename=trainA_list,
batch_size=self.cfg.batch_size,
drop_last=self.cfg.drop_last)
b_train_reader = reader_creator(
image_dir=dataset_dir,
list_filename=trainB_list,
batch_size=self.cfg.batch_size,
drop_last=self.cfg.drop_last)
a_reader_test = None
b_reader_test = None
if self.cfg.run_test:
testA_list = os.path.join(dataset_dir, "testA.txt")
testB_list = os.path.join(dataset_dir, "testB.txt")
a_test_reader = reader_creator(
image_dir=dataset_dir,
list_filename=testA_list,
batch_size=1,
drop_last=self.cfg.drop_last)
b_test_reader = reader_creator(
image_dir=dataset_dir,
list_filename=testB_list,
batch_size=1,
drop_last=self.cfg.drop_last)
a_reader_test = a_test_reader.get_test_reader(
self.cfg, shuffle=False, return_name=True)
b_reader_test = b_test_reader.get_test_reader(
self.cfg, shuffle=False, return_name=True)
batch_num = max(a_train_reader.len(), b_train_reader.len())
a_reader = a_train_reader.get_train_reader(
self.cfg, shuffle=self.shuffle)
b_reader = b_train_reader.get_train_reader(
self.cfg, shuffle=self.shuffle)
return a_reader, b_reader, a_reader_test, b_reader_test, batch_num
else:
dataset_dir = os.path.join(self.cfg.data_dir, self.cfg.dataset)
train_list = os.path.join(dataset_dir, 'train.txt')
if self.cfg.data_list is not None:
train_list = self.cfg.data_list
train_reader = reader_creator(
image_dir=dataset_dir, list_filename=train_list)
reader_test = None
if self.cfg.run_test:
test_list = os.path.join(dataset_dir, "test.txt")
test_reader = reader_creator(
image_dir=dataset_dir,
list_filename=test_list,
batch_size=1,
drop_last=self.cfg.drop_last)
reader_test = test_reader.get_test_reader(
self.cfg, shuffle=False, return_name=True)
batch_num = train_reader.len()
return train_reader, reader_test, batch_num
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import print_function
from PIL import Image
import numpy as np
import os
import sys
import gzip
import argparse
import requests
import six
import hashlib
parser = argparse.ArgumentParser(description='Download dataset.')
#TODO add celeA dataset
parser.add_argument(
'--dataset',
type=str,
default='mnist',
help='name of dataset to download [mnist]')
def md5file(fname):
hash_md5 = hashlib.md5()
f = open(fname, "rb")
for chunk in iter(lambda: f.read(4096), b""):
hash_md5.update(chunk)
f.close()
return hash_md5.hexdigest()
def download_mnist(dir_path):
URL_DIC = {}
URL_PREFIX = 'http://yann.lecun.com/exdb/mnist/'
TEST_IMAGE_URL = URL_PREFIX + 't10k-images-idx3-ubyte.gz'
TEST_IMAGE_MD5 = '9fb629c4189551a2d022fa330f9573f3'
TEST_LABEL_URL = URL_PREFIX + 't10k-labels-idx1-ubyte.gz'
TEST_LABEL_MD5 = 'ec29112dd5afa0611ce80d1b7f02629c'
TRAIN_IMAGE_URL = URL_PREFIX + 'train-images-idx3-ubyte.gz'
TRAIN_IMAGE_MD5 = 'f68b3c2dcbeaaa9fbdd348bbdeb94873'
TRAIN_LABEL_URL = URL_PREFIX + 'train-labels-idx1-ubyte.gz'
TRAIN_LABEL_MD5 = 'd53e105ee54ea40749a09fcbcd1e9432'
URL_DIC[TRAIN_IMAGE_URL] = TRAIN_IMAGE_MD5
URL_DIC[TRAIN_LABEL_URL] = TRAIN_LABEL_MD5
URL_DIC[TEST_IMAGE_URL] = TEST_IMAGE_MD5
URL_DIC[TEST_LABEL_URL] = TEST_LABEL_MD5
### print(url)
for url in URL_DIC:
md5sum = URL_DIC[url]
data_dir = os.path.join(dir_path + 'mnist')
if not os.path.exists(data_dir):
os.makedirs(data_dir)
filename = os.path.join(data_dir, url.split('/')[-1])
retry = 0
retry_limit = 3
while not (os.path.exists(filename) and md5file(filename) == md5sum):
if os.path.exists(filename):
sys.stderr.write("file %s md5 %s" %
(md5file(filename), md5sum))
if retry < retry_limit:
retry += 1
else:
raise RuntimeError("Cannot download {0} within retry limit {1}".
format(url, retry_limit))
sys.stderr.write("Cache file %s not found, downloading %s" %
(filename, url))
r = requests.get(url, stream=True)
total_length = r.headers.get('content-length')
if total_length is None:
with open(filename, 'wb') as f:
shutil.copyfileobj(r.raw, f)
else:
with open(filename, 'wb') as f:
dl = 0
total_length = int(total_length)
for data in r.iter_content(chunk_size=4096):
if six.PY2:
data = six.b(data)
dl += len(data)
f.write(data)
done = int(50 * dl / total_length)
sys.stderr.write("\r[%s%s]" % ('=' * done,
' ' * (50 - done)))
sys.stdout.flush()
sys.stderr.write("\n")
sys.stdout.flush()
print(filename)
def download_cycle_pix(dir_path, dataname):
URL_PREFIX = 'https://people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets/'
IMAGE_URL = '{}.zip'.format(dataname)
url = URL_PREFIX + IMAGE_URL
if not os.path.exists(dir_path):
os.makedirs(dir_path)
r = requests.get(url, stream=True)
total_length = float(r.headers.get('content-length'))
filename = os.path.join(dir_path, IMAGE_URL)
print(filename)
if not os.path.exists(filename):
dl = 0
with open(filename, "wb") as f:
for data in r.iter_content(chunk_size=4096):
if six.PY2:
data = six.b(data)
dl += len(data)
f.write(data)
done = int(100 * dl / total_length)
sys.stderr.write("\r[{}{}] {}% ".format('=' * done, ' ' * (
100 - done), done))
sys.stdout.flush()
else:
sys.stderr.write('{}.zip is EXIST, DO NOT NEED to download it again.'.
format(dataname))
### unzip .zip file
if not os.path.exists(os.path.join(dir_path, '{}'.format(dataname))):
zip_f = zipfile.ZipFile(filename, 'r')
for zip_file in zip_f.namelist():
zip_f.extract(zip_file, dir_path)
### generator .txt file according to dirs
dirs = os.listdir(os.path.join(dir_path, '{}'.format(dataname)))
for d in dirs:
txt_file = d + '.txt'
txt_dir = os.path.join(dir_path, dataname)
f = open(os.path.join(txt_dir, txt_file), 'w')
for fil in os.listdir(os.path.join(txt_dir, d)):
wl = d + '/' + fil + '\n'
f.write(wl)
f.close()
sys.stderr.write("\n")
if __name__ == '__main__':
args = parser.parse_args()
cycle_pix_dataset = [
'apple2orange', 'summer2winter_yosemite', 'horse2zebra', 'monet2photo',
'cezanne2photo', 'ukiyoe2photo', 'vangogh2photo', 'maps', 'cityscapes',
'facades', 'iphone2dslr_flower', 'ae_photos', 'mini'
]
if args.dataset == 'mnist':
print('Download dataset: {}'.format(args.dataset))
download_mnist('./data/')
elif args.dataset in cycle_pix_dataset:
print('Download dataset: {}'.format(args.dataset))
download_cycle_pix('./data/', args.dataset)
else:
print('Please download by yourself, thanks')
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import functools
import os
from PIL import Image
import paddle.fluid as fluid
import paddle
import numpy as np
from scipy.misc import imsave
import glob
from util.config import add_arguments, print_arguments
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('model_net', str, 'cgan', "The model used")
add_arg('net_G', str, "resnet_9block", "Choose the CycleGAN generator's network, choose in [resnet_9block|resnet_6block|unet_128|unet_256]")
add_arg('input', str, None, "The images to be infered.")
add_arg('init_model', str, None, "The init model file of directory.")
add_arg('output', str, "./infer_result", "The directory the infer result to be saved to.")
add_arg('input_style', str, "A", "The style of the input, A or B")
add_arg('norm_type', str, "batch_norm", "Which normalization to used")
add_arg('use_gpu', bool, True, "Whether to use GPU to train.")
add_arg('dropout', bool, False, "Whether to use dropout")
add_arg('data_shape', int, 256, "The shape of load image")
add_arg('g_base_dims', int, 64, "Base channels in CycleGAN generator")
# yapf: enable
def infer(args):
data_shape = [-1, 3, args.data_shape, args.data_shape]
input = fluid.layers.data(name='input', shape=data_shape, dtype='float32')
model_name = 'net_G'
if args.model_net == 'cyclegan':
from network.CycleGAN_network import network_G, network_D
if args.input_style == "A":
fake = network_G(input, name="GA", cfg=args)
elif args.input_style == "B":
fake = network_G(input, name="GB", cfg=args)
else:
raise "Input with style [%s] is not supported." % args.input_style
elif args.model_net == 'cgan':
pass
else:
pass
# prepare environment
place = fluid.CPUPlace()
if args.use_gpu:
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
for var in fluid.default_main_program().global_block().all_parameters():
print(var.name)
print(args.init_model + '/' + model_name)
fluid.io.load_persistables(exe, args.init_model + "/" + model_name)
print('load params done')
if not os.path.exists(args.output):
os.makedirs(args.output)
for file in glob.glob(args.input):
print("read {}".format(file))
image_name = os.path.basename(file)
image = Image.open(file).convert('RGB')
image = image.resize((256, 256), Image.BICUBIC)
image = np.array(image).transpose([2, 0, 1]).astype('float32')
image = image / 255.0
image = (image - 0.5) / 0.5
data = image[np.newaxis, :]
tensor = fluid.LoDTensor()
tensor.set(data, place)
fake_temp = exe.run(fetch_list=[fake.name], feed={"input": tensor})
fake_temp = np.squeeze(fake_temp[0]).transpose([1, 2, 0])
input_temp = np.squeeze(data).transpose([1, 2, 0])
imsave(args.output + "/fake_" + image_name, (
(fake_temp + 1) * 127.5).astype(np.uint8))
if __name__ == "__main__":
args = parser.parse_args()
print_arguments(args)
infer(args)
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from .base_network import linear, conv2d, deconv2d, conv_cond_concat
import paddle.fluid as fluid
import numpy as np
import time
import os
import sys
class CGAN_model(object):
def __init__(self, batch_size=1):
self.batch_size = batch_size
self.img_w = 28
self.img_h = 28
self.y_dim = 1
self.gf_dim = 128
self.df_dim = 64
self.leaky_relu_factor = 0.2
def network_G(self, input, label, name="generator"):
# concat noise and label
y = fluid.layers.reshape(label, shape=[-1, self.y_dim, 1, 1])
xy = fluid.layers.concat([input, y], 1)
o_l1 = linear(
xy,
self.gf_dim * 8,
norm='batch_norm',
activation_fn='relu',
name=name + '_l1')
o_c1 = fluid.layers.concat([o_l1, y], 1)
o_l2 = linear(
o_c1,
self.gf_dim * (self.img_w // 4) * (self.img_h // 4),
norm='batch_norm',
activation_fn='relu',
name=name + '_l2')
o_r1 = fluid.layers.reshape(
o_l2,
shape=[-1, self.gf_dim, self.img_w // 4, self.img_h // 4],
name=name + '_reshape')
o_c2 = conv_cond_concat(o_r1, y)
o_dc1 = deconv2d(
o_c2,
self.gf_dim,
4,
2,
padding=[1, 1],
norm='batch_norm',
activation_fn='relu',
name=name + '_dc1',
output_size=[self.img_w // 2, self.img_h // 2])
o_c3 = conv_cond_concat(o_dc1, y)
o_dc2 = deconv2d(
o_dc1,
1,
4,
2,
padding=[1, 1],
activation_fn='tanh',
name=name + '_dc2',
output_size=[self.img_w, self.img_h])
out = fluid.layers.reshape(o_dc2, [-1, self.img_w * self.img_h])
return o_dc2
def network_D(self, input, label, name="discriminator"):
# concat image and label
x = fluid.layers.reshape(input, shape=[-1, 1, self.img_w, self.img_h])
y = fluid.layers.reshape(label, shape=[-1, self.y_dim, 1, 1])
xy = conv_cond_concat(x, y)
o_l1 = conv2d(
xy,
self.df_dim,
3,
2,
name=name + '_l1',
activation_fn='leaky_relu')
o_c1 = conv_cond_concat(o_l1, y)
o_l2 = conv2d(
o_c1,
self.df_dim,
3,
2,
name=name + '_l2',
norm='batch_norm',
activation_fn='leaky_relu')
o_f1 = fluid.layers.flatten(o_l2, axis=1)
o_c2 = fluid.layers.concat([o_f1, y], 1)
o_l3 = linear(
o_c2,
self.df_dim * 16,
norm='batch_norm',
activation_fn='leaky_relu',
name=name + '_l3')
o_c3 = fluid.layers.concat([o_l3, y], 1)
o_logit = linear(o_c3, 1, activation_fn='sigmoid', name=name + '_l4')
return o_logit
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from .base_network import conv2d, deconv2d, norm_layer
import paddle.fluid as fluid
class CycleGAN_model(object):
def __init__(self):
pass
def network_G(self, input, name, cfg):
if cfg.net_G == 'resnet_9block':
net = build_generator_resnet_blocks(
input,
name=name + "_resnet9block",
n_gen_res=9,
g_base_dims=cfg.g_base_dims,
use_dropout=cfg.dropout,
norm_type=cfg.norm_type)
elif cfg.net_G == 'resnet_6block':
net = build_generator_resnet_blocks(
input,
name=name + "_resnet6block",
n_gen_res=6,
g_base_dims=cfg.g_base_dims,
use_dropout=cfg.dropout,
norm_type=cfg.norm_type)
elif cfg.net_G == 'unet_128':
net = build_generator_Unet(
input,
name=name + "_unet128",
num_downsample=7,
g_base_dims=cfg.g_base_dims,
use_dropout=cfg.dropout,
norm_type=cfg.norm_type)
elif cfg.net_G == 'unet_256':
net = build_generator_Unet(
input,
name=name + "_unet256",
num_downsample=8,
g_base_dims=cfg.g_base_dims,
use_dropout=cfg.dropout,
norm_type=cfg.norm_type)
else:
raise NotImplementedError(
'network G: [%s] is wrong format, please check it' % cfg.net_G)
return net
def network_D(self, input, name, cfg):
if cfg.net_D == 'basic':
net = build_discriminator_Nlayers(
input,
name=name + '_basic',
d_nlayers=3,
d_base_dims=cfg.d_base_dims,
norm_type=cfg.norm_type)
elif cfg.net_D == 'nlayers':
net = build_discriminator_Nlayers(
input,
name=name + '_nlayers',
d_nlayers=cfg.d_nlayers,
d_base_dims=cfg.d_base_dims,
norm_type=cfg.norm_type)
elif cfg.net_D == 'pixel':
net = build_discriminator_Pixel(
input,
name=name + '_pixel',
d_base_dims=cfg.d_base_dims,
norm_type=cfg.norm_type)
else:
raise NotImplementedError(
'network D: [%s] is wrong format, please check it' % cfg.net_D)
return net
def build_resnet_block(inputres,
dim,
name="resnet",
use_bias=False,
use_dropout=False,
norm_type='batch_norm'):
out_res = fluid.layers.pad2d(inputres, [1, 1, 1, 1], mode="reflect")
out_res = conv2d(
out_res,
dim,
3,
1,
0.02,
name=name + "_c1",
norm=norm_type,
activation_fn='relu',
use_bias=use_bias)
if use_dropout:
out_res = fluid.layers.dropout(out_res, dropout_prob=0.5)
out_res = fluid.layers.pad2d(out_res, [1, 1, 1, 1], mode="reflect")
out_res = conv2d(
out_res,
dim,
3,
1,
0.02,
name=name + "_c2",
norm=norm_type,
use_bias=use_bias)
return out_res + inputres
def build_generator_resnet_blocks(inputgen,
name="generator",
n_gen_res=9,
g_base_dims=64,
use_dropout=False,
norm_type='batch_norm'):
''' generator use resnet block'''
'''The shape of input should be equal to the shape of output.'''
use_bias = norm_type == 'instance_norm'
pad_input = fluid.layers.pad2d(inputgen, [3, 3, 3, 3], mode="reflect")
o_c1 = conv2d(
pad_input,
g_base_dims,
7,
1,
0.02,
name=name + "_c1",
norm=norm_type,
activation_fn='relu')
o_c2 = conv2d(
o_c1,
g_base_dims * 2,
3,
2,
0.02,
1,
name=name + "_c2",
norm=norm_type,
activation_fn='relu')
res_input = conv2d(
o_c2,
g_base_dims * 4,
3,
2,
0.02,
1,
name=name + "_c3",
norm=norm_type,
activation_fn='relu')
for i in xrange(n_gen_res):
conv_name = name + "_r{}".format(i + 1)
res_output = build_resnet_block(
res_input,
g_base_dims * 4,
name=conv_name,
use_bias=use_bias,
use_dropout=use_dropout)
res_input = res_output
o_c4 = deconv2d(
res_output,
g_base_dims * 2,
3,
2,
0.02, [1, 1], [0, 1, 0, 1],
name=name + "_c4",
norm=norm_type,
activation_fn='relu')
o_c5 = deconv2d(
o_c4,
g_base_dims,
3,
2,
0.02, [1, 1], [0, 1, 0, 1],
name=name + "_c5",
norm=norm_type,
activation_fn='relu')
o_p2 = fluid.layers.pad2d(o_c5, [3, 3, 3, 3], mode="reflect")
o_c6 = conv2d(
o_p2,
3,
7,
1,
0.02,
name=name + "_c6",
activation_fn='tanh',
use_bias=True)
return o_c6
def Unet_block(inputunet,
i,
outer_dim,
inner_dim,
num_downsample,
innermost=False,
outermost=False,
norm_type='batch_norm',
use_bias=False,
use_dropout=False,
name=None):
if outermost == True:
downconv = conv2d(
inputunet,
inner_dim,
4,
2,
0.02,
1,
name=name + '_outermost_dc1',
use_bias=True)
i += 1
mid_block = Unet_block(
downconv,
i,
inner_dim,
inner_dim * 2,
num_downsample,
norm_type=norm_type,
use_bias=use_bias,
use_dropout=use_dropout,
name=name)
uprelu = fluid.layers.relu(mid_block, name=name + '_outermost_relu')
updeconv = deconv2d(
uprelu,
outer_dim,
4,
2,
0.02,
1,
name=name + '_outermost_uc1',
activation_fn='tanh',
use_bias=use_bias)
return updeconv
elif innermost == True:
downrelu = fluid.layers.leaky_relu(
inputunet, 0.2, name=name + '_innermost_leaky_relu')
upconv = conv2d(
downrelu,
inner_dim,
4,
2,
0.02,
1,
name=name + '_innermost_dc1',
activation_fn='relu',
use_bias=use_bias)
updeconv = deconv2d(
upconv,
outer_dim,
4,
2,
0.02,
1,
name=name + '_innermost_uc1',
norm=norm_type,
use_bias=use_bias)
return fluid.layers.concat([inputunet, updeconv], 1)
else:
downrelu = fluid.layers.leaky_relu(
inputunet, 0.2, name=name + '_leaky_relu')
downnorm = conv2d(
downrelu,
inner_dim,
4,
2,
0.02,
1,
name=name + 'dc1',
norm=norm_type,
use_bias=use_bias)
i += 1
if i < 4:
mid_block = Unet_block(
downnorm,
i,
inner_dim,
inner_dim * 2,
num_downsample,
norm_type=norm_type,
use_bias=use_bias,
name=name + '_mid{}'.format(i))
elif i < num_downsample - 1:
mid_block = Unet_block(
downnorm,
i,
inner_dim,
inner_dim,
num_downsample,
norm_type=norm_type,
use_bias=use_bias,
use_dropout=use_dropout,
name=name + '_mid{}'.format(i))
else:
mid_block = Unet_block(
downnorm,
i,
inner_dim,
inner_dim,
num_downsample,
innermost=True,
norm_type=norm_type,
use_bias=use_bias,
name=name + '_innermost')
uprelu = fluid.layers.relu(mid_block, name=name + '_relu')
updeconv = deconv2d(
uprelu,
outer_dim,
4,
2,
0.02,
1,
name=name + '_uc1',
norm=norm_type,
use_bias=use_bias)
if use_dropout:
upnorm = fluid.layers.dropout(upnorm, dropout_prob=0.5)
return fluid.layers.concat([inputunet, updeconv], 1)
def build_generator_Unet(inputgen,
name="generator",
num_downsample=7,
g_base_dims=64,
use_dropout=False,
norm_type='batch_norm'):
''' generator use Unet'''
use_bias = norm_type == 'instance_norm'
unet_block = Unet_block(
inputgen,
0,
3,
g_base_dims,
num_downsample,
outermost=True,
norm_type=norm_type,
use_bias=use_bias,
use_dropout=use_dropout,
name=name)
return unet_block
def build_discriminator_Nlayers(inputdisc,
name="discriminator",
d_nlayers=3,
d_base_dims=64,
norm_type='batch_norm'):
use_bias = norm_type != 'batch_norm'
dis_input = conv2d(
inputdisc,
d_base_dims,
4,
2,
0.02,
1,
name=name + "_c1",
activation_fn='leaky_relu',
relufactor=0.2,
use_bias=True)
d_dims = d_base_dims
for i in xrange(d_nlayers - 1):
conv_name = name + "_c{}".format(i + 2)
d_dims *= 2
dis_output = conv2d(
dis_input,
d_dims,
4,
2,
0.02,
1,
name=conv_name,
norm=norm_type,
activation_fn='leaky_relu',
relufactor=0.2,
use_bias=use_bias)
dis_input = dis_output
last_dims = min(2**d_nlayers, 8)
o_c4 = conv2d(
dis_output,
d_base_dims * last_dims,
4,
1,
0.02,
1,
name + "_c{}".format(d_nlayers + 1),
norm=norm_type,
activation_fn='leaky_relu',
relufactor=0.2,
use_bias=use_bias)
o_c5 = conv2d(
o_c4,
1,
4,
1,
0.02,
1,
name + "_c{}".format(d_nlayers + 2),
use_bias=True)
return o_c5
def build_discriminator_Pixel(inputdisc,
name="discriminator",
d_base_dims=64,
norm_type='batch_norm'):
use_bias = norm_type != 'instance_norm'
o_c1 = conv2d(
inputdisc,
d_base_dims,
1,
1,
0.02,
name=name + '_c1',
activation_fn='leaky_relu',
relufactor=0.2,
use_bias=True)
o_c2 = conv2d(
o_c1,
d_base_dims * 2,
1,
1,
0.02,
name=name + '_c2',
norm=norm_type,
activation_fn='leaky_relu',
relufactor=0.2,
use_bias=use_bias)
o_c3 = conv2d(o_c2, 1, 1, 1, 0.02, name=name + '_c3', use_bias=use_bias)
return o_c3
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from .base_network import conv2d, deconv2d, linear
import paddle.fluid as fluid
import numpy as np
import os
class DCGAN_model(object):
def __init__(self, batch_size=1):
self.batch_size = batch_size
self.img_dim = 28
self.gfc_dim = 2048
self.dfc_dim = 1024
self.gf_dim = 64
self.df_dim = 64
def network_G(self, input, name="generator"):
o_l1 = linear(input, self.gfc_dim, norm='batch_norm', name=name + '_l1')
o_l2 = linear(
o_l1,
self.gf_dim * 2 * self.img_dim // 4 * self.img_dim // 4,
norm='batch_norm',
name=name + '_l2')
o_r1 = fluid.layers.reshape(
o_l2, [-1, self.df_dim * 2, self.img_dim // 4, self.img_dim // 4])
o_dc1 = deconv2d(
o_r1,
self.gf_dim * 2,
4,
2,
padding=[1, 1],
activation_fn='relu',
output_size=[self.img_dim // 2, self.img_dim // 2],
name=name + '_dc1')
o_dc2 = deconv2d(
o_dc1,
1,
4,
2,
padding=[1, 1],
activation_fn='tanh',
output_size=[self.img_dim, self.img_dim],
name=name + '_dc2')
out = fluid.layers.reshape(o_dc2, shape=[-1, 28 * 28])
return out
def network_D(self, input, name="discriminator"):
o_r1 = fluid.layers.reshape(
input, shape=[-1, 1, self.img_dim, self.img_dim])
o_c1 = conv2d(
o_r1,
self.df_dim,
4,
2,
padding=[1, 1],
activation_fn='leaky_relu',
name=name + '_c1')
o_c2 = conv2d(
o_c1,
self.df_dim * 2,
4,
2,
padding=[1, 1],
norm='batch_norm',
activation_fn='leaky_relu',
name=name + '_c2')
o_l1 = linear(
o_c2,
self.dfc_dim,
norm='batch_norm',
activation_fn='leaky_relu',
name=name + '_l1')
out = linear(o_l1, 1, activation_fn='sigmoid', name=name + '_l2')
return out
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import division
import paddle.fluid as fluid
import numpy as np
import os
use_cudnn = True
if 'ce_mode' in os.environ:
use_cudnn = False
def norm_layer(input, norm_type='batch_norm', name=None):
if norm_type == 'batch_norm':
param_attr = fluid.ParamAttr(
name=name + '_w',
initializer=fluid.initializer.NormalInitializer(
loc=1.0, scale=0.02))
bias_attr = fluid.ParamAttr(
name=name + '_b', initializer=fluid.initializer.Constant(value=0.0))
return fluid.layers.batch_norm(
input,
param_attr=param_attr,
bias_attr=bias_attr,
moving_mean_name=name + '_mean',
moving_variance_name=name + '_var')
elif norm_type == 'instance_norm':
helper = fluid.layer_helper.LayerHelper("instance_norm", **locals())
dtype = helper.input_dtype()
epsilon = 1e-5
mean = fluid.layers.reduce_mean(input, dim=[2, 3], keep_dim=True)
var = fluid.layers.reduce_mean(
fluid.layers.square(input - mean), dim=[2, 3], keep_dim=True)
if name is not None:
scale_name = name + "_scale"
offset_name = name + "_offset"
scale_param = fluid.ParamAttr(
name=scale_name,
initializer=fluid.initializer.NormalInitializer(
loc=0.0, scale=0.02),
trainable=True)
offset_param = fluid.ParamAttr(
name=offset_name,
initializer=fluid.initializer.Constant(0.0),
trainable=True)
scale = helper.create_parameter(
attr=scale_param, shape=input.shape[1:2], dtype=dtype)
offset = helper.create_parameter(
attr=offset_param, shape=input.shape[1:2], dtype=dtype)
tmp = fluid.layers.elementwise_mul(x=(input - mean), y=scale, axis=1)
tmp = tmp / fluid.layers.sqrt(var + epsilon)
tmp = fluid.layers.elementwise_add(tmp, offset, axis=1)
return tmp
else:
raise NotImplementedError("norm tyoe: [%s] is not support" % norm_type)
def conv2d(input,
num_filters=64,
filter_size=7,
stride=1,
stddev=0.02,
padding=0,
name="conv2d",
norm=None,
activation_fn=None,
relufactor=0.0,
use_bias=False):
param_attr = fluid.ParamAttr(
name=name + "_w",
initializer=fluid.initializer.NormalInitializer(
loc=0.0, scale=stddev))
if use_bias == True:
bias_attr = fluid.ParamAttr(
name=name + "_b", initializer=fluid.initializer.Constant(0.0))
else:
bias_attr = False
conv = fluid.layers.conv2d(
input,
num_filters,
filter_size,
name=name,
stride=stride,
padding=padding,
use_cudnn=use_cudnn,
param_attr=param_attr,
bias_attr=bias_attr)
if norm is not None:
conv = norm_layer(input=conv, norm_type=norm, name=name + "_norm")
if activation_fn == 'relu':
conv = fluid.layers.relu(conv, name=name + '_relu')
elif activation_fn == 'leaky_relu':
conv = fluid.layers.leaky_relu(
conv, alpha=relufactor, name=name + '_leaky_relu')
elif activation_fn == 'tanh':
conv = fluid.layers.tanh(conv, name=name + '_tanh')
elif activation_fn == None:
conv = conv
else:
raise NotImplementedError("activation: [%s] is not support" %
activation_fn)
return conv
def deconv2d(input,
num_filters=64,
filter_size=7,
stride=1,
stddev=0.02,
padding=[0, 0],
outpadding=[0, 0, 0, 0],
name="deconv2d",
norm=None,
activation_fn=None,
relufactor=0.0,
use_bias=False,
output_size=None):
param_attr = fluid.ParamAttr(
name=name + "_w",
initializer=fluid.initializer.NormalInitializer(
loc=0.0, scale=stddev))
if use_bias == True:
bias_attr = fluid.ParamAttr(
name=name + "_b", initializer=fluid.initializer.Constant(0.0))
else:
bias_attr = False
conv = fluid.layers.conv2d_transpose(
input,
num_filters,
output_size=output_size,
name=name,
filter_size=filter_size,
stride=stride,
padding=padding,
use_cudnn=use_cudnn,
param_attr=param_attr,
bias_attr=bias_attr)
conv = fluid.layers.pad2d(
conv, paddings=outpadding, mode='constant', pad_value=0.0)
if norm is not None:
conv = norm_layer(input=conv, norm_type=norm, name=name + "_norm")
if activation_fn == 'relu':
conv = fluid.layers.relu(conv, name=name + '_relu')
elif activation_fn == 'leaky_relu':
if relufactor == 0.0:
raise Warning(
"the activation is leaky_relu, but the relufactor is 0")
conv = fluid.layers.leaky_relu(
conv, alpha=relufactor, name=name + '_leaky_relu')
elif activation_fn == 'tanh':
conv = fluid.layers.tanh(conv, name=name + '_tanh')
elif activation_fn == 'sigmoid':
conv = fluid.layers.sigmoid(conv, name=name + '_sigmoid')
elif activation_fn == None:
conv = conv
else:
raise NotImplementedError("activation: [%s] is not support" %
activation_fn)
return conv
def linear(input,
output_size,
norm=None,
stddev=0.02,
activation_fn=None,
relufactor=0.2,
name='linear'):
param_attr = fluid.ParamAttr(
name=name + '_w',
initializer=fluid.initializer.NormalInitializer(
loc=0.0, scale=stddev))
bias_attr = fluid.ParamAttr(
name=name + "_b", initializer=fluid.initializer.Constant(0.0))
linear = fluid.layers.fc(input,
output_size,
param_attr=param_attr,
bias_attr=bias_attr,
name=name)
if norm is not None:
linear = norm_layer(input=linear, norm_type=norm, name=name + '_norm')
if activation_fn == 'relu':
linear = fluid.layers.relu(linear, name=name + '_relu')
elif activation_fn == 'leaky_relu':
if relufactor == 0.0:
raise Warning(
"the activation is leaky_relu, but the relufactor is 0")
linear = fluid.layers.leaky_relu(
linear, alpha=relufactor, name=name + '_leaky_relu')
elif activation_fn == 'tanh':
linear = fluid.layers.tanh(linear, name=name + '_tanh')
elif activation_fn == 'sigmoid':
linear = fluid.layers.sigmoid(linear, name=name + '_sigmoid')
elif activation_fn == None:
linear = linear
else:
raise NotImplementedError("activation: [%s] is not support" %
activation_fn)
return linear
def conv_cond_concat(x, y):
ones = fluid.layers.fill_constant_batch_size_like(
x, [-1, y.shape[1], x.shape[2], x.shape[3]], "float32", 1.0)
out = fluid.layers.concat([x, ones * y], 1)
return out
def conv_and_pool(x, num_filters, name, stddev=0.02, act=None):
param_attr = fluid.ParamAttr(
name=name + '_w',
initializer=fluid.initializer.NormalInitializer(
loc=0.0, scale=stddev))
bias_attr = fluid.ParamAttr(
name=name + "_b", initializer=fluid.initializer.Constant(0.0))
out = fluid.nets.simple_img_conv_pool(
input=x,
filter_size=5,
num_filters=num_filters,
pool_size=2,
pool_stride=2,
param_attr=param_attr,
bias_attr=bias_attr,
act=act)
return out
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import division
from __future__ import print_function
from util import config, utility
from data_reader import data_reader
import os
import sys
import six
import time
import numpy as np
import paddle
import paddle.fluid as fluid
def train(cfg):
reader = data_reader(cfg)
if cfg.model_net == 'CycleGAN':
a_reader, b_reader, a_reader_test, b_reader_test, batch_num = reader.make_data(
)
else:
if cfg.dataset == 'mnist':
train_reader = reader.make_data()
else:
train_reader, test_reader, batch_num = reader.make_data()
if cfg.model_net == 'CGAN':
from trainer.CGAN import CGAN
if cfg.dataset != 'mnist':
raise NotImplementedError('CGAN only support mnist now!')
model = CGAN(cfg, train_reader)
elif cfg.model_net == 'DCGAN':
from trainer.DCGAN import DCGAN
if cfg.dataset != 'mnist':
raise NotImplementedError('DCGAN only support mnist now!')
model = DCGAN(cfg, train_reader)
elif cfg.model_net == 'CycleGAN':
from trainer.CycleGAN import CycleGAN
model = CycleGAN(cfg, a_reader, b_reader, a_reader_test, b_reader_test,
batch_num)
else:
pass
model.build_model()
if __name__ == "__main__":
cfg = config.parse_args()
config.print_arguments(cfg)
assert cfg.load_size >= cfg.crop_size, "Load Size CANNOT less than Crop Size!"
if cfg.profile:
if cfg.use_gpu:
with profiler.profiler('All', 'total', '/tmp/profile') as prof:
train(cfg)
else:
with profiler.profiler("CPU", sorted_key='total') as cpuprof:
train(cfg)
else:
train(cfg)
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from network.CGAN_network import CGAN_model
from util import utility
import sys
import six
import os
import numpy as np
import time
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
import paddle.fluid as fluid
class GTrainer():
def __init__(self, input, conditions, cfg):
self.program = fluid.default_main_program().clone()
with fluid.program_guard(self.program):
model = CGAN_model()
self.fake = model.network_G(input, conditions, name="G")
self.infer_program = self.program.clone()
d_fake = model.network_D(self.fake, conditions, name="D")
fake_labels = fluid.layers.fill_constant_batch_size_like(
input=input, dtype='float32', shape=[-1, 1], value=1.0)
self.g_loss = fluid.layers.reduce_mean(
fluid.layers.sigmoid_cross_entropy_with_logits(
x=d_fake, label=fake_labels))
vars = []
for var in self.program.list_vars():
if fluid.io.is_parameter(var) and (var.name.startswith("G")):
vars.append(var.name)
optimizer = fluid.optimizer.Adam(
learning_rate=cfg.learning_rate, beta1=0.5, name="net_G")
optimizer.minimize(self.g_loss, parameter_list=vars)
class DTrainer():
def __init__(self, input, conditions, labels, cfg):
self.program = fluid.default_main_program().clone()
with fluid.program_guard(self.program):
model = CGAN_model()
d_logit = model.network_D(input, conditions, name="D")
self.d_loss = fluid.layers.reduce_mean(
fluid.layers.sigmoid_cross_entropy_with_logits(
x=d_logit, label=labels))
vars = []
for var in self.program.list_vars():
if fluid.io.is_parameter(var) and (var.name.startswith("D")):
vars.append(var.name)
optimizer = fluid.optimizer.Adam(
learning_rate=cfg.learning_rate, beta1=0.5, name="net_D")
optimizer.minimize(self.d_loss, parameter_list=vars)
class CGAN(object):
def add_special_args(self, parser):
parser.add_argument(
'--noise_size', type=int, default=100, help="the noise dimension")
return parser
def __init__(self, cfg=None, train_reader=None):
self.cfg = cfg
self.train_reader = train_reader
def build_model(self):
img = fluid.layers.data(name='img', shape=[784], dtype='float32')
condition = fluid.layers.data(
name='condition', shape=[1], dtype='float32')
noise = fluid.layers.data(
name='noise', shape=[self.cfg.noise_size], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='float32')
g_trainer = GTrainer(noise, condition, self.cfg)
d_trainer = DTrainer(img, condition, label, self.cfg)
# prepare environment
place = fluid.CUDAPlace(0) if self.cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
const_n = np.random.uniform(
low=-1.0, high=1.0,
size=[self.cfg.batch_size, self.cfg.noise_size]).astype('float32')
if self.cfg.init_model:
utility.init_checkpoints(self.cfg, exe, g_trainer, "net_G")
utility.init_checkpoints(self.cfg, exe, d_trainer, "net_D")
### memory optim
build_strategy = fluid.BuildStrategy()
build_strategy.enable_inplace = True
build_strategy.memory_optimize = False
g_trainer_program = fluid.CompiledProgram(
g_trainer.program).with_data_parallel(
loss_name=g_trainer.g_loss.name, build_strategy=build_strategy)
d_trainer_program = fluid.CompiledProgram(
d_trainer.program).with_data_parallel(
loss_name=d_trainer.d_loss.name, build_strategy=build_strategy)
t_time = 0
losses = [[], []]
for epoch_id in range(self.cfg.epoch):
for batch_id, data in enumerate(self.train_reader()):
if len(data) != self.cfg.batch_size:
continue
noise_data = np.random.uniform(
low=-1.0,
high=1.0,
size=[self.cfg.batch_size, self.cfg.noise_size]).astype(
'float32')
real_image = np.array(list(map(lambda x: x[0], data))).reshape(
[-1, 784]).astype('float32')
condition_data = np.array([x[1] for x in data]).reshape(
[-1, 1]).astype('float32')
real_label = np.ones(
shape=[real_image.shape[0], 1], dtype='float32')
fake_label = np.zeros(
shape=[real_image.shape[0], 1], dtype='float32')
s_time = time.time()
generate_image = exe.run(
g_trainer.infer_program,
feed={'noise': noise_data,
'condition': condition_data},
fetch_list=[g_trainer.fake])
d_real_loss = exe.run(d_trainer_program,
feed={
'img': real_image,
'condition': condition_data,
'label': real_label
},
fetch_list=[d_trainer.d_loss])[0]
d_fake_loss = exe.run(d_trainer_program,
feed={
'img': generate_image,
'condition': condition_data,
'label': fake_label
},
fetch_list=[d_trainer.d_loss])[0]
d_loss = d_real_loss + d_fake_loss
losses[1].append(d_loss)
for _ in six.moves.xrange(self.cfg.num_generator_time):
g_loss = exe.run(g_trainer_program,
feed={
'noise': noise_data,
'condition': condition_data
},
fetch_list=[g_trainer.g_loss])[0]
losses[0].append(g_loss)
batch_time = time.time() - s_time
t_time += batch_time
if batch_id % self.cfg.print_freq == 0:
image_path = self.cfg.output + '/images'
if not os.path.exists(image_path):
os.makedirs(image_path)
generate_const_image = exe.run(
g_trainer.infer_program,
feed={'noise': const_n,
'condition': condition_data},
fetch_list={g_trainer.fake})[0]
generate_image_reshape = np.reshape(generate_const_image, (
self.cfg.batch_size, -1))
total_images = np.concatenate(
[real_image, generate_image_reshape])
fig = utility.plot(total_images)
print(
'Epoch ID={} Batch ID={} D_loss={} G_loss={} Batch_time_cost={:.2f}'.
format(epoch_id, batch_id, d_loss[0], g_loss[0],
batch_time))
plt.title('Epoch ID={}, Batch ID={}'.format(epoch_id,
batch_id))
plt.savefig(
'{}/{:04d}_{:04d}.png'.format(image_path, epoch_id,
batch_id),
bbox_inches='tight')
plt.close(fig)
if self.cfg.save_checkpoints:
utility.checkpoints(epoch_id, self.cfg, exe, g_trainer, "net_G")
utility.checkpoints(epoch_id, self.cfg, exe, d_trainer, "net_D")
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from network.CycleGAN_network import CycleGAN_model
from util import utility
import paddle.fluid as fluid
import sys
import time
lambda_A = 10.0
lambda_B = 10.0
lambda_identity = 0.5
class GTrainer():
def __init__(self, input_A, input_B, cfg, step_per_epoch):
self.program = fluid.default_main_program().clone()
with fluid.program_guard(self.program):
model = CycleGAN_model()
self.fake_B = model.network_G(input_A, name="GA", cfg=cfg)
self.fake_B.persistable = True
self.fake_A = model.network_G(input_B, name="GB", cfg=cfg)
self.fake_A.persistable = True
self.cyc_A = model.network_G(self.fake_B, name="GB", cfg=cfg)
self.cyc_B = model.network_G(self.fake_A, name="GA", cfg=cfg)
self.infer_program = self.program.clone()
# Cycle Loss
diff_A = fluid.layers.abs(
fluid.layers.elementwise_sub(
x=input_A, y=self.cyc_A))
diff_B = fluid.layers.abs(
fluid.layers.elementwise_sub(
x=input_B, y=self.cyc_B))
self.cyc_A_loss = fluid.layers.reduce_mean(diff_A) * lambda_A
self.cyc_B_loss = fluid.layers.reduce_mean(diff_B) * lambda_B
self.cyc_loss = self.cyc_A_loss + self.cyc_B_loss
# GAN Loss D_A(G_A(A))
self.fake_rec_A = model.network_D(self.fake_B, name="DA", cfg=cfg)
self.G_A = fluid.layers.reduce_mean(
fluid.layers.square(self.fake_rec_A - 1))
# GAN Loss D_B(G_B(B))
self.fake_rec_B = model.network_D(self.fake_A, name="DB", cfg=cfg)
self.G_B = fluid.layers.reduce_mean(
fluid.layers.square(self.fake_rec_B - 1))
self.G = self.G_A + self.G_B
# Identity Loss G_A
self.idt_A = model.network_G(input_B, name="GA", cfg=cfg)
self.idt_loss_A = fluid.layers.reduce_mean(
fluid.layers.abs(
fluid.layers.elementwise_sub(
x=input_B, y=self.idt_A))) * lambda_B * lambda_identity
# Identity Loss G_B
self.idt_B = model.network_G(input_A, name="GB", cfg=cfg)
self.idt_loss_B = fluid.layers.reduce_mean(
fluid.layers.abs(
fluid.layers.elementwise_sub(
x=input_A, y=self.idt_B))) * lambda_A * lambda_identity
self.idt_loss = fluid.layers.elementwise_add(self.idt_loss_A,
self.idt_loss_B)
self.g_loss = self.cyc_loss + self.G + self.idt_loss
vars = []
for var in self.program.list_vars():
if fluid.io.is_parameter(var) and (var.name.startswith("GA") or
var.name.startswith("GB")):
vars.append(var.name)
self.param = vars
lr = cfg.learning_rate
optimizer = fluid.optimizer.Adam(
learning_rate=fluid.layers.piecewise_decay(
boundaries=[99 * step_per_epoch] +
[x * step_per_epoch for x in xrange(100, cfg.epoch - 1)],
values=[lr] + [
lr * (1.0 - (x - 99.0) / 101.0)
for x in xrange(100, cfg.epoch)
]),
beta1=0.5,
beta2=0.999,
name="net_G")
optimizer.minimize(self.g_loss, parameter_list=vars)
class DATrainer():
def __init__(self, input_B, fake_pool_B, cfg, step_per_epoch):
self.program = fluid.default_main_program().clone()
with fluid.program_guard(self.program):
model = CycleGAN_model()
self.rec_B = model.network_D(input_B, name="DA", cfg=cfg)
self.fake_pool_rec_B = model.network_D(
fake_pool_B, name="DA", cfg=cfg)
self.d_loss_A = (fluid.layers.square(self.fake_pool_rec_B) +
fluid.layers.square(self.rec_B - 1)) / 2.0
self.d_loss_A = fluid.layers.reduce_mean(self.d_loss_A)
optimizer = fluid.optimizer.Adam(learning_rate=0.0002, beta1=0.5)
vars = []
for var in self.program.list_vars():
if fluid.io.is_parameter(var) and var.name.startswith("DA"):
vars.append(var.name)
self.param = vars
lr = cfg.learning_rate
optimizer = fluid.optimizer.Adam(
learning_rate=fluid.layers.piecewise_decay(
boundaries=[99 * step_per_epoch] +
[x * step_per_epoch for x in xrange(100, cfg.epoch - 1)],
values=[lr] + [
lr * (1.0 - (x - 99.0) / 101.0)
for x in xrange(100, cfg.epoch)
]),
beta1=0.5,
beta2=0.999,
name="net_DA")
optimizer.minimize(self.d_loss_A, parameter_list=vars)
class DBTrainer():
def __init__(self, input_A, fake_pool_A, cfg, step_per_epoch):
self.program = fluid.default_main_program().clone()
with fluid.program_guard(self.program):
model = CycleGAN_model()
self.rec_A = model.network_D(input_A, name="DB", cfg=cfg)
self.fake_pool_rec_A = model.network_D(
fake_pool_A, name="DB", cfg=cfg)
self.d_loss_B = (fluid.layers.square(self.fake_pool_rec_A) +
fluid.layers.square(self.rec_A - 1)) / 2.0
self.d_loss_B = fluid.layers.reduce_mean(self.d_loss_B)
optimizer = fluid.optimizer.Adam(learning_rate=0.0002, beta1=0.5)
vars = []
for var in self.program.list_vars():
if fluid.io.is_parameter(var) and var.name.startswith("DB"):
vars.append(var.name)
self.param = vars
lr = 0.0002
optimizer = fluid.optimizer.Adam(
learning_rate=fluid.layers.piecewise_decay(
boundaries=[99 * step_per_epoch] +
[x * step_per_epoch for x in xrange(100, cfg.epoch - 1)],
values=[lr] + [
lr * (1.0 - (x - 99.0) / 101.0)
for x in xrange(100, cfg.epoch)
]),
beta1=0.5,
beta2=0.999,
name="net_DB")
optimizer.minimize(self.d_loss_B, parameter_list=vars)
class CycleGAN(object):
def add_special_args(self, parser):
parser.add_argument(
'--net_G',
type=str,
default="resnet_9block",
help="Choose the CycleGAN generator's network, choose in [resnet_9block|resnet_6block|unet_128|unet_256]"
)
parser.add_argument(
'--net_D',
type=str,
default="basic",
help="Choose the CycleGAN discriminator's network, choose in [basic|nlayers|pixel]"
)
parser.add_argument(
'--d_nlayers',
type=int,
default=3,
help="only used when CycleGAN discriminator is nlayers")
return parser
def __init__(self,
cfg=None,
A_reader=None,
B_reader=None,
A_test_reader=None,
B_test_reader=None,
batch_num=1):
self.cfg = cfg
self.A_reader = A_reader
self.B_reader = B_reader
self.A_test_reader = A_test_reader
self.B_test_reader = B_test_reader
self.batch_num = batch_num
def build_model(self):
data_shape = [-1, 3, self.cfg.crop_size, self.cfg.crop_size]
input_A = fluid.layers.data(
name='input_A', shape=data_shape, dtype='float32')
input_B = fluid.layers.data(
name='input_B', shape=data_shape, dtype='float32')
fake_pool_A = fluid.layers.data(
name='fake_pool_A', shape=data_shape, dtype='float32')
fake_pool_B = fluid.layers.data(
name='fake_pool_B', shape=data_shape, dtype='float32')
gen_trainer = GTrainer(input_A, input_B, self.cfg, self.batch_num)
d_A_trainer = DATrainer(input_B, fake_pool_B, self.cfg, self.batch_num)
d_B_trainer = DBTrainer(input_A, fake_pool_A, self.cfg, self.batch_num)
# prepare environment
place = fluid.CUDAPlace(0) if self.cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
A_pool = utility.ImagePool()
B_pool = utility.ImagePool()
if self.cfg.init_model:
utility.init_checkpoints(self.cfg, exe, gen_trainer, "net_G")
utility.init_checkpoints(self.cfg, exe, d_A_trainer, "net_DA")
utility.init_checkpoints(self.cfg, exe, d_B_trainer, "net_DB")
### memory optim
build_strategy = fluid.BuildStrategy()
build_strategy.enable_inplace = False
build_strategy.memory_optimize = False
gen_trainer_program = fluid.CompiledProgram(
gen_trainer.program).with_data_parallel(
loss_name=gen_trainer.g_loss.name,
build_strategy=build_strategy)
d_A_trainer_program = fluid.CompiledProgram(
d_A_trainer.program).with_data_parallel(
loss_name=d_A_trainer.d_loss_A.name,
build_strategy=build_strategy)
d_B_trainer_program = fluid.CompiledProgram(
d_B_trainer.program).with_data_parallel(
loss_name=d_B_trainer.d_loss_B.name,
build_strategy=build_strategy)
losses = [[], []]
t_time = 0
for epoch_id in range(self.cfg.epoch):
batch_id = 0
for i in range(self.batch_num):
data_A = next(self.A_reader())
data_B = next(self.B_reader())
tensor_A = fluid.LoDTensor()
tensor_B = fluid.LoDTensor()
tensor_A.set(data_A, place)
tensor_B.set(data_B, place)
s_time = time.time()
# optimize the g_A network
g_A_loss, g_A_cyc_loss, g_A_idt_loss, g_B_loss, g_B_cyc_loss,\
g_B_idt_loss, fake_A_tmp, fake_B_tmp = exe.run(
gen_trainer_program,
fetch_list=[
gen_trainer.G_A, gen_trainer.cyc_A_loss,
gen_trainer.idt_loss_A, gen_trainer.G_B,
gen_trainer.cyc_B_loss, gen_trainer.idt_loss_B,
gen_trainer.fake_A, gen_trainer.fake_B
],
feed={"input_A": tensor_A,
"input_B": tensor_B})
fake_pool_B = B_pool.pool_image(fake_B_tmp)
fake_pool_A = A_pool.pool_image(fake_A_tmp)
# optimize the d_A network
d_A_loss = exe.run(
d_A_trainer_program,
fetch_list=[d_A_trainer.d_loss_A],
feed={"input_B": tensor_B,
"fake_pool_B": fake_pool_B})[0]
# optimize the d_B network
d_B_loss = exe.run(
d_B_trainer_program,
fetch_list=[d_B_trainer.d_loss_B],
feed={"input_A": tensor_A,
"fake_pool_A": fake_pool_A})[0]
batch_time = time.time() - s_time
t_time += batch_time
if batch_id % self.cfg.print_freq == 0:
print("epoch{}: batch{}: \n\
d_A_loss: {}; g_A_loss: {}; g_A_cyc_loss: {}; g_A_idt_loss: {}; \n\
d_B_loss: {}; g_B_loss: {}; g_B_cyc_loss: {}; g_B_idt_loss: {}; \n\
Batch_time_cost: {:.2f}".format(
epoch_id, batch_id, d_A_loss[0], g_A_loss[0],
g_A_cyc_loss[0], g_A_idt_loss[0], d_B_loss[0], g_B_loss[
0], g_B_cyc_loss[0], g_B_idt_loss[0], batch_time))
losses[0].append(g_A_loss[0])
losses[1].append(d_A_loss[0])
sys.stdout.flush()
batch_id += 1
if self.cfg.run_test:
test_program = gen_trainer.infer_program
utility.save_test_image(epoch_id, self.cfg, exe, place,
test_program, gen_trainer,
self.A_test_reader, self.B_test_reader)
if self.cfg.save_checkpoints:
utility.checkpoints(epoch_id, self.cfg, exe, gen_trainer,
"net_G")
utility.checkpoints(epoch_id, self.cfg, exe, d_A_trainer,
"net_DA")
utility.checkpoints(epoch_id, self.cfg, exe, d_B_trainer,
"net_DB")
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from network.DCGAN_network import DCGAN_model
from util import utility
import sys
import six
import os
import numpy as np
import time
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
import paddle.fluid as fluid
class GTrainer():
def __init__(self, input, label, cfg):
self.program = fluid.default_main_program().clone()
with fluid.program_guard(self.program):
model = DCGAN_model()
self.fake = model.network_G(input, name='G')
self.infer_program = self.program.clone()
d_fake = model.network_D(self.fake, name="D")
fake_labels = fluid.layers.fill_constant_batch_size_like(
input, dtype='float32', shape=[-1, 1], value=1.0)
self.g_loss = fluid.layers.reduce_mean(
fluid.layers.sigmoid_cross_entropy_with_logits(
x=d_fake, label=fake_labels))
vars = []
for var in self.program.list_vars():
if fluid.io.is_parameter(var) and (var.name.startswith("G")):
vars.append(var.name)
optimizer = fluid.optimizer.Adam(
learning_rate=cfg.learning_rate, beta1=0.5, name="net_G")
optimizer.minimize(self.g_loss, parameter_list=vars)
class DTrainer():
def __init__(self, input, labels, cfg):
self.program = fluid.default_main_program().clone()
with fluid.program_guard(self.program):
model = DCGAN_model()
d_logit = model.network_D(input, name="D")
self.d_loss = fluid.layers.reduce_mean(
fluid.layers.sigmoid_cross_entropy_with_logits(
x=d_logit, label=labels))
vars = []
for var in self.program.list_vars():
if fluid.io.is_parameter(var) and (var.name.startswith("D")):
vars.append(var.name)
optimizer = fluid.optimizer.Adam(
learning_rate=cfg.learning_rate, beta1=0.5, name="net_D")
optimizer.minimize(self.d_loss, parameter_list=vars)
class DCGAN(object):
def add_special_args(self, parser):
parser.add_argument(
'--noise_size', type=int, default=100, help="the noise dimension")
return parser
def __init__(self, cfg, train_reader):
self.cfg = cfg
self.train_reader = train_reader
def build_model(self):
img = fluid.layers.data(name='img', shape=[784], dtype='float32')
noise = fluid.layers.data(
name='noise', shape=[self.cfg.noise_size], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='float32')
g_trainer = GTrainer(noise, label, self.cfg)
d_trainer = DTrainer(img, label, self.cfg)
# prepare enviorment
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
const_n = np.random.uniform(
low=-1.0, high=1.0,
size=[self.cfg.batch_size, self.cfg.noise_size]).astype('float32')
if self.cfg.init_model:
utility.init_checkpoints(self.cfg, exe, g_trainer, "net_G")
utility.init_checkpoints(self.cfg, exe, d_trainer, "net_D")
### memory optim
build_strategy = fluid.BuildStrategy()
build_strategy.enable_inplace = True
build_strategy.memory_optimize = False
g_trainer_program = fluid.CompiledProgram(
g_trainer.program).with_data_parallel(
loss_name=g_trainer.g_loss.name, build_strategy=build_strategy)
d_trainer_program = fluid.CompiledProgram(
d_trainer.program).with_data_parallel(
loss_name=d_trainer.d_loss.name, build_strategy=build_strategy)
t_time = 0
losses = [[], []]
for epoch_id in range(self.cfg.epoch):
for batch_id, data in enumerate(self.train_reader()):
if len(data) != self.cfg.batch_size:
continue
noise_data = np.random.uniform(
low=-1.0,
high=1.0,
size=[self.cfg.batch_size, self.cfg.noise_size]).astype(
'float32')
real_image = np.array(list(map(lambda x: x[0], data))).reshape(
[-1, 784]).astype('float32')
real_label = np.ones(
shape=[real_image.shape[0], 1], dtype='float32')
fake_label = np.zeros(
shape=[real_image.shape[0], 1], dtype='float32')
s_time = time.time()
generate_image = exe.run(g_trainer.infer_program,
feed={'noise': noise_data},
fetch_list=[g_trainer.fake])
d_real_loss = exe.run(
d_trainer_program,
feed={'img': real_image,
'label': real_label},
fetch_list=[d_trainer.d_loss])[0]
d_fake_loss = exe.run(
d_trainer_program,
feed={'img': generate_image,
'label': fake_label},
fetch_list=[d_trainer.d_loss])[0]
d_loss = d_real_loss + d_fake_loss
losses[1].append(d_loss)
for _ in six.moves.xrange(self.cfg.num_generator_time):
g_loss = exe.run(g_trainer_program,
feed={'noise': noise_data},
fetch_list=[g_trainer.g_loss])[0]
losses[0].append(g_loss)
batch_time = time.time() - s_time
t_time += batch_time
if batch_id % self.cfg.print_freq == 0:
image_path = self.cfg.output + '/images'
if not os.path.exists(image_path):
os.makedirs(image_path)
generate_const_image = exe.run(
g_trainer.infer_program,
feed={'noise': const_n},
fetch_list={g_trainer.fake})[0]
generate_image_reshape = np.reshape(generate_const_image, (
self.cfg.batch_size, -1))
total_images = np.concatenate(
[real_image, generate_image_reshape])
fig = utility.plot(total_images)
print(
'Epoch ID={} Batch ID={} D_loss={} G_loss={} Batch_time_cost={:.2f}'.
format(epoch_id, batch_id, d_loss[0], g_loss[0],
batch_time))
plt.title('Epoch ID={}, Batch ID={}'.format(epoch_id,
batch_id))
plt.savefig(
'{}/{:04d}_{:04d}.png'.format(image_path, epoch_id,
batch_id),
bbox_inches='tight')
plt.close(fig)
if self.cfg.save_checkpoints:
utility.checkpoints(epoch_id, self.cfg, exe, g_trainer, "net_G")
utility.checkpoints(epoch_id, self.cfg, exe, d_trainer, "net_D")
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import importlib
def get_special_cfg(model_net):
model = "trainer." + model_net
modellib = importlib.import_module(model)
for name, cls in modellib.__dict__.items():
if name.lower() == model_net.lower():
model = cls()
return model.add_special_args
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
import six
import argparse
import functools
import distutils.util
import trainer
def print_arguments(args):
''' Print argparse's argument
Usage:
.. code-block:: python
parser = argparse.ArgumentParser()
parser.add_argument("name", default="Jonh", type=str, help="User name.")
args = parser.parse_args()
print_arguments(args)
:param args: Input argparse.Namespace for printing.
:type args: argparse.Namespace
'''
print("----------- Configuration Arguments -----------")
for arg, value in sorted(six.iteritems(vars(args))):
print("%s: %s" % (arg, value))
print("------------------------------------------------")
def add_arguments(argname, type, default, help, argparser, **kwargs):
"""Add argparse's argument.
Usage:
.. code-block:: python
parser = argparse.ArgumentParser()
add_argument("name", str, "Jonh", "User name.", parser)
args = parser.parse_args()
"""
type = distutils.util.strtobool if type == bool else type
argparser.add_argument(
"--" + argname,
default=default,
type=type,
help=help + ' Default: %(default)s.',
**kwargs)
def base_parse_args(parser):
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('model_net', str, "cgan", "The model used.")
add_arg('dataset', str, "mnist", "The dataset used.")
add_arg('data_dir', str, "./data", "The dataset root directory")
add_arg('data_list', str, None, "The dataset list file name")
add_arg('batch_size', int, 1, "Minibatch size.")
add_arg('epoch', int, 200, "The number of epoch to be trained.")
add_arg('g_base_dims', int, 64, "Base channels in CycleGAN generator")
add_arg('d_base_dims', int, 64, "Base channels in CycleGAN discriminator")
add_arg('load_size', int, 286, "the image size when load the image")
add_arg('crop_type', str, 'Centor',
"the crop type, choose = ['Centor', 'Random']")
add_arg('crop_size', int, 256, "crop size when preprocess image")
add_arg('save_checkpoints', bool, True, "Whether to save checkpoints.")
add_arg('run_test', bool, True, "Whether to run test.")
add_arg('use_gpu', bool, True, "Whether to use GPU to train.")
add_arg('profile', bool, False, "Whether to profile.")
add_arg('dropout', bool, False, "Whether to use drouput.")
add_arg('use_dropout', bool, False, "Whether to use dropout")
add_arg('drop_last', bool, False,
"Whether to drop the last images that cannot form a batch")
add_arg('shuffle', bool, True, "Whether to shuffle data")
add_arg('output', str, "./output",
"The directory the model and the test result to be saved to.")
add_arg('init_model', str, None, "The init model file of directory.")
add_arg('norm_type', str, "batch_norm", "Which normalization to used")
add_arg('learning_rate', int, 0.0002, "the initialize learning rate")
add_arg('num_generator_time', int, 1,
"the generator run times in training each epoch")
add_arg('print_freq', int, 10, "the frequency of print loss")
# yapf: enable
return parser
def parse_args():
parser = argparse.ArgumentParser(description=__doc__)
parser = base_parse_args(parser)
cfg, _ = parser.parse_known_args()
model_name = cfg.model_net
model_cfg = trainer.get_special_cfg(model_name)
parser = model_cfg(parser)
args = parser.parse_args()
return args
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid as fluid
import os
import sys
import math
import distutils.util
import numpy as np
import inspect
import matplotlib
import six
matplotlib.use('agg')
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from scipy.misc import imsave
img_dim = 28
def plot(gen_data):
pad_dim = 1
paded = pad_dim + img_dim
gen_data = gen_data.reshape(gen_data.shape[0], img_dim, img_dim)
n = int(math.ceil(math.sqrt(gen_data.shape[0])))
gen_data = (np.pad(
gen_data, [[0, n * n - gen_data.shape[0]], [pad_dim, 0], [pad_dim, 0]],
'constant').reshape((n, n, paded, paded)).transpose((0, 2, 1, 3))
.reshape((n * paded, n * paded)))
fig = plt.figure(figsize=(8, 8))
plt.axis('off')
plt.imshow(gen_data, cmap='Greys_r', vmin=-1, vmax=1)
return fig
def checkpoints(epoch, cfg, exe, trainer, name):
output_path = cfg.output + '/chechpoints/' + str(epoch)
if not os.path.exists(output_path):
os.makedirs(output_path)
fluid.io.save_persistables(
exe, os.path.join(output_path, name), main_program=trainer.program)
print('save checkpoints {} to {}'.format(name, output_path))
sys.stdout.flush()
def init_checkpoints(cfg, exe, trainer, name):
assert os.path.exists(cfg.init_model), "{} cannot be found.".format(
cfg.init_model)
fluid.io.load_persistables(
exe, os.path.join(cfg.init_model, name), main_program=trainer.program)
print('load checkpoints {} {} DONE'.format(cfg.init_model, name))
sys.stdout.flush()
def save_test_image(epoch, cfg, exe, place, test_program, g_trainer,
A_test_reader, B_test_reader):
out_path = cfg.output + '/test'
if not os.path.exists(out_path):
os.makedirs(out_path)
for data_A, data_B in zip(A_test_reader(), B_test_reader()):
A_name = data_A[0][1]
B_name = data_B[0][1]
tensor_A = fluid.LoDTensor()
tensor_B = fluid.LoDTensor()
tensor_A.set(data_A[0][0], place)
tensor_B.set(data_B[0][0], place)
fake_A_temp, fake_B_temp, cyc_A_temp, cyc_B_temp = exe.run(
test_program,
fetch_list=[
g_trainer.fake_A, g_trainer.fake_B, g_trainer.cyc_A,
g_trainer.cyc_B
],
feed={"input_A": tensor_A,
"input_B": tensor_B})
fake_A_temp = np.squeeze(fake_A_temp[0]).transpose([1, 2, 0])
fake_B_temp = np.squeeze(fake_B_temp[0]).transpose([1, 2, 0])
cyc_A_temp = np.squeeze(cyc_A_temp[0]).transpose([1, 2, 0])
cyc_B_temp = np.squeeze(cyc_B_temp[0]).transpose([1, 2, 0])
input_A_temp = np.squeeze(data_A[0][0]).transpose([1, 2, 0])
input_B_temp = np.squeeze(data_B[0][0]).transpose([1, 2, 0])
imsave(out_path + "/fakeB_" + str(epoch) + "_" + A_name, (
(fake_B_temp + 1) * 127.5).astype(np.uint8))
imsave(out_path + "/fakeA_" + str(epoch) + "_" + B_name, (
(fake_A_temp + 1) * 127.5).astype(np.uint8))
imsave(out_path + "/cycA_" + str(epoch) + "_" + A_name, (
(cyc_A_temp + 1) * 127.5).astype(np.uint8))
imsave(out_path + "/cycB_" + str(epoch) + "_" + B_name, (
(cyc_B_temp + 1) * 127.5).astype(np.uint8))
imsave(out_path + "/inputA_" + str(epoch) + "_" + A_name, (
(input_A_temp + 1) * 127.5).astype(np.uint8))
imsave(out_path + "/inputB_" + str(epoch) + "_" + B_name, (
(input_B_temp + 1) * 127.5).astype(np.uint8))
class ImagePool(object):
def __init__(self, pool_size=50):
self.pool = []
self.count = 0
self.pool_size = pool_size
def pool_image(self, image):
if self.count < self.pool_size:
self.pool.append(image)
self.count += 1
return image
else:
p = np.random.rand()
if p > 0.5:
random_id = np.random.randint(0, self.pool_size - 1)
temp = self.pool[random_id]
self.pool[random_id] = image
return temp
else:
return image
......@@ -89,7 +89,7 @@ python val.py --dataset 'mpii' --checkpoint 'checkpoints/pose-resnet50-mpii-384x
### 模型训练
```bash
python train.py --dataset 'mpii' --data_root 'data/mpii'
python train.py --dataset 'mpii'
```
**说明** 详细参数配置已保存到`lib/mpii_reader.py``lib/coco_reader.py`文件中,通过设置dataset来选择使用具体的参数配置
......
......@@ -41,7 +41,7 @@ def print_arguments(args):
:type args: argparse.Namespace
"""
print("----------- Configuration Arguments -----------")
for arg, value in sorted(vars(args).iteritems()):
for arg, value in sorted(vars(args).items()):
print("%s: %s" % (arg, value))
print("------------------------------------------------")
......
......@@ -115,7 +115,7 @@ def infer(args):
image_file, is_color=True).astype("float32")
image -= IMG_MEAN
img = paddle.dataset.image.to_chw(image)[np.newaxis, :]
image_t = fluid.core.LoDTensor()
image_t = fluid.LoDTensor()
image_t.set(img, place)
result = exe.run(inference_program,
feed={"image": image_t},
......
......@@ -18,7 +18,6 @@ from __future__ import division
from __future__ import print_function
import distutils.util
import numpy as np
from paddle.fluid import core
import six
......@@ -72,7 +71,7 @@ def to_lodtensor(data, place):
lod.append(cur_len)
flattened_data = np.concatenate(data, axis=0).astype("int32")
flattened_data = flattened_data.reshape([len(flattened_data), 1])
res = core.LoDTensor()
res = fluid.LoDTensor()
res.set(flattened_data, place)
res.set_lod([lod])
return res
......@@ -80,17 +79,17 @@ def to_lodtensor(data, place):
def get_feeder_data(data, place, for_test=False):
feed_dict = {}
image_t = core.LoDTensor()
image_t = fluid.LoDTensor()
image_t.set(data[0], place)
feed_dict["image"] = image_t
if not for_test:
labels_sub1_t = core.LoDTensor()
labels_sub2_t = core.LoDTensor()
labels_sub4_t = core.LoDTensor()
mask_sub1_t = core.LoDTensor()
mask_sub2_t = core.LoDTensor()
mask_sub4_t = core.LoDTensor()
labels_sub1_t = fluid.LoDTensor()
labels_sub2_t = fluid.LoDTensor()
labels_sub4_t = fluid.LoDTensor()
mask_sub1_t = fluid.LoDTensor()
mask_sub2_t = fluid.LoDTensor()
mask_sub4_t = fluid.LoDTensor()
labels_sub1_t.set(data[1], place)
labels_sub2_t.set(data[3], place)
......@@ -105,8 +104,8 @@ def get_feeder_data(data, place, for_test=False):
feed_dict["label_sub4"] = labels_sub4_t
feed_dict["mask_sub4"] = mask_sub4_t
else:
label_t = core.LoDTensor()
mask_t = core.LoDTensor()
label_t = fluid.LoDTensor()
mask_t = fluid.LoDTensor()
label_t.set(data[1], place)
mask_t.set(data[2], place)
feed_dict["label"] = label_t
......
......@@ -16,7 +16,7 @@ Only support Adam optimizer yet.
Short description of aforementioned steps:
## 1. Install PaddlePaddle
Follow PaddlePaddle [installation instruction](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification#installation) to install PaddlePaddle. If you [build from source](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/beginners_guide/install/compile/compile_Ubuntu_en.md), please use the following cmake arguments and ensure to set `-DWITH_NGRAPH=ON`.
Follow PaddlePaddle [installation instruction](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification#installation) to install PaddlePaddle. If you [build from source](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/beginners_guide/install/compile/compile_Ubuntu_en.md), please use the following cmake arguments and ensure to set `-DWITH_NGRAPH=ON`.
```
cmake .. -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=OFF -DWITH_MKL=ON -DWITH_MKLDNN=ON -DWITH_NGRAPH=ON
```
......@@ -35,9 +35,8 @@ export KMP_AFFINITY=granularity=fine,compact,1,0
```
## 3. How the benchmark script might be run.
If everything built successfully, you can run command in ResNet50 nGraph session in script [run.sh](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/image_classification/run.sh) to start the benchmark job locally. You will need to uncomment the `#ResNet50 nGraph` part of script.
If everything built successfully, you can run command in ResNet50 nGraph session in script [run.sh](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/run.sh) to start the benchmark job locally. You will need to uncomment the `#ResNet50 nGraph` part of script.
Above is training job using the nGraph, to run the inference job using the nGraph:
Please download the pre-trained resnet50 model from [supported models](https://github.com/PaddlePaddle/models/tree/72dcc7c1a8d5de9d19fbd65b4143bd0d661eee2c/fluid/PaddleCV/image_classification#supported-models-and-performances) for inference script.
......@@ -108,3 +108,24 @@ The second figure shows speed-ups when using multiple GPUs according to the abov
Speed-ups of Multiple-GPU Training of Resnet50 on Imagenet
</p>
## Deep Gradient Compression([arXiv:1712.01887](https://arxiv.org/abs/1712.01887)) for resnet
#### Environment
- GPU: NVIDIA® Tesla® V100
- Machine number * Card number: 4 * 4
- System: Centos 6u3
- Cuda/Cudnn: 9.0/7.1
- Dataset: ImageNet
- Date: 2017.04
- PaddleVersion: 1.4
- Batch size: 32
#### Performance
<p align="center">
<img src="../images/resnet_dgc.png" width=528> <br />
Performance using DGC for resnet-fp32 under different bandwidth
</p>
......@@ -66,8 +66,10 @@ def parse_args():
add_arg('split_var', bool, True, "Split params on pserver.")
add_arg('async_mode', bool, False, "Async distributed training, only for pserver mode.")
add_arg('reduce_strategy', str, "allreduce", "Choose from reduce or allreduce.")
add_arg('skip_unbalanced_data', bool, False, "Skip data not if data not balanced on nodes.")
add_arg('enable_sequential_execution', bool, False, "Skip data not if data not balanced on nodes.")
#for dgc
add_arg('enable_dgc', bool, False, "Skip data not if data not balanced on nodes.")
add_arg('rampup_begin_step', int, 5008, "Skip data not if data not balanced on nodes.")
# yapf: enable
args = parser.parse_args()
return args
......@@ -82,9 +84,11 @@ def get_device_num():
device_num = subprocess.check_output(['nvidia-smi', '-L']).decode().count('\n')
return device_num
def prepare_reader(is_train, pyreader, args, pass_id=0):
def prepare_reader(is_train, pyreader, args, pass_id=1):
# NOTE: always use infinite reader for dist training
if is_train:
reader = train(data_dir=args.data_dir, pass_id_as_seed=pass_id)
reader = train(data_dir=args.data_dir, pass_id_as_seed=pass_id,
infinite=True)
else:
reader = val(data_dir=args.data_dir)
if is_train:
......@@ -135,7 +139,10 @@ def build_program(is_train, main_prog, startup_prog, args):
end_lr /= device_num_per_worker
total_images = args.total_images / trainer_count
step = int(total_images / (args.batch_size * args.multi_batch_repeat) + 1)
if os.getenv("FLAGS_selected_gpus"):
step = int(total_images / (args.batch_size / device_num_per_worker * args.multi_batch_repeat) + 1)
else:
step = int(total_images / (args.batch_size * args.multi_batch_repeat) + 1)
warmup_steps = step * 5 # warmup 5 passes
epochs = [30, 60, 80]
bd = [step * e for e in epochs]
......@@ -157,6 +164,17 @@ def build_program(is_train, main_prog, startup_prog, args):
boundaries=bd, values=lr),
warmup_steps, start_lr, end_lr),
momentum=0.9)
if args.enable_dgc:
optimizer = fluid.optimizer.DGCMomentumOptimizer(
learning_rate=utils.learning_rate.lr_warmup(
fluid.layers.piecewise_decay(
boundaries=bd, values=lr),
warmup_steps, start_lr, end_lr),
momentum=0.9,
sparsity=[0.999, 0.999],
rampup_begin_step=args.rampup_begin_step)
if args.fp16:
params_grads = optimizer.backward(avg_cost)
master_params_grads = utils.create_master_params_grads(
......@@ -224,7 +242,7 @@ def train_parallel(args):
if args.update_method == "pserver":
train_prog, startup_prog = pserver_prepare(args, train_prog, startup_prog)
elif args.update_method == "nccl2":
nccl2_prepare(args, startup_prog)
nccl2_prepare(args, startup_prog, main_prog=train_prog)
if args.dist_env["training_role"] == "PSERVER":
run_pserver(train_prog, startup_prog)
......@@ -247,11 +265,16 @@ def train_parallel(args):
strategy = fluid.ExecutionStrategy()
strategy.num_threads = args.num_threads
# num_iteration_per_drop_scope indicates how
# many iterations to clean up the temp variables which
# is generated during execution. It may make the execution faster,
# because the temp variable's shape are the same between two iterations.
strategy.num_iteration_per_drop_scope = 30
build_strategy = fluid.BuildStrategy()
build_strategy.enable_inplace = False
build_strategy.memory_optimize = False
build_strategy.enable_sequential_execution = bool(args.enable_sequential_execution)
if args.reduce_strategy == "reduce":
build_strategy.reduce_strategy = fluid.BuildStrategy(
......@@ -298,14 +321,19 @@ def train_parallel(args):
over_all_start = time.time()
fetch_list = [train_cost.name, train_acc1.name, train_acc5.name]
steps_per_pass = args.total_images / args.batch_size / args.dist_env["num_trainers"]
# 1. MP mode, batch size for current process should be args.batch_size / GPUs
# 2. SP/PG mode, batch size for each process should be original args.batch_size
if os.getenv("FLAGS_selected_gpus"):
steps_per_pass = args.total_images / (args.batch_size / get_device_num()) / args.dist_env["num_trainers"]
else:
steps_per_pass = args.total_images / args.batch_size / args.dist_env["num_trainers"]
for pass_id in range(args.num_epochs):
num_samples = 0
start_time = time.time()
batch_id = 1
# use pass_id+1 as per pass global shuffle for distributed training
prepare_reader(True, train_pyreader, args, pass_id + 1)
train_pyreader.start()
if pass_id == 0:
train_pyreader.start()
while True:
try:
if batch_id % 30 == 0:
......@@ -323,11 +351,10 @@ def train_parallel(args):
break
num_samples += args.batch_size
batch_id += 1
if args.skip_unbalanced_data and batch_id >= steps_per_pass:
if batch_id >= steps_per_pass:
break
print_train_time(start_time, time.time(), num_samples)
train_pyreader.reset()
if pass_id >= args.start_test_pass:
if args.multi_batch_repeat > 1:
copyback_repeat_bn_params(train_prog)
......@@ -343,6 +370,7 @@ def train_parallel(args):
if not os.path.isdir(model_path):
os.makedirs(model_path)
fluid.io.save_persistables(startup_exe, model_path, main_program=train_prog)
train_pyreader.reset()
startup_exe.close()
print("total train time: ", time.time() - over_all_start)
......
......@@ -2,7 +2,7 @@ import os
import paddle.fluid as fluid
def nccl2_prepare(args, startup_prog):
def nccl2_prepare(args, startup_prog, main_prog):
config = fluid.DistributeTranspilerConfig()
config.mode = "nccl2"
t = fluid.DistributeTranspiler(config=config)
......@@ -12,7 +12,8 @@ def nccl2_prepare(args, startup_prog):
t.transpile(envs["trainer_id"],
trainers=','.join(envs["trainer_endpoints"]),
current_endpoint=envs["current_endpoint"],
startup_program=startup_prog)
startup_program=startup_prog,
program=main_prog)
def pserver_prepare(args, train_prog, startup_prog):
......
......@@ -15,5 +15,7 @@ PADDLE_TRAINING_ROLE="TRAINER" \
PADDLE_CURRENT_ENDPOINT="127.0.0.1:716${i}" \
PADDLE_TRAINER_ID="${i}" \
FLAGS_selected_gpus="${i}" \
python dist_train.py --model $MODEL --update_method nccl2 --batch_size 32 --fp16 1 --scale_loss 8 &> logs/tr$i.log &
python -u dist_train.py --model $MODEL --update_method nccl2 \
--batch_size 32 \
--fp16 0 --scale_loss 1 &> logs/tr$i.log &
done
#!/bin/bash
set -e
enable_dgc=False
while true ; do
case "$1" in
-enable_dgc) enable_dgc="$2" ; shift 2 ;;
*)
if [[ ${#1} > 0 ]]; then
echo "not supported arugments ${1}" ; exit 1 ;
else
break
fi
;;
esac
done
case "${enable_dgc}" in
True) ;;
False) ;;
*) echo "not support argument -enable_dgc: ${dgc}" ; exit 1 ;;
esac
export MODEL="DistResNet"
export PADDLE_TRAINER_ENDPOINTS="127.0.0.1:7160,127.0.0.1:7161"
......@@ -9,16 +31,20 @@ mkdir -p logs
# NOTE: set NCCL_P2P_DISABLE so that can run nccl2 distribute train on one node.
# You can set vlog to see more details' log.
# export GLOG_v=1
# export GLOG_logtostderr=1
PADDLE_TRAINING_ROLE="TRAINER" \
PADDLE_CURRENT_ENDPOINT="127.0.0.1:7160" \
PADDLE_TRAINER_ID="0" \
CUDA_VISIBLE_DEVICES="0" \
NCCL_P2P_DISABLE="1" \
python dist_train.py --model $MODEL --update_method nccl2 --batch_size 32 &> logs/tr0.log &
python -u dist_train.py --enable_dgc ${enable_dgc} --model $MODEL --update_method nccl2 --batch_size 32 &> logs/tr0.log &
PADDLE_TRAINING_ROLE="TRAINER" \
PADDLE_CURRENT_ENDPOINT="127.0.0.1:7161" \
PADDLE_TRAINER_ID="1" \
CUDA_VISIBLE_DEVICES="1" \
NCCL_P2P_DISABLE="1" \
python dist_train.py --model $MODEL --update_method nccl2 --batch_size 32 &> logs/tr1.log &
python -u dist_train.py --enable_dgc ${enable_dgc} --model $MODEL --update_method nccl2 --batch_size 32 &> logs/tr1.log &
......@@ -12,7 +12,7 @@ np.random.seed(0)
DATA_DIM = 224
THREAD = 8
BUF_SIZE = 102400
BUF_SIZE = 1024
DATA_DIR = 'data/ILSVRC2012'
......@@ -131,39 +131,45 @@ def _reader_creator(file_list,
color_jitter=False,
rotate=False,
data_dir=DATA_DIR,
pass_id_as_seed=0,
data_dim=224):
def reader():
data_dim=224,
pass_id_as_seed=1,
infinite=False):
def reader():
with open(file_list) as flist:
full_lines = [line.strip() for line in flist]
if shuffle:
if pass_id_as_seed:
np.random.seed(pass_id_as_seed)
np.random.shuffle(full_lines)
if mode == 'train' and os.getenv('PADDLE_TRAINING_ROLE'):
# distributed mode if the env var `PADDLE_TRAINING_ROLE` exits
trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
trainer_count = int(os.getenv("PADDLE_TRAINERS_NUM", "1"))
per_node_lines = len(full_lines) // trainer_count
lines = full_lines[trainer_id * per_node_lines:(trainer_id + 1)
* per_node_lines]
print(
"read images from %d, length: %d, lines length: %d, total: %d"
% (trainer_id * per_node_lines, per_node_lines, len(lines),
len(full_lines)))
else:
lines = full_lines
for line in lines:
if mode == 'train' or mode == 'val':
img_path, label = line.split()
img_path = os.path.join(data_dir, img_path)
yield img_path, int(label)
elif mode == 'test':
img_path, label = line.split()
img_path = os.path.join(data_dir, img_path)
yield [img_path]
pass_id_as_seed_counter = pass_id_as_seed
while True:
if shuffle:
if pass_id_as_seed_counter:
np.random.seed(pass_id_as_seed_counter)
np.random.shuffle(full_lines)
if mode == 'train' and os.getenv('PADDLE_TRAINING_ROLE'):
# distributed mode if the env var `PADDLE_TRAINING_ROLE` exits
trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
trainer_count = int(os.getenv("PADDLE_TRAINERS_NUM", "1"))
per_node_lines = len(full_lines) // trainer_count
lines = full_lines[trainer_id * per_node_lines:(trainer_id + 1)
* per_node_lines]
print(
"read images from %d, length: %d, lines length: %d, total: %d"
% (trainer_id * per_node_lines, per_node_lines, len(lines),
len(full_lines)))
else:
lines = full_lines
for line in lines:
if mode == 'train' or mode == 'val':
img_path, label = line.split()
img_path = os.path.join(data_dir, img_path)
yield img_path, int(label)
elif mode == 'test':
img_path, label = line.split()
img_path = os.path.join(data_dir, img_path)
yield [img_path]
if not infinite:
break
pass_id_as_seed_counter += 1
print("passid ++, current: ", pass_id_as_seed_counter)
mapper = functools.partial(
process_image, data_dim=data_dim, mode=mode, color_jitter=color_jitter, rotate=rotate)
......@@ -171,7 +177,7 @@ def _reader_creator(file_list,
return paddle.reader.xmap_readers(mapper, reader, THREAD, BUF_SIZE)
def train(data_dir=DATA_DIR, pass_id_as_seed=0):
def train(data_dir=DATA_DIR, pass_id_as_seed=1, infinite=False):
file_list = os.path.join(data_dir, 'train_list.txt')
return _reader_creator(
file_list,
......@@ -180,7 +186,8 @@ def train(data_dir=DATA_DIR, pass_id_as_seed=0):
color_jitter=False,
rotate=False,
data_dir=data_dir,
pass_id_as_seed=pass_id_as_seed)
pass_id_as_seed=pass_id_as_seed,
infinite=infinite)
def val(data_dir=DATA_DIR,image_shape="3,224,224"):
......
......@@ -323,6 +323,7 @@ def train(args):
train_py_reader.decorate_paddle_reader(train_reader)
test_py_reader.decorate_paddle_reader(test_reader)
# use_ngraph is for CPU only, please refer to README_ngraph.md for details
use_ngraph = os.getenv('FLAGS_use_ngraph')
if not use_ngraph:
train_exe = fluid.ParallelExecutor(
......
......@@ -103,8 +103,16 @@ def create_master_params_grads(params_grads, main_prog, startup_prog, scale_loss
def master_param_to_train_param(master_params_grads, params_grads, main_prog):
for idx, m_p_g in enumerate(master_params_grads):
train_p, _ = params_grads[idx]
if train_p.name.startswith("batch_norm"):
continue
with main_prog._optimized_guard([m_p_g[0], m_p_g[1]]):
train_p_name = m_p_g[0].name.replace(".master", "")
if train_p_name.startswith("batch_norm"):
continue
train_p = None
# find fp16 param in original params_grads list
for p, g in params_grads:
if p.name == train_p_name:
train_p = p
if not train_p:
print("can not find train param for: ", m_p_g[0].name)
continue
cast_fp32_to_fp16(m_p_g[0], train_p, main_prog)
......@@ -3,111 +3,124 @@ from paddle.fluid.initializer import MSRA
from paddle.fluid.param_attr import ParamAttr
def conv_bn(input,
filter_size,
num_filters,
stride,
padding,
channels=None,
num_groups=1,
act='relu',
use_cudnn=True):
parameter_attr = ParamAttr(learning_rate=0.1, initializer=MSRA())
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=parameter_attr,
bias_attr=False)
return fluid.layers.batch_norm(input=conv, act=act)
class MobileNetSSD:
def __init__(self, img, num_classes, img_shape):
self.img = img
self.num_classes = num_classes
self.img_shape = img_shape
def ssd_net(self, scale=1.0):
# 300x300
tmp = self.conv_bn(self.img, 3, int(32 * scale), 2, 1, 3)
# 150x150
tmp = self.depthwise_separable(tmp, 32, 64, 32, 1, scale)
tmp = self.depthwise_separable(tmp, 64, 128, 64, 2, scale)
# 75x75
tmp = self.depthwise_separable(tmp, 128, 128, 128, 1, scale)
tmp = self.depthwise_separable(tmp, 128, 256, 128, 2, scale)
# 38x38
tmp = self.depthwise_separable(tmp, 256, 256, 256, 1, scale)
tmp = self.depthwise_separable(tmp, 256, 512, 256, 2, scale)
def depthwise_separable(input, num_filters1, num_filters2, num_groups, stride,
scale):
depthwise_conv = conv_bn(
input=input,
filter_size=3,
num_filters=int(num_filters1 * scale),
stride=stride,
padding=1,
num_groups=int(num_groups * scale),
use_cudnn=False)
# 19x19
for i in range(5):
tmp = self.depthwise_separable(tmp, 512, 512, 512, 1, scale)
module11 = tmp
tmp = self.depthwise_separable(tmp, 512, 1024, 512, 2, scale)
pointwise_conv = conv_bn(
input=depthwise_conv,
filter_size=1,
num_filters=int(num_filters2 * scale),
stride=1,
padding=0)
return pointwise_conv
# 10x10
module13 = self.depthwise_separable(tmp, 1024, 1024, 1024, 1, scale)
module14 = self.extra_block(module13, 256, 512, 1, 2, scale)
# 5x5
module15 = self.extra_block(module14, 128, 256, 1, 2, scale)
# 3x3
module16 = self.extra_block(module15, 128, 256, 1, 2, scale)
# 2x2
module17 = self.extra_block(module16, 64, 128, 1, 2, scale)
mbox_locs, mbox_confs, box, box_var = fluid.layers.multi_box_head(
inputs=[
module11, module13, module14, module15, module16, module17
],
image=self.img,
num_classes=self.num_classes,
min_ratio=20,
max_ratio=90,
min_sizes=[60.0, 105.0, 150.0, 195.0, 240.0, 285.0],
max_sizes=[[], 150.0, 195.0, 240.0, 285.0, 300.0],
aspect_ratios=[[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.],
[2., 3.]],
base_size=self.img_shape[2],
offset=0.5,
flip=True)
def extra_block(input, num_filters1, num_filters2, num_groups, stride, scale):
# 1x1 conv
pointwise_conv = conv_bn(
input=input,
filter_size=1,
num_filters=int(num_filters1 * scale),
stride=1,
num_groups=int(num_groups * scale),
padding=0)
return mbox_locs, mbox_confs, box, box_var
# 3x3 conv
normal_conv = conv_bn(
input=pointwise_conv,
filter_size=3,
num_filters=int(num_filters2 * scale),
stride=2,
num_groups=int(num_groups * scale),
padding=1)
return normal_conv
def conv_bn(self,
input,
filter_size,
num_filters,
stride,
padding,
channels=None,
num_groups=1,
act='relu',
use_cudnn=True):
parameter_attr = ParamAttr(learning_rate=0.1, initializer=MSRA())
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=parameter_attr,
bias_attr=False)
return fluid.layers.batch_norm(input=conv, act=act)
def depthwise_separable(self, input, num_filters1, num_filters2, num_groups,
stride, scale):
depthwise_conv = self.conv_bn(
input=input,
filter_size=3,
num_filters=int(num_filters1 * scale),
stride=stride,
padding=1,
num_groups=int(num_groups * scale),
use_cudnn=False)
def mobile_net(num_classes, img, img_shape, scale=1.0):
# 300x300
tmp = conv_bn(img, 3, int(32 * scale), 2, 1, 3)
# 150x150
tmp = depthwise_separable(tmp, 32, 64, 32, 1, scale)
tmp = depthwise_separable(tmp, 64, 128, 64, 2, scale)
# 75x75
tmp = depthwise_separable(tmp, 128, 128, 128, 1, scale)
tmp = depthwise_separable(tmp, 128, 256, 128, 2, scale)
# 38x38
tmp = depthwise_separable(tmp, 256, 256, 256, 1, scale)
tmp = depthwise_separable(tmp, 256, 512, 256, 2, scale)
pointwise_conv = self.conv_bn(
input=depthwise_conv,
filter_size=1,
num_filters=int(num_filters2 * scale),
stride=1,
padding=0)
return pointwise_conv
# 19x19
for i in range(5):
tmp = depthwise_separable(tmp, 512, 512, 512, 1, scale)
module11 = tmp
tmp = depthwise_separable(tmp, 512, 1024, 512, 2, scale)
def extra_block(self, input, num_filters1, num_filters2, num_groups, stride,
scale):
# 1x1 conv
pointwise_conv = self.conv_bn(
input=input,
filter_size=1,
num_filters=int(num_filters1 * scale),
stride=1,
num_groups=int(num_groups * scale),
padding=0)
# 10x10
module13 = depthwise_separable(tmp, 1024, 1024, 1024, 1, scale)
module14 = extra_block(module13, 256, 512, 1, 2, scale)
# 5x5
module15 = extra_block(module14, 128, 256, 1, 2, scale)
# 3x3
module16 = extra_block(module15, 128, 256, 1, 2, scale)
# 2x2
module17 = extra_block(module16, 64, 128, 1, 2, scale)
# 3x3 conv
normal_conv = self.conv_bn(
input=pointwise_conv,
filter_size=3,
num_filters=int(num_filters2 * scale),
stride=2,
num_groups=int(num_groups * scale),
padding=1)
return normal_conv
mbox_locs, mbox_confs, box, box_var = fluid.layers.multi_box_head(
inputs=[module11, module13, module14, module15, module16, module17],
image=img,
num_classes=num_classes,
min_ratio=20,
max_ratio=90,
min_sizes=[60.0, 105.0, 150.0, 195.0, 240.0, 285.0],
max_sizes=[[], 150.0, 195.0, 240.0, 285.0, 300.0],
aspect_ratios=[[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]],
base_size=img_shape[2],
offset=0.5,
flip=True)
return mbox_locs, mbox_confs, box, box_var
def build_mobilenet_ssd(img, num_classes, img_shape):
ssd_model = MobileNetSSD(img, num_classes, img_shape)
return ssd_model.ssd_net()
......@@ -293,6 +293,7 @@ def train(settings,
coco_api = COCO(file_path)
image_ids = coco_api.getImgIds()
images = coco_api.loadImgs(image_ids)
np.random.shuffle(images)
n = int(math.ceil(len(images) // num_workers))
image_lists = [images[i:i + n] for i in range(0, len(images), n)]
......@@ -307,11 +308,11 @@ def train(settings,
data_dir))
else:
images = [line.strip() for line in open(file_path)]
np.random.shuffle(images)
n = int(math.ceil(len(images) // num_workers))
image_lists = [images[i:i + n] for i in range(0, len(images), n)]
for l in image_lists:
readers.append(pascalvoc(settings, l, 'train', batch_size, shuffle))
return paddle.reader.multiprocess_reader(readers, False)
......@@ -341,7 +342,7 @@ def infer(settings, image_path):
"data path correctly." % image_path)
img = Image.open(image_path)
if img.mode == 'L':
img = im.convert('RGB')
img = img.convert('RGB')
im_width, im_height = img.size
img = img.resize((settings.resize_w, settings.resize_h),
Image.ANTIALIAS)
......
......@@ -10,7 +10,7 @@ import multiprocessing
import paddle
import paddle.fluid as fluid
import reader
from mobilenet_ssd import mobile_net
from mobilenet_ssd import build_mobilenet_ssd
from utility import add_arguments, print_arguments
parser = argparse.ArgumentParser(description=__doc__)
......@@ -92,7 +92,7 @@ def build_program(main_prog, startup_prog, train_params, is_train):
use_double_buffer=True)
with fluid.unique_name.guard():
image, gt_box, gt_label, difficult = fluid.layers.read_file(py_reader)
locs, confs, box, box_var = mobile_net(class_num, image, image_shape)
locs, confs, box, box_var = build_mobilenet_ssd(image, class_num, image_shape)
if is_train:
with fluid.unique_name.guard("train"):
loss = fluid.layers.ssd_loss(locs, confs, gt_box, gt_label, box,
......@@ -228,6 +228,13 @@ def train(args,
total_time = 0.0
for epoc_id in range(epoc_num):
train_reader = reader.train(data_args,
train_file_list,
batch_size_per_device,
shuffle=is_shuffle,
num_workers=num_workers,
enable_ce=enable_ce)
train_py_reader.decorate_paddle_reader(train_reader)
epoch_idx = epoc_id + 1
start_time = time.time()
prev_start_time = start_time
......@@ -255,9 +262,10 @@ def train(args,
end_time = time.time()
total_time += end_time - start_time
best_map, mean_map = test(epoc_id, best_map)
print("Best test map {0}".format(best_map))
if epoc_id % 10 == 0 or epoc_id == epoc_num - 1:
best_map, mean_map = test(epoc_id, best_map)
print("Best test map {0}".format(best_map))
# save model
save_model(str(epoc_id), train_prog)
if enable_ce:
......@@ -275,7 +283,7 @@ def train(args,
(devices_num, total_time / epoch_idx))
if __name__ == '__main__':
def main():
args = parser.parse_args()
print_arguments(args)
......@@ -318,3 +326,7 @@ if __name__ == '__main__':
train_parameters[dataset],
train_file_list=train_file_list,
val_file_list=val_file_list)
if __name__ == '__main__':
main()
\ No newline at end of file

运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
......@@ -156,12 +155,13 @@ env CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --parallel=True
通过以下命令调用评估脚本用指定数据集对模型进行评估:
```
env CUDA_VISIBLE_DEVICE=0 python eval.py \
env CUDA_VISIBLE_DEVICES=0 python eval.py \
--model_path="./models/model_0" \
--input_images_dir="./eval_data/images/" \
--input_images_list="./eval_data/eval_list\" \
--input_images_list="./eval_data/eval_list"
```
执行`python train.py --help`可查看参数详细说明。
......@@ -170,7 +170,7 @@ env CUDA_VISIBLE_DEVICE=0 python eval.py \
从标准输入读取一张图片的路径,并对齐进行预测:
```
env CUDA_VISIBLE_DEVICE=0 python infer.py \
env CUDA_VISIBLE_DEVICES=0 python infer.py \
--model_path="models/model_00044_15000"
```
......@@ -193,7 +193,7 @@ result: [2067 2067 8187 8477 5027 7191 2431 1462]
从文件中批量读取图片路径,并对其进行预测:
```
env CUDA_VISIBLE_DEVICE=0 python infer.py \
env CUDA_VISIBLE_DEVICES=0 python infer.py \
--model_path="models/model_00044_15000" \
--input_images_list="data/test.list"
```
......@@ -204,3 +204,5 @@ env CUDA_VISIBLE_DEVICE=0 python infer.py \
|- |:-: |
|[ocr_ctc_params](https://paddle-ocr-models.bj.bcebos.com/ocr_ctc.zip) | 22.3% |
|[ocr_attention_params](https://paddle-ocr-models.bj.bcebos.com/ocr_attention.zip) | 15.8%|
>在本文示例中,均可通过修改`CUDA_VISIBLE_DEVICES`改变当前任务使用的显卡号。
......@@ -339,7 +339,7 @@ def attention_infer(images, num_classes, use_cudnn=True):
return ids
def attention_eval(data_shape, num_classes):
def attention_eval(data_shape, num_classes, use_cudnn=True):
images = fluid.layers.data(name='pixel', shape=data_shape, dtype='float32')
label_in = fluid.layers.data(
name='label_in', shape=[1], dtype='int32', lod_level=1)
......@@ -349,7 +349,7 @@ def attention_eval(data_shape, num_classes):
label_in = fluid.layers.cast(x=label_in, dtype='int64')
gru_backward, encoded_vector, encoded_proj = encoder_net(
images, is_test=True)
images, is_test=True, use_cudnn=use_cudnn)
backward_first = fluid.layers.sequence_pool(
input=gru_backward, pool_type='first')
......
......@@ -213,12 +213,12 @@ def ctc_train_net(args, data_shape, num_classes):
return sum_cost, error_evaluator, inference_program, model_average
def ctc_infer(images, num_classes, use_cudnn):
def ctc_infer(images, num_classes, use_cudnn=True):
fc_out = encoder_net(images, num_classes, is_test=True, use_cudnn=use_cudnn)
return fluid.layers.ctc_greedy_decoder(input=fc_out, blank=num_classes)
def ctc_eval(data_shape, num_classes, use_cudnn):
def ctc_eval(data_shape, num_classes, use_cudnn=True):
images = fluid.layers.data(name='pixel', shape=data_shape, dtype='float32')
label = fluid.layers.data(
name='label', shape=[1], dtype='int32', lod_level=1)
......
......@@ -10,6 +10,11 @@ from os import path
from paddle.dataset.image import load_image
import paddle
try:
input = raw_input
except NameError:
pass
SOS = 0
EOS = 1
NUM_CLASSES = 95
......@@ -175,7 +180,7 @@ class DataGenerator(object):
yield img, label
else:
while True:
img_path = raw_input("Please input the path of image: ")
img_path = input("Please input the path of image: ")
img = Image.open(img_path).convert('L')
img = np.array(img) - 127.5
img = img[np.newaxis, ...]
......
......@@ -31,7 +31,8 @@ def evaluate(args):
num_classes = data_reader.num_classes()
data_shape = data_reader.data_shape()
# define network
evaluator, cost = eval(data_shape, num_classes)
evaluator, cost = eval(
data_shape, num_classes, use_cudnn=True if args.use_gpu else False)
# data reader
test_reader = data_reader.test(
......@@ -62,8 +63,8 @@ def evaluate(args):
count += 1
exe.run(fluid.default_main_program(), feed=get_feeder_data(data, place))
avg_distance, avg_seq_error = evaluator.eval(exe)
print("Read %d samples; avg_distance: %s; avg_seq_error: %s" % (
count, avg_distance, avg_seq_error))
print("Read %d samples; avg_distance: %s; avg_seq_error: %s" %
(count, avg_distance, avg_seq_error))
def main():
......
......@@ -31,7 +31,7 @@ def inference(args):
"""OCR inference"""
if args.model == "crnn_ctc":
infer = ctc_infer
get_feeder_data = get_ctc_feeder_data
get_feeder_data = get_ctc_feeder_for_infer
else:
infer = attention_infer
get_feeder_data = get_attention_feeder_for_infer
......@@ -78,7 +78,7 @@ def inference(args):
batch_times = []
iters = 0
for data in infer_reader():
feed_dict = get_feeder_data(data, place, need_label=False)
feed_dict = get_feeder_data(data, place)
if args.iterations > 0 and iters == args.iterations + args.skip_batch_num:
break
if iters < args.skip_batch_num:
......
......@@ -18,7 +18,6 @@ from __future__ import division
from __future__ import print_function
import distutils.util
import numpy as np
from paddle.fluid import core
import paddle.fluid as fluid
import six
......@@ -73,17 +72,18 @@ def to_lodtensor(data, place):
lod.append(cur_len)
flattened_data = np.concatenate(data, axis=0).astype("int32")
flattened_data = flattened_data.reshape([len(flattened_data), 1])
res = core.LoDTensor()
res = fluid.LoDTensor()
res.set(flattened_data, place)
res.set_lod([lod])
return res
def get_ctc_feeder_data(data, place, need_label=True):
pixel_tensor = core.LoDTensor()
pixel_tensor = fluid.LoDTensor()
pixel_data = None
pixel_data = np.concatenate(
list(map(lambda x: x[0][np.newaxis, :], data)), axis=0).astype("float32")
list(map(lambda x: x[0][np.newaxis, :], data)),
axis=0).astype("float32")
pixel_tensor.set(pixel_data, place)
label_tensor = to_lodtensor(list(map(lambda x: x[1], data)), place)
if need_label:
......@@ -92,11 +92,16 @@ def get_ctc_feeder_data(data, place, need_label=True):
return {"pixel": pixel_tensor}
def get_ctc_feeder_for_infer(data, place):
return get_ctc_feeder_data(data, place, need_label=False)
def get_attention_feeder_data(data, place, need_label=True):
pixel_tensor = core.LoDTensor()
pixel_tensor = fluid.LoDTensor()
pixel_data = None
pixel_data = np.concatenate(
list(map(lambda x: x[0][np.newaxis, :], data)), axis=0).astype("float32")
list(map(lambda x: x[0][np.newaxis, :], data)),
axis=0).astype("float32")
pixel_tensor.set(pixel_data, place)
label_in_tensor = to_lodtensor(list(map(lambda x: x[1], data)), place)
label_out_tensor = to_lodtensor(list(map(lambda x: x[2], data)), place)
......@@ -124,10 +129,11 @@ def get_attention_feeder_for_infer(data, place):
init_scores = fluid.create_lod_tensor(init_scores_data,
init_recursive_seq_lens, place)
pixel_tensor = core.LoDTensor()
pixel_tensor = fluid.LoDTensor()
pixel_data = None
pixel_data = np.concatenate(
list(map(lambda x: x[0][np.newaxis, :], data)), axis=0).astype("float32")
list(map(lambda x: x[0][np.newaxis, :], data)),
axis=0).astype("float32")
pixel_tensor.set(pixel_data, place)
return {
"pixel": pixel_tensor,
......
......@@ -7,11 +7,11 @@ export OMP_NUM_THREADS=1
cudaid=${face_detection:=0} # use 0-th card as default
export CUDA_VISIBLE_DEVICES=$cudaid
FLAGS_benchmark=true python train.py --model_save_dir=output/ --data_dir=dataset/coco/ --max_iter=100 --enable_ce --pretrained_model=./imagenet_resnet50_fusebn | python _ce.py
FLAGS_benchmark=true python train.py --model_save_dir=output/ --data_dir=dataset/coco/ --max_iter=500 --enable_ce --pretrained_model=./imagenet_resnet50_fusebn --learning_rate=0.00125 | python _ce.py
cudaid=${face_detection_m:=0,1,2,3} # use 0,1,2,3 card as default
export CUDA_VISIBLE_DEVICES=$cudaid
FLAGS_benchmark=true python train.py --model_save_dir=output/ --data_dir=dataset/coco/ --max_iter=100 --enable_ce --pretrained_model=./imagenet_resnet50_fusebn | python _ce.py
FLAGS_benchmark=true python train.py --model_save_dir=output/ --data_dir=dataset/coco/ --max_iter=500 --enable_ce --pretrained_model=./imagenet_resnet50_fusebn --learning_rate=0.005 | python _ce.py
......@@ -332,7 +332,7 @@ def make_multi_reader(filelist, batch_size, sample_times, is_training, shuffle,
else:
yield sample
for i in range(len(p_list)):
p_list[i].terminate()
p_list[i].join()
if p_list[i].is_alive():
p_list[i].join()
return queue_reader
......@@ -13,7 +13,7 @@
## 模型简介
Temporal Shift Module是由MIT和IBM Watson AI Lab的Ji Lin,Chuang Gan等人提出的通过时间位移来提高网络视频理解能力的模块,其位移操作原理如下图所示。
Temporal Shift Module是由MIT和IBM Watson AI Lab的Ji Lin,Chuang Gan和Song Han等人提出的通过时间位移来提高网络视频理解能力的模块,其位移操作原理如下图所示。
<p align="center">
<img src="../../images/temporal_shift.png" height=250 width=800 hspace='10'/> <br />
......@@ -34,6 +34,9 @@ TSM的训练数据采用由DeepMind公布的Kinetics-400动作识别数据集。
数据准备完毕后,可以通过如下两种方式启动训练:
export FLAGS_fast_eager_deletion_mode=1
export FLAGS_eager_delete_tensor_gb=0.0
export FLAGS_fraction_of_gpu_memory_to_use=0.98
python train.py --model_name=TSM
--config=./configs/tsm.txt
--save_dir=checkpoints
......
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
# activate eager gc to reduce memory use
export FLAGS_fast_eager_deletion_mode=1
export FLAGS_eager_delete_tensor_gb=0.0
export FLAGS_fraction_of_gpu_memory_to_use=0.98
python train.py --model_name="TSM" --config=./configs/tsm.txt --epoch_num=65 \
--valid_interval=1 --log_interval=10
......@@ -9,11 +9,10 @@
- [Training](#training)
- [Evaluation](#evaluation)
- [Inference and Visualization](#inference-and-visualization)
- [Appendix](#appendix)
## Installation
Running sample code in this directory requires PaddelPaddle Fluid v.1.4 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/zh/1.4/beginners_guide/install/install_doc.html#paddlepaddle) and make an update.
Running sample code in this directory requires PaddelPaddle Fluid v.1.4 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/en/1.4/beginners_guide/install/index_en.html) and make an update.
## Introduction
......@@ -45,34 +44,35 @@ Train the model on [MS-COCO dataset](http://cocodataset.org/#download), download
cd dataset/coco
./download.sh
The data catalog structure is as follows:
```
dataset/coco/
├── annotations
│   ├── instances_train2014.json
│   ├── instances_train2017.json
│   ├── instances_val2014.json
│   ├── instances_val2017.json
| ...
├── train2017
│   ├── 000000000009.jpg
│   ├── 000000580008.jpg
| ...
├── val2017
│   ├── 000000000139.jpg
│   ├── 000000000285.jpg
| ...
```
## Training
After data preparation, one can start the training step by:
python train.py \
--model_save_dir=output/ \
--pretrain=${path_to_pretrain_model}
--data_dir=${path_to_data}
- Set ```export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7``` to specifiy 8 GPU to train.
- For more help on arguments:
python train.py --help
**download the pre-trained model:** This sample provides Resnet-50 pre-trained model which is converted from Caffe. The model fuses the parameters in batch normalization layer. One can download pre-trained model as:
sh ./weights/download.sh
Set `pretrain` to load pre-trained model. In addition, this parameter is used to load trained model when finetuning as well.
Please make sure that pre-trained model is downloaded and loaded correctly, otherwise, the loss may be NAN during training.
**Install the [cocoapi](https://github.com/cocodataset/cocoapi):**
To train the model, [cocoapi](https://github.com/cocodataset/cocoapi) is needed. Install the cocoapi:
git clone https://github.com/cocodataset/cocoapi.git
cd PythonAPI
cd cocoapi/PythonAPI
# if cython is not installed
pip install Cython
# Install into global site-packages
......@@ -81,6 +81,26 @@ To train the model, [cocoapi](https://github.com/cocodataset/cocoapi) is needed.
# not to install the COCO API into global site-packages
python2 setup.py install --user
**download the pre-trained model:** This sample provides Resnet-50 pre-trained model which is converted from Caffe. The model fuses the parameters in batch normalization layer. One can download pre-trained model as:
sh ./weights/download.sh
Set `pretrain` to load pre-trained model. In addition, this parameter is used to load trained model when finetuning as well.
Please make sure that pre-trained model is downloaded and loaded correctly, otherwise, the loss may be NAN during training.
**training:** After data preparation, one can start the training step by:
python train.py \
--model_save_dir=output/ \
--pretrain=${path_to_pretrain_model}
--data_dir=${path_to_data}
- Set ```export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7``` to specifiy 8 GPU to train.
- For more help on arguments:
python train.py --help
**data reader introduction:**
* Data reader is defined in `reader.py` .
......@@ -114,14 +134,24 @@ Evaluation is to evaluate the performance of a trained model. This sample provid
- Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to eval.
Evalutaion result is shown as below:
If train with `--syncbn=False`, Evalutaion result is shown as below:
| input size | mAP(IoU=0.50:0.95) | mAP(IoU=0.50) | mAP(IoU=0.75) |
| :------: | :------: | :------: | :------: |
| 608x608| 37.7 | 59.8 | 40.8 |
| 608x608 | 37.7 | 59.8 | 40.8 |
| 416x416 | 36.5 | 58.2 | 39.1 |
| 320x320 | 34.1 | 55.4 | 36.3 |
If train with `--syncbn=True`, Evalutaion result is shown as below:
| input size | mAP(IoU=0.50:0.95) | mAP(IoU=0.50) | mAP(IoU=0.75) |
| :------: | :------: | :------: | :------: |
| 608x608 | 38.9 | 61.1 | 42.0 |
| 416x416 | 37.5 | 59.6 | 40.2 |
| 320x320 | 34.8 | 56.4 | 36.9 |
- **NOTE:** evaluations based on `pycocotools` evaluator, predict bounding boxes with `score < 0.05` were not filtered out. Some frameworks which filtered out predict bounding boxes with `score < 0.05` will cause a drop in accuracy.
## Inference and Visualization
Inference is used to get prediction score or image features based on trained models. `infer.py` is the main executor for inference, one can start infer step by:
......@@ -133,12 +163,14 @@ Inference is used to get prediction score or image features based on trained mod
--image_name=000000000139.jpg \
--draw_threshold=0.5
Inference speed:
- Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to infer.
Inference speed(Tesla P40):
| input size | 608x608 | 416x416 | 320x320 |
|:-------------:| :-----: | :-----: | :-----: |
| infer speed | 50 ms/frame | 29 ms/frame |24 ms/frame |
| infer speed | 48 ms/frame | 29 ms/frame |24 ms/frame |
Visualization of infer result is shown as below:
......
......@@ -9,11 +9,10 @@
- [模型训练](#模型训练)
- [模型评估](#模型评估)
- [模型推断及可视化](#模型推断及可视化)
- [附录](#附录)
## 安装
在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.4或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.4/beginners_guide/install/install_doc.html#paddlepaddle)中的说明来更新PaddlePaddle。
在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.4或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://paddlepaddle.org/documentation/docs/zh/1.4/beginners_guide/install/index_cn.html)中的说明来更新PaddlePaddle。
## 简介
......@@ -47,34 +46,35 @@ YOLOv3 的网络结构由基础特征提取网络、multi-scale特征融合层
cd dataset/coco
./download.sh
数据目录结构如下:
```
dataset/coco/
├── annotations
│   ├── instances_train2014.json
│   ├── instances_train2017.json
│   ├── instances_val2014.json
│   ├── instances_val2017.json
| ...
├── train2017
│   ├── 000000000009.jpg
│   ├── 000000580008.jpg
| ...
├── val2017
│   ├── 000000000139.jpg
│   ├── 000000000285.jpg
| ...
```
## 模型训练
数据准备完毕后,可以通过如下的方式启动训练:
python train.py \
--model_save_dir=output/ \
--pretrain=${path_to_pretrain_model}
--data_dir=${path_to_data}
- 通过设置export CUDA\_VISIBLE\_DEVICES=0,1,2,3,4,5,6,7指定8卡GPU训练。
- 可选参数见:
python train.py --help
**下载预训练模型:** 本示例提供darknet53预训练模型,该模型转换自作者提供的darknet53在ImageNet上预训练的权重,采用如下命令下载预训练模型:
sh ./weights/download.sh
通过初始化`pretrain` 加载预训练模型。同时在参数微调时也采用该设置加载已训练模型。
请在训练前确认预训练模型下载与加载正确,否则训练过程中损失可能会出现NAN。
**安装[cocoapi](https://github.com/cocodataset/cocoapi):**
训练前需要首先下载[cocoapi](https://github.com/cocodataset/cocoapi)
git clone https://github.com/cocodataset/cocoapi.git
cd PythonAPI
cd cocoapi/PythonAPI
# if cython is not installed
pip install Cython
# Install into global site-packages
......@@ -83,6 +83,25 @@ YOLOv3 的网络结构由基础特征提取网络、multi-scale特征融合层
# not to install the COCO API into global site-packages
python2 setup.py install --user
**下载预训练模型:** 本示例提供darknet53预训练模型,该模型转换自作者提供的darknet53在ImageNet上预训练的权重,采用如下命令下载预训练模型:
sh ./weights/download.sh
通过初始化`pretrain` 加载预训练模型。同时在参数微调时也采用该设置加载已训练模型。
请在训练前确认预训练模型下载与加载正确,否则训练过程中损失可能会出现NAN。
**开始训练:** 数据准备完毕后,可以通过如下的方式启动训练:
python train.py \
--model_save_dir=output/ \
--pretrain=${path_to_pretrain_model}
--data_dir=${path_to_data}
- 通过设置export CUDA\_VISIBLE\_DEVICES=0,1,2,3,4,5,6,7指定8卡GPU训练。
- 可选参数见:
python train.py --help
**数据读取器说明:**
* 数据读取器定义在reader.py中。
......@@ -115,14 +134,24 @@ Train Loss
- 通过设置export CUDA\_VISIBLE\_DEVICES=0指定单卡GPU评估。
模型评估结果:
若训练时指定`--syncbn=False`, 模型评估精度如下:
| input size | mAP(IoU=0.50:0.95) | mAP(IoU=0.50) | mAP(IoU=0.75) |
| :------: | :------: | :------: | :------: |
| 608x608| 37.7 | 59.8 | 40.8 |
| 608x608 | 37.7 | 59.8 | 40.8 |
| 416x416 | 36.5 | 58.2 | 39.1 |
| 320x320 | 34.1 | 55.4 | 36.3 |
若训练时指定`--syncbn=True`, 模型评估精度如下:
| input size | mAP(IoU=0.50:0.95) | mAP(IoU=0.50) | mAP(IoU=0.75) |
| :------: | :------: | :------: | :------: |
| 608x608 | 38.9 | 61.1 | 42.0 |
| 416x416 | 37.5 | 59.6 | 40.2 |
| 320x320 | 34.8 | 56.4 | 36.9 |
- **注意:** 评估结果基于`pycocotools`评估器,没有滤除`score < 0.05`的预测框,其他框架有此滤除操作会导致精度下降。
## 模型推断及可视化
......@@ -136,12 +165,14 @@ Train Loss
--image_name=000000000139.jpg \
--draw_threshold=0.5
模型预测速度:
- 通过设置export CUDA\_VISIBLE\_DEVICES=0指定单卡GPU预测。
模型预测速度(Tesla P40):
| input size | 608x608 | 416x416 | 320x320 |
|:-------------:| :-----: | :-----: | :-----: |
| infer speed | 50 ms/frame | 29 ms/frame |24 ms/frame |
| infer speed | 48 ms/frame | 29 ms/frame |24 ms/frame |
下图为模型可视化预测结果:
<p align="center">
......
......@@ -68,7 +68,7 @@ def eval():
res = {
'image_id': im_id,
'category_id': label_ids[int(label)],
'bbox': map(float, bbox),
'bbox': list(map(float, bbox)),
'score': float(score)
}
result.append(res)
......
......@@ -51,7 +51,14 @@ def random_distort(img):
return img
def random_crop(img, boxes, labels, scores, scales=[0.3, 1.0], max_ratio=2.0, constraints=None, max_trial=50):
def random_crop(img,
boxes,
labels,
scores,
scales=[0.3, 1.0],
max_ratio=2.0,
constraints=None,
max_trial=50):
if len(boxes) == 0:
return img, boxes
......@@ -90,10 +97,12 @@ def random_crop(img, boxes, labels, scores, scales=[0.3, 1.0], max_ratio=2.0, co
while crops:
crop = crops.pop(np.random.randint(0, len(crops)))
crop_boxes, crop_labels, crop_scores, box_num = box_utils.box_crop(boxes, labels, scores, crop, (w, h))
crop_boxes, crop_labels, crop_scores, box_num = \
box_utils.box_crop(boxes, labels, scores, crop, (w, h))
if box_num < 1:
continue
img = img.crop((crop[0], crop[1], crop[0] + crop[2], crop[1] + crop[3])).resize(img.size, Image.LANCZOS)
img = img.crop((crop[0], crop[1], crop[0] + crop[2],
crop[1] + crop[3])).resize(img.size, Image.LANCZOS)
img = np.asarray(img)
return img, crop_boxes, crop_labels, crop_scores
img = np.asarray(img)
......@@ -118,10 +127,16 @@ def random_interp(img, size, interp=None):
h, w, _ = img.shape
im_scale_x = size / float(w)
im_scale_y = size / float(h)
img = cv2.resize(img, None, None, fx=im_scale_x, fy=im_scale_y, interpolation=interp)
img = cv2.resize(img, None, None, fx=im_scale_x, fy=im_scale_y,
interpolation=interp)
return img
def random_expand(img, gtboxes, max_ratio=4., fill=None, keep_ratio=True, thresh=0.5):
def random_expand(img,
gtboxes,
max_ratio=4.,
fill=None,
keep_ratio=True,
thresh=0.5):
if random.random() > thresh:
return img, gtboxes
......@@ -153,13 +168,21 @@ def random_expand(img, gtboxes, max_ratio=4., fill=None, keep_ratio=True, thresh
return out_img.astype('uint8'), gtboxes
def shuffle_gtbox(gtbox, gtlabel, gtscore):
gt = np.concatenate([gtbox, gtlabel[:, np.newaxis], gtscore[:, np.newaxis]], axis=1)
gt = np.concatenate([gtbox, gtlabel[:, np.newaxis],
gtscore[:, np.newaxis]], axis=1)
idx = np.arange(gt.shape[0])
np.random.shuffle(idx)
gt = gt[idx, :]
return gt[:, :4], gt[:, 4], gt[:, 5]
def image_mixup(img1, gtboxes1, gtlabels1, gtscores1, img2, gtboxes2, gtlabels2, gtscores2):
def image_mixup(img1,
gtboxes1,
gtlabels1,
gtscores1,
img2,
gtboxes2,
gtlabels2,
gtscores2):
factor = np.random.beta(1.5, 1.5)
factor = max(0.0, min(1.0, factor))
if factor >= 1.0:
......@@ -173,7 +196,8 @@ def image_mixup(img1, gtboxes1, gtlabels1, gtscores1, img2, gtboxes2, gtlabels2,
w = max(img1.shape[1], img2.shape[1])
img = np.zeros((h, w, img1.shape[2]), 'float32')
img[:img1.shape[0], :img1.shape[1], :] = img1.astype('float32') * factor
img[:img2.shape[0], :img2.shape[1], :] += img2.astype('float32') * (1.0 - factor)
img[:img2.shape[0], :img2.shape[1], :] += \
img2.astype('float32') * (1.0 - factor)
gtboxes = np.zeros_like(gtboxes1)
gtlabels = np.zeros_like(gtlabels1)
gtscores = np.zeros_like(gtscores1)
......@@ -208,7 +232,8 @@ def image_mixup(img1, gtboxes1, gtlabels1, gtscores1, img2, gtboxes2, gtlabels2,
def image_augment(img, gtboxes, gtlabels, gtscores, size, means=None):
img = random_distort(img)
img, gtboxes = random_expand(img, gtboxes, fill=means)
img, gtboxes, gtlabels, gtscores = random_crop(img, gtboxes, gtlabels, gtscores)
img, gtboxes, gtlabels, gtscores = \
random_crop(img, gtboxes, gtlabels, gtscores)
img = random_interp(img, size)
img, gtboxes = random_flip(img, gtboxes)
gtboxes, gtlabels, gtscores = shuffle_gtbox(gtboxes, gtlabels, gtscores)
......
......@@ -55,7 +55,13 @@ def conv_bn_layer(input,
out = fluid.layers.leaky_relu(x=out, alpha=0.1)
return out
def downsample(input, ch_out, filter_size=3, stride=2, padding=1, is_test=True, name=None):
def downsample(input,
ch_out,
filter_size=3,
stride=2,
padding=1,
is_test=True,
name=None):
return conv_bn_layer(input,
ch_out=ch_out,
filter_size=filter_size,
......@@ -65,15 +71,19 @@ def downsample(input, ch_out, filter_size=3, stride=2, padding=1, is_test=True,
name=name)
def basicblock(input, ch_out, is_test=True, name=None):
conv1 = conv_bn_layer(input, ch_out, 1, 1, 0, is_test=is_test, name=name+".0")
conv2 = conv_bn_layer(conv1, ch_out*2, 3, 1, 1, is_test=is_test, name=name+".1")
conv1 = conv_bn_layer(input, ch_out, 1, 1, 0,
is_test=is_test, name=name+".0")
conv2 = conv_bn_layer(conv1, ch_out*2, 3, 1, 1,
is_test=is_test, name=name+".1")
out = fluid.layers.elementwise_add(x=input, y=conv2, act=None)
return out
def layer_warp(block_func, input, ch_out, count, is_test=True, name=None):
res_out = block_func(input, ch_out, is_test=is_test, name='{}.0'.format(name))
res_out = block_func(input, ch_out, is_test=is_test,
name='{}.0'.format(name))
for j in range(1, count):
res_out = block_func(res_out, ch_out, is_test=is_test, name='{}.{}'.format(name, j))
res_out = block_func(res_out, ch_out, is_test=is_test,
name='{}.{}'.format(name, j))
return res_out
DarkNet_cfg = {
......@@ -83,14 +93,21 @@ DarkNet_cfg = {
def add_DarkNet53_conv_body(body_input, is_test=True):
stages, block_func = DarkNet_cfg[53]
stages = stages[0:5]
conv1 = conv_bn_layer(
body_input, ch_out=32, filter_size=3, stride=1, padding=1, is_test=is_test, name="yolo_input")
downsample_ = downsample(conv1, ch_out=conv1.shape[1]*2, is_test=is_test, name="yolo_input.downsample")
conv1 = conv_bn_layer(body_input, ch_out=32, filter_size=3,
stride=1, padding=1, is_test=is_test,
name="yolo_input")
downsample_ = downsample(conv1, ch_out=conv1.shape[1]*2,
is_test=is_test,
name="yolo_input.downsample")
blocks = []
for i, stage in enumerate(stages):
block = layer_warp(block_func, downsample_, 32 *(2**i), stage, is_test=is_test, name="stage.{}".format(i))
block = layer_warp(block_func, downsample_, 32 *(2**i),
stage, is_test=is_test,
name="stage.{}".format(i))
blocks.append(block)
if i < len(stages) - 1: # do not downsaple in the last stage
downsample_ = downsample(block, ch_out=block.shape[1]*2, is_test=is_test, name="stage.{}.downsample".format(i))
downsample_ = downsample(block, ch_out=block.shape[1]*2,
is_test=is_test,
name="stage.{}.downsample".format(i))
return blocks[-1:-4:-1]
......@@ -27,29 +27,28 @@ from .darknet import add_DarkNet53_conv_body
from .darknet import conv_bn_layer
def yolo_detection_block(input, channel, is_test=True, name=None):
assert channel % 2 == 0, "channel {} cannot be divided by 2".format(channel)
assert channel % 2 == 0, \
"channel {} cannot be divided by 2".format(channel)
conv = input
for j in range(2):
conv = conv_bn_layer(conv, channel, filter_size=1, stride=1, padding=0, is_test=is_test, name='{}.{}.0'.format(name, j))
conv = conv_bn_layer(conv, channel*2, filter_size=3, stride=1, padding=1, is_test=is_test, name='{}.{}.1'.format(name, j))
route = conv_bn_layer(conv, channel, filter_size=1, stride=1, padding=0, is_test=is_test, name='{}.2'.format(name))
tip = conv_bn_layer(route,channel*2, filter_size=3, stride=1, padding=1, is_test=is_test, name='{}.tip'.format(name))
conv = conv_bn_layer(conv, channel, filter_size=1,
stride=1, padding=0, is_test=is_test,
name='{}.{}.0'.format(name, j))
conv = conv_bn_layer(conv, channel*2, filter_size=3,
stride=1, padding=1, is_test=is_test,
name='{}.{}.1'.format(name, j))
route = conv_bn_layer(conv, channel, filter_size=1, stride=1,
padding=0, is_test=is_test,
name='{}.2'.format(name))
tip = conv_bn_layer(route,channel*2, filter_size=3, stride=1,
padding=1, is_test=is_test,
name='{}.tip'.format(name))
return route, tip
def upsample(input, scale=2,name=None):
# get dynamic upsample output shape
shape_nchw = fluid.layers.shape(input)
shape_hw = fluid.layers.slice(shape_nchw, axes=[0], starts=[2], ends=[4])
shape_hw.stop_gradient = True
in_shape = fluid.layers.cast(shape_hw, dtype='int32')
out_shape = in_shape * scale
out_shape.stop_gradient = True
# reisze by actual_shape
out = fluid.layers.resize_nearest(
input=input,
scale=scale,
actual_shape=out_shape,
scale=float(scale),
name=name)
return out
......@@ -68,11 +67,15 @@ class YOLOv3(object):
if self.is_train:
self.py_reader = fluid.layers.py_reader(
capacity=64,
shapes = [[-1] + self.image_shape, [-1, cfg.max_box_num, 4], [-1, cfg.max_box_num], [-1, cfg.max_box_num]],
shapes = [[-1] + self.image_shape,
[-1, cfg.max_box_num, 4],
[-1, cfg.max_box_num],
[-1, cfg.max_box_num]],
lod_levels=[0, 0, 0, 0],
dtypes=['float32'] * 2 + ['int32'] + ['float32'],
use_double_buffer=True)
self.image, self.gtbox, self.gtlabel, self.gtscore = fluid.layers.read_file(self.py_reader)
self.image, self.gtbox, self.gtlabel, self.gtscore = \
fluid.layers.read_file(self.py_reader)
else:
self.image = fluid.layers.data(
name='image', shape=self.image_shape, dtype='float32'
......@@ -139,9 +142,9 @@ class YOLOv3(object):
if self.is_train:
loss = fluid.layers.yolov3_loss(
x=out,
gtbox=self.gtbox,
gtlabel=self.gtlabel,
gtscore=self.gtscore,
gt_box=self.gtbox,
gt_label=self.gtlabel,
gt_score=self.gtscore,
anchors=cfg.anchors,
anchor_mask=anchor_mask,
class_num=cfg.class_num,
......
......@@ -53,13 +53,17 @@ class DataSetReader(object):
cfg.dataset))
if mode == 'train':
cfg.train_file_list = os.path.join(cfg.data_dir, cfg.train_file_list)
cfg.train_data_dir = os.path.join(cfg.data_dir, cfg.train_data_dir)
cfg.train_file_list = os.path.join(cfg.data_dir,
cfg.train_file_list)
cfg.train_data_dir = os.path.join(cfg.data_dir,
cfg.train_data_dir)
self.COCO = COCO(cfg.train_file_list)
self.img_dir = cfg.train_data_dir
elif mode == 'test' or mode == 'infer':
cfg.val_file_list = os.path.join(cfg.data_dir, cfg.val_file_list)
cfg.val_data_dir = os.path.join(cfg.data_dir, cfg.val_data_dir)
cfg.val_file_list = os.path.join(cfg.data_dir,
cfg.val_file_list)
cfg.val_data_dir = os.path.join(cfg.data_dir,
cfg.val_data_dir)
self.COCO = COCO(cfg.val_file_list)
self.img_dir = cfg.val_data_dir
......@@ -88,7 +92,8 @@ class DataSetReader(object):
def _parse_gt_annotations(self, img):
img_height = img['height']
img_width = img['width']
anno = self.COCO.loadAnns(self.COCO.getAnnIds(imgIds=img['id'], iscrowd=None))
anno = self.COCO.loadAnns(
self.COCO.getAnnIds(imgIds=img['id'], iscrowd=None))
gt_index = 0
for target in anno:
if target['area'] < cfg.gt_min_area:
......@@ -96,13 +101,15 @@ class DataSetReader(object):
if 'ignore' in target and target['ignore']:
continue
box = box_utils.coco_anno_box_to_center_relative(target['bbox'], img_height, img_width)
box = box_utils.coco_anno_box_to_center_relative(
target['bbox'], img_height, img_width)
if box[2] <= 0 and box[3] <= 0:
continue
img['gt_id'][gt_index] = np.int32(target['id'])
img['gt_boxes'][gt_index] = box
img['gt_labels'][gt_index] = self.category_to_id_map[target['category_id']]
img['gt_labels'][gt_index] = \
self.category_to_id_map[target['category_id']]
gt_index += 1
if gt_index >= cfg.max_box_num:
break
......@@ -136,10 +143,18 @@ class DataSetReader(object):
else:
return self._parse_images(is_train=(mode=='train'))
def get_reader(self, mode, size=416, batch_size=None, shuffle=False, mixup_iter=0, random_sizes=[], image=None):
def get_reader(self,
mode,
size=416,
batch_size=None,
shuffle=False,
mixup_iter=0,
random_sizes=[],
image=None):
assert mode in ['train', 'test', 'infer'], "Unknow mode type!"
if mode != 'infer':
assert batch_size is not None, "batch size connot be None in mode {}".format(mode)
assert batch_size is not None, \
"batch size connot be None in mode {}".format(mode)
self._parse_dataset_dir(mode)
self._parse_dataset_catagory()
......@@ -151,7 +166,9 @@ class DataSetReader(object):
h, w, _ = im.shape
im_scale_x = size / float(w)
im_scale_y = size / float(h)
out_img = cv2.resize(im, None, None, fx=im_scale_x, fy=im_scale_y, interpolation=cv2.INTER_CUBIC)
out_img = cv2.resize(im, None, None,
fx=im_scale_x, fy=im_scale_y,
interpolation=cv2.INTER_CUBIC)
mean = np.array(mean).reshape((1, 1, -1))
std = np.array(std).reshape((1, 1, -1))
out_img = (out_img / 255.0 - mean) / std
......@@ -173,11 +190,14 @@ class DataSetReader(object):
mixup_gt_boxes = np.array(mixup_img['gt_boxes']).copy()
mixup_gt_labels = np.array(mixup_img['gt_labels']).copy()
mixup_gt_scores = np.ones_like(mixup_gt_labels)
im, gt_boxes, gt_labels, gt_scores = image_utils.image_mixup(im, gt_boxes, \
gt_labels, gt_scores, mixup_im, mixup_gt_boxes, mixup_gt_labels, \
mixup_gt_scores)
im, gt_boxes, gt_labels, gt_scores = \
image_utils.image_mixup(im, gt_boxes, gt_labels,
gt_scores, mixup_im, mixup_gt_boxes,
mixup_gt_labels, mixup_gt_scores)
im, gt_boxes, gt_labels, gt_scores = image_utils.image_augment(im, gt_boxes, gt_labels, gt_scores, size, mean)
im, gt_boxes, gt_labels, gt_scores = \
image_utils.image_augment(im, gt_boxes, gt_labels,
gt_scores, size, mean)
mean = np.array(mean).reshape((1, 1, -1))
std = np.array(std).reshape((1, 1, -1))
......@@ -214,7 +234,9 @@ class DataSetReader(object):
read_cnt += 1
if read_cnt % len(imgs) == 0 and shuffle:
np.random.shuffle(imgs)
im, gt_boxes, gt_labels, gt_scores = img_reader_with_augment(img, img_size, cfg.pixel_means, cfg.pixel_stds, mixup_img)
im, gt_boxes, gt_labels, gt_scores = \
img_reader_with_augment(img, img_size, cfg.pixel_means,
cfg.pixel_stds, mixup_img)
batch_out.append([im, gt_boxes, gt_labels, gt_scores])
if len(batch_out) == batch_size:
......@@ -227,7 +249,9 @@ class DataSetReader(object):
imgs = self._parse_images_by_mode(mode)
batch_out = []
for img in imgs:
im, im_id, im_shape = img_reader(img, size, cfg.pixel_means, cfg.pixel_stds)
im, im_id, im_shape = img_reader(img, size,
cfg.pixel_means,
cfg.pixel_stds)
batch_out.append((im, im_id, im_shape))
if len(batch_out) == batch_size:
yield batch_out
......@@ -238,7 +262,9 @@ class DataSetReader(object):
img = {}
img['image'] = image
img['id'] = 0
im, im_id, im_shape = img_reader(img, size, cfg.pixel_means, cfg.pixel_stds)
im, im_id, im_shape = img_reader(img, size,
cfg.pixel_means,
cfg.pixel_stds)
batch_out = [(im, im_id, im_shape)]
yield batch_out
......@@ -256,7 +282,8 @@ def train(size=416,
num_workers=8,
max_queue=32,
use_multiprocessing=True):
generator = dsr.get_reader('train', size, batch_size, shuffle, int(mixup_iter/num_workers), random_sizes)
generator = dsr.get_reader('train', size, batch_size, shuffle,
int(mixup_iter/num_workers), random_sizes)
if not use_multiprocessing:
return generator
......
......@@ -80,9 +80,12 @@ def train():
return os.path.exists(os.path.join(cfg.pretrain, var.name))
fluid.io.load_vars(exe, cfg.pretrain, predicate=if_exist)
build_strategy= fluid.BuildStrategy()
build_strategy.memory_optimize = True
build_strategy.sync_batch_norm = cfg.syncbn
compile_program = fluid.compiler.CompiledProgram(
fluid.default_main_program()).with_data_parallel(
loss_name=loss.name)
loss_name=loss.name, build_strategy=build_strategy)
random_sizes = [cfg.input_size]
if cfg.random_shape:
......@@ -90,7 +93,13 @@ def train():
total_iter = cfg.max_iter - cfg.start_iter
mixup_iter = total_iter - cfg.no_mixup_iter
train_reader = reader.train(input_size, batch_size=cfg.batch_size, shuffle=True, total_iter=total_iter*devices_num, mixup_iter=mixup_iter*devices_num, random_sizes=random_sizes, use_multiprocessing=cfg.use_multiprocess)
train_reader = reader.train(input_size,
batch_size=cfg.batch_size,
shuffle=True,
total_iter=total_iter*devices_num,
mixup_iter=mixup_iter*devices_num,
random_sizes=random_sizes,
use_multiprocessing=cfg.use_multiprocess)
py_reader = model.py_reader
py_reader.decorate_paddle_reader(train_reader)
......@@ -112,21 +121,25 @@ def train():
for iter_id in range(cfg.start_iter, cfg.max_iter):
prev_start_time = start_time
start_time = time.time()
losses = exe.run(compile_program, fetch_list=[v.name for v in fetch_list])
losses = exe.run(compile_program,
fetch_list=[v.name for v in fetch_list])
smoothed_loss.add_value(np.mean(np.array(losses[0])))
snapshot_loss += np.mean(np.array(losses[0]))
snapshot_time += start_time - prev_start_time
lr = np.array(fluid.global_scope().find_var('learning_rate')
.get_tensor())
print("Iter {:d}, lr {:.6f}, loss {:.6f}, time {:.5f}".format(
iter_id, lr[0],
smoothed_loss.get_mean_value(), start_time - prev_start_time))
iter_id, lr[0],
smoothed_loss.get_mean_value(),
start_time - prev_start_time))
sys.stdout.flush()
if (iter_id + 1) % cfg.snapshot_iter == 0:
save_model("model_iter{}".format(iter_id))
print("Snapshot {} saved, average loss: {}, average time: {}".format(
iter_id + 1, snapshot_loss / float(cfg.snapshot_iter),
snapshot_time / float(cfg.snapshot_iter)))
print("Snapshot {} saved, average loss: {}, \
average time: {}".format(
iter_id + 1,
snapshot_loss / float(cfg.snapshot_iter),
snapshot_time / float(cfg.snapshot_iter)))
snapshot_loss = 0
snapshot_time = 0
except fluid.core.EOFException:
......
......@@ -101,27 +101,31 @@ def parse_args():
add_arg('dataset', str, 'coco2017', "Dataset: coco2014, coco2017.")
add_arg('class_num', int, 80, "Class number.")
add_arg('data_dir', str, 'dataset/coco', "The data root path.")
add_arg('start_iter', int, 0, "Start iteration.")
add_arg('use_multiprocess', bool, True, "add multiprocess.")
add_arg('start_iter', int, 0, "Start iteration.")
add_arg('use_multiprocess', bool, True, "add multiprocess.")
#SOLVER
add_arg('batch_size', int, 8, "Mini-batch size per device.")
add_arg('learning_rate', float, 0.001, "Learning rate.")
add_arg('max_iter', int, 500200, "Iter number.")
add_arg('snapshot_iter', int, 2000, "Save model every snapshot stride.")
add_arg('label_smooth', bool, True, "Use label smooth in class label.")
add_arg('no_mixup_iter', int, 40000, "Disable mixup in last N iter.")
add_arg('batch_size', int, 8, "Mini-batch size per device.")
add_arg('learning_rate', float, 0.001, "Learning rate.")
add_arg('max_iter', int, 500200, "Iter number.")
add_arg('snapshot_iter', int, 2000, "Save model every snapshot stride.")
add_arg('label_smooth', bool, True, "Use label smooth in class label.")
add_arg('no_mixup_iter', int, 40000, "Disable mixup in last N iter.")
# TRAIN TEST INFER
add_arg('input_size', int, 608, "Image input size of YOLOv3.")
add_arg('random_shape', bool, True, "Resize to random shape for train reader.")
add_arg('valid_thresh', float, 0.005, "Valid confidence score for NMS.")
add_arg('nms_thresh', float, 0.45, "NMS threshold.")
add_arg('syncbn', bool, True, "Whether to use synchronized batch normalization.")
add_arg('random_shape', bool, True, "Resize to random shape for train reader.")
add_arg('valid_thresh', float, 0.005, "Valid confidence score for NMS.")
add_arg('nms_thresh', float, 0.45, "NMS threshold.")
add_arg('nms_topk', int, 400, "The number of boxes to perform NMS.")
add_arg('nms_posk', int, 100, "The number of boxes of NMS output.")
add_arg('debug', bool, False, "Debug mode")
add_arg('debug', bool, False, "Debug mode")
# SINGLE EVAL AND DRAW
add_arg('image_path', str, 'image', "The image path used to inference and visualize.")
add_arg('image_name', str, None, "The single image used to inference and visualize. None to inference all images in image_path")
add_arg('draw_thresh', float, 0.5, "Confidence score threshold to draw prediction box in image in debug mode")
add_arg('image_path', str, 'image',
"The image path used to inference and visualize.")
add_arg('image_name', str, None,
"The single image used to inference and visualize. None to inference all images in image_path")
add_arg('draw_thresh', float, 0.5,
"Confidence score threshold to draw prediction box in image in debug mode")
# yapf: enable
args = parser.parse_args()
file_name = sys.argv[0]
......
<h1 align="center">ELMO</h1>
## 介绍
ELMO(Embeddings from Language Models)是一种新型深度语境化词表征,可对词进行复杂特征(如句法和语义)和词在语言语境中的变化进行建模(即对多义词进行建模)。ELMO作为词向量,解决了两个重要问题:(1)词使用的复杂特性,如句法和语法。(2)如何在具体的语境下使用词,比如多义词的问题。
ELMO在大语料上以language model为训练目标,训练出bidirectional LSTM模型,利用LSTM产生词语的表征, 对下游NLP任务(如问答、分类、命名实体识别等)进行微调。
此版本发布要点:
1. 发布预训练模型完整代码。
2. 支持多卡训练,训练速度比主流实现快约1倍。
3. 发布[ELMO中文预训练模型](https://dureader.gz.bcebos.com/elmo/elmo_chinese_checkpoint.tar.gz),
训练约3.8G中文百科数据。
4. 发布基于ELMO微调步骤和示例代码,验证在中文词法分析任务LAC上f1值提升了0.68%。
## 基本配置及第三方安装包
Python==2.7
PaddlePaddle lastest版本
numpy ==1.15.1
six==1.11.0
glob
## 预训练模型
1. 把文档文件切分成句子,并基于词表(参考[`data/vocabulary_min5k.txt`](data/vocabulary_min5k.txt))对句子进行切词。把文件切分成训练集trainset和测试集testset。训练数据参考[`data/train`](data/train),测试数据参考[`data/dev`](data/dev)
训练集和测试集比例推荐为5:1。
```
本 书 介绍 了 中国 经济 发展 的 内外 平衡 问题 、 亚洲 金融 危机 十 周年 回顾 与 反思 、 实践 中 的 城乡 统筹 发展 、 未来 十 年 中国 需要 研究 的 重大 课题 、 科学 发展 与 新型 工业 化 等 方面 。
```
```
吴 敬 琏 曾经 提出 中国 股市 “ 赌场 论 ” , 主张 维护 市场 规则 , 保护 草根 阶层 生计 , 被 誉 为 “ 中国 经济 学界 良心 ” , 是 媒体 和 公众 眼中 的 学术 明星
```
2. 训练模型
```shell
sh run.sh
```
3. 把checkpoint结果写入文件中。
## 单机多卡训练
模型支持单机多卡训练,需要在[`run.sh`](run.sh)里export CUDA_VISIBLE_DEVICES设置指定卡,如下所示:
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
```
## 如何利用ELMO做微调
在深度学习训练中,例如图像识别训练,每次从零开始训练都要消耗大量的时间和资源。而且当数据集比较少时,模型也难以拟合的情况。基于这种情况下,就出现了迁移学习,通过使用已经训练好的模型来初始化即将训练的网络,可以加快模型的收敛速度,而且还能提高模型的准确率。这个用于初始化训练网络的模型是使用大型数据集训练得到的一个模型,而且模型已经完全收敛。最好训练的模型和预训练的模型是同一个网络,这样可以最大限度地初始化全部层。
利用ELMO做微调,与Bert方式不同,ELMO微调是把ELMO部分作为已预训练好的词向量,接入到NLP下游任务中。
在原论文中推荐的使用方式是,NLP下游任务输入的embedding层与ELMO的输出向量直接做concat。其中,ELMO部分是直接加载预训练出来的模型参数(PaddlePaddle中通过fluid.io.load_vars接口来加载参数),模型参数输入到NLP下游任务是fix的(在PaddlePaddle中通过stop_gradient = True来实现)。
ELMO微调任务的要点如下:
1)下载预训练模型的参数文件。
2)加载elmo网络定义部分bilm.py。
3)在网络启动时加载预训练模型。
4)基于elmo字典对输入做切词并转化为id。
5)elmo词向量与网络embedding层做concat。
具体步骤如下:
1. 下载ELMO Paddle官方发布Checkpoint文件,Checkpoint文件为预训练好的约3.8G中文百科数据。
[PaddlePaddle官方发布Checkpoint文件下载地址](https://dureader.gz.bcebos.com/elmo/elmo_chinese_checkpoint.tar.gz)
2. 在网络初始化启动中加载ELMO Checkpoint文件。加载参数接口(fluid.io.load_vars),可加在网络参数(exe.run(fluid.default_startup_program()))初始化之后。
```shell
# 定义一个使用CPU的执行器
place = fluid.CUDAPlace(0)
# place = fluid.CPUPlace()
exe = fluid.Executor(place)
# 进行参数初始化
exe.run(fluid.default_startup_program())
```
```shell
src_pretrain_model_path = '490001' #490001为ELMO checkpoint文件
def if_exist(var):
path = os.path.join(src_pretrain_model_path, var.name)
exist = os.path.exists(path)
if exist:
print('Load model: %s' % path)
return exist
fluid.io.load_vars(executor=exe, dirname=src_pretrain_model_path, predicate=if_exist, main_program=main_program)
```
3. 在下游NLP任务代码中加入[`bilm.py`](bilm.py) 文件,[`bilm.py`](bilm.py) 是ELMO网络定义部分。
4. 基于elmo词表(参考[`data/vocabulary_min5k.txt`](data/vocabulary_min5k.txt) )对输入的句子或段落进行切词,并把切词的词转化为id,放入feed_dict中。
5. 在NLP下游任务代码,网络定义中embedding部分加入ELMO网络的定义
```shell
#引入 bilm.py embedding部分和encoder部分
from bilm import elmo_encoder
from bilm import emb
#word为输入elmo部分切词后的字典
elmo_embedding = emb(word)
elmo_enc= elmo_encoder(elmo_embedding)
#与NLP任务中生成词向量word_embedding做连接操作
word_embedding=layers.concat(input=[elmo_enc, word_embedding], axis=1)
```
## 参考论文
[Deep contextualized word representations](https://arxiv.org/abs/1802.05365)
## Contributors
本项目由百度深度学习技术平台部PaddlePaddle团队和百度自然语言处理部合作完成。欢迎贡献代码和反馈问题。
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is used to finetune.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy
import paddle.fluid.layers as layers
import paddle.fluid as fluid
import numpy as np
# if you use our release weight layers,do not use the args.
cell_clip = 3.0
proj_clip = 3.0
hidden_size = 4096
vocab_size = 52445
embed_size = 512
# according to orginal paper, dropout need to be modifyed on finetune
modify_dropout = 1
proj_size = 512
num_layers = 2
random_seed = 0
dropout_rate = 0.5
def dropout(input):
return layers.dropout(
input,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
seed=random_seed,
is_test=False)
def lstmp_encoder(input_seq, gate_size, h_0, c_0, para_name):
# A lstm encoder implementation with projection.
# Linear transformation part for input gate, output gate, forget gate
# and cell activation vectors need be done outside of dynamic_lstm.
# So the output size is 4 times of gate_size.
input_proj = layers.fc(input=input_seq,
param_attr=fluid.ParamAttr(
name=para_name + '_gate_w', initializer=init),
size=gate_size * 4,
act=None,
bias_attr=False)
hidden, cell = layers.dynamic_lstmp(
input=input_proj,
size=gate_size * 4,
proj_size=proj_size,
h_0=h_0,
c_0=c_0,
use_peepholes=False,
proj_clip=proj_clip,
cell_clip=cell_clip,
proj_activation="identity",
param_attr=fluid.ParamAttr(initializer=None),
bias_attr=fluid.ParamAttr(initializer=None))
return hidden, cell, input_proj
def encoder(x_emb,
init_hidden=None,
init_cell=None,
para_name=''):
rnn_input = x_emb
rnn_outs = []
rnn_outs_ori = []
cells = []
projs = []
for i in range(num_layers):
if init_hidden and init_cell:
h0 = layers.squeeze(
layers.slice(
init_hidden, axes=[0], starts=[i], ends=[i + 1]),
axes=[0])
c0 = layers.squeeze(
layers.slice(
init_cell, axes=[0], starts=[i], ends=[i + 1]),
axes=[0])
else:
h0 = c0 = None
rnn_out, cell, input_proj = lstmp_encoder(
rnn_input, hidden_size, h0, c0,
para_name + 'layer{}'.format(i + 1))
rnn_out_ori = rnn_out
if i > 0:
rnn_out = rnn_out + rnn_input
rnn_out.stop_gradient = True
rnn_outs.append(rnn_out)
rnn_outs_ori.append(rnn_out_ori)
# add weight layers for finetone
a1 = layers.create_parameter(
[1], dtype="float32", name="gamma1")
a2 = layers.create_parameter(
[1], dtype="float32", name="gamma2")
rnn_outs[0].stop_gradient = True
rnn_outs[1].stop_gradient = True
num_layer1 = rnn_outs[0] * a1
num_layer2 = rnn_outs[1] * a2
output_layer = num_layer1 * 0.5 + num_layer2 * 0.5
return output_layer, rnn_outs_ori
def emb(x):
x_emb = layers.embedding(
input=x,
size=[vocab_size, embed_size],
dtype='float32',
is_sparse=False,
param_attr=fluid.ParamAttr(name='embedding_para'))
return x_emb
def elmo_encoder(x_emb):
x_emb_r = fluid.layers.sequence_reverse(x_emb, name=None)
fw_hiddens, fw_hiddens_ori = encoder(
x_emb,
para_name='fw_')
bw_hiddens, bw_hiddens_ori = encoder(
x_emb_r,
para_name='bw_')
embedding = layers.concat(input=[fw_hiddens, bw_hiddens], axis=1)
# add dropout on finetune
embedding = dropout(embedding)
a = layers.create_parameter(
[1], dtype="float32", name="gamma")
embedding = embedding * a
return embedding
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid.layers as layers
import paddle.fluid as fluid
import numpy as np
def dropout(input, test_mode, args):
if args.dropout and (not test_mode):
return layers.dropout(
input,
dropout_prob=args.dropout,
dropout_implementation="upscale_in_train",
seed=args.random_seed,
is_test=False)
else:
return input
def lstmp_encoder(input_seq, gate_size, h_0, c_0, para_name, proj_size, test_mode, args):
# A lstm encoder implementation with projection.
# Linear transformation part for input gate, output gate, forget gate
# and cell activation vectors need be done outside of dynamic_lstm.
# So the output size is 4 times of gate_size.
input_seq = dropout(input_seq, test_mode, args)
input_proj = layers.fc(input=input_seq,
param_attr=fluid.ParamAttr(
name=para_name + '_gate_w', initializer=None),
size=gate_size * 4,
act=None,
bias_attr=False)
hidden, cell = layers.dynamic_lstmp(
input=input_proj,
size=gate_size * 4,
proj_size=proj_size,
h_0=h_0,
c_0=c_0,
use_peepholes=False,
proj_clip=args.proj_clip,
cell_clip=args.cell_clip,
proj_activation="identity",
param_attr=fluid.ParamAttr(initializer=None),
bias_attr=fluid.ParamAttr(initializer=None))
return hidden, cell, input_proj
def encoder(x,
y,
vocab_size,
emb_size,
init_hidden=None,
init_cell=None,
para_name='',
custom_samples=None,
custom_probabilities=None,
test_mode=False,
args=None):
x_emb = layers.embedding(
input=x,
size=[vocab_size, emb_size],
dtype='float32',
is_sparse=False,
param_attr=fluid.ParamAttr(name='embedding_para'))
rnn_input = x_emb
rnn_outs = []
rnn_outs_ori = []
cells = []
projs = []
for i in range(args.num_layers):
rnn_input = dropout(rnn_input, test_mode, args)
if init_hidden and init_cell:
h0 = layers.squeeze(
layers.slice(
init_hidden, axes=[0], starts=[i], ends=[i + 1]),
axes=[0])
c0 = layers.squeeze(
layers.slice(
init_cell, axes=[0], starts=[i], ends=[i + 1]),
axes=[0])
else:
h0 = c0 = None
rnn_out, cell, input_proj = lstmp_encoder(
rnn_input, args.hidden_size, h0, c0,
para_name + 'layer{}'.format(i + 1), emb_size, test_mode, args)
rnn_out_ori = rnn_out
if i > 0:
rnn_out = rnn_out + rnn_input
rnn_out = dropout(rnn_out, test_mode, args)
cell = dropout(cell, test_mode, args)
rnn_outs.append(rnn_out)
rnn_outs_ori.append(rnn_out_ori)
rnn_input = rnn_out
cells.append(cell)
projs.append(input_proj)
softmax_weight = layers.create_parameter(
[vocab_size, emb_size], dtype="float32", name="softmax_weight")
softmax_bias = layers.create_parameter(
[vocab_size], dtype="float32", name='softmax_bias')
projection = layers.matmul(rnn_outs[-1], softmax_weight, transpose_y=True)
projection = layers.elementwise_add(projection, softmax_bias)
projection = layers.reshape(projection, shape=[-1, vocab_size])
if args.sample_softmax and (not test_mode):
loss = layers.sampled_softmax_with_cross_entropy(
logits=projection,
label=y,
num_samples=args.n_negative_samples_batch,
seed=args.random_seed)
else:
label = layers.one_hot(input=y, depth=vocab_size)
loss = layers.softmax_with_cross_entropy(
logits=projection, label=label, soft_label=True)
return [x_emb, projection, loss], rnn_outs, rnn_outs_ori, cells, projs
class LanguageModel(object):
def __init__(self, args, vocab_size, test_mode):
self.args = args
self.vocab_size = vocab_size
self.test_mode = test_mode
def build(self):
args = self.args
emb_size = args.embed_size
proj_size = args.embed_size
hidden_size = args.hidden_size
batch_size = args.batch_size
num_layers = args.num_layers
num_steps = args.num_steps
lstm_outputs = []
x_f = layers.data(name="x", shape=[1], dtype='int64', lod_level=1)
y_f = layers.data(name="y", shape=[1], dtype='int64', lod_level=1)
x_b = layers.data(name="x_r", shape=[1], dtype='int64', lod_level=1)
y_b = layers.data(name="y_r", shape=[1], dtype='int64', lod_level=1)
init_hiddens_ = layers.data(
name="init_hiddens", shape=[1], dtype='float32')
init_cells_ = layers.data(
name="init_cells", shape=[1], dtype='float32')
init_hiddens = layers.reshape(
init_hiddens_, shape=[2 * num_layers, -1, proj_size])
init_cells = layers.reshape(
init_cells_, shape=[2 * num_layers, -1, hidden_size])
init_hidden = layers.slice(
init_hiddens, axes=[0], starts=[0], ends=[num_layers])
init_cell = layers.slice(
init_cells, axes=[0], starts=[0], ends=[num_layers])
init_hidden_r = layers.slice(
init_hiddens, axes=[0], starts=[num_layers],
ends=[2 * num_layers])
init_cell_r = layers.slice(
init_cells, axes=[0], starts=[num_layers], ends=[2 * num_layers])
if args.use_custom_samples:
custom_samples = layers.data(
name="custom_samples",
shape=[args.n_negative_samples_batch + 1],
dtype='int64',
lod_level=1)
custom_samples_r = layers.data(
name="custom_samples_r",
shape=[args.n_negative_samples_batch + 1],
dtype='int64',
lod_level=1)
custom_probabilities = layers.data(
name="custom_probabilities",
shape=[args.n_negative_samples_batch + 1],
dtype='float32',
lod_level=1)
else:
custom_samples = None
custom_samples_r = None
custom_probabilities = None
forward, fw_hiddens, fw_hiddens_ori, fw_cells, fw_projs = encoder(
x_f,
y_f,
self.vocab_size,
emb_size,
init_hidden,
init_cell,
para_name='fw_',
custom_samples=custom_samples,
custom_probabilities=custom_probabilities,
test_mode=self.test_mode,
args=args)
backward, bw_hiddens, bw_hiddens_ori, bw_cells, bw_projs = encoder(
x_b,
y_b,
self.vocab_size,
emb_size,
init_hidden_r,
init_cell_r,
para_name='bw_',
custom_samples=custom_samples_r,
custom_probabilities=custom_probabilities,
test_mode=self.test_mode,
args=args)
losses = layers.concat([forward[-1], backward[-1]])
self.loss = layers.reduce_mean(losses)
self.loss.persistable = True
self.grad_vars = [x_f, y_f, x_b, y_b, self.loss]
self.grad_vars_name = ['x', 'y', 'x_r', 'y_r', 'final_loss']
fw_vars_name = ['x_emb', 'proj', 'loss'] + [
'init_hidden', 'init_cell'
] + ['rnn_out', 'rnn_out2', 'cell', 'cell2', 'xproj', 'xproj2']
bw_vars_name = ['x_emb_r', 'proj_r', 'loss_r'] + [
'init_hidden_r', 'init_cell_r'
] + [
'rnn_out_r', 'rnn_out2_r', 'cell_r', 'cell2_r', 'xproj_r',
'xproj2_r'
]
fw_vars = forward + [init_hidden, init_cell
] + fw_hiddens + fw_cells + fw_projs
bw_vars = backward + [init_hidden_r, init_cell_r
] + bw_hiddens + bw_cells + bw_projs
for i in range(len(fw_vars_name)):
self.grad_vars.append(fw_vars[i])
self.grad_vars.append(bw_vars[i])
self.grad_vars_name.append(fw_vars_name[i])
self.grad_vars_name.append(bw_vars_name[i])
if args.use_custom_samples:
self.feed_order = [
'x', 'y', 'x_r', 'y_r', 'custom_samples', 'custom_samples_r',
'custom_probabilities'
]
else:
self.feed_order = ['x', 'y', 'x_r', 'y_r']
self.last_hidden = [
fluid.layers.sequence_last_step(input=x)
for x in fw_hiddens_ori + bw_hiddens_ori
]
self.last_cell = [
fluid.layers.sequence_last_step(input=x)
for x in fw_cells + bw_cells
]
self.last_hidden = layers.concat(self.last_hidden, axis=0)
self.last_hidden.persistable = True
self.last_cell = layers.concat(self.last_cell, axis=0)
self.last_cell.persistable = True
export CUDA_VISIBLE_DEVICES=0
python train.py \
--train_path='data/train/sentence_file_*' \
--test_path='data/dev/sentence_file_*' \
--vocab_path data/vocabulary_min5k.txt \
--learning_rate 0.2 \
--use_gpu True \
--local True $@
此差异已折叠。
Subproject commit a4eb73b2fb64d8aab8499a1184edf4fc386f8268
Subproject commit 77ab80a7061024c4b28f0b41fdd6ba42d5e6d9e1
PaddleNLP
=========
机器翻译
--------
机器翻译(Machine Translation)将一种自然语言(源语言)转换成一种自然语言(目标语言),是自然语言处理中非常基础和重要的研究方向。在全球化的浪潮中,机器翻译在促进跨语言文明的交流中所起的重要作用是不言而喻的。其发展经历了统计机器翻译和基于神经网络的神经机器翻译(Nueural
Machine Translation, NMT)等阶段。在 NMT 成熟后,机器翻译才真正得以大规模应用。而早阶段的 NMT 主要是基于循环神经网络 RNN 的,其训练过程中当前时间步依赖于前一个时间步的计算,时间步之间难以并行化以提高训练速度。因此,非 RNN 结构的 NMT 得以应运而生,例如基 卷积神经网络 CNN 的结构和基于自注意力机制(Self-Attention)的结构。
本实例所实现的 Transformer 就是一个基于自注意力机制的机器翻译模型,其中不再有RNN或CNN结构,而是完全利用 Attention 学习语言中的上下文依赖。相较于RNN/CNN, 这种结构在单层内计算复杂度更低、易于并行化、对长程依赖更易建模,最终在多种语言之间取得了最好的翻译效果。
- [Transformer](https://github.com/PaddlePaddle/models/blob/develop/PaddleNLP/neural_machine_translation/transformer/README_cn.md)
中文词法分析
------------
中文分词(Word Segmentation)是将连续的自然语言文本,切分出具有语义合理性和完整性的词汇序列的过程。因为在汉语中,词是承担语义的最基本单位,切词是文本分类、情感分析、信息检索等众多自然语言处理任务的基础。 词性标注(Part-of-speech Tagging)是为自然语言文本中的每一个词汇赋予一个词性的过程,这里的词性包括名词、动词、形容词、副词等等。 命名实体识别(Named Entity Recognition,NER)又称作“专名识别”,是指识别自然语言文本中具有特定意义的实体,主要包括人名、地名、机构名、专有名词等。 我们将这三个任务统一成一个联合任务,称为词法分析任务,基于深度神经网络,利用海量标注语料进行训练,提供了一个端到端的解决方案。
我们把这个联合的中文词法分析解决方案命名为LAC。LAC既可以认为是Lexical Analysis of Chinese的首字母缩写,也可以认为是LAC Analyzes Chinese的递归缩写。
- [LAC](https://github.com/baidu/lac/blob/master/README.md)
情感倾向分析
------------
情感倾向分析针对带有主观描述的中文文本,可自动判断该文本的情感极性类别并给出相应的置信度。情感类型分为积极、消极、中性。情感倾向分析能够帮助企业理解用户消费习惯、分析热点话题和危机舆情监控,为企业提供有力的决策支持。本次我们开放 AI 开放平台中情感倾向分析采用的[模型](http://ai.baidu.com/tech/nlp/sentiment_classify),提供给用户使用。
PaddleNLP 是百度开源的工业级 NLP 工具与预训练模型集,能够适应全面丰富的 NLP 任务,方便开发者灵活插拔尝试多种网络结构,并且让应用最快速达到工业级效果。
- [Senta](https://github.com/baidu/Senta/blob/master/README.md)
PaddleNLP 完全基于[PaddlePaddle Fluid](http://www.paddlepaddle.org/)开发,并提供依托于百度百亿级大数据的预训练模型,能够极大地方便 NLP 研究者和工程师快速应用。使用者可以用PaddleNLP 快速实现文本分类、文本匹配、序列标注、阅读理解、智能对话等NLP任务的组网、建模和部署,并且可以直接使用百度开源工业级预训练模型进行快速应用。用户在极大地减少研究和开发成本的同时,也可以获得更好的基于工业实践的应用效果。
语义匹配
特点与优势
--------
- 全面丰富的中文NLP应用任务;
- 任务与网络解耦,网络灵活可插拔;
- 强大的工业化预训练模型,打造优异应用效果。
目录结构
------
```text
.
├── dialogue_model_toolkit # 对话模型工具箱
├── emotion_detection # 对话情绪识别
├── knowledge_driven_dialogue # 知识驱动对话
├── language_model # 语言模型
├── language_representations_kit # 语言表示工具箱
├── lexical_analysis # 词法分析
├── models # 共享网络
│ ├── __init__.py
│ ├── classification
│ ├── dialogue_model_toolkit
│ ├── language_model
│ ├── matching
│ ├── neural_machine_translation
│ ├── reading_comprehension
│ ├── representation
│ ├── sequence_labeling
│ └── transformer_encoder.py
├── neural_machine_translation # 机器翻译
├── preprocess # 共享文本预处理工具
│ ├── __init__.py
│ ├── ernie
│ ├── padding.py
│ └── tokenizer
├── reading_comprehension # 阅读理解
├── sentiment_classification # 文本情感分析
├── similarity_net # 短文本语义匹配
```
其中,除了 `models``preprocess` 分别是共享的模型集合与共享的数据预处理流程之外,其它目录包含的都是相互独立的任务,可以直接进入这些目录运行任务。
快速安装
-------
### 依赖
本项目依赖于 Python 2.7 和 Paddle Fluid 1.3.1 及以上版本,请参考 [安装指南](http://www.paddlepaddle.org/#quick-start) 安装 PaddlePaddle。
### 流程
- 克隆代码库到本地
```shell
git clone https://github.com/PaddlePaddle/models.git
```
- 进入到特定的子目录中查看代码和运行任务(如情感分析)
```shell
cd models/PaddleNLP/sentiment_classification
```
支持的 NLP 任务
-------------
### 文本分类
- [文本情感分析](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/sentiment_classification)
- [对话情绪识别](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/emotion_detection)
### 文本匹配
- [短文本语义匹配](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/similarity_net)
### 序列标注
- [词法分析](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis)
### 文本生成
- [机器翻译](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/neural_machine_translation/transformer)
### 语义表示与语言模型
- [语言表示工具箱](https://github.com/PaddlePaddle/LARK/tree/develop)
- [语言模型](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/language_model)
### 复杂任务
- [对话模型工具箱](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/dialogue_model_toolkit)
- [知识驱动对话](https://github.com/baidu/knowledge-driven-dialogue/tree/master)
- [阅读理解](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/reading_comprehension)
在自然语言处理很多场景中,需要度量两个文本在语义上的相似度,这类任务通常被称为语义匹配。例如在搜索中根据查询与候选文档的相似度对搜索结果进行排序,文本去重中文本与文本相似度的计算,自动问答中候选答案与问题的匹配等。
本例所开放的DAM (Deep Attention Matching Network)为百度自然语言处理部发表于ACL-2018的工作,用于检索式聊天机器人多轮对话中应答的选择。DAM受Transformer的启发,其网络结构完全基于注意力(attention)机制,利用栈式的self-attention结构分别学习不同粒度下应答和语境的语义表示,然后利用cross-attention获取应答与语境之间的相关性,在两个大规模多轮对话数据集上的表现均好于其它模型。
- [Deep Attention Matching Network](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/deep_attention_matching_net)
AnyQ
----
[AnyQ](https://github.com/baidu/AnyQ)(ANswer Your Questions) 开源项目主要包含面向FAQ集合的问答系统框架、文本语义匹配工具SimNet。 问答系统框架采用了配置化、插件化的设计,各功能均通过插件形式加入,当前共开放了20+种插件。开发者可以使用AnyQ系统快速构建和定制适用于特定业务场景的FAQ问答系统,并加速迭代和升级。
SimNet是百度自然语言处理部于2013年自主研发的语义匹配框架,该框架在百度各产品上广泛应用,主要包括BOW、CNN、RNN、MM-DNN等核心网络结构形式,同时基于该框架也集成了学术界主流的语义匹配模型,如MatchPyramid、MV-LSTM、K-NRM等模型。使用SimNet构建出的模型可以便捷的加入AnyQ系统中,增强AnyQ系统的语义匹配能力。
- [SimNet in PaddlePaddle Fluid](https://github.com/baidu/AnyQ/blob/master/tools/simnet/train/paddle/README.md)
机器阅读理解
----------
机器阅读理解(MRC)是自然语言处理(NLP)中的核心任务之一,最终目标是让机器像人类一样阅读文本,提炼文本信息并回答相关问题。深度学习近年来在NLP中得到广泛使用,也使得机器阅读理解能力在近年有了大幅提高,但是目前研究的机器阅读理解都采用人工构造的数据集,以及回答一些相对简单的问题,和人类处理的数据还有明显差距,因此亟需大规模真实训练数据推动MRC的进一步发展。
百度阅读理解数据集是由百度自然语言处理部开源的一个真实世界数据集,所有的问题、原文都来源于实际数据(百度搜索引擎数据和百度知道问答社区),答案是由人类回答的。每个问题都对应多个答案,数据集包含200k问题、1000k原文和420k答案,是目前最大的中文MRC数据集。百度同时开源了对应的阅读理解模型,称为DuReader,采用当前通用的网络分层结构,通过双向attention机制捕捉问题和原文之间的交互关系,生成query-aware的原文表示,最终基于query-aware的原文表示通过point network预测答案范围。
- [DuReader in PaddlePaddle Fluid](https://github.com/PaddlePaddle/models/blob/develop/PaddleNLP/machine_reading_comprehension/README.md)
Subproject commit dc1af6a83dd1372055158ac6d17f6d14b3a0f0f8
Subproject commit b3e096b92f26720f6e3b020b374e11aa0748c032
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册