提交 5c47580a 编写于 作者: D Dun 提交者: qingqing01

add Deeplabv3+ model (#1252)

* add Deeplabv3+ model
上级 6c9d59a3
DeepLab运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
## 代码结构
```
├── models.py # 网络结构定义脚本
├── train.py # 训练任务脚本
├── eval.py # 评估脚本
└── reader.py # 定义通用的函数以及数据预处理脚本
```
## 简介
DeepLabv3+ 是DeepLab语义分割系列网络的最新作,其前作有 DeepLabv1,DeepLabv2, DeepLabv3,
在最新作中,DeepLab的作者通过encoder-decoder进行多尺度信息的融合,同时保留了原来的空洞卷积和ASSP层,
其骨干网络使用了Xception模型,提高了语义分割的健壮性和运行速率,在 PASCAL VOC 2012 dataset取得新的state-of-art performance,89.0mIOU。
![](./imgs/model.png)
## 数据准备
本文采用Cityscape数据集,请前往[Cityscape官网](https://www.cityscapes-dataset.com)注册下载。
下载以后的数据目录结构如下
```
data/cityscape/
|-- gtFine
| |-- test
| |-- train
| `-- val
|-- leftImg8bit
|-- test
|-- train
`-- val
```
# 预训练模型准备
如果需要从头开始训练模型,用户需要下载我们的初始化模型
```
wget http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus_xception65_initialize.tar.gz
```
如果需要最终训练模型进行fine tune或者直接用于预测,请下载我们的最终模型
```
wget http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus.tar.gz
```
## 模型训练与预测
### 训练
执行以下命令进行训练,同时指定weights的保存路径,初始化路径,以及数据存放位置:
```
python ./train.py \
--batch_size=1 \
--train_crop_size=769 \
--total_step=50 \
--init_weights_path=$INIT_WEIGHTS_PATH \
--save_weights_path=$SAVE_WEIGHTS_PATH \
--dataset_path=$DATASET_PATH
```
使用以下命令获得更多使用说明:
```
python train.py --help
```
以上命令用于测试训练过程是否正常,仅仅迭代了50次并且使用了1的batch size,如果需要复现
原论文的实验,请使用以下设置:
```
python ./train.py \
--batch_size=8 \
--parallel=true
--train_crop_size=769 \
--total_step=90000 \
--init_weights_path=$INIT_WEIGHTS_PATH \
--save_weights_path=$SAVE_WEIGHTS_PATH \
--dataset_path=$DATASET_PATH
```
### 测试
执行以下命令在`Cityscape`测试数据集上进行测试:
```
python ./eval.py \
--init_weights_path=$INIT_WEIGHTS_PATH \
--dataset_path=$DATASET_PATH
```
需要通过选项`--model_path`指定模型文件。
测试脚本的输出的评估指标为[mean IoU]()。
## 实验结果
训练完成以后,使用`eval.py`在验证集上进行测试,得到以下结果:
```
load from: ../models/deeplabv3p
total number 500
step: 500, mIoU: 0.7873
```
## 其他信息
|数据集 | pretrained model | trained model | mean IoU
|---|---|---|---|
|CityScape | [deeplabv3plus_xception65_initialize.tar.gz](http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus_xception65_initialize.tar.gz) | [deeplabv3plus.tar.gz](http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus.tar.gz) | 0.7873 |
## 参考
- [Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611)
import os
os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98'
import paddle
import paddle.fluid as fluid
import numpy as np
import argparse
from reader import CityscapeDataset
import reader
import models
import sys
def add_argument(name, type, default, help):
parser.add_argument('--' + name, default=default, type=type, help=help)
def add_arguments():
add_argument('total_step', int, -1,
"Number of the step to be evaluated, -1 for full evaluation.")
add_argument('init_weights_path', str, None,
"Path of the weights to evaluate.")
add_argument('dataset_path', str, None, "Cityscape dataset path.")
add_argument('verbose', bool, False, "Print mIoU for each step if verbose.")
add_argument('use_gpu', bool, True, "Whether use GPU or CPU.")
def mean_iou(pred, label):
label = fluid.layers.elementwise_min(
label, fluid.layers.assign(np.array(
[num_classes], dtype=np.int32)))
label_ignore = (label == num_classes).astype('int32')
label_nignore = (label != num_classes).astype('int32')
pred = pred * label_nignore + label_ignore * num_classes
miou, wrong, correct = fluid.layers.mean_iou(pred, label, num_classes + 1)
return miou, wrong, correct
def load_model():
if args.init_weights_path.endswith('/'):
fluid.io.load_params(
exe, dirname=args.init_weights_path, main_program=tp)
else:
fluid.io.load_params(
exe, dirname="", filename=args.init_weights_path, main_program=tp)
CityscapeDataset = reader.CityscapeDataset
parser = argparse.ArgumentParser()
add_arguments()
args = parser.parse_args()
models.clean()
models.is_train = False
deeplabv3p = models.deeplabv3p
image_shape = [1025, 2049]
eval_shape = [1024, 2048]
sp = fluid.Program()
tp = fluid.Program()
batch_size = 1
reader.default_config['crop_size'] = -1
reader.default_config['shuffle'] = False
num_classes = 19
with fluid.program_guard(tp, sp):
img = fluid.layers.data(name='img', shape=[3, 0, 0], dtype='float32')
label = fluid.layers.data(name='label', shape=eval_shape, dtype='int32')
img = fluid.layers.resize_bilinear(img, image_shape)
logit = deeplabv3p(img)
logit = fluid.layers.resize_bilinear(logit, eval_shape)
pred = fluid.layers.argmax(logit, axis=1).astype('int32')
miou, out_wrong, out_correct = mean_iou(pred, label)
tp = tp.clone(True)
fluid.memory_optimize(
tp,
print_log=False,
skip_opt_set=[pred.name, miou, out_wrong, out_correct],
level=1)
place = fluid.CPUPlace()
if args.use_gpu:
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(sp)
if args.init_weights_path:
print "load from:", args.init_weights_path
load_model()
dataset = CityscapeDataset(args.dataset_path, 'val')
if args.total_step == -1:
total_step = len(dataset.label_files)
else:
total_step = args.total_step
batches = dataset.get_batch_generator(batch_size, total_step)
sum_iou = 0
all_correct = np.array([0], dtype=np.int64)
all_wrong = np.array([0], dtype=np.int64)
for i, imgs, labels, names in batches:
result = exe.run(tp,
feed={'img': imgs,
'label': labels},
fetch_list=[pred, miou, out_wrong, out_correct])
wrong = result[2][:-1] + all_wrong
right = result[3][:-1] + all_correct
all_wrong = wrong.copy()
all_correct = right.copy()
mp = (wrong + right) != 0
miou2 = np.mean((right[mp] * 1.0 / (right[mp] + wrong[mp])))
if args.verbose:
print 'step: %s, mIoU: %s' % (i + 1, miou2)
else:
print '\rstep: %s, mIoU: %s' % (i + 1, miou2),
sys.stdout.flush()
import paddle
import paddle.fluid as fluid
import contextlib
name_scope = ""
decode_channel = 48
encode_channel = 256
label_number = 19
bn_momentum = 0.99
dropout_keep_prop = 0.9
is_train = True
op_results = {}
default_epsilon = 1e-3
default_norm_type = 'bn'
default_group_number = 32
@contextlib.contextmanager
def scope(name):
global name_scope
bk = name_scope
name_scope = name_scope + name + '/'
yield
name_scope = bk
def check(data, number):
if type(data) == int:
return [data] * number
assert len(data) == number
return data
def clean():
global op_results
op_results = {}
def append_op_result(result, name):
global op_results
op_index = len(op_results)
name = name_scope + name + str(op_index)
op_results[name] = result
return result
def conv(*args, **kargs):
kargs['param_attr'] = name_scope + 'weights'
if kargs.has_key('bias_attr') and kargs['bias_attr']:
kargs['bias_attr'] = name_scope + 'biases'
else:
kargs['bias_attr'] = False
return append_op_result(fluid.layers.conv2d(*args, **kargs), 'conv')
def group_norm(input, G, eps=1e-5, param_attr=None, bias_attr=None):
helper = fluid.layer_helper.LayerHelper('group_norm', **locals())
N, C, H, W = input.shape
if C % G != 0:
print "group can not divide channle:", C, G
for d in range(10):
for t in [d, -d]:
if G + t <= 0: continue
if C % (G + t) == 0:
G = G + t
break
if C % G == 0:
print "use group size:", G
break
assert C % G == 0
param_shape = (G, )
x = input
x = fluid.layers.reshape(x, [N, G, C // G * H * W])
mean = fluid.layers.reduce_mean(x, dim=2, keep_dim=True)
x = x - mean
var = fluid.layers.reduce_mean(fluid.layers.square(x), dim=2, keep_dim=True)
x = x / fluid.layers.sqrt(var + eps)
scale = helper.create_parameter(
attr=helper.param_attr,
shape=param_shape,
dtype='float32',
default_initializer=fluid.initializer.Constant(1.0))
bias = helper.create_parameter(
attr=helper.bias_attr, shape=param_shape, dtype='float32', is_bias=True)
x = fluid.layers.elementwise_add(
fluid.layers.elementwise_mul(
x, scale, axis=1), bias, axis=1)
return fluid.layers.reshape(x, input.shape)
def bn(*args, **kargs):
if default_norm_type == 'bn':
with scope('BatchNorm'):
return append_op_result(
fluid.layers.batch_norm(
*args,
epsilon=default_epsilon,
momentum=bn_momentum,
param_attr=name_scope + 'gamma',
bias_attr=name_scope + 'beta',
moving_mean_name=name_scope + 'moving_mean',
moving_variance_name=name_scope + 'moving_variance',
**kargs),
'bn')
elif default_norm_type == 'gn':
with scope('GroupNorm'):
return append_op_result(
group_norm(
args[0],
default_group_number,
eps=default_epsilon,
param_attr=name_scope + 'gamma',
bias_attr=name_scope + 'beta'),
'gn')
else:
raise "Unsupport norm type:" + default_norm_type
def bn_relu(data):
return append_op_result(fluid.layers.relu(bn(data)), 'relu')
def relu(data):
return append_op_result(fluid.layers.relu(data), 'relu')
def seq_conv(input, channel, stride, filter, dilation=1, act=None):
with scope('depthwise'):
input = conv(
input,
input.shape[1],
filter,
stride,
groups=input.shape[1],
padding=(filter / 2) * dilation,
dilation=dilation)
input = bn(input)
if act: input = act(input)
with scope('pointwise'):
input = conv(input, channel, 1, 1, groups=1, padding=0)
input = bn(input)
if act: input = act(input)
return input
def xception_block(input,
channels,
strides=1,
filters=3,
dilation=1,
skip_conv=True,
has_skip=True,
activation_fn_in_separable_conv=False):
repeat_number = 3
channels = check(channels, repeat_number)
filters = check(filters, repeat_number)
strides = check(strides, repeat_number)
data = input
datum = []
for i in range(repeat_number):
with scope('separable_conv' + str(i + 1)):
if not activation_fn_in_separable_conv:
data = relu(data)
data = seq_conv(
data,
channels[i],
strides[i],
filters[i],
dilation=dilation)
else:
data = seq_conv(
data,
channels[i],
strides[i],
filters[i],
dilation=dilation,
act=relu)
datum.append(data)
if not has_skip:
return append_op_result(data, 'xception_block'), datum
if skip_conv:
with scope('shortcut'):
skip = bn(
conv(
input, channels[-1], 1, strides[-1], groups=1, padding=0))
else:
skip = input
return append_op_result(data + skip, 'xception_block'), datum
def entry_flow(data):
with scope("entry_flow"):
with scope("conv1"):
data = conv(data, 32, 3, stride=2, padding=1)
data = bn_relu(data)
with scope("conv2"):
data = conv(data, 64, 3, stride=1, padding=1)
data = bn_relu(data)
with scope("block1"):
data, _ = xception_block(data, 128, [1, 1, 2])
with scope("block2"):
data, datum = xception_block(data, 256, [1, 1, 2])
with scope("block3"):
data, _ = xception_block(data, 728, [1, 1, 2])
return data, datum[1]
def middle_flow(data):
with scope("middle_flow"):
for i in range(16):
with scope("block" + str(i + 1)):
data, _ = xception_block(data, 728, [1, 1, 1], skip_conv=False)
return data
def exit_flow(data):
with scope("exit_flow"):
with scope('block1'):
data, _ = xception_block(data, [728, 1024, 1024], [1, 1, 1])
with scope('block2'):
data, _ = xception_block(
data, [1536, 1536, 2048], [1, 1, 1],
dilation=2,
has_skip=False,
activation_fn_in_separable_conv=True)
return data
def dropout(x, keep_rate):
if is_train:
return fluid.layers.dropout(x, 1 - keep_rate) / keep_rate
else:
return x
def encoder(input):
with scope('encoder'):
channel = 256
with scope("image_pool"):
image_avg = fluid.layers.reduce_mean(input, [2, 3], keep_dim=True)
append_op_result(image_avg, 'reduce_mean')
image_avg = bn_relu(
conv(
image_avg, channel, 1, 1, groups=1, padding=0))
image_avg = fluid.layers.resize_bilinear(image_avg, input.shape[2:])
with scope("aspp0"):
aspp0 = bn_relu(conv(input, channel, 1, 1, groups=1, padding=0))
with scope("aspp1"):
aspp1 = seq_conv(input, channel, 1, 3, dilation=6, act=relu)
with scope("aspp2"):
aspp2 = seq_conv(input, channel, 1, 3, dilation=12, act=relu)
with scope("aspp3"):
aspp3 = seq_conv(input, channel, 1, 3, dilation=18, act=relu)
with scope("concat"):
data = append_op_result(
fluid.layers.concat(
[image_avg, aspp0, aspp1, aspp2, aspp3], axis=1),
'concat')
data = bn_relu(conv(data, channel, 1, 1, groups=1, padding=0))
data = dropout(data, dropout_keep_prop)
return data
def decoder(encode_data, decode_shortcut):
with scope('decoder'):
with scope('concat'):
decode_shortcut = bn_relu(
conv(
decode_shortcut, decode_channel, 1, 1, groups=1, padding=0))
encode_data = fluid.layers.resize_bilinear(
encode_data, decode_shortcut.shape[2:])
encode_data = fluid.layers.concat(
[encode_data, decode_shortcut], axis=1)
append_op_result(encode_data, 'concat')
with scope("separable_conv1"):
encode_data = seq_conv(
encode_data, encode_channel, 1, 3, dilation=1, act=relu)
with scope("separable_conv2"):
encode_data = seq_conv(
encode_data, encode_channel, 1, 3, dilation=1, act=relu)
return encode_data
def deeplabv3p(img):
global default_epsilon
append_op_result(img, 'img')
with scope('xception_65'):
default_epsilon = 1e-3
# Entry flow
data, decode_shortcut = entry_flow(img)
# Middle flow
data = middle_flow(data)
# Exit flow
data = exit_flow(data)
default_epsilon = 1e-5
encode_data = encoder(data)
encode_data = decoder(encode_data, decode_shortcut)
with scope('logit'):
logit = conv(
encode_data, label_number, 1, stride=1, padding=0, bias_attr=True)
logit = fluid.layers.resize_bilinear(logit, img.shape[2:])
return logit
import cv2
import numpy as np
default_config = {
"shuffle": True,
"min_resize": 0.5,
"max_resize": 2,
"crop_size": 769,
}
def slice_with_pad(a, s, value=0):
pads = []
slices = []
for i in range(len(a.shape)):
if i >= len(s):
pads.append([0, 0])
slices.append([0, a.shape[i]])
else:
l, r = s[i]
if l < 0:
pl = -l
l = 0
else:
pl = 0
if r > a.shape[i]:
pr = r - a.shape[i]
r = a.shape[i]
else:
pr = 0
pads.append([pl, pr])
slices.append([l, r])
slices = map(lambda x: slice(x[0], x[1], 1), slices)
a = a[slices]
a = np.pad(a, pad_width=pads, mode='constant', constant_values=value)
return a
class CityscapeDataset:
def __init__(self, dataset_dir, subset='train', config=default_config):
import commands
label_dirname = dataset_dir + 'gtFine/' + subset
label_files = commands.getoutput(
"find %s -type f | grep labelTrainIds | sort" %
label_dirname).splitlines()
self.label_files = label_files
self.label_dirname = label_dirname
self.index = 0
self.subset = subset
self.dataset_dir = dataset_dir
self.config = config
self.reset()
print "total number", len(label_files)
def reset(self, shuffle=False):
self.index = 0
if self.config["shuffle"]:
np.random.shuffle(self.label_files)
def next_img(self):
self.index += 1
if self.index >= len(self.label_files):
self.reset()
def get_img(self):
shape = self.config["crop_size"]
while True:
ln = self.label_files[self.index]
img_name = self.dataset_dir + 'leftImg8bit/' + self.subset + ln[len(
self.label_dirname):]
img_name = img_name.replace('gtFine_labelTrainIds', 'leftImg8bit')
label = cv2.imread(ln)
img = cv2.imread(img_name)
if img is None:
print "load img failed:", img_name
self.next_img()
else:
break
if shape == -1:
return img, label, ln
random_scale = np.random.rand(1) * (self.config['max_resize'] -
self.config['min_resize']
) + self.config['min_resize']
crop_size = int(shape / random_scale)
bb = crop_size // 2
def _randint(low, high):
return int(np.random.rand(1) * (high - low) + low)
offset_x = np.random.randint(bb, max(bb + 1, img.shape[0] -
bb)) - crop_size // 2
offset_y = np.random.randint(bb, max(bb + 1, img.shape[1] -
bb)) - crop_size // 2
img_crop = slice_with_pad(img, [[offset_x, offset_x + crop_size],
[offset_y, offset_y + crop_size]], 128)
img = cv2.resize(img_crop, (shape, shape))
label_crop = slice_with_pad(label, [[offset_x, offset_x + crop_size],
[offset_y, offset_y + crop_size]],
255)
label = cv2.resize(
label_crop, (shape, shape), interpolation=cv2.INTER_NEAREST)
return img, label, ln + str(
(offset_x, offset_y, crop_size, random_scale))
def get_batch(self, batch_size=1):
imgs = []
labels = []
names = []
while len(imgs) < batch_size:
img, label, ln = self.get_img()
imgs.append(img)
labels.append(label)
names.append(ln)
self.next_img()
return np.array(imgs), np.array(labels), names
def get_batch_generator(self, batch_size, total_step):
def do_get_batch():
for i in range(total_step):
imgs, labels, names = self.get_batch(batch_size)
labels = labels.astype(np.int32)[:, :, :, 0]
imgs = imgs[:, :, :, ::-1].transpose(
0, 3, 1, 2).astype(np.float32) / (255.0 / 2) - 1
yield i, imgs, labels, names
batches = do_get_batch()
try:
from prefetch_generator import BackgroundGenerator
batches = BackgroundGenerator(batches, 100)
except:
print "You can install 'prefetch_generator' for acceleration of data reading."
return batches
import os
os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98'
import paddle
import paddle.fluid as fluid
import numpy as np
import argparse
from reader import CityscapeDataset
import reader
import models
def add_argument(name, type, default, help):
parser.add_argument('--' + name, default=default, type=type, help=help)
def add_arguments():
add_argument('batch_size', int, 2,
"The number of images in each batch during training.")
add_argument('train_crop_size', int, 769,
"'Image crop size during training.")
add_argument('base_lr', float, 0.0001,
"The base learning rate for model training.")
add_argument('total_step', int, 90000, "Number of the training step.")
add_argument('init_weights_path', str, None,
"Path of the initial weights in paddlepaddle format.")
add_argument('save_weights_path', str, None,
"Path of the saved weights during training.")
add_argument('dataset_path', str, None, "Cityscape dataset path.")
add_argument('parallel', bool, False, "using ParallelExecutor.")
add_argument('use_gpu', bool, True, "Whether use GPU or CPU.")
def load_model():
if args.init_weights_path.endswith('/'):
fluid.io.load_params(
exe, dirname=args.init_weights_path, main_program=tp)
else:
fluid.io.load_params(
exe, dirname="", filename=args.init_weights_path, main_program=tp)
def save_model():
if args.save_weights_path.endswith('/'):
fluid.io.save_params(
exe, dirname=args.save_weights_path, main_program=tp)
else:
fluid.io.save_params(
exe, dirname="", filename=args.save_weights_path, main_program=tp)
def loss(logit, label):
label_nignore = (label < num_classes).astype('float32')
label = fluid.layers.elementwise_min(
label,
fluid.layers.assign(np.array(
[num_classes - 1], dtype=np.int32)))
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.reshape(logit, [-1, num_classes])
label = fluid.layers.reshape(label, [-1, 1])
label = fluid.layers.cast(label, 'int64')
label_nignore = fluid.layers.reshape(label_nignore, [-1, 1])
loss = fluid.layers.softmax_with_cross_entropy(logit, label)
loss = loss * label_nignore
no_grad_set.add(label_nignore.name)
no_grad_set.add(label.name)
return loss, label_nignore
CityscapeDataset = reader.CityscapeDataset
parser = argparse.ArgumentParser()
add_arguments()
args = parser.parse_args()
models.clean()
models.bn_momentum = 0.9997
models.dropout_keep_prop = 0.9
deeplabv3p = models.deeplabv3p
sp = fluid.Program()
tp = fluid.Program()
crop_size = args.train_crop_size
batch_size = args.batch_size
image_shape = [crop_size, crop_size]
reader.default_config['crop_size'] = crop_size
reader.default_config['shuffle'] = True
num_classes = 19
weight_decay = 0.00004
base_lr = args.base_lr
total_step = args.total_step
no_grad_set = set()
with fluid.program_guard(tp, sp):
img = fluid.layers.data(
name='img', shape=[3] + image_shape, dtype='float32')
label = fluid.layers.data(name='label', shape=image_shape, dtype='int32')
logit = deeplabv3p(img)
pred = fluid.layers.argmax(logit, axis=1).astype('int32')
loss, mask = loss(logit, label)
lr = fluid.layers.polynomial_decay(
base_lr, total_step, end_learning_rate=0, power=0.9)
area = fluid.layers.elementwise_max(
fluid.layers.reduce_mean(mask),
fluid.layers.assign(np.array(
[0.1], dtype=np.float32)))
loss_mean = fluid.layers.reduce_mean(loss) / area
opt = fluid.optimizer.Momentum(
lr,
momentum=0.9,
regularization=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=weight_decay), )
retv = opt.minimize(loss_mean, startup_program=sp, no_grad_set=no_grad_set)
fluid.memory_optimize(
tp, print_log=False, skip_opt_set=[pred.name, loss_mean.name], level=1)
place = fluid.CPUPlace()
if args.use_gpu:
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(sp)
if args.init_weights_path:
print "load from:", args.init_weights_path
load_model()
dataset = CityscapeDataset(args.dataset_path, 'train')
if args.parallel:
print "Using ParallelExecutor."
exe_p = fluid.ParallelExecutor(
use_cuda=True, loss_name=loss_mean.name, main_program=tp)
batches = dataset.get_batch_generator(batch_size, total_step)
for i, imgs, labels, names in batches:
if args.parallel:
retv = exe_p.run(fetch_list=[pred.name, loss_mean.name],
feed={'img': imgs,
'label': labels})
else:
retv = exe.run(tp,
feed={'img': imgs,
'label': labels},
fetch_list=[pred, loss_mean])
if i % 100 == 0:
print "Model is saved to", args.save_weights_path
save_model()
print "step %s, loss: %s" % (i, np.mean(retv[1]))
print "Training done. Model is saved to", args.save_weights_path
save_model()
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册