提交 11a101ca 编写于 作者: E Evan Shelhamer

add SIFT Flow FCNs

These nets are jointly trained for segmentation of semantic and
geometric classes since this dataset includes annotations for both.

- FCN-32s SIFT Flow
- FCN-16s SIFT Flow
- FCN-8s SIFT Flow

TODO: fix semantic class evaluation for this dataset, which requires
special care since there are missing classes in the test set.
上级 e1a0612a
......@@ -29,17 +29,26 @@ Unlike the FCN-32/16/8s models, this network is trained with gradient accumulati
To reproduce the validation scores, use the [seg11valid](https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/data/pascal/seg11valid.txt) split defined by the paper in footnote 7. Since SBD train and PASCAL VOC 2011 segval intersect, we only evaluate on the non-intersecting set for validation purposes.
**NYUDv2 models**: trained online with high momentum on color, depth, and HHA features (from Gupta et al. https://github.com/s-gupta/rcnn-depth):
**NYUDv2 models**: trained online with high momentum on color, depth, and HHA features (from Gupta et al. https://github.com/s-gupta/rcnn-depth).
These models demonstrate FCNs for multi-modal input.
* [FCN-32s NYUDv2 Color](nyud-fcn32s-color): single stream, 32 pixel prediction stride net on color/BGR input
* [FCN-32s NYUDv2 HHA](nyud-fcn32s-hha): single stream, 32 pixel prediction stride net on HHA input
* [FCN-32s NYUDv2 Early Color-Depth](nyud-fcn32s-color-d): single stream, 32 pixel prediction stride net on early fusion of color and (log) depth for 4-channel input
* [FCN-32s NYUDv2 Late Color-HHA](nyud-fcn32s-color-hha): single stream, 32 pixel prediction stride net by late fusion of FCN-32s NYUDv2 Color and FCN-32s NYUDv2 HHA
**The following models have not yet been ported to master and trained with the latest settings. Check back soon.**
**SIFT Flow models**: trained online with high momentum for joint semantic class and geometric class segmentation.
These models demonstrate FCNs for multi-task output.
* [FCN-32s SIFT Flow](siftflow-fcn32s): single stream stream, 32 pixel prediction stride net
* [FCN-16s SIFT Flow](siftflow-fcn16s): two stream, 16 pixel prediction stride net
* [FCN-8s SIFT Flow](siftflow-fcn8s): three stream, 8 pixel prediction stride net
SIFT Flow model (also fine-tuned from VGG-16):
* [FCN-16s SIFT Flow](https://gist.github.com/longjon/f35e3a101e1478f721f5#file-readme-md): two stream, 16 pixel prediction stride version
*Note*: in this release, the evaluation of the semantic classes is not quite right at the moment due to an issue with missing classes.
This will be corrected soon.
The evaluation of the geometric classes is fine.
**The following models have not yet been ported to master and trained with the latest settings. Check back soon.**
PASCAL-Context models including architecture definition, solver configuration, and bare-bones solving script (fine-tuned from the ILSVRC-trained VGG-16 model):
* [FCN-32s PASCAL-Context](https://gist.github.com/shelhamer/80667189b218ad570e82#file-readme-md): single stream, 32 pixel prediction stride version
......
# SIFT Flow
SIFT Flow is a semantic segmentation dataset with two labelings:
- semantic classes, such as "cat" or "dog"
- geometric classes, consisting of "horizontal, vertical, and sky"
Refer to `classes.txt` for the listing of classes in model output order.
Refer to `../siftflow_layers.py` for the Python data layer for this dataset.
Note that the dataset has a number of issues, including unannotated images and missing classes from the test set.
The provided splits exclude the unannotated images.
As noted in the paper, care must be taken for proper evalution by excluding the missing classes.
Download the dataset:
http://www.cs.unc.edu/~jtighe/Papers/ECCV10/siftflow/SiftFlowDataset.zip
Semantic and geometric segmentation classes for scenes.
Semantic: 0 is void and 1–33 are classes.
01 awning
02 balcony
03 bird
04 boat
05 bridge
06 building
07 bus
08 car
09 cow
10 crosswalk
11 desert
12 door
13 fence
14 field
15 grass
16 moon
17 mountain
18 person
19 plant
20 pole
21 river
22 road
23 rock
24 sand
25 sea
26 sidewalk
27 sign
28 sky
29 staircase
30 streetlight
31 sun
32 tree
33 window
Geometric: -1 is void and 1–3 are classes.
01 sky
02 horizontal
03 vertical
N.B. Three classes (cow, desert, and moon) are absent from the test set, so
they are excluded from evaluation. The highway_bost181 and street_urb506 images
are missing annotations so these are likewise excluded from evaluation.
coast_natu975
insidecity_art947
insidecity_urb781
highway_bost374
coast_n203085
insidecity_a223049
mountain_nat116
street_art861
mountain_land188
street_par177
opencountry_natu524
forest_natu29
highway_gre37
street_bost77
insidecity_art1125
street_urb521
highway_bost178
street_art760
street_urb885
insidecity_art829
coast_natu804
mountain_sharp44
coast_natu649
opencountry_land691
insidecity_hous35
tallbuilding_art1719
mountain_n736026
mountain_moun41
insidecity_urban992
opencountry_land295
tallbuilding_art527
highway_art238
forest_for114
coast_land296
tallbuilding_sky7
mountain_n44009
tallbuilding_art1316
forest_nat717
highway_bost164
street_par29
forest_natc52
tallbuilding_art1004
coast_sun14
opencountry_land206
opencountry_land364
mountain_n219015
highway_a836030
forest_nat324
opencountry_land493
insidecity_art1598
street_street27
insidecity_a48009
coast_cdmc889
street_gre295
tallbuilding_a538076
street_boston378
highway_urb759
street_par151
tallbuilding_urban1003
tallbuilding_urban16
highway_bost151
opencountry_nat965
highway_gre661
forest_for42
opencountry_n18002
insidecity_art646
highway_gre55
coast_n295051
forest_bost103
highway_n480036
mountain_land4
forest_nat130
coast_nat643
insidecity_urb250
street_gre11
street_boston271
opencountry_n490003
mountain_nat762
street_par86
coast_arnat59
mountain_land787
highway_gre472
opencountry_tell67
mountain_sharp66
opencountry_land534
insidecity_gre290
highway_bost307
opencountry_n213059
forest_nat220
forest_cdmc348
tallbuilding_art900
insidecity_art569
street_urb200
coast_natu468
coast_n672069
insidecity_hous109
forest_land862
opencountry_natu65
tallbuilding_a805096
opencountry_n291058
forest_natu439
coast_nat799
tallbuilding_urban991
tallbuilding_sky17
opencountry_land638
opencountry_natu563
tallbuilding_urb733
forest_cdmc451
mountain_n371066
mountain_n213081
mountain_nat57
tallbuilding_a463068
forest_natu848
tallbuilding_art306
insidecity_boston92
insidecity_urb584
tallbuilding_urban1126
coast_n286045
street_gre179
coast_nat1091
opencountry_nat615
coast_nat901
forest_cdmc291
mountain_natu568
mountain_n18070
street_bost136
tallbuilding_art425
coast_bea3
tallbuilding_art1616
insidecity_art690
highway_gre492
highway_bost320
forest_nat400
highway_par23
tallbuilding_a212033
forest_natu994
tallbuilding_archi296
highway_gre413
tallbuilding_a279033
insidecity_art1277
coast_cdmc948
forest_for15
street_par68
mountain_natu786
opencountry_open61
opencountry_nat423
mountain_land143
tallbuilding_a487066
tallbuilding_art1751
insidecity_hous79
street_par118
highway_bost293
mountain_n213021
opencountry_nat802
coast_n384099
opencountry_natu998
mountain_n344042
coast_nat1265
forest_text44
forest_for84
insidecity_a807066
opencountry_nat1117
coast_sun42
insidecity_par180
opencountry_land923
highway_art580
street_art1328
coast_cdmc838
opencountry_land660
opencountry_cdmc354
coast_natu825
opencountry_natu38
mountain_nat30
coast_n199066
forest_text124
forest_land222
tallbuilding_city56
tallbuilding_city22
opencountry_fie36
mountain_ski24
coast_cdmc997
insidecity_boston232
opencountry_land575
opencountry_land797
insidecity_urb362
forest_nat1033
mountain_nat891
street_hexp3
tallbuilding_art1474
tallbuilding_urban73
opencountry_natu852
mountain_nat1008
coast_nat294
mountain_sharp20
opencountry_fie14
mountain_land275
forest_land760
coast_land374
mountain_nat426
highway_gre141
\ No newline at end of file
此差异已折叠。
http://dl.caffe.berkeleyvision.org/siftflow-fcn16s-heavy.caffemodel
import caffe
from caffe import layers as L, params as P
from caffe.coord_map import crop
def conv_relu(bottom, nout, ks=3, stride=1, pad=1):
conv = L.Convolution(bottom, kernel_size=ks, stride=stride,
num_output=nout, pad=pad,
param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
return conv, L.ReLU(conv, in_place=True)
def max_pool(bottom, ks=2, stride=2):
return L.Pooling(bottom, pool=P.Pooling.MAX, kernel_size=ks, stride=stride)
def fcn(split):
n = caffe.NetSpec()
n.data, n.sem, n.geo = L.Python(module='siftflow_layers',
layer='SIFTFlowSegDataLayer', ntop=3,
param_str=str(dict(siftflow_dir='../data/sift-flow',
split=split, seed=1337)))
# the base net
n.conv1_1, n.relu1_1 = conv_relu(n.data, 64, pad=100)
n.conv1_2, n.relu1_2 = conv_relu(n.relu1_1, 64)
n.pool1 = max_pool(n.relu1_2)
n.conv2_1, n.relu2_1 = conv_relu(n.pool1, 128)
n.conv2_2, n.relu2_2 = conv_relu(n.relu2_1, 128)
n.pool2 = max_pool(n.relu2_2)
n.conv3_1, n.relu3_1 = conv_relu(n.pool2, 256)
n.conv3_2, n.relu3_2 = conv_relu(n.relu3_1, 256)
n.conv3_3, n.relu3_3 = conv_relu(n.relu3_2, 256)
n.pool3 = max_pool(n.relu3_3)
n.conv4_1, n.relu4_1 = conv_relu(n.pool3, 512)
n.conv4_2, n.relu4_2 = conv_relu(n.relu4_1, 512)
n.conv4_3, n.relu4_3 = conv_relu(n.relu4_2, 512)
n.pool4 = max_pool(n.relu4_3)
n.conv5_1, n.relu5_1 = conv_relu(n.pool4, 512)
n.conv5_2, n.relu5_2 = conv_relu(n.relu5_1, 512)
n.conv5_3, n.relu5_3 = conv_relu(n.relu5_2, 512)
n.pool5 = max_pool(n.relu5_3)
# fully conv
n.fc6, n.relu6 = conv_relu(n.pool5, 4096, ks=7, pad=0)
n.drop6 = L.Dropout(n.relu6, dropout_ratio=0.5, in_place=True)
n.fc7, n.relu7 = conv_relu(n.drop6, 4096, ks=1, pad=0)
n.drop7 = L.Dropout(n.relu7, dropout_ratio=0.5, in_place=True)
n.score_fr_sem = L.Convolution(n.drop7, num_output=33, kernel_size=1, pad=0,
param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
n.upscore2_sem = L.Deconvolution(n.score_fr_sem,
convolution_param=dict(num_output=33, kernel_size=4, stride=2,
bias_term=False),
param=[dict(lr_mult=0)])
n.score_pool4_sem = L.Convolution(n.pool4, num_output=33, kernel_size=1, pad=0,
param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
n.score_pool4_semc = crop(n.score_pool4_sem, n.upscore2_sem)
n.fuse_pool4_sem = L.Eltwise(n.upscore2_sem, n.score_pool4_semc,
operation=P.Eltwise.SUM)
n.upscore16_sem = L.Deconvolution(n.fuse_pool4_sem,
convolution_param=dict(num_output=33, kernel_size=32, stride=16,
bias_term=False),
param=[dict(lr_mult=0)])
n.score_sem = crop(n.upscore16_sem, n.data)
# loss to make score happy (o.w. loss_sem)
n.loss = L.SoftmaxWithLoss(n.score_sem, n.sem,
loss_param=dict(normalize=False, ignore_label=255))
n.score_fr_geo = L.Convolution(n.drop7, num_output=3, kernel_size=1, pad=0,
param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
n.upscore2_geo = L.Deconvolution(n.score_fr_geo,
convolution_param=dict(num_output=3, kernel_size=4, stride=2,
bias_term=False),
param=[dict(lr_mult=0)])
n.score_pool4_geo = L.Convolution(n.pool4, num_output=3, kernel_size=1, pad=0,
param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
n.score_pool4_geoc = crop(n.score_pool4_geo, n.upscore2_geo)
n.fuse_pool4_geo = L.Eltwise(n.upscore2_geo, n.score_pool4_geoc,
operation=P.Eltwise.SUM)
n.upscore16_geo = L.Deconvolution(n.fuse_pool4_geo,
convolution_param=dict(num_output=3, kernel_size=32, stride=16,
bias_term=False),
param=[dict(lr_mult=0)])
n.score_geo = crop(n.upscore16_geo, n.data)
n.loss_geo = L.SoftmaxWithLoss(n.score_geo, n.geo,
loss_param=dict(normalize=False, ignore_label=255))
return n.to_proto()
def make_net():
with open('trainval.prototxt', 'w') as f:
f.write(str(fcn('trainval')))
with open('test.prototxt', 'w') as f:
f.write(str(fcn('test')))
if __name__ == '__main__':
make_net()
import caffe
import surgery, score
import numpy as np
import os
import setproctitle
setproctitle.setproctitle(os.path.basename(os.getcwd()))
weights = '../siftflow-fcn32s/siftflow-fcn32s.caffemodel'
# init
caffe.set_device(int(sys.argv[1]))
caffe.set_mode_gpu()
solver = caffe.SGDSolver('solver.prototxt')
solver.net.copy_from(weights)
# surgeries
interp_layers = [k for k in solver.net.params.keys() if 'up' in k]
surgery.interp(solver.net, interp_layers)
# scoring
test = np.loadtxt('../data/sift-flow/test.txt', dtype=str)
for _ in range(50):
solver.step(2000)
# N.B. metrics on the semantic labels are off b.c. of missing classes;
# score manually from the histogram instead for proper evaluation
score.seg_tests(solver, False, test, layer='score_sem', gt='sem')
score.seg_tests(solver, False, test, layer='score_geo', gt='geo')
train_net: "trainval.prototxt"
test_net: "test.prototxt"
test_iter: 1111
# make test net, but don't invoke it from the solver itself
test_interval: 999999999
display: 20
average_loss: 20
lr_policy: "fixed"
# lr for unnormalized softmax
base_lr: 1e-12
# high momentum
momentum: 0.99
# no gradient accumulation
iter_size: 1
max_iter: 300000
weight_decay: 0.0005
test_initialization: false
layer {
name: "data"
type: "Python"
top: "data"
top: "sem"
top: "geo"
python_param {
module: "siftflow_layers"
layer: "SIFTFlowSegDataLayer"
param_str: "{\'siftflow_dir\': \'../../data/sift-flow\', \'seed\': 1337, \'split\': \'test\'}"
}
}
layer {
name: "conv1_1"
type: "Convolution"
bottom: "data"
top: "conv1_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 100
kernel_size: 3
stride: 1
}
}
layer {
name: "relu1_1"
type: "ReLU"
bottom: "conv1_1"
top: "conv1_1"
}
layer {
name: "conv1_2"
type: "Convolution"
bottom: "conv1_1"
top: "conv1_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu1_2"
type: "ReLU"
bottom: "conv1_2"
top: "conv1_2"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1_2"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2_1"
type: "Convolution"
bottom: "pool1"
top: "conv2_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu2_1"
type: "ReLU"
bottom: "conv2_1"
top: "conv2_1"
}
layer {
name: "conv2_2"
type: "Convolution"
bottom: "conv2_1"
top: "conv2_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu2_2"
type: "ReLU"
bottom: "conv2_2"
top: "conv2_2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2_2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv3_1"
type: "Convolution"
bottom: "pool2"
top: "conv3_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu3_1"
type: "ReLU"
bottom: "conv3_1"
top: "conv3_1"
}
layer {
name: "conv3_2"
type: "Convolution"
bottom: "conv3_1"
top: "conv3_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu3_2"
type: "ReLU"
bottom: "conv3_2"
top: "conv3_2"
}
layer {
name: "conv3_3"
type: "Convolution"
bottom: "conv3_2"
top: "conv3_3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu3_3"
type: "ReLU"
bottom: "conv3_3"
top: "conv3_3"
}
layer {
name: "pool3"
type: "Pooling"
bottom: "conv3_3"
top: "pool3"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv4_1"
type: "Convolution"
bottom: "pool3"
top: "conv4_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu4_1"
type: "ReLU"
bottom: "conv4_1"
top: "conv4_1"
}
layer {
name: "conv4_2"
type: "Convolution"
bottom: "conv4_1"
top: "conv4_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu4_2"
type: "ReLU"
bottom: "conv4_2"
top: "conv4_2"
}
layer {
name: "conv4_3"
type: "Convolution"
bottom: "conv4_2"
top: "conv4_3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu4_3"
type: "ReLU"
bottom: "conv4_3"
top: "conv4_3"
}
layer {
name: "pool4"
type: "Pooling"
bottom: "conv4_3"
top: "pool4"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv5_1"
type: "Convolution"
bottom: "pool4"
top: "conv5_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu5_1"
type: "ReLU"
bottom: "conv5_1"
top: "conv5_1"
}
layer {
name: "conv5_2"
type: "Convolution"
bottom: "conv5_1"
top: "conv5_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu5_2"
type: "ReLU"
bottom: "conv5_2"
top: "conv5_2"
}
layer {
name: "conv5_3"
type: "Convolution"
bottom: "conv5_2"
top: "conv5_3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu5_3"
type: "ReLU"
bottom: "conv5_3"
top: "conv5_3"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5_3"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "fc6"
type: "Convolution"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 4096
pad: 0
kernel_size: 7
stride: 1
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "Convolution"
bottom: "fc6"
top: "fc7"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 4096
pad: 0
kernel_size: 1
stride: 1
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "score_fr_sem"
type: "Convolution"
bottom: "fc7"
top: "score_fr_sem"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 33
pad: 0
kernel_size: 1
}
}
layer {
name: "upscore2_sem"
type: "Deconvolution"
bottom: "score_fr_sem"
top: "upscore2_sem"
param {
lr_mult: 0
}
convolution_param {
num_output: 33
bias_term: false
kernel_size: 4
stride: 2
}
}
layer {
name: "score_pool4_sem"
type: "Convolution"
bottom: "pool4"
top: "score_pool4_sem"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 33
pad: 0
kernel_size: 1
}
}
layer {
name: "score_pool4_semc"
type: "Crop"
bottom: "score_pool4_sem"
bottom: "upscore2_sem"
top: "score_pool4_semc"
crop_param {
axis: 2
offset: 5
}
}
layer {
name: "fuse_pool4_sem"
type: "Eltwise"
bottom: "upscore2_sem"
bottom: "score_pool4_semc"
top: "fuse_pool4_sem"
eltwise_param {
operation: SUM
}
}
layer {
name: "upscore16_sem"
type: "Deconvolution"
bottom: "fuse_pool4_sem"
top: "upscore16_sem"
param {
lr_mult: 0
}
convolution_param {
num_output: 33
bias_term: false
kernel_size: 32
stride: 16
}
}
layer {
name: "score_sem"
type: "Crop"
bottom: "upscore16_sem"
bottom: "data"
top: "score_sem"
crop_param {
axis: 2
offset: 27
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score_sem"
bottom: "sem"
top: "loss"
loss_param {
ignore_label: 255
normalize: false
}
}
layer {
name: "score_fr_geo"
type: "Convolution"
bottom: "fc7"
top: "score_fr_geo"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 3
pad: 0
kernel_size: 1
}
}
layer {
name: "upscore2_geo"
type: "Deconvolution"
bottom: "score_fr_geo"
top: "upscore2_geo"
param {
lr_mult: 0
}
convolution_param {
num_output: 3
bias_term: false
kernel_size: 4
stride: 2
}
}
layer {
name: "score_pool4_geo"
type: "Convolution"
bottom: "pool4"
top: "score_pool4_geo"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 3
pad: 0
kernel_size: 1
}
}
layer {
name: "score_pool4_geoc"
type: "Crop"
bottom: "score_pool4_geo"
bottom: "upscore2_geo"
top: "score_pool4_geoc"
crop_param {
axis: 2
offset: 5
}
}
layer {
name: "fuse_pool4_geo"
type: "Eltwise"
bottom: "upscore2_geo"
bottom: "score_pool4_geoc"
top: "fuse_pool4_geo"
eltwise_param {
operation: SUM
}
}
layer {
name: "upscore16_geo"
type: "Deconvolution"
bottom: "fuse_pool4_geo"
top: "upscore16_geo"
param {
lr_mult: 0
}
convolution_param {
num_output: 3
bias_term: false
kernel_size: 32
stride: 16
}
}
layer {
name: "score_geo"
type: "Crop"
bottom: "upscore16_geo"
bottom: "data"
top: "score_geo"
crop_param {
axis: 2
offset: 27
}
}
layer {
name: "loss_geo"
type: "SoftmaxWithLoss"
bottom: "score_geo"
bottom: "geo"
top: "loss_geo"
loss_param {
ignore_label: 255
normalize: false
}
}
layer {
name: "data"
type: "Python"
top: "data"
top: "sem"
top: "geo"
python_param {
module: "siftflow_layers"
layer: "SIFTFlowSegDataLayer"
param_str: "{\'siftflow_dir\': \'../../data/sift-flow\', \'seed\': 1337, \'split\': \'trainval\'}"
}
}
layer {
name: "conv1_1"
type: "Convolution"
bottom: "data"
top: "conv1_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 100
kernel_size: 3
stride: 1
}
}
layer {
name: "relu1_1"
type: "ReLU"
bottom: "conv1_1"
top: "conv1_1"
}
layer {
name: "conv1_2"
type: "Convolution"
bottom: "conv1_1"
top: "conv1_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu1_2"
type: "ReLU"
bottom: "conv1_2"
top: "conv1_2"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1_2"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2_1"
type: "Convolution"
bottom: "pool1"
top: "conv2_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu2_1"
type: "ReLU"
bottom: "conv2_1"
top: "conv2_1"
}
layer {
name: "conv2_2"
type: "Convolution"
bottom: "conv2_1"
top: "conv2_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu2_2"
type: "ReLU"
bottom: "conv2_2"
top: "conv2_2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2_2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv3_1"
type: "Convolution"
bottom: "pool2"
top: "conv3_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu3_1"
type: "ReLU"
bottom: "conv3_1"
top: "conv3_1"
}
layer {
name: "conv3_2"
type: "Convolution"
bottom: "conv3_1"
top: "conv3_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu3_2"
type: "ReLU"
bottom: "conv3_2"
top: "conv3_2"
}
layer {
name: "conv3_3"
type: "Convolution"
bottom: "conv3_2"
top: "conv3_3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu3_3"
type: "ReLU"
bottom: "conv3_3"
top: "conv3_3"
}
layer {
name: "pool3"
type: "Pooling"
bottom: "conv3_3"
top: "pool3"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv4_1"
type: "Convolution"
bottom: "pool3"
top: "conv4_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu4_1"
type: "ReLU"
bottom: "conv4_1"
top: "conv4_1"
}
layer {
name: "conv4_2"
type: "Convolution"
bottom: "conv4_1"
top: "conv4_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu4_2"
type: "ReLU"
bottom: "conv4_2"
top: "conv4_2"
}
layer {
name: "conv4_3"
type: "Convolution"
bottom: "conv4_2"
top: "conv4_3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu4_3"
type: "ReLU"
bottom: "conv4_3"
top: "conv4_3"
}
layer {
name: "pool4"
type: "Pooling"
bottom: "conv4_3"
top: "pool4"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv5_1"
type: "Convolution"
bottom: "pool4"
top: "conv5_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu5_1"
type: "ReLU"
bottom: "conv5_1"
top: "conv5_1"
}
layer {
name: "conv5_2"
type: "Convolution"
bottom: "conv5_1"
top: "conv5_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu5_2"
type: "ReLU"
bottom: "conv5_2"
top: "conv5_2"
}
layer {
name: "conv5_3"
type: "Convolution"
bottom: "conv5_2"
top: "conv5_3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu5_3"
type: "ReLU"
bottom: "conv5_3"
top: "conv5_3"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5_3"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "fc6"
type: "Convolution"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 4096
pad: 0
kernel_size: 7
stride: 1
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "Convolution"
bottom: "fc6"
top: "fc7"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 4096
pad: 0
kernel_size: 1
stride: 1
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "score_fr_sem"
type: "Convolution"
bottom: "fc7"
top: "score_fr_sem"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 33
pad: 0
kernel_size: 1
}
}
layer {
name: "upscore2_sem"
type: "Deconvolution"
bottom: "score_fr_sem"
top: "upscore2_sem"
param {
lr_mult: 0
}
convolution_param {
num_output: 33
bias_term: false
kernel_size: 4
stride: 2
}
}
layer {
name: "score_pool4_sem"
type: "Convolution"
bottom: "pool4"
top: "score_pool4_sem"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 33
pad: 0
kernel_size: 1
}
}
layer {
name: "score_pool4_semc"
type: "Crop"
bottom: "score_pool4_sem"
bottom: "upscore2_sem"
top: "score_pool4_semc"
crop_param {
axis: 2
offset: 5
}
}
layer {
name: "fuse_pool4_sem"
type: "Eltwise"
bottom: "upscore2_sem"
bottom: "score_pool4_semc"
top: "fuse_pool4_sem"
eltwise_param {
operation: SUM
}
}
layer {
name: "upscore16_sem"
type: "Deconvolution"
bottom: "fuse_pool4_sem"
top: "upscore16_sem"
param {
lr_mult: 0
}
convolution_param {
num_output: 33
bias_term: false
kernel_size: 32
stride: 16
}
}
layer {
name: "score_sem"
type: "Crop"
bottom: "upscore16_sem"
bottom: "data"
top: "score_sem"
crop_param {
axis: 2
offset: 27
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score_sem"
bottom: "sem"
top: "loss"
loss_param {
ignore_label: 255
normalize: false
}
}
layer {
name: "score_fr_geo"
type: "Convolution"
bottom: "fc7"
top: "score_fr_geo"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 3
pad: 0
kernel_size: 1
}
}
layer {
name: "upscore2_geo"
type: "Deconvolution"
bottom: "score_fr_geo"
top: "upscore2_geo"
param {
lr_mult: 0
}
convolution_param {
num_output: 3
bias_term: false
kernel_size: 4
stride: 2
}
}
layer {
name: "score_pool4_geo"
type: "Convolution"
bottom: "pool4"
top: "score_pool4_geo"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 3
pad: 0
kernel_size: 1
}
}
layer {
name: "score_pool4_geoc"
type: "Crop"
bottom: "score_pool4_geo"
bottom: "upscore2_geo"
top: "score_pool4_geoc"
crop_param {
axis: 2
offset: 5
}
}
layer {
name: "fuse_pool4_geo"
type: "Eltwise"
bottom: "upscore2_geo"
bottom: "score_pool4_geoc"
top: "fuse_pool4_geo"
eltwise_param {
operation: SUM
}
}
layer {
name: "upscore16_geo"
type: "Deconvolution"
bottom: "fuse_pool4_geo"
top: "upscore16_geo"
param {
lr_mult: 0
}
convolution_param {
num_output: 3
bias_term: false
kernel_size: 32
stride: 16
}
}
layer {
name: "score_geo"
type: "Crop"
bottom: "upscore16_geo"
bottom: "data"
top: "score_geo"
crop_param {
axis: 2
offset: 27
}
}
layer {
name: "loss_geo"
type: "SoftmaxWithLoss"
bottom: "score_geo"
bottom: "geo"
top: "loss_geo"
loss_param {
ignore_label: 255
normalize: false
}
}
http://dl.caffe.berkeleyvision.org/siftflow-fcn32s-heavy.caffemodel
import caffe
from caffe import layers as L, params as P
from caffe.coord_map import crop
def conv_relu(bottom, nout, ks=3, stride=1, pad=1):
conv = L.Convolution(bottom, kernel_size=ks, stride=stride,
num_output=nout, pad=pad,
param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
return conv, L.ReLU(conv, in_place=True)
def max_pool(bottom, ks=2, stride=2):
return L.Pooling(bottom, pool=P.Pooling.MAX, kernel_size=ks, stride=stride)
def fcn(split):
n = caffe.NetSpec()
n.data, n.sem, n.geo = L.Python(module='siftflow_layers',
layer='SIFTFlowSegDataLayer', ntop=3,
param_str=str(dict(siftflow_dir='../data/sift-flow',
split=split, seed=1337)))
# the base net
n.conv1_1, n.relu1_1 = conv_relu(n.data, 64, pad=100)
n.conv1_2, n.relu1_2 = conv_relu(n.relu1_1, 64)
n.pool1 = max_pool(n.relu1_2)
n.conv2_1, n.relu2_1 = conv_relu(n.pool1, 128)
n.conv2_2, n.relu2_2 = conv_relu(n.relu2_1, 128)
n.pool2 = max_pool(n.relu2_2)
n.conv3_1, n.relu3_1 = conv_relu(n.pool2, 256)
n.conv3_2, n.relu3_2 = conv_relu(n.relu3_1, 256)
n.conv3_3, n.relu3_3 = conv_relu(n.relu3_2, 256)
n.pool3 = max_pool(n.relu3_3)
n.conv4_1, n.relu4_1 = conv_relu(n.pool3, 512)
n.conv4_2, n.relu4_2 = conv_relu(n.relu4_1, 512)
n.conv4_3, n.relu4_3 = conv_relu(n.relu4_2, 512)
n.pool4 = max_pool(n.relu4_3)
n.conv5_1, n.relu5_1 = conv_relu(n.pool4, 512)
n.conv5_2, n.relu5_2 = conv_relu(n.relu5_1, 512)
n.conv5_3, n.relu5_3 = conv_relu(n.relu5_2, 512)
n.pool5 = max_pool(n.relu5_3)
# fully conv
n.fc6, n.relu6 = conv_relu(n.pool5, 4096, ks=7, pad=0)
n.drop6 = L.Dropout(n.relu6, dropout_ratio=0.5, in_place=True)
n.fc7, n.relu7 = conv_relu(n.drop6, 4096, ks=1, pad=0)
n.drop7 = L.Dropout(n.relu7, dropout_ratio=0.5, in_place=True)
n.score_fr_sem = L.Convolution(n.drop7, num_output=33, kernel_size=1, pad=0,
param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
n.upscore_sem = L.Deconvolution(n.score_fr_sem,
convolution_param=dict(num_output=33, kernel_size=64, stride=32,
bias_term=False),
param=[dict(lr_mult=0)])
n.score_sem = crop(n.upscore_sem, n.data)
# loss to make score happy (o.w. loss_sem)
n.loss = L.SoftmaxWithLoss(n.score_sem, n.sem,
loss_param=dict(normalize=False, ignore_label=255))
n.score_fr_geo = L.Convolution(n.drop7, num_output=3, kernel_size=1, pad=0,
param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
n.upscore_geo = L.Deconvolution(n.score_fr_geo,
convolution_param=dict(num_output=3, kernel_size=64, stride=32,
bias_term=False),
param=[dict(lr_mult=0)])
n.score_geo = crop(n.upscore_geo, n.data)
n.loss_geo = L.SoftmaxWithLoss(n.score_geo, n.geo,
loss_param=dict(normalize=False, ignore_label=255))
return n.to_proto()
def make_net():
with open('trainval.prototxt', 'w') as f:
f.write(str(fcn('trainval')))
with open('test.prototxt', 'w') as f:
f.write(str(fcn('test')))
if __name__ == '__main__':
make_net()
import caffe
import surgery, score
import numpy as np
import os
import setproctitle
setproctitle.setproctitle(os.path.basename(os.getcwd()))
weights = '../vgg16fc.caffemodel'
# init
caffe.set_device(int(sys.argv[1]))
caffe.set_mode_gpu()
solver = caffe.SGDSolver('solver.prototxt')
solver.net.copy_from(weights)
# surgeries
interp_layers = [k for k in solver.net.params.keys() if 'up' in k]
surgery.interp(solver.net, interp_layers)
# scoring
test = np.loadtxt('../data/sift-flow/test.txt', dtype=str)
for _ in range(50):
solver.step(2000)
# N.B. metrics on the semantic labels are off b.c. of missing classes;
# score manually from the histogram instead for proper evaluation
score.seg_tests(solver, False, test, layer='score_sem', gt='sem')
score.seg_tests(solver, False, test, layer='score_geo', gt='geo')
train_net: "trainval.prototxt"
test_net: "test.prototxt"
test_iter: 1111
# make test net, but don't invoke it from the solver itself
test_interval: 999999999
display: 20
average_loss: 20
lr_policy: "fixed"
# lr for unnormalized softmax
base_lr: 1e-10
# high momentum
momentum: 0.99
# no gradient accumulation
iter_size: 1
max_iter: 300000
weight_decay: 0.0005
test_initialization: false
train_net: "trainval.prototxt"
test_net: "test.prototxt"
test_iter: 1111
# make test net, but don't invoke it from the solver itself
test_interval: 999999999
display: 20
average_loss: 20
lr_policy: "fixed"
# lr for unnormalized softmax
base_lr: 1e-10
# high momentum
momentum: 0.99
# no gradient accumulation
iter_size: 1
max_iter: 300000
weight_decay: 0.0005
test_initialization: false
此差异已折叠。
此差异已折叠。
http://dl.caffe.berkeleyvision.org/siftflow-fcn8s-heavy.caffemodel
import caffe
from caffe import layers as L, params as P
from caffe.coord_map import crop
def conv_relu(bottom, nout, ks=3, stride=1, pad=1):
conv = L.Convolution(bottom, kernel_size=ks, stride=stride,
num_output=nout, pad=pad,
param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
return conv, L.ReLU(conv, in_place=True)
def max_pool(bottom, ks=2, stride=2):
return L.Pooling(bottom, pool=P.Pooling.MAX, kernel_size=ks, stride=stride)
def fcn(split):
n = caffe.NetSpec()
n.data, n.sem, n.geo = L.Python(module='siftflow_layers',
layer='SIFTFlowSegDataLayer', ntop=3,
param_str=str(dict(siftflow_dir='../data/sift-flow',
split=split, seed=1337)))
# the base net
n.conv1_1, n.relu1_1 = conv_relu(n.data, 64, pad=100)
n.conv1_2, n.relu1_2 = conv_relu(n.relu1_1, 64)
n.pool1 = max_pool(n.relu1_2)
n.conv2_1, n.relu2_1 = conv_relu(n.pool1, 128)
n.conv2_2, n.relu2_2 = conv_relu(n.relu2_1, 128)
n.pool2 = max_pool(n.relu2_2)
n.conv3_1, n.relu3_1 = conv_relu(n.pool2, 256)
n.conv3_2, n.relu3_2 = conv_relu(n.relu3_1, 256)
n.conv3_3, n.relu3_3 = conv_relu(n.relu3_2, 256)
n.pool3 = max_pool(n.relu3_3)
n.conv4_1, n.relu4_1 = conv_relu(n.pool3, 512)
n.conv4_2, n.relu4_2 = conv_relu(n.relu4_1, 512)
n.conv4_3, n.relu4_3 = conv_relu(n.relu4_2, 512)
n.pool4 = max_pool(n.relu4_3)
n.conv5_1, n.relu5_1 = conv_relu(n.pool4, 512)
n.conv5_2, n.relu5_2 = conv_relu(n.relu5_1, 512)
n.conv5_3, n.relu5_3 = conv_relu(n.relu5_2, 512)
n.pool5 = max_pool(n.relu5_3)
# fully conv
n.fc6, n.relu6 = conv_relu(n.pool5, 4096, ks=7, pad=0)
n.drop6 = L.Dropout(n.relu6, dropout_ratio=0.5, in_place=True)
n.fc7, n.relu7 = conv_relu(n.drop6, 4096, ks=1, pad=0)
n.drop7 = L.Dropout(n.relu7, dropout_ratio=0.5, in_place=True)
n.score_fr_sem = L.Convolution(n.drop7, num_output=33, kernel_size=1, pad=0,
param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
n.upscore2_sem = L.Deconvolution(n.score_fr_sem,
convolution_param=dict(num_output=33, kernel_size=4, stride=2,
bias_term=False),
param=[dict(lr_mult=0)])
n.score_pool4_sem = L.Convolution(n.pool4, num_output=33, kernel_size=1, pad=0,
param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
n.score_pool4_semc = crop(n.score_pool4_sem, n.upscore2_sem)
n.fuse_pool4_sem = L.Eltwise(n.upscore2_sem, n.score_pool4_semc,
operation=P.Eltwise.SUM)
n.upscore_pool4_sem = L.Deconvolution(n.fuse_pool4_sem,
convolution_param=dict(num_output=33, kernel_size=4, stride=2,
bias_term=False),
param=[dict(lr_mult=0)])
n.score_pool3_sem = L.Convolution(n.pool3, num_output=33, kernel_size=1,
pad=0, param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2,
decay_mult=0)])
n.score_pool3_semc = crop(n.score_pool3_sem, n.upscore_pool4_sem)
n.fuse_pool3_sem = L.Eltwise(n.upscore_pool4_sem, n.score_pool3_semc,
operation=P.Eltwise.SUM)
n.upscore8_sem = L.Deconvolution(n.fuse_pool3_sem,
convolution_param=dict(num_output=33, kernel_size=16, stride=8,
bias_term=False),
param=[dict(lr_mult=0)])
n.score_sem = crop(n.upscore8_sem, n.data)
# loss to make score happy (o.w. loss_sem)
n.loss = L.SoftmaxWithLoss(n.score_sem, n.sem,
loss_param=dict(normalize=False, ignore_label=255))
n.score_fr_geo = L.Convolution(n.drop7, num_output=3, kernel_size=1, pad=0,
param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
n.upscore2_geo = L.Deconvolution(n.score_fr_geo,
convolution_param=dict(num_output=3, kernel_size=4, stride=2,
bias_term=False),
param=[dict(lr_mult=0)])
n.score_pool4_geo = L.Convolution(n.pool4, num_output=3, kernel_size=1, pad=0,
param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
n.score_pool4_geoc = crop(n.score_pool4_geo, n.upscore2_geo)
n.fuse_pool4_geo = L.Eltwise(n.upscore2_geo, n.score_pool4_geoc,
operation=P.Eltwise.SUM)
n.upscore_pool4_geo = L.Deconvolution(n.fuse_pool4_geo,
convolution_param=dict(num_output=3, kernel_size=4, stride=2,
bias_term=False),
param=[dict(lr_mult=0)])
n.score_pool3_geo = L.Convolution(n.pool3, num_output=3, kernel_size=1,
pad=0, param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2,
decay_mult=0)])
n.score_pool3_geoc = crop(n.score_pool3_geo, n.upscore_pool4_geo)
n.fuse_pool3_geo = L.Eltwise(n.upscore_pool4_geo, n.score_pool3_geoc,
operation=P.Eltwise.SUM)
n.upscore8_geo = L.Deconvolution(n.fuse_pool3_geo,
convolution_param=dict(num_output=3, kernel_size=16, stride=8,
bias_term=False),
param=[dict(lr_mult=0)])
n.score_geo = crop(n.upscore8_geo, n.data)
n.loss_geo = L.SoftmaxWithLoss(n.score_geo, n.geo,
loss_param=dict(normalize=False, ignore_label=255))
return n.to_proto()
def make_net():
with open('trainval.prototxt', 'w') as f:
f.write(str(fcn('trainval')))
with open('test.prototxt', 'w') as f:
f.write(str(fcn('test')))
if __name__ == '__main__':
make_net()
import caffe
import surgery, score
import numpy as np
import os
import setproctitle
setproctitle.setproctitle(os.path.basename(os.getcwd()))
weights = '../siftflow-fcn16s/siftflow-fcn16s.caffemodel'
# init
caffe.set_device(int(sys.argv[1]))
caffe.set_mode_gpu()
solver = caffe.SGDSolver('solver.prototxt')
solver.net.copy_from(weights)
# surgeries
interp_layers = [k for k in solver.net.params.keys() if 'up' in k]
surgery.interp(solver.net, interp_layers)
# scoring
test = np.loadtxt('../data/sift-flow/test.txt', dtype=str)
for _ in range(50):
solver.step(2000)
# N.B. metrics on the semantic labels are off b.c. of missing classes;
# score manually from the histogram instead for proper evaluation
score.seg_tests(solver, False, test, layer='score_sem', gt='sem')
score.seg_tests(solver, False, test, layer='score_geo', gt='geo')
train_net: "trainval.prototxt"
test_net: "test.prototxt"
test_iter: 1111
# make test net, but don't invoke it from the solver itself
test_interval: 999999999
display: 20
average_loss: 20
lr_policy: "fixed"
# lr for unnormalized softmax
base_lr: 1e-12
# high momentum
momentum: 0.99
# no gradient accumulation
iter_size: 1
max_iter: 300000
weight_decay: 0.0005
test_initialization: false
此差异已折叠。
此差异已折叠。
import caffe
import numpy as np
from PIL import Image
import scipy.io
import random
class SIFTFlowSegDataLayer(caffe.Layer):
"""
Load (input image, label image) pairs from SIFT Flow
one-at-a-time while reshaping the net to preserve dimensions.
This data layer has three tops:
1. the data, pre-processed
2. the semantic labels 0-32 and void 255
3. the geometric labels 0-2 and void 255
Use this to feed data to a fully convolutional network.
"""
def setup(self, bottom, top):
"""
Setup data layer according to parameters:
- siftflow_dir: path to SIFT Flow dir
- split: train / val / test
- randomize: load in random order (default: True)
- seed: seed for randomization (default: None / current time)
for semantic segmentation of object and geometric classes.
example: params = dict(siftflow_dir="/path/to/siftflow", split="val")
"""
# config
params = eval(self.param_str)
self.siftflow_dir = params['siftflow_dir']
self.split = params['split']
self.mean = np.array((114.578, 115.294, 108.353), dtype=np.float32)
self.random = params.get('randomize', True)
self.seed = params.get('seed', None)
# three tops: data, semantic, geometric
if len(top) != 3:
raise Exception("Need to define three tops: data, semantic label, and geometric label.")
# data layers have no bottoms
if len(bottom) != 0:
raise Exception("Do not define a bottom.")
# load indices for images and labels
split_f = '{}/{}.txt'.format(self.siftflow_dir, self.split)
self.indices = open(split_f, 'r').read().splitlines()
self.idx = 0
# make eval deterministic
if 'train' not in self.split:
self.random = False
# randomization: seed and pick
if self.random:
random.seed(self.seed)
self.idx = random.randint(0, len(self.indices)-1)
def reshape(self, bottom, top):
# load image + label image pair
self.data = self.load_image(self.indices[self.idx])
self.label_semantic = self.load_label(self.indices[self.idx], label_type='semantic')
self.label_geometric = self.load_label(self.indices[self.idx], label_type='geometric')
# reshape tops to fit (leading 1 is for batch dimension)
top[0].reshape(1, *self.data.shape)
top[1].reshape(1, *self.label_semantic.shape)
top[2].reshape(1, *self.label_geometric.shape)
def forward(self, bottom, top):
# assign output
top[0].data[...] = self.data
top[1].data[...] = self.label_semantic
top[2].data[...] = self.label_geometric
# pick next input
if self.random:
self.idx = random.randint(0, len(self.indices)-1)
else:
self.idx += 1
if self.idx == len(self.indices):
self.idx = 0
def backward(self, top, propagate_down, bottom):
pass
def load_image(self, idx):
"""
Load input image and preprocess for Caffe:
- cast to float
- switch channels RGB -> BGR
- subtract mean
- transpose to channel x height x width order
"""
im = Image.open('{}/Images/spatial_envelope_256x256_static_8outdoorcategories/{}.jpg'.format(self.siftflow_dir, idx))
in_ = np.array(im, dtype=np.float32)
in_ = in_[:,:,::-1]
in_ -= self.mean
in_ = in_.transpose((2,0,1))
return in_
def load_label(self, idx, label_type=None):
"""
Load label image as 1 x height x width integer array of label indices.
The leading singleton dimension is required by the loss.
"""
if label_type == 'semantic':
label = scipy.io.loadmat('{}/SemanticLabels/spatial_envelope_256x256_static_8outdoorcategories/{}.mat'.format(self.siftflow_dir, idx))['S']
elif label_type == 'geometric':
label = scipy.io.loadmat('{}/GeoLabels/spatial_envelope_256x256_static_8outdoorcategories/{}.mat'.format(self.siftflow_dir, idx))['S']
label[label == -1] = 0
else:
raise Exception("Unknown label type: {}. Pick semantic or geometric.".format(label_type))
label = label.astype(np.uint8)
label -= 1 # rotate labels so classes start at 0, void is 255
label = label[np.newaxis, ...]
return label.copy()
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册