add SIFT Flow FCNs

These nets are jointly trained for segmentation of semantic and geometric classes since this dataset includes annotations for both. - FCN-32s SIFT Flow - FCN-16s SIFT Flow - FCN-8s SIFT Flow TODO: fix semantic class evaluation for this dataset, which requires special care since there are missing classes in the test set.

add SIFT Flow FCNs
These nets are jointly trained for segmentation of semantic and geometric classes since this dataset includes annotations for both. - FCN-32s SIFT Flow - FCN-16s SIFT Flow - FCN-8s SIFT Flow TODO: fix semantic class evaluation for this dataset, which requires special care since there are missing classes in the test set.
11a101ca · Evan Shelhamer · e1a0612a · 11a101ca · 11a101ca · 11a101ca
25 changed file
--- a/README.md
+++ b/README.md
@@ -29,17 +29,26 @@ Unlike the FCN-32/16/8s models, this network is trained with gradient accumulati

 To reproduce the validation scores, use the [seg11valid](https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/data/pascal/seg11valid.txt) split defined by the paper in footnote 7. Since SBD train and PASCAL VOC 2011 segval intersect, we only evaluate on the non-intersecting set for validation purposes.

-**NYUDv2 models**: trained online with high momentum on color, depth, and HHA features (from Gupta et al. https://github.com/s-gupta/rcnn-depth):
+**NYUDv2 models**: trained online with high momentum on color, depth, and HHA features (from Gupta et al. https://github.com/s-gupta/rcnn-depth).
+These models demonstrate FCNs for multi-modal input.

 * [FCN-32s NYUDv2 Color](nyud-fcn32s-color): single stream, 32 pixel prediction stride net on color/BGR input
 * [FCN-32s NYUDv2 HHA](nyud-fcn32s-hha): single stream, 32 pixel prediction stride net on HHA input
 * [FCN-32s NYUDv2 Early Color-Depth](nyud-fcn32s-color-d): single stream, 32 pixel prediction stride net on early fusion of color and (log) depth for 4-channel input
 * [FCN-32s NYUDv2 Late Color-HHA](nyud-fcn32s-color-hha): single stream, 32 pixel prediction stride net by late fusion of FCN-32s NYUDv2 Color and FCN-32s NYUDv2 HHA

-**The following models have not yet been ported to master and trained with the latest settings. Check back soon.**
+**SIFT Flow models**: trained online with high momentum for joint semantic class and geometric class segmentation.
+These models demonstrate FCNs for multi-task output.
+
+* [FCN-32s SIFT Flow](siftflow-fcn32s): single stream stream, 32 pixel prediction stride net
+* [FCN-16s SIFT Flow](siftflow-fcn16s): two stream, 16 pixel prediction stride net
+* [FCN-8s SIFT Flow](siftflow-fcn8s): three stream, 8 pixel prediction stride net

-SIFT Flow model (also fine-tuned from VGG-16):
-* [FCN-16s SIFT Flow](https://gist.github.com/longjon/f35e3a101e1478f721f5#file-readme-md): two stream, 16 pixel prediction stride version
+*Note*: in this release, the evaluation of the semantic classes is not quite right at the moment due to an issue with missing classes.
+This will be corrected soon.
+The evaluation of the geometric classes is fine.
+
+**The following models have not yet been ported to master and trained with the latest settings. Check back soon.**

 PASCAL-Context models including architecture definition, solver configuration, and bare-bones solving script (fine-tuned from the ILSVRC-trained VGG-16 model):
 * [FCN-32s PASCAL-Context](https://gist.github.com/shelhamer/80667189b218ad570e82#file-readme-md): single stream, 32 pixel prediction stride version

--- a/data/sift-flow/README.md
+++ b/data/sift-flow/README.md
+# SIFT Flow
+
+SIFT Flow is a semantic segmentation dataset with two labelings:
+
+- semantic classes, such as "cat" or "dog"
+- geometric classes, consisting of "horizontal, vertical, and sky"
+
+Refer to `classes.txt` for the listing of classes in model output order.
+Refer to `../siftflow_layers.py` for the Python data layer for this dataset.
+
+Note that the dataset has a number of issues, including unannotated images and missing classes from the test set.
+The provided splits exclude the unannotated images.
+As noted in the paper, care must be taken for proper evalution by excluding the missing classes.
+
+Download the dataset:
+http://www.cs.unc.edu/~jtighe/Papers/ECCV10/siftflow/SiftFlowDataset.zip
--- a/data/sift-flow/classes.txt
+++ b/data/sift-flow/classes.txt
+Semantic and geometric segmentation classes for scenes.
+
+Semantic: 0 is void and 1–33 are classes.
+
+01 awning
+02 balcony
+03 bird
+04 boat
+05 bridge
+06 building
+07 bus
+08 car
+09 cow
+10 crosswalk
+11 desert
+12 door
+13 fence
+14 field
+15 grass
+16 moon
+17 mountain
+18 person
+19 plant
+20 pole
+21 river
+22 road
+23 rock
+24 sand
+25 sea
+26 sidewalk
+27 sign
+28 sky
+29 staircase
+30 streetlight
+31 sun
+32 tree
+33 window
+
+Geometric: -1 is void and 1–3 are classes.
+
+01 sky
+02 horizontal
+03 vertical
+
+N.B. Three classes (cow, desert, and moon) are absent from the test set, so
+they are excluded from evaluation. The highway_bost181 and street_urb506 images
+are missing annotations so these are likewise excluded from evaluation.
--- a/data/sift-flow/test.txt
+++ b/data/sift-flow/test.txt
+coast_natu975
+insidecity_art947
+insidecity_urb781
+highway_bost374
+coast_n203085
+insidecity_a223049
+mountain_nat116
+street_art861
+mountain_land188
+street_par177
+opencountry_natu524
+forest_natu29
+highway_gre37
+street_bost77
+insidecity_art1125
+street_urb521
+highway_bost178
+street_art760
+street_urb885
+insidecity_art829
+coast_natu804
+mountain_sharp44
+coast_natu649
+opencountry_land691
+insidecity_hous35
+tallbuilding_art1719
+mountain_n736026
+mountain_moun41
+insidecity_urban992
+opencountry_land295
+tallbuilding_art527
+highway_art238
+forest_for114
+coast_land296
+tallbuilding_sky7
+mountain_n44009
+tallbuilding_art1316
+forest_nat717
+highway_bost164
+street_par29
+forest_natc52
+tallbuilding_art1004
+coast_sun14
+opencountry_land206
+opencountry_land364
+mountain_n219015
+highway_a836030
+forest_nat324
+opencountry_land493
+insidecity_art1598
+street_street27
+insidecity_a48009
+coast_cdmc889
+street_gre295
+tallbuilding_a538076
+street_boston378
+highway_urb759
+street_par151
+tallbuilding_urban1003
+tallbuilding_urban16
+highway_bost151
+opencountry_nat965
+highway_gre661
+forest_for42
+opencountry_n18002
+insidecity_art646
+highway_gre55
+coast_n295051
+forest_bost103
+highway_n480036
+mountain_land4
+forest_nat130
+coast_nat643
+insidecity_urb250
+street_gre11
+street_boston271
+opencountry_n490003
+mountain_nat762
+street_par86
+coast_arnat59
+mountain_land787
+highway_gre472
+opencountry_tell67
+mountain_sharp66
+opencountry_land534
+insidecity_gre290
+highway_bost307
+opencountry_n213059
+forest_nat220
+forest_cdmc348
+tallbuilding_art900
+insidecity_art569
+street_urb200
+coast_natu468
+coast_n672069
+insidecity_hous109
+forest_land862
+opencountry_natu65
+tallbuilding_a805096
+opencountry_n291058
+forest_natu439
+coast_nat799
+tallbuilding_urban991
+tallbuilding_sky17
+opencountry_land638
+opencountry_natu563
+tallbuilding_urb733
+forest_cdmc451
+mountain_n371066
+mountain_n213081
+mountain_nat57
+tallbuilding_a463068
+forest_natu848
+tallbuilding_art306
+insidecity_boston92
+insidecity_urb584
+tallbuilding_urban1126
+coast_n286045
+street_gre179
+coast_nat1091
+opencountry_nat615
+coast_nat901
+forest_cdmc291
+mountain_natu568
+mountain_n18070
+street_bost136
+tallbuilding_art425
+coast_bea3
+tallbuilding_art1616
+insidecity_art690
+highway_gre492
+highway_bost320
+forest_nat400
+highway_par23
+tallbuilding_a212033
+forest_natu994
+tallbuilding_archi296
+highway_gre413
+tallbuilding_a279033
+insidecity_art1277
+coast_cdmc948
+forest_for15
+street_par68
+mountain_natu786
+opencountry_open61
+opencountry_nat423
+mountain_land143
+tallbuilding_a487066
+tallbuilding_art1751
+insidecity_hous79
+street_par118
+highway_bost293
+mountain_n213021
+opencountry_nat802
+coast_n384099
+opencountry_natu998
+mountain_n344042
+coast_nat1265
+forest_text44
+forest_for84
+insidecity_a807066
+opencountry_nat1117
+coast_sun42
+insidecity_par180
+opencountry_land923
+highway_art580
+street_art1328
+coast_cdmc838
+opencountry_land660
+opencountry_cdmc354
+coast_natu825
+opencountry_natu38
+mountain_nat30
+coast_n199066
+forest_text124
+forest_land222
+tallbuilding_city56
+tallbuilding_city22
+opencountry_fie36
+mountain_ski24
+coast_cdmc997
+insidecity_boston232
+opencountry_land575
+opencountry_land797
+insidecity_urb362
+forest_nat1033
+mountain_nat891
+street_hexp3
+tallbuilding_art1474
+tallbuilding_urban73
+opencountry_natu852
+mountain_nat1008
+coast_nat294
+mountain_sharp20
+opencountry_fie14
+mountain_land275
+forest_land760
+coast_land374
+mountain_nat426
+highway_gre141
\ No newline at end of file
--- a/data/sift-flow/trainval.txt
+++ b/data/sift-flow/trainval.txt
--- a/siftflow-fcn16s/caffemodel-url
+++ b/siftflow-fcn16s/caffemodel-url
+http://dl.caffe.berkeleyvision.org/siftflow-fcn16s-heavy.caffemodel
--- a/siftflow-fcn16s/net.py
+++ b/siftflow-fcn16s/net.py
+import caffe
+from caffe import layers as L, params as P
+from caffe.coord_map import crop
+
+def conv_relu(bottom, nout, ks=3, stride=1, pad=1):
+    conv = L.Convolution(bottom, kernel_size=ks, stride=stride,
+        num_output=nout, pad=pad,
+        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
+    return conv, L.ReLU(conv, in_place=True)
+
+def max_pool(bottom, ks=2, stride=2):
+    return L.Pooling(bottom, pool=P.Pooling.MAX, kernel_size=ks, stride=stride)
+
+def fcn(split):
+    n = caffe.NetSpec()
+    n.data, n.sem, n.geo = L.Python(module='siftflow_layers',
+            layer='SIFTFlowSegDataLayer', ntop=3,
+            param_str=str(dict(siftflow_dir='../data/sift-flow',
+                split=split, seed=1337)))
+
+    # the base net
+    n.conv1_1, n.relu1_1 = conv_relu(n.data, 64, pad=100)
+    n.conv1_2, n.relu1_2 = conv_relu(n.relu1_1, 64)
+    n.pool1 = max_pool(n.relu1_2)
+
+    n.conv2_1, n.relu2_1 = conv_relu(n.pool1, 128)
+    n.conv2_2, n.relu2_2 = conv_relu(n.relu2_1, 128)
+    n.pool2 = max_pool(n.relu2_2)
+
+    n.conv3_1, n.relu3_1 = conv_relu(n.pool2, 256)
+    n.conv3_2, n.relu3_2 = conv_relu(n.relu3_1, 256)
+    n.conv3_3, n.relu3_3 = conv_relu(n.relu3_2, 256)
+    n.pool3 = max_pool(n.relu3_3)
+
+    n.conv4_1, n.relu4_1 = conv_relu(n.pool3, 512)
+    n.conv4_2, n.relu4_2 = conv_relu(n.relu4_1, 512)
+    n.conv4_3, n.relu4_3 = conv_relu(n.relu4_2, 512)
+    n.pool4 = max_pool(n.relu4_3)
+
+    n.conv5_1, n.relu5_1 = conv_relu(n.pool4, 512)
+    n.conv5_2, n.relu5_2 = conv_relu(n.relu5_1, 512)
+    n.conv5_3, n.relu5_3 = conv_relu(n.relu5_2, 512)
+    n.pool5 = max_pool(n.relu5_3)
+
+    # fully conv
+    n.fc6, n.relu6 = conv_relu(n.pool5, 4096, ks=7, pad=0)
+    n.drop6 = L.Dropout(n.relu6, dropout_ratio=0.5, in_place=True)
+    n.fc7, n.relu7 = conv_relu(n.drop6, 4096, ks=1, pad=0)
+    n.drop7 = L.Dropout(n.relu7, dropout_ratio=0.5, in_place=True)
+
+    n.score_fr_sem = L.Convolution(n.drop7, num_output=33, kernel_size=1, pad=0,
+        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
+    n.upscore2_sem = L.Deconvolution(n.score_fr_sem,
+        convolution_param=dict(num_output=33, kernel_size=4, stride=2,
+            bias_term=False),
+        param=[dict(lr_mult=0)])
+
+    n.score_pool4_sem = L.Convolution(n.pool4, num_output=33, kernel_size=1, pad=0,
+        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
+    n.score_pool4_semc = crop(n.score_pool4_sem, n.upscore2_sem)
+    n.fuse_pool4_sem = L.Eltwise(n.upscore2_sem, n.score_pool4_semc,
+            operation=P.Eltwise.SUM)
+    n.upscore16_sem = L.Deconvolution(n.fuse_pool4_sem,
+        convolution_param=dict(num_output=33, kernel_size=32, stride=16,
+            bias_term=False),
+        param=[dict(lr_mult=0)])
+
+    n.score_sem = crop(n.upscore16_sem, n.data)
+    # loss to make score happy (o.w. loss_sem)
+    n.loss = L.SoftmaxWithLoss(n.score_sem, n.sem,
+            loss_param=dict(normalize=False, ignore_label=255))
+
+    n.score_fr_geo = L.Convolution(n.drop7, num_output=3, kernel_size=1, pad=0,
+        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
+
+    n.upscore2_geo = L.Deconvolution(n.score_fr_geo,
+        convolution_param=dict(num_output=3, kernel_size=4, stride=2,
+            bias_term=False),
+        param=[dict(lr_mult=0)])
+
+    n.score_pool4_geo = L.Convolution(n.pool4, num_output=3, kernel_size=1, pad=0,
+        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
+    n.score_pool4_geoc = crop(n.score_pool4_geo, n.upscore2_geo)
+    n.fuse_pool4_geo = L.Eltwise(n.upscore2_geo, n.score_pool4_geoc,
+            operation=P.Eltwise.SUM)
+    n.upscore16_geo = L.Deconvolution(n.fuse_pool4_geo,
+        convolution_param=dict(num_output=3, kernel_size=32, stride=16,
+            bias_term=False),
+        param=[dict(lr_mult=0)])
+
+    n.score_geo = crop(n.upscore16_geo, n.data)
+    n.loss_geo = L.SoftmaxWithLoss(n.score_geo, n.geo,
+            loss_param=dict(normalize=False, ignore_label=255))
+
+    return n.to_proto()
+
+def make_net():
+    with open('trainval.prototxt', 'w') as f:
+        f.write(str(fcn('trainval')))
+
+    with open('test.prototxt', 'w') as f:
+        f.write(str(fcn('test')))
+
+if __name__ == '__main__':
+    make_net()
--- a/siftflow-fcn16s/solve.py
+++ b/siftflow-fcn16s/solve.py
+import caffe
+import surgery, score
+
+import numpy as np
+import os
+
+import setproctitle
+setproctitle.setproctitle(os.path.basename(os.getcwd()))
+
+weights = '../siftflow-fcn32s/siftflow-fcn32s.caffemodel'
+
+# init
+caffe.set_device(int(sys.argv[1]))
+caffe.set_mode_gpu()
+
+solver = caffe.SGDSolver('solver.prototxt')
+solver.net.copy_from(weights)
+
+# surgeries
+interp_layers = [k for k in solver.net.params.keys() if 'up' in k]
+surgery.interp(solver.net, interp_layers)
+
+# scoring
+test = np.loadtxt('../data/sift-flow/test.txt', dtype=str)
+
+for _ in range(50):
+    solver.step(2000)
+    # N.B. metrics on the semantic labels are off b.c. of missing classes;
+    # score manually from the histogram instead for proper evaluation
+    score.seg_tests(solver, False, test, layer='score_sem', gt='sem')
+    score.seg_tests(solver, False, test, layer='score_geo', gt='geo')
--- a/siftflow-fcn16s/solver.prototxt
+++ b/siftflow-fcn16s/solver.prototxt
+train_net: "trainval.prototxt"
+test_net: "test.prototxt"
+test_iter: 1111
+# make test net, but don't invoke it from the solver itself
+test_interval: 999999999
+display: 20
+average_loss: 20
+lr_policy: "fixed"
+# lr for unnormalized softmax
+base_lr: 1e-12
+# high momentum
+momentum: 0.99
+# no gradient accumulation
+iter_size: 1
+max_iter: 300000
+weight_decay: 0.0005
+test_initialization: false
--- a/siftflow-fcn16s/test.prototxt
+++ b/siftflow-fcn16s/test.prototxt
+layer {
+  name: "data"
+  type: "Python"
+  top: "data"
+  top: "sem"
+  top: "geo"
+  python_param {
+    module: "siftflow_layers"
+    layer: "SIFTFlowSegDataLayer"
+    param_str: "{\'siftflow_dir\': \'../../data/sift-flow\', \'seed\': 1337, \'split\': \'test\'}"
+  }
+}
+layer {
+  name: "conv1_1"
+  type: "Convolution"
+  bottom: "data"
+  top: "conv1_1"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 64
+    pad: 100
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu1_1"
+  type: "ReLU"
+  bottom: "conv1_1"
+  top: "conv1_1"
+}
+layer {
+  name: "conv1_2"
+  type: "Convolution"
+  bottom: "conv1_1"
+  top: "conv1_2"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 64
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu1_2"
+  type: "ReLU"
+  bottom: "conv1_2"
+  top: "conv1_2"
+}
+layer {
+  name: "pool1"
+  type: "Pooling"
+  bottom: "conv1_2"
+  top: "pool1"
+  pooling_param {
+    pool: MAX
+    kernel_size: 2
+    stride: 2
+  }
+}
+layer {
+  name: "conv2_1"
+  type: "Convolution"
+  bottom: "pool1"
+  top: "conv2_1"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 128
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu2_1"
+  type: "ReLU"
+  bottom: "conv2_1"
+  top: "conv2_1"
+}
+layer {
+  name: "conv2_2"
+  type: "Convolution"
+  bottom: "conv2_1"
+  top: "conv2_2"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 128
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu2_2"
+  type: "ReLU"
+  bottom: "conv2_2"
+  top: "conv2_2"
+}
+layer {
+  name: "pool2"
+  type: "Pooling"
+  bottom: "conv2_2"
+  top: "pool2"
+  pooling_param {
+    pool: MAX
+    kernel_size: 2
+    stride: 2
+  }
+}
+layer {
+  name: "conv3_1"
+  type: "Convolution"
+  bottom: "pool2"
+  top: "conv3_1"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 256
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu3_1"
+  type: "ReLU"
+  bottom: "conv3_1"
+  top: "conv3_1"
+}
+layer {
+  name: "conv3_2"
+  type: "Convolution"
+  bottom: "conv3_1"
+  top: "conv3_2"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 256
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu3_2"
+  type: "ReLU"
+  bottom: "conv3_2"
+  top: "conv3_2"
+}
+layer {
+  name: "conv3_3"
+  type: "Convolution"
+  bottom: "conv3_2"
+  top: "conv3_3"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 256
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu3_3"
+  type: "ReLU"
+  bottom: "conv3_3"
+  top: "conv3_3"
+}
+layer {
+  name: "pool3"
+  type: "Pooling"
+  bottom: "conv3_3"
+  top: "pool3"
+  pooling_param {
+    pool: MAX
+    kernel_size: 2
+    stride: 2
+  }
+}
+layer {
+  name: "conv4_1"
+  type: "Convolution"
+  bottom: "pool3"
+  top: "conv4_1"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 512
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu4_1"
+  type: "ReLU"
+  bottom: "conv4_1"
+  top: "conv4_1"
+}
+layer {
+  name: "conv4_2"
+  type: "Convolution"
+  bottom: "conv4_1"
+  top: "conv4_2"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 512
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu4_2"
+  type: "ReLU"
+  bottom: "conv4_2"
+  top: "conv4_2"
+}
+layer {
+  name: "conv4_3"
+  type: "Convolution"
+  bottom: "conv4_2"
+  top: "conv4_3"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 512
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu4_3"
+  type: "ReLU"
+  bottom: "conv4_3"
+  top: "conv4_3"
+}
+layer {
+  name: "pool4"
+  type: "Pooling"
+  bottom: "conv4_3"
+  top: "pool4"
+  pooling_param {
+    pool: MAX
+    kernel_size: 2
+    stride: 2
+  }
+}
+layer {
+  name: "conv5_1"
+  type: "Convolution"
+  bottom: "pool4"
+  top: "conv5_1"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 512
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu5_1"
+  type: "ReLU"
+  bottom: "conv5_1"
+  top: "conv5_1"
+}
+layer {
+  name: "conv5_2"
+  type: "Convolution"
+  bottom: "conv5_1"
+  top: "conv5_2"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 512
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu5_2"
+  type: "ReLU"
+  bottom: "conv5_2"
+  top: "conv5_2"
+}
+layer {
+  name: "conv5_3"
+  type: "Convolution"
+  bottom: "conv5_2"
+  top: "conv5_3"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 512
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu5_3"
+  type: "ReLU"
+  bottom: "conv5_3"
+  top: "conv5_3"
+}
+layer {
+  name: "pool5"
+  type: "Pooling"
+  bottom: "conv5_3"
+  top: "pool5"
+  pooling_param {
+    pool: MAX
+    kernel_size: 2
+    stride: 2
+  }
+}
+layer {
+  name: "fc6"
+  type: "Convolution"
+  bottom: "pool5"
+  top: "fc6"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 4096
+    pad: 0
+    kernel_size: 7
+    stride: 1
+  }
+}
+layer {
+  name: "relu6"
+  type: "ReLU"
+  bottom: "fc6"
+  top: "fc6"
+}
+layer {
+  name: "drop6"
+  type: "Dropout"
+  bottom: "fc6"
+  top: "fc6"
+  dropout_param {
+    dropout_ratio: 0.5
+  }
+}
+layer {
+  name: "fc7"
+  type: "Convolution"
+  bottom: "fc6"
+  top: "fc7"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 4096
+    pad: 0
+    kernel_size: 1
+    stride: 1
+  }
+}
+layer {
+  name: "relu7"
+  type: "ReLU"
+  bottom: "fc7"
+  top: "fc7"
+}
+layer {
+  name: "drop7"
+  type: "Dropout"
+  bottom: "fc7"
+  top: "fc7"
+  dropout_param {
+    dropout_ratio: 0.5
+  }
+}
+layer {
+  name: "score_fr_sem"
+  type: "Convolution"
+  bottom: "fc7"
+  top: "score_fr_sem"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 33
+    pad: 0
+    kernel_size: 1
+  }
+}
+layer {
+  name: "upscore2_sem"
+  type: "Deconvolution"
+  bottom: "score_fr_sem"
+  top: "upscore2_sem"
+  param {
+    lr_mult: 0
+  }
+  convolution_param {
+    num_output: 33
+    bias_term: false
+    kernel_size: 4
+    stride: 2
+  }
+}
+layer {
+  name: "score_pool4_sem"
+  type: "Convolution"
+  bottom: "pool4"
+  top: "score_pool4_sem"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 33
+    pad: 0
+    kernel_size: 1
+  }
+}
+layer {
+  name: "score_pool4_semc"
+  type: "Crop"
+  bottom: "score_pool4_sem"
+  bottom: "upscore2_sem"
+  top: "score_pool4_semc"
+  crop_param {
+    axis: 2
+    offset: 5
+  }
+}
+layer {
+  name: "fuse_pool4_sem"
+  type: "Eltwise"
+  bottom: "upscore2_sem"
+  bottom: "score_pool4_semc"
+  top: "fuse_pool4_sem"
+  eltwise_param {
+    operation: SUM
+  }
+}
+layer {
+  name: "upscore16_sem"
+  type: "Deconvolution"
+  bottom: "fuse_pool4_sem"
+  top: "upscore16_sem"
+  param {
+    lr_mult: 0
+  }
+  convolution_param {
+    num_output: 33
+    bias_term: false
+    kernel_size: 32
+    stride: 16
+  }
+}
+layer {
+  name: "score_sem"
+  type: "Crop"
+  bottom: "upscore16_sem"
+  bottom: "data"
+  top: "score_sem"
+  crop_param {
+    axis: 2
+    offset: 27
+  }
+}
+layer {
+  name: "loss"
+  type: "SoftmaxWithLoss"
+  bottom: "score_sem"
+  bottom: "sem"
+  top: "loss"
+  loss_param {
+    ignore_label: 255
+    normalize: false
+  }
+}
+layer {
+  name: "score_fr_geo"
+  type: "Convolution"
+  bottom: "fc7"
+  top: "score_fr_geo"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 3
+    pad: 0
+    kernel_size: 1
+  }
+}
+layer {
+  name: "upscore2_geo"
+  type: "Deconvolution"
+  bottom: "score_fr_geo"
+  top: "upscore2_geo"
+  param {
+    lr_mult: 0
+  }
+  convolution_param {
+    num_output: 3
+    bias_term: false
+    kernel_size: 4
+    stride: 2
+  }
+}
+layer {
+  name: "score_pool4_geo"
+  type: "Convolution"
+  bottom: "pool4"
+  top: "score_pool4_geo"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 3
+    pad: 0
+    kernel_size: 1
+  }
+}
+layer {
+  name: "score_pool4_geoc"
+  type: "Crop"
+  bottom: "score_pool4_geo"
+  bottom: "upscore2_geo"
+  top: "score_pool4_geoc"
+  crop_param {
+    axis: 2
+    offset: 5
+  }
+}
+layer {
+  name: "fuse_pool4_geo"
+  type: "Eltwise"
+  bottom: "upscore2_geo"
+  bottom: "score_pool4_geoc"
+  top: "fuse_pool4_geo"
+  eltwise_param {
+    operation: SUM
+  }
+}
+layer {
+  name: "upscore16_geo"
+  type: "Deconvolution"
+  bottom: "fuse_pool4_geo"
+  top: "upscore16_geo"
+  param {
+    lr_mult: 0
+  }
+  convolution_param {
+    num_output: 3
+    bias_term: false
+    kernel_size: 32
+    stride: 16
+  }
+}
+layer {
+  name: "score_geo"
+  type: "Crop"
+  bottom: "upscore16_geo"
+  bottom: "data"
+  top: "score_geo"
+  crop_param {
+    axis: 2
+    offset: 27
+  }
+}
+layer {
+  name: "loss_geo"
+  type: "SoftmaxWithLoss"
+  bottom: "score_geo"
+  bottom: "geo"
+  top: "loss_geo"
+  loss_param {
+    ignore_label: 255
+    normalize: false
+  }
+}
--- a/siftflow-fcn16s/trainval.prototxt
+++ b/siftflow-fcn16s/trainval.prototxt
+layer {
+  name: "data"
+  type: "Python"
+  top: "data"
+  top: "sem"
+  top: "geo"
+  python_param {
+    module: "siftflow_layers"
+    layer: "SIFTFlowSegDataLayer"
+    param_str: "{\'siftflow_dir\': \'../../data/sift-flow\', \'seed\': 1337, \'split\': \'trainval\'}"
+  }
+}
+layer {
+  name: "conv1_1"
+  type: "Convolution"
+  bottom: "data"
+  top: "conv1_1"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 64
+    pad: 100
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu1_1"
+  type: "ReLU"
+  bottom: "conv1_1"
+  top: "conv1_1"
+}
+layer {
+  name: "conv1_2"
+  type: "Convolution"
+  bottom: "conv1_1"
+  top: "conv1_2"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 64
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu1_2"
+  type: "ReLU"
+  bottom: "conv1_2"
+  top: "conv1_2"
+}
+layer {
+  name: "pool1"
+  type: "Pooling"
+  bottom: "conv1_2"
+  top: "pool1"
+  pooling_param {
+    pool: MAX
+    kernel_size: 2
+    stride: 2
+  }
+}
+layer {
+  name: "conv2_1"
+  type: "Convolution"
+  bottom: "pool1"
+  top: "conv2_1"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 128
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu2_1"
+  type: "ReLU"
+  bottom: "conv2_1"
+  top: "conv2_1"
+}
+layer {
+  name: "conv2_2"
+  type: "Convolution"
+  bottom: "conv2_1"
+  top: "conv2_2"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 128
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu2_2"
+  type: "ReLU"
+  bottom: "conv2_2"
+  top: "conv2_2"
+}
+layer {
+  name: "pool2"
+  type: "Pooling"
+  bottom: "conv2_2"
+  top: "pool2"
+  pooling_param {
+    pool: MAX
+    kernel_size: 2
+    stride: 2
+  }
+}
+layer {
+  name: "conv3_1"
+  type: "Convolution"
+  bottom: "pool2"
+  top: "conv3_1"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 256
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu3_1"
+  type: "ReLU"
+  bottom: "conv3_1"
+  top: "conv3_1"
+}
+layer {
+  name: "conv3_2"
+  type: "Convolution"
+  bottom: "conv3_1"
+  top: "conv3_2"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 256
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu3_2"
+  type: "ReLU"
+  bottom: "conv3_2"
+  top: "conv3_2"
+}
+layer {
+  name: "conv3_3"
+  type: "Convolution"
+  bottom: "conv3_2"
+  top: "conv3_3"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 256
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu3_3"
+  type: "ReLU"
+  bottom: "conv3_3"
+  top: "conv3_3"
+}
+layer {
+  name: "pool3"
+  type: "Pooling"
+  bottom: "conv3_3"
+  top: "pool3"
+  pooling_param {
+    pool: MAX
+    kernel_size: 2
+    stride: 2
+  }
+}
+layer {
+  name: "conv4_1"
+  type: "Convolution"
+  bottom: "pool3"
+  top: "conv4_1"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 512
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu4_1"
+  type: "ReLU"
+  bottom: "conv4_1"
+  top: "conv4_1"
+}
+layer {
+  name: "conv4_2"
+  type: "Convolution"
+  bottom: "conv4_1"
+  top: "conv4_2"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 512
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu4_2"
+  type: "ReLU"
+  bottom: "conv4_2"
+  top: "conv4_2"
+}
+layer {
+  name: "conv4_3"
+  type: "Convolution"
+  bottom: "conv4_2"
+  top: "conv4_3"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 512
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu4_3"
+  type: "ReLU"
+  bottom: "conv4_3"
+  top: "conv4_3"
+}
+layer {
+  name: "pool4"
+  type: "Pooling"
+  bottom: "conv4_3"
+  top: "pool4"
+  pooling_param {
+    pool: MAX
+    kernel_size: 2
+    stride: 2
+  }
+}
+layer {
+  name: "conv5_1"
+  type: "Convolution"
+  bottom: "pool4"
+  top: "conv5_1"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 512
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu5_1"
+  type: "ReLU"
+  bottom: "conv5_1"
+  top: "conv5_1"
+}
+layer {
+  name: "conv5_2"
+  type: "Convolution"
+  bottom: "conv5_1"
+  top: "conv5_2"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 512
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu5_2"
+  type: "ReLU"
+  bottom: "conv5_2"
+  top: "conv5_2"
+}
+layer {
+  name: "conv5_3"
+  type: "Convolution"
+  bottom: "conv5_2"
+  top: "conv5_3"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 512
+    pad: 1
+    kernel_size: 3
+    stride: 1
+  }
+}
+layer {
+  name: "relu5_3"
+  type: "ReLU"
+  bottom: "conv5_3"
+  top: "conv5_3"
+}
+layer {
+  name: "pool5"
+  type: "Pooling"
+  bottom: "conv5_3"
+  top: "pool5"
+  pooling_param {
+    pool: MAX
+    kernel_size: 2
+    stride: 2
+  }
+}
+layer {
+  name: "fc6"
+  type: "Convolution"
+  bottom: "pool5"
+  top: "fc6"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 4096
+    pad: 0
+    kernel_size: 7
+    stride: 1
+  }
+}
+layer {
+  name: "relu6"
+  type: "ReLU"
+  bottom: "fc6"
+  top: "fc6"
+}
+layer {
+  name: "drop6"
+  type: "Dropout"
+  bottom: "fc6"
+  top: "fc6"
+  dropout_param {
+    dropout_ratio: 0.5
+  }
+}
+layer {
+  name: "fc7"
+  type: "Convolution"
+  bottom: "fc6"
+  top: "fc7"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 4096
+    pad: 0
+    kernel_size: 1
+    stride: 1
+  }
+}
+layer {
+  name: "relu7"
+  type: "ReLU"
+  bottom: "fc7"
+  top: "fc7"
+}
+layer {
+  name: "drop7"
+  type: "Dropout"
+  bottom: "fc7"
+  top: "fc7"
+  dropout_param {
+    dropout_ratio: 0.5
+  }
+}
+layer {
+  name: "score_fr_sem"
+  type: "Convolution"
+  bottom: "fc7"
+  top: "score_fr_sem"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 33
+    pad: 0
+    kernel_size: 1
+  }
+}
+layer {
+  name: "upscore2_sem"
+  type: "Deconvolution"
+  bottom: "score_fr_sem"
+  top: "upscore2_sem"
+  param {
+    lr_mult: 0
+  }
+  convolution_param {
+    num_output: 33
+    bias_term: false
+    kernel_size: 4
+    stride: 2
+  }
+}
+layer {
+  name: "score_pool4_sem"
+  type: "Convolution"
+  bottom: "pool4"
+  top: "score_pool4_sem"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 33
+    pad: 0
+    kernel_size: 1
+  }
+}
+layer {
+  name: "score_pool4_semc"
+  type: "Crop"
+  bottom: "score_pool4_sem"
+  bottom: "upscore2_sem"
+  top: "score_pool4_semc"
+  crop_param {
+    axis: 2
+    offset: 5
+  }
+}
+layer {
+  name: "fuse_pool4_sem"
+  type: "Eltwise"
+  bottom: "upscore2_sem"
+  bottom: "score_pool4_semc"
+  top: "fuse_pool4_sem"
+  eltwise_param {
+    operation: SUM
+  }
+}
+layer {
+  name: "upscore16_sem"
+  type: "Deconvolution"
+  bottom: "fuse_pool4_sem"
+  top: "upscore16_sem"
+  param {
+    lr_mult: 0
+  }
+  convolution_param {
+    num_output: 33
+    bias_term: false
+    kernel_size: 32
+    stride: 16
+  }
+}
+layer {
+  name: "score_sem"
+  type: "Crop"
+  bottom: "upscore16_sem"
+  bottom: "data"
+  top: "score_sem"
+  crop_param {
+    axis: 2
+    offset: 27
+  }
+}
+layer {
+  name: "loss"
+  type: "SoftmaxWithLoss"
+  bottom: "score_sem"
+  bottom: "sem"
+  top: "loss"
+  loss_param {
+    ignore_label: 255
+    normalize: false
+  }
+}
+layer {
+  name: "score_fr_geo"
+  type: "Convolution"
+  bottom: "fc7"
+  top: "score_fr_geo"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 3
+    pad: 0
+    kernel_size: 1
+  }
+}
+layer {
+  name: "upscore2_geo"
+  type: "Deconvolution"
+  bottom: "score_fr_geo"
+  top: "upscore2_geo"
+  param {
+    lr_mult: 0
+  }
+  convolution_param {
+    num_output: 3
+    bias_term: false
+    kernel_size: 4
+    stride: 2
+  }
+}
+layer {
+  name: "score_pool4_geo"
+  type: "Convolution"
+  bottom: "pool4"
+  top: "score_pool4_geo"
+  param {
+    lr_mult: 1
+    decay_mult: 1
+  }
+  param {
+    lr_mult: 2
+    decay_mult: 0
+  }
+  convolution_param {
+    num_output: 3
+    pad: 0
+    kernel_size: 1
+  }
+}
+layer {
+  name: "score_pool4_geoc"
+  type: "Crop"
+  bottom: "score_pool4_geo"
+  bottom: "upscore2_geo"
+  top: "score_pool4_geoc"
+  crop_param {
+    axis: 2
+    offset: 5
+  }
+}
+layer {
+  name: "fuse_pool4_geo"
+  type: "Eltwise"
+  bottom: "upscore2_geo"
+  bottom: "score_pool4_geoc"
+  top: "fuse_pool4_geo"
+  eltwise_param {
+    operation: SUM
+  }
+}
+layer {
+  name: "upscore16_geo"
+  type: "Deconvolution"
+  bottom: "fuse_pool4_geo"
+  top: "upscore16_geo"
+  param {
+    lr_mult: 0
+  }
+  convolution_param {
+    num_output: 3
+    bias_term: false
+    kernel_size: 32
+    stride: 16
+  }
+}
+layer {
+  name: "score_geo"
+  type: "Crop"
+  bottom: "upscore16_geo"
+  bottom: "data"
+  top: "score_geo"
+  crop_param {
+    axis: 2
+    offset: 27
+  }
+}
+layer {
+  name: "loss_geo"
+  type: "SoftmaxWithLoss"
+  bottom: "score_geo"
+  bottom: "geo"
+  top: "loss_geo"
+  loss_param {
+    ignore_label: 255
+    normalize: false
+  }
+}
--- a/siftflow-fcn32s/caffemodel-url
+++ b/siftflow-fcn32s/caffemodel-url
+http://dl.caffe.berkeleyvision.org/siftflow-fcn32s-heavy.caffemodel
--- a/siftflow-fcn32s/net.py
+++ b/siftflow-fcn32s/net.py
+import caffe
+from caffe import layers as L, params as P
+from caffe.coord_map import crop
+
+def conv_relu(bottom, nout, ks=3, stride=1, pad=1):
+    conv = L.Convolution(bottom, kernel_size=ks, stride=stride,
+        num_output=nout, pad=pad,
+        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
+    return conv, L.ReLU(conv, in_place=True)
+
+def max_pool(bottom, ks=2, stride=2):
+    return L.Pooling(bottom, pool=P.Pooling.MAX, kernel_size=ks, stride=stride)
+
+def fcn(split):
+    n = caffe.NetSpec()
+    n.data, n.sem, n.geo = L.Python(module='siftflow_layers',
+            layer='SIFTFlowSegDataLayer', ntop=3,
+            param_str=str(dict(siftflow_dir='../data/sift-flow',
+                split=split, seed=1337)))
+
+    # the base net
+    n.conv1_1, n.relu1_1 = conv_relu(n.data, 64, pad=100)
+    n.conv1_2, n.relu1_2 = conv_relu(n.relu1_1, 64)
+    n.pool1 = max_pool(n.relu1_2)
+
+    n.conv2_1, n.relu2_1 = conv_relu(n.pool1, 128)
+    n.conv2_2, n.relu2_2 = conv_relu(n.relu2_1, 128)
+    n.pool2 = max_pool(n.relu2_2)
+
+    n.conv3_1, n.relu3_1 = conv_relu(n.pool2, 256)
+    n.conv3_2, n.relu3_2 = conv_relu(n.relu3_1, 256)
+    n.conv3_3, n.relu3_3 = conv_relu(n.relu3_2, 256)
+    n.pool3 = max_pool(n.relu3_3)
+
+    n.conv4_1, n.relu4_1 = conv_relu(n.pool3, 512)
+    n.conv4_2, n.relu4_2 = conv_relu(n.relu4_1, 512)
+    n.conv4_3, n.relu4_3 = conv_relu(n.relu4_2, 512)
+    n.pool4 = max_pool(n.relu4_3)
+
+    n.conv5_1, n.relu5_1 = conv_relu(n.pool4, 512)
+    n.conv5_2, n.relu5_2 = conv_relu(n.relu5_1, 512)
+    n.conv5_3, n.relu5_3 = conv_relu(n.relu5_2, 512)
+    n.pool5 = max_pool(n.relu5_3)
+
+    # fully conv
+    n.fc6, n.relu6 = conv_relu(n.pool5, 4096, ks=7, pad=0)
+    n.drop6 = L.Dropout(n.relu6, dropout_ratio=0.5, in_place=True)
+    n.fc7, n.relu7 = conv_relu(n.drop6, 4096, ks=1, pad=0)
+    n.drop7 = L.Dropout(n.relu7, dropout_ratio=0.5, in_place=True)
+
+    n.score_fr_sem = L.Convolution(n.drop7, num_output=33, kernel_size=1, pad=0,
+        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
+    n.upscore_sem = L.Deconvolution(n.score_fr_sem,
+        convolution_param=dict(num_output=33, kernel_size=64, stride=32,
+            bias_term=False),
+        param=[dict(lr_mult=0)])
+    n.score_sem = crop(n.upscore_sem, n.data)
+    # loss to make score happy (o.w. loss_sem)
+    n.loss = L.SoftmaxWithLoss(n.score_sem, n.sem,
+            loss_param=dict(normalize=False, ignore_label=255))
+
+    n.score_fr_geo = L.Convolution(n.drop7, num_output=3, kernel_size=1, pad=0,
+        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
+    n.upscore_geo = L.Deconvolution(n.score_fr_geo,
+        convolution_param=dict(num_output=3, kernel_size=64, stride=32,
+            bias_term=False),
+        param=[dict(lr_mult=0)])
+    n.score_geo = crop(n.upscore_geo, n.data)
+    n.loss_geo = L.SoftmaxWithLoss(n.score_geo, n.geo,
+            loss_param=dict(normalize=False, ignore_label=255))
+
+    return n.to_proto()
+
+def make_net():
+    with open('trainval.prototxt', 'w') as f:
+        f.write(str(fcn('trainval')))
+
+    with open('test.prototxt', 'w') as f:
+        f.write(str(fcn('test')))
+
+if __name__ == '__main__':
+    make_net()
--- a/siftflow-fcn32s/solve.py
+++ b/siftflow-fcn32s/solve.py
+import caffe
+import surgery, score
+
+import numpy as np
+import os
+
+import setproctitle
+setproctitle.setproctitle(os.path.basename(os.getcwd()))
+
+weights = '../vgg16fc.caffemodel'
+
+# init
+caffe.set_device(int(sys.argv[1]))
+caffe.set_mode_gpu()
+
+solver = caffe.SGDSolver('solver.prototxt')
+solver.net.copy_from(weights)
+
+# surgeries
+interp_layers = [k for k in solver.net.params.keys() if 'up' in k]
+surgery.interp(solver.net, interp_layers)
+
+# scoring
+test = np.loadtxt('../data/sift-flow/test.txt', dtype=str)
+
+for _ in range(50):
+    solver.step(2000)
+    # N.B. metrics on the semantic labels are off b.c. of missing classes;
+    # score manually from the histogram instead for proper evaluation
+    score.seg_tests(solver, False, test, layer='score_sem', gt='sem')
+    score.seg_tests(solver, False, test, layer='score_geo', gt='geo')
--- a/siftflow-fcn32s/solver.prototxt
+++ b/siftflow-fcn32s/solver.prototxt
+train_net: "trainval.prototxt"
+test_net: "test.prototxt"
+test_iter: 1111
+# make test net, but don't invoke it from the solver itself
+test_interval: 999999999
+display: 20
+average_loss: 20
+lr_policy: "fixed"
+# lr for unnormalized softmax
+base_lr: 1e-10
+# high momentum
+momentum: 0.99
+# no gradient accumulation
+iter_size: 1
+max_iter: 300000
+weight_decay: 0.0005
+test_initialization: false
--- a/siftflow-fcn32s/solver.prototxt.orig
+++ b/siftflow-fcn32s/solver.prototxt.orig
+train_net: "trainval.prototxt"
+test_net: "test.prototxt"
+test_iter: 1111
+# make test net, but don't invoke it from the solver itself
+test_interval: 999999999
+display: 20
+average_loss: 20
+lr_policy: "fixed"
+# lr for unnormalized softmax
+base_lr: 1e-10
+# high momentum
+momentum: 0.99
+# no gradient accumulation
+iter_size: 1
+max_iter: 300000
+weight_decay: 0.0005
+test_initialization: false
--- a/siftflow-fcn32s/test.prototxt
+++ b/siftflow-fcn32s/test.prototxt
--- a/siftflow-fcn32s/trainval.prototxt
+++ b/siftflow-fcn32s/trainval.prototxt
--- a/siftflow-fcn8s/caffemodel-url
+++ b/siftflow-fcn8s/caffemodel-url
+http://dl.caffe.berkeleyvision.org/siftflow-fcn8s-heavy.caffemodel
--- a/siftflow-fcn8s/net.py
+++ b/siftflow-fcn8s/net.py
+import caffe
+from caffe import layers as L, params as P
+from caffe.coord_map import crop
+
+def conv_relu(bottom, nout, ks=3, stride=1, pad=1):
+    conv = L.Convolution(bottom, kernel_size=ks, stride=stride,
+        num_output=nout, pad=pad,
+        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
+    return conv, L.ReLU(conv, in_place=True)
+
+def max_pool(bottom, ks=2, stride=2):
+    return L.Pooling(bottom, pool=P.Pooling.MAX, kernel_size=ks, stride=stride)
+
+def fcn(split):
+    n = caffe.NetSpec()
+    n.data, n.sem, n.geo = L.Python(module='siftflow_layers',
+            layer='SIFTFlowSegDataLayer', ntop=3,
+            param_str=str(dict(siftflow_dir='../data/sift-flow',
+                split=split, seed=1337)))
+
+    # the base net
+    n.conv1_1, n.relu1_1 = conv_relu(n.data, 64, pad=100)
+    n.conv1_2, n.relu1_2 = conv_relu(n.relu1_1, 64)
+    n.pool1 = max_pool(n.relu1_2)
+
+    n.conv2_1, n.relu2_1 = conv_relu(n.pool1, 128)
+    n.conv2_2, n.relu2_2 = conv_relu(n.relu2_1, 128)
+    n.pool2 = max_pool(n.relu2_2)
+
+    n.conv3_1, n.relu3_1 = conv_relu(n.pool2, 256)
+    n.conv3_2, n.relu3_2 = conv_relu(n.relu3_1, 256)
+    n.conv3_3, n.relu3_3 = conv_relu(n.relu3_2, 256)
+    n.pool3 = max_pool(n.relu3_3)
+
+    n.conv4_1, n.relu4_1 = conv_relu(n.pool3, 512)
+    n.conv4_2, n.relu4_2 = conv_relu(n.relu4_1, 512)
+    n.conv4_3, n.relu4_3 = conv_relu(n.relu4_2, 512)
+    n.pool4 = max_pool(n.relu4_3)
+
+    n.conv5_1, n.relu5_1 = conv_relu(n.pool4, 512)
+    n.conv5_2, n.relu5_2 = conv_relu(n.relu5_1, 512)
+    n.conv5_3, n.relu5_3 = conv_relu(n.relu5_2, 512)
+    n.pool5 = max_pool(n.relu5_3)
+
+    # fully conv
+    n.fc6, n.relu6 = conv_relu(n.pool5, 4096, ks=7, pad=0)
+    n.drop6 = L.Dropout(n.relu6, dropout_ratio=0.5, in_place=True)
+    n.fc7, n.relu7 = conv_relu(n.drop6, 4096, ks=1, pad=0)
+    n.drop7 = L.Dropout(n.relu7, dropout_ratio=0.5, in_place=True)
+
+    n.score_fr_sem = L.Convolution(n.drop7, num_output=33, kernel_size=1, pad=0,
+        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
+    n.upscore2_sem = L.Deconvolution(n.score_fr_sem,
+        convolution_param=dict(num_output=33, kernel_size=4, stride=2,
+            bias_term=False),
+        param=[dict(lr_mult=0)])
+
+    n.score_pool4_sem = L.Convolution(n.pool4, num_output=33, kernel_size=1, pad=0,
+        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
+    n.score_pool4_semc = crop(n.score_pool4_sem, n.upscore2_sem)
+    n.fuse_pool4_sem = L.Eltwise(n.upscore2_sem, n.score_pool4_semc,
+            operation=P.Eltwise.SUM)
+    n.upscore_pool4_sem  = L.Deconvolution(n.fuse_pool4_sem,
+        convolution_param=dict(num_output=33, kernel_size=4, stride=2,
+            bias_term=False),
+        param=[dict(lr_mult=0)])
+
+    n.score_pool3_sem = L.Convolution(n.pool3, num_output=33, kernel_size=1,
+            pad=0, param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2,
+                decay_mult=0)])
+    n.score_pool3_semc = crop(n.score_pool3_sem, n.upscore_pool4_sem)
+    n.fuse_pool3_sem = L.Eltwise(n.upscore_pool4_sem, n.score_pool3_semc,
+            operation=P.Eltwise.SUM)
+    n.upscore8_sem = L.Deconvolution(n.fuse_pool3_sem,
+        convolution_param=dict(num_output=33, kernel_size=16, stride=8,
+            bias_term=False),
+        param=[dict(lr_mult=0)])
+
+    n.score_sem = crop(n.upscore8_sem, n.data)
+    # loss to make score happy (o.w. loss_sem)
+    n.loss = L.SoftmaxWithLoss(n.score_sem, n.sem,
+            loss_param=dict(normalize=False, ignore_label=255))
+
+    n.score_fr_geo = L.Convolution(n.drop7, num_output=3, kernel_size=1, pad=0,
+        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
+
+    n.upscore2_geo = L.Deconvolution(n.score_fr_geo,
+        convolution_param=dict(num_output=3, kernel_size=4, stride=2,
+            bias_term=False),
+        param=[dict(lr_mult=0)])
+
+    n.score_pool4_geo = L.Convolution(n.pool4, num_output=3, kernel_size=1, pad=0,
+        param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
+    n.score_pool4_geoc = crop(n.score_pool4_geo, n.upscore2_geo)
+    n.fuse_pool4_geo = L.Eltwise(n.upscore2_geo, n.score_pool4_geoc,
+            operation=P.Eltwise.SUM)
+    n.upscore_pool4_geo  = L.Deconvolution(n.fuse_pool4_geo,
+        convolution_param=dict(num_output=3, kernel_size=4, stride=2,
+            bias_term=False),
+        param=[dict(lr_mult=0)])
+
+    n.score_pool3_geo = L.Convolution(n.pool3, num_output=3, kernel_size=1,
+            pad=0, param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2,
+                decay_mult=0)])
+    n.score_pool3_geoc = crop(n.score_pool3_geo, n.upscore_pool4_geo)
+    n.fuse_pool3_geo = L.Eltwise(n.upscore_pool4_geo, n.score_pool3_geoc,
+            operation=P.Eltwise.SUM)
+    n.upscore8_geo = L.Deconvolution(n.fuse_pool3_geo,
+        convolution_param=dict(num_output=3, kernel_size=16, stride=8,
+            bias_term=False),
+        param=[dict(lr_mult=0)])
+
+    n.score_geo = crop(n.upscore8_geo, n.data)
+    n.loss_geo = L.SoftmaxWithLoss(n.score_geo, n.geo,
+            loss_param=dict(normalize=False, ignore_label=255))
+
+    return n.to_proto()
+
+def make_net():
+    with open('trainval.prototxt', 'w') as f:
+        f.write(str(fcn('trainval')))
+
+    with open('test.prototxt', 'w') as f:
+        f.write(str(fcn('test')))
+
+if __name__ == '__main__':
+    make_net()
--- a/siftflow-fcn8s/solve.py
+++ b/siftflow-fcn8s/solve.py
+import caffe
+import surgery, score
+
+import numpy as np
+import os
+
+import setproctitle
+setproctitle.setproctitle(os.path.basename(os.getcwd()))
+
+weights = '../siftflow-fcn16s/siftflow-fcn16s.caffemodel'
+
+# init
+caffe.set_device(int(sys.argv[1]))
+caffe.set_mode_gpu()
+
+solver = caffe.SGDSolver('solver.prototxt')
+solver.net.copy_from(weights)
+
+# surgeries
+interp_layers = [k for k in solver.net.params.keys() if 'up' in k]
+surgery.interp(solver.net, interp_layers)
+
+# scoring
+test = np.loadtxt('../data/sift-flow/test.txt', dtype=str)
+
+for _ in range(50):
+    solver.step(2000)
+    # N.B. metrics on the semantic labels are off b.c. of missing classes;
+    # score manually from the histogram instead for proper evaluation
+    score.seg_tests(solver, False, test, layer='score_sem', gt='sem')
+    score.seg_tests(solver, False, test, layer='score_geo', gt='geo')
--- a/siftflow-fcn8s/solver.prototxt
+++ b/siftflow-fcn8s/solver.prototxt
+train_net: "trainval.prototxt"
+test_net: "test.prototxt"
+test_iter: 1111
+# make test net, but don't invoke it from the solver itself
+test_interval: 999999999
+display: 20
+average_loss: 20
+lr_policy: "fixed"
+# lr for unnormalized softmax
+base_lr: 1e-12
+# high momentum
+momentum: 0.99
+# no gradient accumulation
+iter_size: 1
+max_iter: 300000
+weight_decay: 0.0005
+test_initialization: false
--- a/siftflow-fcn8s/test.prototxt
+++ b/siftflow-fcn8s/test.prototxt
--- a/siftflow-fcn8s/trainval.prototxt
+++ b/siftflow-fcn8s/trainval.prototxt
--- a/siftflow_layers.py
+++ b/siftflow_layers.py
+import caffe
+
+import numpy as np
+from PIL import Image
+import scipy.io
+
+import random
+
+class SIFTFlowSegDataLayer(caffe.Layer):
+    """
+    Load (input image, label image) pairs from SIFT Flow
+    one-at-a-time while reshaping the net to preserve dimensions.
+
+    This data layer has three tops:
+
+    1. the data, pre-processed
+    2. the semantic labels 0-32 and void 255
+    3. the geometric labels 0-2 and void 255
+
+    Use this to feed data to a fully convolutional network.
+    """
+
+    def setup(self, bottom, top):
+        """
+        Setup data layer according to parameters:
+
+        - siftflow_dir: path to SIFT Flow dir
+        - split: train / val / test
+        - randomize: load in random order (default: True)
+        - seed: seed for randomization (default: None / current time)
+
+        for semantic segmentation of object and geometric classes.
+
+        example: params = dict(siftflow_dir="/path/to/siftflow", split="val")
+        """
+        # config
+        params = eval(self.param_str)
+        self.siftflow_dir = params['siftflow_dir']
+        self.split = params['split']
+        self.mean = np.array((114.578, 115.294, 108.353), dtype=np.float32)
+        self.random = params.get('randomize', True)
+        self.seed = params.get('seed', None)
+
+        # three tops: data, semantic, geometric
+        if len(top) != 3:
+            raise Exception("Need to define three tops: data, semantic label, and geometric label.")
+        # data layers have no bottoms
+        if len(bottom) != 0:
+            raise Exception("Do not define a bottom.")
+
+        # load indices for images and labels
+        split_f  = '{}/{}.txt'.format(self.siftflow_dir, self.split)
+        self.indices = open(split_f, 'r').read().splitlines()
+        self.idx = 0
+
+        # make eval deterministic
+        if 'train' not in self.split:
+            self.random = False
+
+        # randomization: seed and pick
+        if self.random:
+            random.seed(self.seed)
+            self.idx = random.randint(0, len(self.indices)-1)
+
+    def reshape(self, bottom, top):
+        # load image + label image pair
+        self.data = self.load_image(self.indices[self.idx])
+        self.label_semantic = self.load_label(self.indices[self.idx], label_type='semantic')
+        self.label_geometric = self.load_label(self.indices[self.idx], label_type='geometric')
+        # reshape tops to fit (leading 1 is for batch dimension)
+        top[0].reshape(1, *self.data.shape)
+        top[1].reshape(1, *self.label_semantic.shape)
+        top[2].reshape(1, *self.label_geometric.shape)
+
+    def forward(self, bottom, top):
+        # assign output
+        top[0].data[...] = self.data
+        top[1].data[...] = self.label_semantic
+        top[2].data[...] = self.label_geometric
+
+        # pick next input
+        if self.random:
+            self.idx = random.randint(0, len(self.indices)-1)
+        else:
+            self.idx += 1
+            if self.idx == len(self.indices):
+                self.idx = 0
+
+    def backward(self, top, propagate_down, bottom):
+        pass
+
+    def load_image(self, idx):
+        """
+        Load input image and preprocess for Caffe:
+        - cast to float
+        - switch channels RGB -> BGR
+        - subtract mean
+        - transpose to channel x height x width order
+        """
+        im = Image.open('{}/Images/spatial_envelope_256x256_static_8outdoorcategories/{}.jpg'.format(self.siftflow_dir, idx))
+        in_ = np.array(im, dtype=np.float32)
+        in_ = in_[:,:,::-1]
+        in_ -= self.mean
+        in_ = in_.transpose((2,0,1))
+        return in_
+
+    def load_label(self, idx, label_type=None):
+        """
+        Load label image as 1 x height x width integer array of label indices.
+        The leading singleton dimension is required by the loss.
+        """
+        if label_type == 'semantic':
+            label = scipy.io.loadmat('{}/SemanticLabels/spatial_envelope_256x256_static_8outdoorcategories/{}.mat'.format(self.siftflow_dir, idx))['S']
+        elif label_type == 'geometric':
+            label = scipy.io.loadmat('{}/GeoLabels/spatial_envelope_256x256_static_8outdoorcategories/{}.mat'.format(self.siftflow_dir, idx))['S']
+            label[label == -1] = 0
+        else:
+            raise Exception("Unknown label type: {}. Pick semantic or geometric.".format(label_type))
+        label = label.astype(np.uint8)
+        label -= 1  # rotate labels so classes start at 0, void is 255
+        label = label[np.newaxis, ...]
+        return label.copy()