Merge pull request #1 from chenyuntc/caffe

merge code using caffe pretrain model

Merge pull request #1 from chenyuntc/caffe
merge code using caffe pretrain model
a616f82b · 陈云 · GitHub · 89b3f465 · e4bf340a · a616f82b
10 changed file
--- a/README.MD
+++ b/README.MD
+A Pythonic, Extensible and Minimal Implemention of Faster RCNN Without Harming Performance
+
+##  Introduction
+
+This project is a **Simplified** Faster R-CNN implementation mostly based on [chainercv](https://github.com/chainer/chainercv) and Other [projects](#Acknowledgement) . It aims to:
+
+- Simplify the code (*Simple is better than complex*)
+- Make the code more straight forward (*Flat is better than nested*)
+- Match the performance reported in [ origin paper](https://arxiv.org/abs/1506.01497) (*Speed Counts and mAP Matters*)
+
+##  Performance
+
+- mAP
+
+VGG16 train on trainval and test on test, Note, the training show great randomness, you may need to train more epoch to reach the highest mAP. However, it should be easy to reach the lowerboud. It's also reported that train it with more epochs may 
+
+|              Implementation              | mAP         |
+| :--------------------------------------: | ----------- |
+| [origin paper](https://arxiv.org/abs/1506.01497) | 0.699       |
+| using caffe pretrained model (enable with`--caffe-pretrain`) | 0.700-0.708 |
+|    using torchvision pretrained model    | 0.690-0.701 |
+| model converted from [chainercv](https://github.com/chainer/chainercv/tree/master/examples/faster_rcnn) (reported 0.706) | 0.7053      |
+| current the best i've ever seen (ruotian's) | 0.711       |
+
+- Speed
+
+| Implementation                           | GPU      | Inference | Trainining |
+| ---------------------------------------- | -------- | --------- | ---------- |
+| [origin paper](https://arxiv.org/abs/1506.01497) | K40      | 5 fps     | NA         |
+| This                                     | TITAN Xp | 12 fps^*^ | 5-6 fps    |
+| [pytorch-faster-rcnn](https://github.com/ruotianluo/pytorch-faster-rcnn) | TITAN Xp | NA        | 5-6fps^**^ |
+
+\* include reading images from disk, preprocessing, etc. see `eval` in `train.py` for more detail.
+
+** it depends on the environment.
+
+**NOTE that** you should make sure you install cupy correctly to reach the benchmark.
+
+## Install Prerequisites
+
+- install PyTorch >=0.3 with GPU (code are gpu-only), refer to [official website](http://pytorch.org)
+- install cupy, you can install via `pip install` but it's better to read the [docs](https://docs-cupy.chainer.org/en/latest/install.html#install-cupy-with-cudnn-and-nccl) and make sure the environ is correctly set
+
+
+- install other dependencies:  `pip install -r requirements.txt `
+- Optional but recommended： build `nms_gpu_post`: `cd model/utils/nmspython3 build.py build_ext --inplace`
+- start vidom for visualize
+
+
+```
+nohup python3 -m visdom.server &
+```
+
+If you're in China and have encounter problem with visdom (i.e. timeout, blank screen), you may refer to [visdom issue](https://github.com/facebookresearch/visdom/issues/111#issuecomment-321743890), and a temporay solution provided by me
+
+## Demo
+
+download pretrained model from [..............................................] 
+
+see `demo.ipynb` for detail
+
+## Train
+
+### Data
+
+#### Pascal VOC2007
+
+1. Download the training, validation, test data and VOCdevkit
+
+   ```
+   wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
+   wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
+   wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
+   ```
+
+2. Extract all of these tars into one directory named `VOCdevkit`
+
+   ```
+   tar xvf VOCtrainval_06-Nov-2007.tar
+   tar xvf VOCtest_06-Nov-2007.tar
+   tar xvf VOCdevkit_08-Jun-2007.tar
+   ```
+
+3. It should have this basic structure
+
+   ```
+   $VOCdevkit/                           # development kit
+   $VOCdevkit/VOCcode/                   # VOC utility code
+   $VOCdevkit/VOC2007                    # image sets, annotations, etc.
+   # ... and several other directories ...
+   ```
+
+4.  specifiy the `voc_data_dir` in `config.py`, or pass it to program using argument like '--voc-data-dir=/path/to/VOCdevkit/VOC2007/' .
+
+#### COCO
+
+TBD
+
+### preprare caffe-pretrained vgg16
+
+if you want to use caffe-pretrain model, you can run:
+
+````
+python misc/convert_caffe_pretrain.py
+````
+
+then you should speicified where caffe-pretraind model `vgg16_caffe.pth` stored in `config.py`
+
+if you want to use torchvision pretrained model, you may skip this.
+
+### begin traininig
+
+```Bash
+make checkpoints/ # make dir for storing snapshots
+```
+
+
+
+```
+python3 train.py train --env='fasterrcnn-caffe' --plot-every=100 --caffe-pretrain
+```
+
+you may refer to `config.py` for more argument.
+
+Some Key arguments:
+
+- `--caffe-pretrain`=True: use caffe pretrain model  or use torchvision pretrained model(Default: torchvison)
+- `--plot-every=n`: visulize predict, loss etc every n batches.
+- `--env`: visdom env for visulization
+- `--voc_data_dir`: where the VOC data stored
+- `--use-drop`: use dropout in roi head, default without dropout
+- `--use-adam`: use adam instead of SGD, default SGD
+- `--load-path`: pretrained model path, default `None`, if it's specified, the pretrained model would be loaded.
+
+## Troubleshooting
+
+- visdom
+- dataloader/ulimit
+- cupy
+- vgg
+
+## TODO
+[] training on coco
+[] resnet
+[] replace cupy with THTensor+cffi?
+
+## Acknowledge
+This work builds on many excellent works, which include:
+
+- [Yusuke Niitani's ChainerCV](https://github.com/chainer/chainercv) 
+- [Ruotian Luo's pytorch-faster-rcnn](https://github.com/ruotianluo/pytorch-faster-rcnn) which based on [ Xinlei Chen's tf-faster-rcnn](https://github.com/endernewton/tf-faster-rcnn)
+- [faster-rcnn.pytorch by Jianwei Yang and Jiasen Lu](https://github.com/jwyang/faster-rcnn.pytorch).It's mainly based on [longcw's faster_rcnn_pytorch](https://github.com/longcw/faster_rcnn_pytorch)
+- All the above Repositories  have refer to [py-faster-rcnn by Ross Girshick and Sean Bell](https://github.com/rbgirshick/py-faster-rcnn)  either directly or indirectly. 
\ No newline at end of file
--- a/config.py
+++ b/config.py
@@ -50,6 +50,9 @@ class Config:
    # model
    load_path = None  # '/mnt/3/rpn.pth'

+    caffe_pretrain = False
+    caffe_pretrain_path = 'checkpoints/vgg16-caffe.pth'
+
    def _parse(self, kwargs):
        state_dict = self._state_dict()
        for k, v in kwargs.items():

--- a/data/dataset.py
+++ b/data/dataset.py
@@ -3,9 +3,38 @@ from .voc_dataset import VOCBboxDataset
 from skimage import transform as sktsf
 from torchvision import transforms as tvtsf
 from . import util
+import numpy as np
+from config import opt
 from util import array_tool as at


+def inverse_normalize(img):
+    if opt.caffe_pretrain:
+        img = img + (np.array([122.7717, 115.9465, 102.9801]).reshape(3,1,1))
+        return img[::-1, :, :]
+    # approximate un-normalize for visualize
+    return (img*0.225+0.45).clip(min=0,max=1)*255
+
+def pytorch_normalze(img):
+    """
+    https://github.com/pytorch/vision/issues/223
+    return appr -1~1 RGB
+    """
+    normalize = tvtsf.Normalize(mean=[0.485, 0.456, 0.406],
+                                    std=[0.229, 0.224, 0.225])
+    img = normalize(t.from_numpy(img))
+    return img.numpy()
+
+def caffe_normalize(img):
+    """
+    return appr -125-125 BGR
+    """
+    img = img[[2,1,0],:,:] #RGB-BGR
+    img = img*255
+    mean=np.array([122.7717, 115.9465, 102.9801]).reshape(3,1,1)
+    img = (img - mean).astype(np.float32, copy=True)
+    return img
+
 def preprocess(img, min_size=600, max_size=1000):
    """Preprocess an image for feature extraction.

@@ -30,22 +59,15 @@ def preprocess(img, min_size=600, max_size=1000):
    scale1 = min_size / min(H, W)
    scale2 = max_size / max(H, W)
    scale = min(scale1, scale2)
+    img = img / 255.
+    img = sktsf.resize(img, (C, H * scale, W * scale), mode='reflect')
    # both the longer and shorter should be less than
    # max_size and min_size
-    img = img / 256.
-    img = sktsf.resize(img, (C, H * scale, W * scale), mode='reflect')
-    normalize = tvtsf.Normalize(mean=[0.485, 0.456, 0.406],
-                                std=[0.229, 0.224, 0.225])
-
-    img = normalize(t.from_numpy(img))
-    return img.numpy()
-    # unNOTE: original implementation in chainer:
-    # mean=np.array([122.7717, 115.9465, 102.9801],
-    # img = (img - self.mean).astype(np.float32, copy=False)
-    # Answer: https://github.com/pytorch/vision/issues/223
-    # the input of vgg16 in pytorch:
-    # rgb 0 to 1, instead of bgr 0 to 255
-
+    if opt.caffe_pretrain:
+        normalize = caffe_normalize
+    else:
+        normalize = pytorch_normalze
+    return normalize(img)

 class Transform(object):

@@ -82,7 +104,7 @@ class Dataset():
        img, bbox, label, scale = self.tsf((ori_img, bbox, label))
        # TODO: check whose stride is negative to fix this instead copy all
        # some of the strides of a given numpy array are negative.
-        return img.copy(), bbox.copy(), label.copy(), scale, ori_img
+        return img.copy(), bbox.copy(), label.copy(), scale

    def __len__(self):
        return len(self.db)

--- a/model/ROIModule.py
+++ b/model/ROIModule.py
@@ -120,7 +120,8 @@ def test_roi_module():
        neq = (cc != variable.data.cpu().numpy())
        assert neq.sum() == 0, 'test failed: %s' % info

-    # chainer version
+    # chainer version,if you're going to run this
+    # pip install chainer 
    import chainer.functions as F
    from chainer import Variable
    x_cn = Variable(t2c(x))

--- a/model/faster_rcnn.py
+++ b/model/faster_rcnn.py
@@ -22,7 +22,6 @@ import torch as t
 import numpy as np
 import cupy as cp
 from util import array_tool as at
-# from chainercv.transforms.image.resize import resize
 from model.utils.bbox_tools import loc2bbox
 from model.utils.nms import non_maximum_suppression

@@ -195,7 +194,7 @@ class FasterRCNN(nn.Module):
        score = np.concatenate(score, axis=0).astype(np.float32)
        return bbox, label, score

-    def predict(self, imgs,visualize=False):
+    def predict(self, imgs,sizes=None,visualize=False):
        """Detect objects from images.

        This method predicts objects for each image.
@@ -226,14 +225,15 @@ class FasterRCNN(nn.Module):
        self.eval()
        if visualize:
            self.use_preset('visualize')
-        prepared_imgs = list()
-        sizes = list()
-        for img in imgs:
-            size = img.shape[1:]
-            img = preprocess(img.numpy())
-            prepared_imgs.append(img)
-            sizes.append(size)
-
+            prepared_imgs = list()
+            sizes = list()
+            for img in imgs:
+                size = img.shape[1:]
+                img = preprocess(at.tonumpy(img))
+                prepared_imgs.append(img)
+                sizes.append(size)
+        else:
+             prepared_imgs = imgs 
        bboxes = list()
        labels = list()
        scores = list()
@@ -278,81 +278,6 @@ class FasterRCNN(nn.Module):
        self.use_preset('evaluate')
        self.train()
        return bboxes, labels, scores
-    def predict2(self, prepared_imgs, sizes):
-        """Detect objects from images.
-
-        This method predicts objects for each image.
-
-        Args:
-            imgs (iterable of numpy.ndarray): Arrays holding images.
-                All images are in CHW and RGB format
-                and the range of their value is :math:`[0, 255]`.
-
-        Returns:
-           tuple of lists:
-           This method returns a tuple of three lists,
-           :obj:`(bboxes, labels, scores)`.
-
-           * **bboxes**: A list of float arrays of shape :math:`(R, 4)`, \
-               where :math:`R` is the number of bounding boxes in a image. \
-               Each bouding box is organized by \
-               :math:`(y_{min}, x_{min}, y_{max}, x_{max})` \
-               in the second axis.
-           * **labels** : A list of integer arrays of shape :math:`(R,)`. \
-               Each value indicates the class of the bounding box. \
-               Values are in range :math:`[0, L - 1]`, where :math:`L` is the \
-               number of the foreground classes.
-           * **scores** : A list of float arrays of shape :math:`(R,)`. \
-               Each value indicates how confident the prediction is.
-
-        """
-        self.eval()
-        # self.use_preset('visualize')
-        self.use_preset('evaluate')
-        bboxes = list()
-        labels = list()
-        scores = list()
-        for img, size in zip(prepared_imgs, sizes):
-            img = t.autograd.Variable(at.totensor(img).float()[None], volatile=True)
-            scale = img.shape[3] / size[1]
-            roi_cls_loc, roi_scores, rois, _ = self(img, scale=scale)
-            # We are assuming that batch size is 1.
-            # roi_cls_loc = at.tonumpy(roi_cls_locs)#.data.numpy()
-            roi_score = roi_scores.data
-            roi_cls_loc = roi_cls_loc.data
-            roi = at.totensor(rois) / scale
-
-            # Convert predictions to bounding boxes in image coordinates.
-            # Bounding boxes are scaled to the scale of the input images.
-            mean = t.Tensor(self.loc_normalize_mean).cuda(). \
-                repeat(self.n_class)[None]
-            std = t.Tensor(self.loc_normalize_std).cuda(). \
-                repeat(self.n_class)[None]
-
-            roi_cls_loc = (roi_cls_loc * std + mean)
-            roi_cls_loc = roi_cls_loc.view(-1, self.n_class, 4)
-            roi = roi.view(-1, 1, 4).expand_as(roi_cls_loc)
-            cls_bbox = loc2bbox(at.tonumpy(roi).reshape((-1, 4)),
-                                at.tonumpy(roi_cls_loc).reshape((-1, 4)))
-            cls_bbox = at.totensor(cls_bbox)
-            cls_bbox = cls_bbox.view(-1, self.n_class * 4)
-            # clip bounding box
-            cls_bbox[:, 0::2] = (cls_bbox[:, 0::2]).clamp(min=0, max=size[0])
-            cls_bbox[:, 1::2] = (cls_bbox[:, 1::2]).clamp(min=0, max=size[1])
-
-            prob = at.tonumpy(F.softmax(at.tovariable(roi_score), dim=1))
-
-            raw_cls_bbox = at.tonumpy(cls_bbox)
-            raw_prob = at.tonumpy(prob)
-
-            bbox, label, score = self._suppress(raw_cls_bbox, raw_prob)
-            bboxes.append(bbox)
-            labels.append(label)
-            scores.append(score)
-
-        # self.use_preset('evaluate')
-        self.train()
-        return bboxes, labels, scores

    def get_optimizer_group(self):
        self.lr1, self.lr2, self.lr3 = opt.lr1, opt.lr2, opt.lr3

--- a/model/faster_rcnn_vgg16.py
+++ b/model/faster_rcnn_vgg16.py
@@ -12,7 +12,14 @@ from config import opt

 def decom_vgg16(pretrained=True):
    # the 30th layer of features is relu of conv5_3
-    model = vgg16(pretrained)
+
+    if opt.caffe_pretrain:
+        model = vgg16(pretrained=False)
+        if not opt.load_path:
+            model.load_state_dict(t.load(opt.caffe_pretrain_path))
+    else:
+        model = vgg16(not opt.load_path)
+
    features = list(model.features)[:30]
    classifier = model.classifier

@@ -138,8 +145,10 @@ class FasterRCNNVGG16(FasterRCNN):
                 ratios=[0.5, 1, 2], anchor_scales=[8, 16, 32]
                 ):
                 
-        if opt.use_chainer:decom = decom_vgg16_chainer
-        else: decom = decom_vgg16
+        if opt.use_chainer:
+            decom = decom_vgg16_chainer
+        else: 
+            decom = decom_vgg16
        extractor, classifier = decom(not opt.load_path)

        rpn = RegionProposalNetwork(

--- a/train.py
+++ b/train.py
@@ -6,7 +6,7 @@ from tqdm import tqdm

 import torch as t
 from config import opt
-from data.dataset import Dataset, TestDataset
+from data.dataset import Dataset, TestDataset,inverse_normalize
 from model import FasterRCNNVGG16
 from torch.autograd import Variable
 from torch.utils import data as data_
@@ -22,7 +22,7 @@ def eval(dataloader, faster_rcnn, test_num=10000):
    gt_bboxes, gt_labels, gt_difficults = list(), list(), list()
    for ii, (imgs, sizes, gt_bboxes_, gt_labels_, gt_difficults_) in tqdm(enumerate(dataloader)):
        sizes = [sizes[0][0], sizes[1][0]]
-        pred_bboxes_, pred_labels_, pred_scores_ = faster_rcnn.predict2(imgs, [sizes])
+        pred_bboxes_, pred_labels_, pred_scores_ = faster_rcnn.predict(imgs, [sizes])
        gt_bboxes += list(gt_bboxes_.numpy())
        gt_labels += list(gt_labels_.numpy())
        gt_difficults += list(gt_difficults_.numpy())
@@ -67,7 +67,7 @@ def train(**kwargs):
    best_map = 0
    for epoch in range(opt.epoch):
        trainer.reset_meters()
-        for ii, (img, bbox_, label_, scale, ori_img) in tqdm(enumerate(dataloader)):
+        for ii, (img, bbox_, label_, scale) in tqdm(enumerate(dataloader)):
            scale = at.scalar(scale)
            img, bbox, label = img.cuda().float(), bbox_.cuda(), label_.cuda()
            img, bbox, label = Variable(img), Variable(bbox), Variable(label)
@@ -81,15 +81,15 @@ def train(**kwargs):
                trainer.vis.plot_many(trainer.get_meter_data())

                # plot groud truth bboxes
-                ori_img_ = (img * 0.225 + 0.45).clamp(min=0, max=1) * 255
-                gt_img = visdom_bbox(at.tonumpy(ori_img_)[0], 
-                                    at.tonumpy(bbox_)[0], 
-                                    label_[0].numpy())
+                ori_img_ = inverse_normalize(at.tonumpy(img[0]))
+                gt_img = visdom_bbox(ori_img_, 
+                                    at.tonumpy(bbox_[0]), 
+                                    at.tonumpy(label_[0]))
                trainer.vis.img('gt_img', gt_img)

                # plot predicti bboxes
-                _bboxes, _labels, _scores = trainer.faster_rcnn.predict(ori_img,visualize=True)
-                pred_img = visdom_bbox( at.tonumpy(ori_img[0]), 
+                _bboxes, _labels, _scores = trainer.faster_rcnn.predict([ori_img_],visualize=True)
+                pred_img = visdom_bbox( ori_img_, 
                                        at.tonumpy(_bboxes[0]),
                                        at.tonumpy(_labels[0]).reshape(-1), 
                                        at.tonumpy(_scores[0]))
@@ -107,8 +107,8 @@ def train(**kwargs):
        if eval_result['map'] > best_map:
            best_map = eval_result['map']
            best_path = trainer.save(best_map=best_map)
-        if epoch==9:
-            # trainer.load(best_path)
+        if epoch==10:
+            trainer.load(best_path)
            trainer.faster_rcnn.scale_lr(opt.lr_decay)
        # if epoch ==0:
        #     trainer.optimizer = trainer.faster_rcnn.get_optimizer()

--- a/train_as_chainer.py
+++ b/train_as_chainer.py
@@ -23,7 +23,7 @@ def eval(dataloader, faster_rcnn, test_num=10000):
    gt_bboxes, gt_labels, gt_difficults = list(), list(), list()
    for ii, (imgs, sizes, gt_bboxes_, gt_labels_, gt_difficults_) in tqdm(enumerate(dataloader)):
        sizes = [sizes[0][0], sizes[1][0]]
-        pred_bboxes_, pred_labels_, pred_scores_ = faster_rcnn.predict2(imgs, [sizes])
+        pred_bboxes_, pred_labels_, pred_scores_ = faster_rcnn.predict(imgs, [sizes])
        gt_bboxes += list(gt_bboxes_.numpy())
        gt_labels += list(gt_labels_.numpy())
        gt_difficults += list(gt_difficults_.numpy())
@@ -101,13 +101,13 @@ def train(**kwargs):
                trainer.vis.text(str(trainer.rpn_cm.value().tolist()), win='rpn_cm')
                # roi confusion matrix
                trainer.vis.img('roi_cm', at.totensor(trainer.roi_cm.conf, False).float())
-        if epoch<9:
-            continue
+        # if epoch<9:
+        #   continue

        eval_result = eval(test_dataloader, faster_rcnn, test_num=opt.test_num)
-
-        best_map = eval_result['map']
-        best_path = trainer.save(best_map=best_map)
+        if eval_result['map']>best_map:
+            best_map = eval_result['map']
+            best_path = trainer.save(best_map=best_map)

        if epoch ==9:
            trainer.faster_rcnn.scale_lr(opt.lr_decay)
@@ -118,7 +118,7 @@ def train(**kwargs):
                                            str(eval_result['map']), 
                                            str(trainer.get_meter_data()))
        trainer.vis.log(log_info)
-        if epoch == 14:break
+        if epoch == 19:break
        # t.save(trainer.state_dict(),'checkpoints/fasterrcnn_%s.pth' %epoch)
        # t.vis.save([opt.env])


--- a/train_fast.py
+++ b/train_fast.py
@@ -22,7 +22,7 @@ def eval(dataloader, faster_rcnn, test_num=10000):
    gt_bboxes, gt_labels, gt_difficults = list(), list(), list()
    for ii, (imgs, sizes, gt_bboxes_, gt_labels_, gt_difficults_) in tqdm(enumerate(dataloader)):
        sizes = [sizes[0][0], sizes[1][0]]
-        pred_bboxes_, pred_labels_, pred_scores_ = faster_rcnn.predict2(imgs, [sizes])
+        pred_bboxes_, pred_labels_, pred_scores_ = faster_rcnn.predict(imgs, [sizes])
        gt_bboxes += list(gt_bboxes_.numpy())
        gt_labels += list(gt_labels_.numpy())
        gt_difficults += list(gt_difficults_.numpy())

--- a/util/vis_tool.py
+++ b/util/vis_tool.py
@@ -59,18 +59,6 @@ def vis_image(img, ax=None):
 def vis_bbox(img, bbox, label=None, score=None, ax=None):
    """Visualize bounding boxes inside image.

-    Example:
-
-        >>> from chainercv.datasets import VOCDetectionDataset
-        >>> from chainercv.datasets import voc_bbox_label_names
-        >>> from chainercv.visualizations import vis_bbox
-        >>> import matplotlib.pyplot as plot
-        >>> dataset = VOCDetectionDataset()
-        >>> img, bbox, label = dataset[60]
-        >>> vis_bbox(img, bbox, label,
-        ...         label_names=voc_bbox_label_names)
-        >>> plot.show()
-
    Args:
        img (~numpy.ndarray): An array of shape :math:`(3, height, width)`.
            This is in RGB format and the range of its value is