Initial commit

2368d206 · Waleed Abdulla · 2368d206 · 2368d206 · 2368d206 · 2368d206
53 changed file
--- a/LICENSE
+++ b/LICENSE
+Mask R-CNN
+
+The MIT License (MIT)
+
+Copyright (c) 2017 Matterport, Inc.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
--- a/README.md
+++ b/README.md
+# Mask R-CNN for Object Detection and Segmentation
+
+This is an implementation of [Mask R-CNN](https://arxiv.org/abs/1703.06870) on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone.
+
+![Instance Segmentation Sample](assets/street.png)
+
+The repository includes:
+* Source code of Mask R-CNN built on FPN and ResNet101.
+* Training code for MS COCO
+* Pre-trained weights for MS COCO
+* Jupyter notebooks to visualize the detection pipeline at every step
+* ParallelModel class for multi-GPU training
+* Evaluation on MS COCO metrics (AP)
+* Example of training on your own dataset
+
+
+The code is documented and designed to be easy to extend. If you use it in your research, please consider referencing this repository. If you work on 3D vision, you might find our recently released [Matterport3D](https://matterport.com/blog/2017/09/20/announcing-matterport3d-research-dataset/) dataset useful as well.
+This dataset was created from 3D-reconstructed spaces captured by our customers who agreed to make them publicly available for academic use. You can see more examples [here](https://matterport.com/gallery/).
+
+
+# Getting Started
+* [demo.ipynb](/demo.ipynb) Is the easiest way to start. It shows an example of using a model pre-trained on MS COCO to segment objects in your own images.
+It includes code to run object detection and instance segmentation on arbitrary images.
+
+* [train_shapes.ipynb](train_shapes.ipynb) shows how to train Mask R-CNN on your own dataset. This notebook introduces a toy dataset (Shapes) to demonstrate training on a new dataset.
+
+* ([model.py](model.py), [utils.py](utils.py), [config.py](config.py)): These files contain the main Mask RCNN implementation. 
+
+
+* [inspect_data.ipynb](/inspect_data.ipynb). This notebook visualizes the different pre-processing steps
+to prepare the training data.
+
+* [inspect_model.ipynb](/inspect_model.ipynb) This notebook goes in depth into the steps performed to detect and segment objects. It provides visualizations of every step of the pipeline.
+
+* [inspect_weights.ipynb](/inspect_weights.ipynb)
+This notebooks inspects the weights of a trained model and looks for anomalies and odd patterns.
+
+
+# Step by Step Detection
+To help with debugging and understanding the model, there are 3 notebooks 
+([inspect_data.ipynb](inspect_data.ipynb), [inspect_model.ipynb](inspect_model.ipynb),
+[inspect_weights.ipynb](inspect_weights.ipynb)) that provide a lot of visualizations and allow running the model step by step to inspect the output at each point. Here are a few examples:
+
+
+
+## 1. Anchor sorting and filtering
+Visualizes every step of the first stage Region Proposal Network and displays positive and negative anchors along with anchor box refinement.
+![](assets/detection_anchors.png)
+
+## 2. Bounding Box Refinement
+This is an example of final detection boxes (dotted lines) and the refinement applied to them (solid lines) in the second stage.
+![](assets/detection_refinement.png)
+
+## 3. Mask Generation
+Examples of generated masks. These then get scaled and placed on the image in the right location.
+
+![](assets/detection_masks.png)
+
+## 4.Layer activations
+Often it's useful to inspect the activations at different layers to look for signs of trouble (all zeros or random noise).
+
+![](assets/detection_activations.png)
+
+## 5. Weight Histograms
+Another useful debugging tool is to inspect the weight histograms. These are included in the inspect_weights.ipynb notebook.
+
+![](assets/detection_histograms.png)
+
+## 6. Logging to TensorBoard
+TensorBoard is another great debugging and visualization tool. The model is configured to log losses and save weights at the end of every epoch.
+
+![](assets/detection_tensorboard.png)
+
+## 6. Composing the different pieces into a final result
+
+![](assets/detection_final.png)
+
+
+# Training on MS COCO
+We're providing pre-trained weights for MS COCO to make it easier to start. You can
+use those weights as a starting point to train your own variation on the network.
+Training and evaluation code is in coco.py. You can import this
+module in Jupyter notebook (see the provided notebooks for examples) or you
+can run it directly from the command line as such:
+
+```
+# Train a new model starting from pre-trained COCO weights
+python3 coco.py train --dataset=/path/to/coco/ --model=coco
+
+# Train a new model starting from ImageNet weights
+python3 coco.py train --dataset=/path/to/coco/ --model=imagenet
+
+# Continue training a model that you had trained earlier
+python3 coco.py train --dataset=/path/to/coco/ --model=/path/to/weights.h5
+
+# Continue training the last model you trained. This will find
+# the last trained weights in the model directory.
+python3 coco.py train --dataset=/path/to/coco/ --model=last
+```
+
+You can also run the COCO evaluation code with:
+```
+# Run COCO evaluation on the last trained model
+python3 coco.py evaluate --dataset=/path/to/coco/ --model=last
+```
+
+The training schedule, learning rate, and other parameters should be set in coco.py.
+
+
+# Training on Your Own Dataset
+To train the model on your own dataset you'll need to sub-class two classes:
+
+```Config```
+This class contains the default configuration. Subclass it and modify the attributes you need to change.
+
+```Dataset```
+This class provides a consistent way to work with any dataset. 
+It allows you to use new datasets for training without having to change 
+the code of the model. It also supports loading multiple datasets at the
+same time, which is useful if the objects you want to detect are not 
+all available in one dataset. 
+
+The ```Dataset``` class itself is the base class. To use it, create a new
+class that inherits from it and adds functions specific to your dataset.
+See the base `Dataset` class in utils.py and examples of extending it in train_shapes.ipynb and coco.py.
+
+## Differences from the Official Paper
+This implementation follows the Mask RCNN paper for the most part, but there are a few cases where we deviated in favor of code simplicity and generalization. These are some of the differences we're aware of. If you encounter other differences, please do let us know.
+
+* **Image Resizing:** To support training multiple images per batch we resize all images to the same size. For example, 1024x1024px on MS COCO. We preserve the aspect ratio, so if an image is not square we pad it with zeros. In the paper the resizing is done such that the smallest side is 800px and the largest is trimmed at 1000px.
+* **Bounding Boxes**: Some datasets provide bounding boxes and some provide masks only. To support training on multiple datasets we opted to ignore the bounding boxes that come with the dataset and generate them on the fly instead. We pick the smallest box that encapsulates all the pixels of the mask as the bounding box. This simplifies the implementation and also makes it easy to apply certain image augmentations that would otherwise be really hard to apply to bounding boxes, such as image rotation.
+
+    To validate this approach, we compared our computed bounding boxes to those provided by the COCO dataset.
+We found that ~2% of bounding boxes differed by 1px or more, ~0.05% differed by 5px or more, 
+and only 0.01% differed by 10px or more.
+
+* **Learning Rate:** The paper uses a learning rate of 0.02, but we found that to be
+too high, and often causes the weights to explode, especially when using a small batch
+size. It might be related to differences between how Caffe and TensorFlow compute 
+gradients (sum vs mean across batches and GPUs). Or, maybe the official model uses gradient
+clipping to avoid this issue. We do use gradient clipping, but don't set it too aggressively.
+We found that smaller learning rates converge faster anyway so we go with that.
+
+* **Anchor Strides:** The lowest level of the pyramid has a stride of 4px relative to the image, so anchors are created at every 4 pixel intervals. To reduce computation and memory load we adopt an anchor stride of 2, which cuts the number of anchors by 4 and doesn't have a significant effect on accuracy.
+
+## Contributing
+Contributions to this repository are welcome. Examples of things you can contribute:
+* Speed Improvements. Like re-writing some Python code in TensorFlow or Cython.
+* Training on other datasets.
+* Accuracy Improvements.
+* Visualizations and examples.
+
+You can also [join our team](https://matterport.com/careers/) and help us build even more projects like this one.
+
+
+## Requirements
+* Python 3.4+
+* TensorFlow 1.3+
+* Keras 2.0.8+
+* Jupyter Notebook
+* Numpy, skimage, scipy
+
+If you use Docker, the model has been verified to work on
+[this Docker container](https://hub.docker.com/r/waleedka/modern-deep-learning/).
+
+## Installation
+1. Clone this repository
+2. Download pre-trained COCO weights from the releases section of this repository.
+
+## More Examples
+![Sheep](assets/sheep.png)
+![Donuts](assets/donuts.png)
--- a/assets/detection_activations.png
+++ b/assets/detection_activations.png
--- a/assets/detection_anchors.png
+++ b/assets/detection_anchors.png
--- a/assets/detection_final.png
+++ b/assets/detection_final.png
--- a/assets/detection_histograms.png
+++ b/assets/detection_histograms.png
--- a/assets/detection_masks.png
+++ b/assets/detection_masks.png
--- a/assets/detection_refinement.png
+++ b/assets/detection_refinement.png
--- a/assets/detection_tensorboard.png
+++ b/assets/detection_tensorboard.png
--- a/assets/donuts.png
+++ b/assets/donuts.png
--- a/assets/sheep.png
+++ b/assets/sheep.png
--- a/assets/street.png
+++ b/assets/street.png
--- a/coco.py
+++ b/coco.py
+"""
+Mask R-CNN
+Configurations and data loading code for MS COCO.
+
+Copyright (c) 2017 Matterport, Inc.
+Licensed under the MIT License (see LICENSE for details)
+Written by Waleed Abdulla
+
+------------------------------------------------------------
+
+Usage: import the module (see Jupyter notebooks for examples), or run from
+       the command line as such:
+
+    # Train a new model starting from pre-trained COCO weights
+    python3 coco.py train --dataset=/path/to/coco/ --model=coco
+
+    # Train a new model starting from ImageNet weights
+    python3 coco.py train --dataset=/path/to/coco/ --model=imagenet
+
+    # Continue training a model that you had trained earlier
+    python3 coco.py train --dataset=/path/to/coco/ --model=/path/to/weights.h5
+
+    # Continue training the last model you trained
+    python3 coco.py train --dataset=/path/to/coco/ --model=last
+
+    # Run COCO evaluatoin on the last model you trained
+    python3 coco.py evaluate --dataset=/path/to/coco/ --model=last
+"""
+
+import os
+import time
+import numpy as np
+
+# Download and install the Python COCO tools from https://github.com/waleedka/coco
+# That's a fork from the original https://github.com/pdollar/coco with a bug
+# fix for Python 3.
+# I submitted a pull request https://github.com/cocodataset/cocoapi/pull/50
+# If the PR is merged then use the original repo.
+# Note: Edit PythonAPI/Makefile and replace "python" with "python3".
+from pycocotools.coco import COCO
+from pycocotools.cocoeval import COCOeval
+from pycocotools import mask as maskUtils
+
+from config import Config
+import utils
+import model as modellib
+
+# Root directory of the project
+ROOT_DIR = os.getcwd()
+
+# Path to trained weights file
+COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
+
+# Directory to save logs and trained model
+MODEL_DIR = os.path.join(ROOT_DIR, "logs")
+
+
+############################################################
+#  Configurations
+############################################################
+
+class CocoConfig(Config):
+    """Configuration for training on MS COCO.
+    Derives from the base Config class and overrides values specific
+    to the COCO dataset.
+    """
+    # Give the configuration a recognizable name
+    NAME = "coco"
+
+    # We use a GPU with 12GB memory, which can fit two images.
+    # Adjust down if you use a smaller GPU.
+    IMAGES_PER_GPU = 2
+
+    # Uncomment to train on 8 GPUs (default is 1)
+    # GPU_COUNT = 8
+
+    # Number of classes (including background)
+    NUM_CLASSES = 1 + 80  # COCO has 80 classes
+
+
+############################################################
+#  Dataset
+############################################################
+
+class CocoDataset(utils.Dataset):
+    def load_coco(self, dataset_dir, subset, class_ids=None,
+                  class_map=None, return_coco=False):
+        """Load a subset of the COCO dataset.
+        dataset_dir: The root directory of the COCO dataset.
+        subset: What to load (train, val, minival, val35k)
+        class_ids: If provided, only loads images that have the given classes.
+        class_map: TODO: Not implemented yet. Supports maping classes from
+            different datasets to the same class ID.
+        return_coco: If True, returns the COCO object.
+        """
+        # Path
+        image_dir = os.path.join(dataset_dir, "train2014" if subset == "train"
+                                 else "val2014")
+
+        # Create COCO object
+        json_path_dict = {
+            "train": "annotations/instances_train2014.json",
+            "val": "annotations/instances_val2014.json",
+            "minival": "annotations/instances_minival2014.json",
+            "val35k": "annotations/instances_valminusminival2014.json",
+        }
+        coco = COCO(os.path.join(dataset_dir, json_path_dict[subset]))
+
+        # Load all classes or a subset?
+        if not class_ids:
+            # All classes
+            class_ids = sorted(coco.getCatIds())
+
+        # All images or a subset?
+        if class_ids:
+            image_ids = []
+            for id in class_ids:
+                image_ids.extend(list(coco.getImgIds(catIds=[id])))
+            # Remove duplicates
+            image_ids = list(set(image_ids))
+        else:
+            # All images
+            image_ids = list(coco.imgs.keys())
+
+        # Add classes
+        for i in class_ids:
+            self.add_class("coco", i, coco.loadCats(i)[0]["name"])
+
+        # Add images
+        for i in image_ids:
+            self.add_image(
+                "coco", image_id=i,
+                path=os.path.join(image_dir, coco.imgs[i]['file_name']),
+                width=coco.imgs[i]["width"],
+                height=coco.imgs[i]["height"],
+                annotations=coco.loadAnns(coco.getAnnIds(imgIds=[i], iscrowd=False)))
+        if return_coco:
+            return coco
+
+    def load_mask(self, image_id):
+        """Load instance masks for the given image.
+
+        Different datasets use different ways to store masks. This
+        function converts the different mask format to one format
+        in the form of a bitmap [height, width, instances].
+
+        Returns:
+        masks: A bool array of shape [height, width, instance count] with
+            one mask per instance.
+        class_ids: a 1D array of class IDs of the instance masks.
+        """
+        # If not a COCO image, delegate to parent class.
+        image_info = self.image_info[image_id]
+        if image_info["source"] != "coco":
+            return super(self.__class__).load_mask(image_id)
+
+        instance_masks = []
+        class_ids = []
+        annotations = self.image_info[image_id]["annotations"]
+        # Build mask of shape [height, width, instance_count] and list
+        # of class IDs that correspond to each channel of the mask.
+        for annotation in annotations:
+            class_id = self.map_source_class_id(
+                "coco.{}".format(annotation['category_id']))
+            if class_id:
+                m = self.annToMask(annotation, image_info["height"],
+                                   image_info["width"])
+                # Some objects are so small that they're less than 1 pixel area
+                # and end up rounded out. Skip those objects.
+                if m.max() < 1:
+                    continue
+                instance_masks.append(m)
+                class_ids.append(class_id)
+
+        # Pack instance masks into an array
+        if class_ids:
+            mask = np.stack(instance_masks, axis=2)
+            class_ids = np.array(class_ids, dtype=np.int32)
+            return mask, class_ids
+        else:
+            # Call super class to return an empty mask
+            return super(self.__class__).load_mask(image_id)
+
+    def image_reference(self, image_id):
+        """Return a link to the image in the COCO Website."""
+        info = self.image_info[image_id]
+        if info["source"] == "coco":
+            return "http://cocodataset.org/#explore?id={}".format(info["id"])
+        else:
+            super(self.__class__).image_reference(self, image_id)
+
+    # The following two functions are from pycocotools with a few changes.
+
+    def annToRLE(self, ann, height, width):
+        """
+        Convert annotation which can be polygons, uncompressed RLE to RLE.
+        :return: binary mask (numpy 2D array)
+        """
+        segm = ann['segmentation']
+        if isinstance(segm, list):
+            # polygon -- a single object might consist of multiple parts
+            # we merge all parts into one mask rle code
+            rles = maskUtils.frPyObjects(segm, height, width)
+            rle = maskUtils.merge(rles)
+        elif isinstance(segm['counts'], list):
+            # uncompressed RLE
+            rle = maskUtils.frPyObjects(segm, height, width)
+        else:
+            # rle
+            rle = ann['segmentation']
+        return rle
+
+    def annToMask(self, ann, height, width):
+        """
+        Convert annotation which can be polygons, uncompressed RLE, or RLE to binary mask.
+        :return: binary mask (numpy 2D array)
+        """
+        rle = self.annToRLE(ann, height, width)
+        m = maskUtils.decode(rle)
+        return m
+
+
+############################################################
+#  COCO Evaluation
+############################################################
+
+def build_coco_results(dataset, image_ids, rois, class_ids, scores, masks):
+    """Arrange resutls to match COCO specs in http://cocodataset.org/#format
+    """
+    # If no results, return an empty list
+    if rois is None:
+        return []
+
+    results = []
+    for image_id in image_ids:
+        # Loop through detections
+        for i in range(rois.shape[0]):
+            class_id = class_ids[i]
+            score = scores[i]
+            bbox = np.around(rois[i], 1)
+            mask = masks[:, :, i]
+
+            result = {
+                "image_id": image_id,
+                "category_id": dataset.get_source_class_id(class_id, "coco"),
+                "bbox": [bbox[1], bbox[0], bbox[3]-bbox[1], bbox[2]-bbox[0]],
+                "score": score,
+                "segmentation": maskUtils.encode(np.asfortranarray(mask))
+            }
+            results.append(result)
+    return results
+
+
+def evaluate_coco(dataset, coco, eval_type="bbox", limit=0):
+    """Runs official COCO evaluation.
+    dataset: A Dataset object with valiadtion data
+    eval_type: "bbox" or "segm" for bounding box or segmentation evaluation
+    limit: if not 0, it's the number of images to use for evaluation
+    """
+    # Pick COCO images from the dataset
+    image_ids = dataset.image_ids
+
+    # Limit to a subset
+    if limit:
+        image_ids = image_ids[:limit]
+
+    # Get corresponding COCO image IDs.
+    coco_image_ids = [dataset.image_info[id]["id"] for id in image_ids]
+
+    t_prediction = 0
+    t_start = time.time()
+
+    results = []
+    for i, image_id in enumerate(image_ids):
+        # Load image
+        image = dataset.load_image(image_id)
+
+        # Run detection
+        t = time.time()
+        r = model.detect([image], verbose=0)[0]
+        t_prediction += (time.time() - t)
+
+        # Convert results to COCO format
+        image_results = build_coco_results(dataset, coco_image_ids[i:i+1],
+                                           r["rois"], r["class_ids"],
+                                           r["scores"], r["masks"])
+        results.extend(image_results)
+
+    # Load results. This modifies results with additional attributes.
+    coco_results = coco.loadRes(results)
+
+    # Evaluate
+    cocoEval = COCOeval(coco, coco_results, eval_type)
+    cocoEval.params.imgIds = coco_image_ids
+    cocoEval.evaluate()
+    cocoEval.accumulate()
+    cocoEval.summarize()
+
+    print("Prediction time: {}. Average {}/image".format(
+        t_prediction, t_prediction/len(image_ids)))
+    print("Total time: ", time.time() - t_start)
+
+
+
+############################################################
+#  Training
+############################################################
+
+if __name__ == '__main__':
+    import argparse
+
+    # Parse command line arguments
+    parser = argparse.ArgumentParser(
+        description='Train Mask R-CNN on MS COCO.')
+    parser.add_argument("command",
+                        metavar="<command>",
+                        help="'train' or 'evaluate' on MS COCO")
+    parser.add_argument('--dataset', required=True,
+                        metavar="/path/to/coco/",
+                        help='Directory of the MS-COCO dataset')
+    parser.add_argument('--model', required=True,
+                        metavar="/path/to/weights.h5",
+                        help="Path to weights .h5 file or 'coco'")
+    args = parser.parse_args()
+    print("Command: ", args.command)
+    print("Model: ", args.model)
+    print("Dataset: ", args.dataset)
+
+    # Configurations
+    if args.command == "train":
+        config = CocoConfig()
+    else:
+        class InferenceConfig(CocoConfig):
+            # Set batch size to 1 since we'll be running inference on
+            # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
+            GPU_COUNT = 1
+            IMAGES_PER_GPU = 1
+        config = InferenceConfig()
+    config.print()
+
+    # Create model
+    if args.command == "train":
+        model = modellib.MaskRCNN(mode="training", config=config,
+                                  model_dir=MODEL_DIR)
+    else:
+        model = modellib.MaskRCNN(mode="inference", config=config,
+                                  model_dir=MODEL_DIR)
+
+    # Select weights file to load
+    if args.model.lower() == "coco":
+        model_path = COCO_MODEL_PATH
+    elif args.model.lower() == "last":
+        # Find last trained weights
+        model_path = model.find_last()[1]
+    elif args.model.lower() == "imagenet":
+        # Start from ImageNet trained weights
+        model_path = model.get_imagenet_weights()
+    else:
+        model_path = args.model
+
+    # Load weights
+    print("Loading weights ", model_path)
+    model.load_weights(model_path, by_name=True)
+
+    # Train or evaluate
+    if args.command == "train":
+        # Training dataset. Use the training set and 35K from the
+        # validation set, as as in the Mask RCNN paper.
+        dataset_train = CocoDataset()
+        dataset_train.load_coco(args.dataset, "train")
+        dataset_train.load_coco(args.dataset, "val35k")
+        dataset_train.prepare()
+
+        # Validation dataset
+        dataset_val = CocoDataset()
+        dataset_val.load_coco(args.dataset, "minival")
+        dataset_val.prepare()
+
+        # This training schedule is an example. Update to fit your needs.
+
+        # Training - Stage 1
+        # Adjust epochs and layers as needed
+        print("Training network heads")
+        model.train(dataset_train, dataset_val,
+                    learning_rate=config.LEARNING_RATE,
+                    epochs=40,
+                    layers='heads')
+
+        # Training - Stage 2
+        # Finetune layers from ResNet stage 4 and up
+        print("Training Resnet layer 4+")
+        model.train(dataset_train, dataset_val,
+                    learning_rate=config.LEARNING_RATE / 10,
+                    epochs=100,
+                    layers='4+')
+
+        # Training - Stage 3
+        # Finetune layers from ResNet stage 3 and up
+        print("Training Resnet layer 3+")
+        model.train(dataset_train, dataset_val,
+                    learning_rate=config.LEARNING_RATE / 100,
+                    epochs=200,
+                    layers='all')
+
+    elif args.command == "evaluate":
+        # Validation dataset
+        dataset_val = CocoDataset()
+        coco = dataset_val.load_coco(args.dataset, "minival", return_coco=True)
+        dataset_val.prepare()
+
+        # TODO: evaluating on 500 images. Set to 0 to evaluate on all images.
+        evaluate_coco(dataset_val, coco, "bbox", limit=500)
+    else:
+        print("'{}' is not recognized. "
+              "Use 'train' or 'evaluate'".format(args.command))
--- a/config.py
+++ b/config.py
+"""
+Mask R-CNN
+Base Configurations class.
+
+Copyright (c) 2017 Matterport, Inc.
+Licensed under the MIT License (see LICENSE for details)
+Written by Waleed Abdulla
+"""
+
+import math
+import numpy as np
+
+
+# Base Configuration Class
+# Don't use this class directly. Instead, sub-class it and override
+# the configurations you need to change.
+
+class Config(object):
+    """Base configuration class. For custom configurations, create a
+    sub-class that inherits from this one and override properties
+    that need to be changed.
+    """
+    # Name the configurations. For example, 'COCO', 'Experiment 3', ...etc.
+    # Useful if your code needs to do things differently depending on which
+    # experiment is running.
+    NAME = None  # Override in sub-classes
+
+    # NUMBER OF GPUs to use. For CPU training, use 1
+    GPU_COUNT = 1
+
+    # Number of images to train with on each GPU. A 12GB GPU can typically
+    # handle 2 images of 1024x1024px.
+    # Adjust based on your GPU memory and image sizes. Use the highest
+    # number that your GPU can handle for best performance.
+    IMAGES_PER_GPU = 2
+
+    # Number of training steps per epoch
+    # This doesn't need to match the size of the training set. Tensorboard
+    # updates are saved at the end of each epoch, so setting this to a
+    # smaller number means getting more frequent TensorBoard updates.
+    # Validation stats are also calculated at each epoch end and they
+    # might take a while, so don't set this too small to avoid spending
+    # a lot of time on validation stats.
+    STEPS_PER_EPOCH = 1000
+
+    # Number of validation steps to run at the end of every training epoch.
+    # A bigger number improves accuracy of validation stats, but slows
+    # down the training.
+    VALIDATION_STPES = 50
+
+    # The strides of each layer of the FPN Pyramid. These values
+    # are based on a Resnet101 backbone.
+    BACKBONE_STRIDES = [4, 8, 16, 32, 64]
+
+    # Number of classification classes (including background)
+    NUM_CLASSES = 1  # Override in sub-classes
+
+    ###### RPN ######
+    # Length of square anchor side in pixels
+    RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512)
+
+    # Ratios of anchors at each cell (width/height)
+    # A value of 1 represents a square anchor, and 0.5 is a wide anchor
+    RPN_ANCHOR_RATIOS = [0.5, 1, 2]
+
+    # Anchor stride
+    # If 1 then anchors are created for each cell in the backbone feature map.
+    # If 2, then anchors are created for every other cell, and so on.
+    RPN_ANCHOR_STRIDE = 2
+
+    # How many anchors per image to use for RPN training
+    RPN_TRAIN_ANCHORS_PER_IMAGE = 256
+
+    # ROIs kept after non-maximum supression (training and inference)
+    POST_NMS_ROIS_TRAINING = 2000
+    POST_NMS_ROIS_INFERENCE = 1000
+
+    # If enabled, resizes instance masks to a smaller size to reduce
+    # memory load. Recommended when using high-resolution images.
+    USE_MINI_MASK = True
+    MINI_MASK_SHAPE = (56, 56)  # (height, width) of the mini-mask
+
+    # Input image resing
+    # Images are resized such that the smallest side is >= IMAGE_MIN_DIM and
+    # the longest side is <= IMAGE_MAX_DIM. In case both conditions can't
+    # be satisfied together the IMAGE_MAX_DIM is enforced.
+    IMAGE_MIN_DIM = 800
+    IMAGE_MAX_DIM = 1024
+    # If True, pad images with zeros such that they're (max_dim by max_dim)
+    IMAGE_PADDING = True  # currently, the False option is not supported
+
+    # Image mean (RGB)
+    MEAN_PIXEL = np.array([123.7, 116.8, 103.9])
+
+    # Number of ROIs per image to feed to classifier/mask heads
+    TRAIN_ROIS_PER_IMAGE = 128  # TODO: paper uses 512
+
+    # Percent of positive ROIs used to train classifier/mask heads
+    ROI_POSITIVE_RATIO = 0.33
+
+    # Pooled ROIs
+    POOL_SIZE = 7
+    MASK_POOL_SIZE = 14
+    MASK_SHAPE = [28, 28]
+
+    # Maximum number of ground truth instances to use in one image
+    MAX_GT_INSTANCES = 100
+
+    # Bounding box refinement standard deviation for RPN and final detections.
+    RPN_BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2])
+    BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2])
+
+    # Max number of final detections
+    DETECTION_MAX_INSTANCES = 100
+
+    # Minimum probability value to accept a detected instance
+    # ROIs below this threshold are skipped
+    DETECTION_MIN_CONFIDENCE = 0.7
+
+    # Non-maximum suppression threshold for detection
+    DETECTION_NMS_THRESHOLD = 0.3
+
+    # Learning rate and momentum
+    # The paper uses lr=0.02, but we found that to cause weights to explode often
+    LEARNING_RATE = 0.002
+    LEARNING_MOMENTUM = 0.9
+
+    # Weight decay regularization
+    WEIGHT_DECAY = 0.0001
+
+    # Use RPN ROIs or externally generated ROIs for training
+    # Keep this True for most situations. Set to False if you want to train
+    # the head branches on ROI generated by code rather than the ROIs from
+    # the RPN. For example, to debug the classifier head without having to
+    # train the RPN.
+    USE_RPN_ROIS = True
+
+    def __init__(self):
+        """Set values of computed attributes."""
+        # Effective batch size
+        self.BATCH_SIZE = self.IMAGES_PER_GPU * self.GPU_COUNT
+
+        # Input image size
+        self.IMAGE_SHAPE = np.array([self.IMAGE_MAX_DIM, self.IMAGE_MAX_DIM, 3])
+
+        # Compute backbone size from input image size
+        self.BACKBONE_SHAPES = np.array(
+            [[int(math.ceil(self.IMAGE_SHAPE[0] / stride)),
+              int(math.ceil(self.IMAGE_SHAPE[1] / stride))]
+             for stride in self.BACKBONE_STRIDES])
+
+    def print(self):
+        """Display Configuration values."""
+        print("\nConfigurations:")
+        for a in dir(self):
+            if not a.startswith("__") and not callable(getattr(self, a)):
+                print("{:30} {}".format(a, getattr(self, a)))
+        print("\n")
--- a/demo.ipynb
+++ b/demo.ipynb
--- a/images/1045023827_4ec3e8ba5c_z.jpg
+++ b/images/1045023827_4ec3e8ba5c_z.jpg
--- a/images/12283150_12d37e6389_z.jpg
+++ b/images/12283150_12d37e6389_z.jpg
--- a/images/2383514521_1fc8d7b0de_z.jpg
+++ b/images/2383514521_1fc8d7b0de_z.jpg
--- a/images/2502287818_41e4b0c4fb_z.jpg
+++ b/images/2502287818_41e4b0c4fb_z.jpg
--- a/images/2516944023_d00345997d_z.jpg
+++ b/images/2516944023_d00345997d_z.jpg
--- a/images/25691390_f9944f61b5_z.jpg
+++ b/images/25691390_f9944f61b5_z.jpg
--- a/images/262985539_1709e54576_z.jpg
+++ b/images/262985539_1709e54576_z.jpg
--- a/images/3132016470_c27baa00e8_z.jpg
+++ b/images/3132016470_c27baa00e8_z.jpg
--- a/images/3627527276_6fe8cd9bfe_z.jpg
+++ b/images/3627527276_6fe8cd9bfe_z.jpg
--- a/images/3651581213_f81963d1dd_z.jpg
+++ b/images/3651581213_f81963d1dd_z.jpg
--- a/images/3800883468_12af3c0b50_z.jpg
+++ b/images/3800883468_12af3c0b50_z.jpg
--- a/images/3862500489_6fd195d183_z.jpg
+++ b/images/3862500489_6fd195d183_z.jpg
--- a/images/3878153025_8fde829928_z.jpg
+++ b/images/3878153025_8fde829928_z.jpg
--- a/images/4410436637_7b0ca36ee7_z.jpg
+++ b/images/4410436637_7b0ca36ee7_z.jpg
--- a/images/4782628554_668bc31826_z.jpg
+++ b/images/4782628554_668bc31826_z.jpg
--- a/images/5951960966_d4e1cda5d0_z.jpg
+++ b/images/5951960966_d4e1cda5d0_z.jpg
--- a/images/6584515005_fce9cec486_z.jpg
+++ b/images/6584515005_fce9cec486_z.jpg
--- a/images/6821351586_59aa0dc110_z.jpg
+++ b/images/6821351586_59aa0dc110_z.jpg
--- a/images/7581246086_cf7bbb7255_z.jpg
+++ b/images/7581246086_cf7bbb7255_z.jpg
--- a/images/7933423348_c30bd9bd4e_z.jpg
+++ b/images/7933423348_c30bd9bd4e_z.jpg
--- a/images/8053677163_d4c8f416be_z.jpg
+++ b/images/8053677163_d4c8f416be_z.jpg
--- a/images/8239308689_efa6c11b08_z.jpg
+++ b/images/8239308689_efa6c11b08_z.jpg
--- a/images/8433365521_9252889f9a_z.jpg
+++ b/images/8433365521_9252889f9a_z.jpg
--- a/images/8512296263_5fc5458e20_z.jpg
+++ b/images/8512296263_5fc5458e20_z.jpg
--- a/images/8699757338_c3941051b6_z.jpg
+++ b/images/8699757338_c3941051b6_z.jpg
--- a/images/8734543718_37f6b8bd45_z.jpg
+++ b/images/8734543718_37f6b8bd45_z.jpg
--- a/images/8829708882_48f263491e_z.jpg
+++ b/images/8829708882_48f263491e_z.jpg
--- a/images/9118579087_f9ffa19e63_z.jpg
+++ b/images/9118579087_f9ffa19e63_z.jpg
--- a/images/9247489789_132c0d534a_z.jpg
+++ b/images/9247489789_132c0d534a_z.jpg
--- a/inspect_data.ipynb
+++ b/inspect_data.ipynb
--- a/inspect_model.ipynb
+++ b/inspect_model.ipynb
--- a/inspect_weights.ipynb
+++ b/inspect_weights.ipynb
--- a/model.py
+++ b/model.py
--- a/parallel_model.py
+++ b/parallel_model.py
+"""
+Mask R-CNN
+Multi-GPU Support for Keras.
+
+Copyright (c) 2017 Matterport, Inc.
+Licensed under the MIT License (see LICENSE for details)
+Written by Waleed Abdulla
+
+Ideas and a small code snippets from these sources:
+https://github.com/fchollet/keras/issues/2436
+https://medium.com/@kuza55/transparent-multi-gpu-training-on-tensorflow-with-keras-8b0016fd9012
+https://github.com/avolkov1/keras_experiments/blob/master/keras_exp/multigpu/
+https://github.com/fchollet/keras/blob/master/keras/utils/training_utils.py
+"""
+
+import tensorflow as tf
+import keras.backend as K
+import keras.layers as KL
+import keras.models as KM
+
+
+class ParallelModel(KM.Model):
+    """Subclasses the standard Keras Model and adds multi-GPU support.
+    It works by creating a copy of the model on each GPU. Then it slices
+    the inputs and sends a slice to each copy of the model, and then
+    merges the outputs together and applies the loss on the combined
+    outputs.
+    """
+
+    def __init__(self, keras_model, gpu_count):
+        """Class constructor.
+        keras_model: The Keras model to parallelize
+        gpu_count: Number of GPUs. Must be > 1
+        """
+        self.inner_model = keras_model
+        self.gpu_count = gpu_count
+        merged_outputs = self.make_parallel()
+        super(ParallelModel, self).__init__(inputs=self.inner_model.inputs,
+                                            outputs=merged_outputs)
+
+    def __getattribute__(self, attrname):
+        """Redirect loading and saving methods to the inner model. That's where
+        the weights are stored."""
+        if 'load' in attrname or 'save' in attrname:
+            return getattr(self.inner_model, attrname)
+        return super(ParallelModel, self).__getattribute__(attrname)
+
+    def summary(self, *args, **kwargs):
+        """Override summary() to display summaries of both, the wrapper
+        and inner models."""
+        super(ParallelModel, self).summary(*args, **kwargs)
+        self.inner_model.summary(*args, **kwargs)
+
+    def make_parallel(self):
+        """Creates a new wrapper model that consists of multiple replicas of
+        the original model placed on different GPUs.
+        """
+        # Slice inputs. Slice inputs on the CPU to avoid sending a copy
+        # of the full inputs to all GPUs. Saves on bandwidth and memory.
+        input_slices = {name: tf.split(x, self.gpu_count)
+                        for name, x in zip(self.inner_model.input_names,
+                                           self.inner_model.inputs)}
+
+        output_names = self.inner_model.output_names
+        outputs_all = []
+        for i in range(len(self.inner_model.outputs)):
+            outputs_all.append([])
+
+        # Run the model call() on each GPU to place the ops there
+        for i in range(self.gpu_count):
+            with tf.device('/gpu:%d' % i):
+                with tf.name_scope('tower_%d' % i):
+                    # Run a slice of inputs through this replica
+                    zipped_inputs = zip(self.inner_model.input_names,
+                                        self.inner_model.inputs)
+                    inputs = [
+                        KL.Lambda(lambda s: input_slices[name][i],
+                                  output_shape=lambda s: (None,)+s[1:])(tensor)
+                        for name, tensor in zipped_inputs]
+                    # Create the model replica and get the outputs
+                    outputs = self.inner_model(inputs)
+                    if not isinstance(outputs, list):
+                        outputs = [outputs]
+                    # Save the outputs for merging back together later
+                    for l, o in enumerate(outputs):
+                        outputs_all[l].append(o)
+
+        # Merge outputs on CPU
+        with tf.device('/cpu:0'):
+            merged = []
+            for outputs, name in zip(outputs_all, output_names):
+                # If outputs are numbers without dimensions, add a batch dim.
+                def add_dim(tensor):
+                    """Add a dimension to tensors that don't have any."""
+                    if K.int_shape(tensor) == ():
+                        return KL.Lambda(lambda t: K.reshape(t, [1, 1]))(tensor)
+                    return tensor
+                outputs = list(map(add_dim, outputs))
+
+                # Concatenate
+                merged.append(KL.Concatenate(axis=0, name=name)(outputs))
+        return merged
+
+
+if __name__ == "__main__":
+    # Testing code below. It creates a simple model to train on MNIST and
+    # tries to run it on 2 GPUs. It saves the graph so it can be viewed
+    # in TensorBoard. Run it as:
+    #
+    # python3 parallel_model.py
+
+    import os
+    import numpy as np
+    import keras.optimizers
+    from keras.datasets import mnist
+    from keras.preprocessing.image import ImageDataGenerator
+
+    GPU_COUNT = 2
+
+    # Root directory of the project
+    ROOT_DIR = os.getcwd()
+
+    # Directory to save logs and trained model
+    MODEL_DIR = os.path.join(ROOT_DIR, "logs/parallel")
+
+    def build_model(x_train, num_classes):
+        # Reset default graph. Keras leaves old ops in the graph,
+        # which are ignored for execution but clutter graph
+        # visualization in TensorBoard.
+        tf.reset_default_graph()
+
+        inputs = KL.Input(shape=x_train.shape[1:], name="input_image")
+        x = KL.Conv2D(32, (3, 3), activation='relu', padding="same",
+                      name="conv1")(inputs)
+        x = KL.Conv2D(64, (3, 3), activation='relu', padding="same",
+                      name="conv2")(x)
+        x = KL.MaxPooling2D(pool_size=(2, 2), name="pool1")(x)
+        x = KL.Flatten(name="flat1")(x)
+        x = KL.Dense(128, activation='relu', name="dense1")(x)
+        x = KL.Dense(num_classes, activation='softmax', name="dense2")(x)
+
+        return KM.Model(inputs, x, "digit_classifier_model")
+
+    # Load MNIST Data
+    (x_train, y_train), (x_test, y_test) = mnist.load_data()
+    x_train = np.expand_dims(x_train, -1).astype('float32') / 255
+    x_test = np.expand_dims(x_test, -1).astype('float32') / 255
+
+    print('x_train shape:', x_train.shape)
+    print('x_test shape:', x_test.shape)
+
+    # Build data generator and model
+    datagen = ImageDataGenerator()
+    model = build_model(x_train, 10)
+
+    # Add multi-GPU support.
+    model = ParallelModel(model, GPU_COUNT)
+
+    optimizer = keras.optimizers.SGD(lr=0.01, momentum=0.9, clipnorm=5.0)
+
+    model.compile(loss='sparse_categorical_crossentropy',
+                  optimizer=optimizer, metrics=['accuracy'])
+
+    model.summary()
+
+    # Train
+    model.fit_generator(
+        datagen.flow(x_train, y_train, batch_size=64),
+        steps_per_epoch=50, epochs=10, verbose=1,
+        validation_data=(x_test, y_test),
+        callbacks=[keras.callbacks.TensorBoard(log_dir=MODEL_DIR,
+                                               write_graph=True)]
+    )
--- a/shapes.py
+++ b/shapes.py
+"""
+Mask R-CNN
+Configurations and data loading code for the synthetic Shapes dataset.
+This is a duplicate of the code in the noteobook train_shapes.ipynb for easy
+import into other notebooks, such as inspect_model.ipynb.
+
+Copyright (c) 2017 Matterport, Inc.
+Licensed under the MIT License (see LICENSE for details)
+Written by Waleed Abdulla
+"""
+
+import math
+import random
+import numpy as np
+import cv2
+
+from config import Config
+import utils
+
+
+class ShapesConfig(Config):
+    """Configuration for training on the toy shapes dataset.
+    Derives from the base Config class and overrides values specific
+    to the toy shapes dataset.
+    """
+    # Give the configuration a recognizable name
+    NAME = "shapes"
+
+    # Train on 1 GPU and 8 images per GPU. We can put multiple images on each
+    # GPU because the images are small. Batch size is 8 (GPUs * images/GPU).
+    GPU_COUNT = 1
+    IMAGES_PER_GPU = 8
+
+    # Number of classes (including background)
+    NUM_CLASSES = 1 + 3  # background + 3 shapes
+
+    # Use small images for faster training. Set the limits of the small side
+    # the large side, and that determines the image shape.
+    IMAGE_MIN_DIM = 128
+    IMAGE_MAX_DIM = 128
+
+    # Use smaller anchors because our image and objects are small
+    RPN_ANCHOR_SCALES = (8, 16, 32, 64, 128)  # anchor side in pixels
+
+    # Reduce training ROIs per image because the images are small and have
+    # few objects. Aim to allow ROI sampling to pick 33% positive ROIs.
+    TRAIN_ROIS_PER_IMAGE = 32
+
+    # Use a small epoch since the data is simple
+    STEPS_PER_EPOCH = 100
+
+    # use small validation steps since the epoch is small
+    VALIDATION_STPES = 5
+
+
+class ShapesDataset(utils.Dataset):
+    """Generates the shapes synthetic dataset. The dataset consists of simple
+    shapes (triangles, squares, circles) placed randomly on a blank surface.
+    The images are generated on the fly. No file access required.
+    """
+
+    def load_shapes(self, count, height, width):
+        """Generate the requested number of synthetic images.
+        count: number of images to generate.
+        height, width: the size of the generated images.
+        """
+        # Add classes
+        self.add_class("shapes", 1, "square")
+        self.add_class("shapes", 2, "circle")
+        self.add_class("shapes", 3, "triangle")
+
+        # Add images
+        # Generate random specifications of images (i.e. color and
+        # list of shapes sizes and locations). This is more compact than
+        # actual images. Images are generated on the fly in load_image().
+        for i in range(count):
+            bg_color, shapes = self.random_image(height, width)
+            self.add_image("shapes", image_id=i, path=None,
+                           width=width, height=height,
+                           bg_color=bg_color, shapes=shapes)
+
+    def load_image(self, image_id):
+        """Generate an image from the specs of the given image ID.
+        Typically this function loads the image from a file, but
+        in this case it generates the image on the fly from the
+        specs in image_info.
+        """
+        info = self.image_info[image_id]
+        bg_color = np.array(info['bg_color']).reshape([1, 1, 3])
+        image = np.ones([info['height'], info['width'], 3], dtype=np.uint8)
+        image = image * bg_color.astype(np.uint8)
+        for shape, color, dims in info['shapes']:
+            image = self.draw_shape(image, shape, dims, color)
+        return image
+
+    def image_reference(self, image_id):
+        """Return the shapes data of the image."""
+        info = self.image_info[image_id]
+        if info["source"] == "shapes":
+            return info["shapes"]
+        else:
+            super(self.__class__).image_reference(self, image_id)
+
+    def load_mask(self, image_id):
+        """Generate instance masks for shapes of the given image ID.
+        """
+        info = self.image_info[image_id]
+        shapes = info['shapes']
+        count = len(shapes)
+        mask = np.zeros([info['height'], info['width'], count], dtype=np.uint8)
+        for i, (shape, _, dims) in enumerate(info['shapes']):
+            mask[:, :, i:i+1] = self.draw_shape(mask[:, :, i:i+1].copy(),
+                                                shape, dims, 1)
+        # Handle occlusions
+        occlusion = np.logical_not(mask[:, :, -1]).astype(np.uint8)
+        for i in range(count-2, -1, -1):
+            mask[:, :, i] = mask[:, :, i] * occlusion
+            occlusion = np.logical_and(occlusion, np.logical_not(mask[:, :, i]))
+        # Map class names to class IDs.
+        class_ids = np.array([self.class_names.index(s[0]) for s in shapes])
+        return mask, class_ids.astype(np.int32)
+
+    def draw_shape(self, image, shape, dims, color):
+        """Draws a shape from the given specs."""
+        # Get the center x, y and the size s
+        x, y, s = dims
+        if shape == 'square':
+            image = cv2.rectangle(image, (x-s, y-s), (x+s, y+s), color, -1)
+        elif shape == "circle":
+            image = cv2.circle(image, (x, y), s, color, -1)
+        elif shape == "triangle":
+            points = np.array([[(x, y-s),
+                                (x-s/math.sin(math.radians(60)), y+s),
+                                (x+s/math.sin(math.radians(60)), y+s),
+                                ]], dtype=np.int32)
+            image = cv2.fillPoly(image, points, color)
+        return image
+
+    def random_shape(self, height, width):
+        """Generates specifications of a random shape that lies within
+        the given height and width boundaries.
+        Returns a tuple of three valus:
+        * The shape name (square, circle, ...)
+        * Shape color: a tuple of 3 values, RGB.
+        * Shape dimensions: A tuple of values that define the shape size
+                            and location. Differs per shape type.
+        """
+        # Shape
+        shape = random.choice(["square", "circle", "triangle"])
+        # Color
+        color = tuple([random.randint(0, 255) for _ in range(3)])
+        # Center x, y
+        buffer = 20
+        y = random.randint(buffer, height - buffer - 1)
+        x = random.randint(buffer, width - buffer - 1)
+        # Size
+        s = random.randint(buffer, height//4)
+        return shape, color, (x, y, s)
+
+    def random_image(self, height, width):
+        """Creates random specifications of an image with multiple shapes.
+        Returns the background color of the image and a list of shape
+        specifications that can be used to draw the image.
+        """
+        # Pick random background color
+        bg_color = np.array([random.randint(0, 255) for _ in range(3)])
+        # Generate a few random shapes and record their
+        # bounding boxes
+        shapes = []
+        boxes = []
+        N = random.randint(1, 4)
+        for _ in range(N):
+            shape, color, dims = self.random_shape(height, width)
+            shapes.append((shape, color, dims))
+            x, y, s = dims
+            boxes.append([y-s, x-s, y+s, x+s])
+        # Apply non-max suppression wit 0.3 threshold to avoid
+        # shapes covering each other
+        keep_ixs = utils.non_max_suppression(np.array(boxes), np.arange(N), 0.3)
+        shapes = [s for i, s in enumerate(shapes) if i in keep_ixs]
+        return bg_color, shapes
--- a/train_shapes.ipynb
+++ b/train_shapes.ipynb
--- a/utils.py
+++ b/utils.py
--- a/visualize.py
+++ b/visualize.py