GPU-Suport: Mask-RCNN + Minor GPU fixes (#2714)

* fixed cpu mask rcnn+preparation for gpu * fix-limit gpu memory to 30% of total memory per worker Co-authored-by: N Nikita Manovich <nikita.manovich@intel.com>

GPU-Suport: Mask-RCNN + Minor GPU fixes (#2714)
* fixed cpu mask rcnn+preparation for gpu * fix-limit gpu memory to 30% of total memory per worker Co-authored-by: N Nikita Manovich <nikita.manovich@intel.com>
59c3b281 · Ali Jahani · GitHub · daedff42 · 59c3b281 · 59c3b281
9 changed file
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added

 - CVAT-3D: support lidar data on the server side (<https://github.com/openvinotoolkit/cvat/pull/2534>)
+- GPU support for Mask-RCNN and improvement in its deployment time (<https://github.com/openvinotoolkit/cvat/pull/2714>)
 - CVAT-3D: Load all frames corresponding to the job instance
  (<https://github.com/openvinotoolkit/cvat/pull/2645>)
 - Intelligent scissors with OpenCV javascript (<https://github.com/openvinotoolkit/cvat/pull/2689>)
@@ -23,7 +24,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 - Updated HTTPS install README section (cleanup and described more robust deploy)
 - Logstash is improved for using with configurable elasticsearch outputs (<https://github.com/openvinotoolkit/cvat/pull/2531>)
- Bumped nuclio version to 1.5.16
+- Bumped nuclio version to 1.5.16 (<https://github.com/openvinotoolkit/cvat/pull/2578>)
 - All methods for interative segmentation accept negative points as well
 - Persistent queue added to logstash (<https://github.com/openvinotoolkit/cvat/pull/2744>)

@@ -36,7 +37,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 -

 ### Fixed
-
+- More robust execution of nuclio GPU functions by limiting the GPU memory consumption per worker (<https://github.com/openvinotoolkit/cvat/pull/2714>)
 - Kibana startup initialization (<https://github.com/openvinotoolkit/cvat/pull/2659>)
 - The cursor jumps to the end of the line when renaming a task (<https://github.com/openvinotoolkit/cvat/pull/2669>)
 - SSLCertVerificationError when remote source is used (<https://github.com/openvinotoolkit/cvat/pull/2683>)

--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -122,10 +122,10 @@ You develop CVAT under WSL (Windows subsystem for Linux) following next steps.

 ### DL models as serverless functions

-Install [nuclio platform](https://github.com/nuclio/nuclio):
+Follow this [guide](/cvat/apps/documentation/installation_automatic_annotation.md) to install Nuclio:

 - You have to install `nuctl` command line tool to build and deploy serverless
-  functions. Download [the latest release](https://github.com/nuclio/nuclio/blob/development/docs/reference/nuctl/nuctl.md#download).
+  functions.
 - The simplest way to explore Nuclio is to run its graphical user interface (GUI)
  of the Nuclio dashboard. All you need in order to run the dashboard is Docker. See
  [nuclio documentation](https://github.com/nuclio/nuclio#quick-start-steps)

--- a/README.md
+++ b/README.md
@@ -80,7 +80,7 @@ For more information about supported formats look at the
 | [f-BRS](/serverless/pytorch/saic-vul/fbrs/nuclio)                                                       | interactor | PyTorch    | X   |     |
 | [Inside-Outside Guidance](/serverless/pytorch/shiyinzhang/iog/nuclio)                                   | interactor | PyTorch    | X   |     |
 | [Faster RCNN](/serverless/tensorflow/faster_rcnn_inception_v2_coco/nuclio)                              | detector   | TensorFlow | X   | X   |
-| [Mask RCNN](/serverless/tensorflow/matterport/mask_rcnn/nuclio)                                         | detector   | TensorFlow | X   |     |
+| [Mask RCNN](/serverless/tensorflow/matterport/mask_rcnn/nuclio)                                         | detector   | TensorFlow | X   | X   |

 <!--lint enable maximum-line-length-->


--- a/cvat/apps/documentation/installation.md
+++ b/cvat/apps/documentation/installation.md
@@ -290,7 +290,7 @@ docker-compose -f docker-compose.yml \

 ### Semi-automatic and automatic annotation

-Please follow [instructions](/cvat/apps/documentation/installation_automatic_annotation.md)
+Please follow this [guide](/cvat/apps/documentation/installation_automatic_annotation.md).

 ### Stop all containers


--- a/cvat/apps/documentation/installation_automatic_annotation.md
+++ b/cvat/apps/documentation/installation_automatic_annotation.md
@@ -53,47 +53,80 @@
  - See [deploy_cpu.sh](/serverless/deploy_cpu.sh) for more examples.

  #### GPU Support
-
-  You will need to install Nvidia Container Toolkit and make sure your docker supports GPU. Follow [Nvidia docker instructions](https://www.tensorflow.org/install/docker#gpu_support).
-  Also you will need to add `--resource-limit nvidia.com/gpu=1` to the nuclio deployment command.
+  You will need to install [Nvidia Container Toolkit](https://www.tensorflow.org/install/docker#gpu_support).
+  Also you will need to add `--resource-limit nvidia.com/gpu=1 --triggers '{"myHttpTrigger": {"maxWorkers": 1}}'` to
+  the nuclio deployment command. You can increase the maxWorker if you have enough GPU memory.
  As an example, below will run on the GPU:

  ```bash
-  nuctl deploy tf-faster-rcnn-inception-v2-coco-gpu \
-    --project-name cvat --path "serverless/tensorflow/faster_rcnn_inception_v2_coco/nuclio" --platform local \
-    --base-image tensorflow/tensorflow:2.1.1-gpu \
-    --desc "Faster RCNN from Tensorflow Object Detection GPU API" \
-    --image cvat/tf.faster_rcnn_inception_v2_coco_gpu \
+  nuctl deploy --project-name cvat \
+    --path `pwd`/tensorflow/matterport/mask_rcnn/nuclio \
+    --platform local --base-image tensorflow/tensorflow:1.15.5-gpu-py3 \
+    --desc "GPU based implementation of Mask RCNN on Python 3, Keras, and TensorFlow." \
+    --image cvat/tf.matterport.mask_rcnn_gpu
+    --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \
    --resource-limit nvidia.com/gpu=1
  ```

  **Note:**
-
-  - Since the model is loaded during deployment, the number of GPU functions you can deploy will be limited to your GPU memory.
-
+  - The number of GPU deployed functions will be limited to your GPU memory.
  - See [deploy_gpu.sh](/serverless/deploy_gpu.sh) script for more examples.

-####Debugging Nuclio Functions:
+**Troubleshooting Nuclio Functions:**

 - You can open nuclio dashboard at [localhost:8070](http://localhost:8070). Make sure status of your functions are up and running without any error.
+- Test your deployed DL model as a serverless function. The command below should work on Linux and Mac OS.
+
+  ```bash
+  image=$(curl https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png --output - | base64 | tr -d '\n')
+  cat << EOF > /tmp/input.json
+  {"image": "$image"}
+  EOF
+  cat /tmp/input.json | nuctl invoke openvino.omz.public.yolo-v3-tf -c 'application/json'
+  ```

- To check for internal server errors, run `docker ps -a` to see the list of containers. Find the container that you are interested, e.g. `nuclio-nuclio-tf-faster-rcnn-inception-v2-coco-gpu`. Then check its logs by
+  <details>

  ```bash
-  docker logs <name of your container>
+  20.07.17 12:07:44.519    nuctl.platform.invoker (I) Executing function {"method": "POST", "url": "http://:57308", "headers": {"Content-Type":["application/json"],"X-Nuclio-Log-Level":["info"],"X-Nuclio-Target":["openvino.omz.public.yolo-v3-tf"]}}
+  20.07.17 12:07:45.275    nuctl.platform.invoker (I) Got response {"status": "200 OK"}
+  20.07.17 12:07:45.275                     nuctl (I) >>> Start of function logs
+  20.07.17 12:07:45.275 ino.omz.public.yolo-v3-tf (I) Run yolo-v3-tf model {"worker_id": "0", "time": 1594976864570.9353}
+  20.07.17 12:07:45.275                     nuctl (I) <<< End of function logs
+
+  > Response headers:
+  Date = Fri, 17 Jul 2020 09:07:45 GMT
+  Content-Type = application/json
+  Content-Length = 100
+  Server = nuclio
+
+  > Response body:
+  [
+      {
+          "confidence": "0.9992254",
+          "label": "person",
+          "points": [
+              39,
+              124,
+              408,
+              512
+          ],
+          "type": "rectangle"
+      }
+  ]
  ```
+  </details>

+- To check for internal server errors, run `docker ps -a` to see the list of containers.
+  Find the container that you are interested, e.g., `nuclio-nuclio-tf-faster-rcnn-inception-v2-coco-gpu`.
+  Then check its logs by `docker logs <name of your container>`
  e.g.,
-
  ```bash
  docker logs nuclio-nuclio-tf-faster-rcnn-inception-v2-coco-gpu
  ```

- If you would like to debug a code inside a container, you can use vscode to directly attach to a container [instructions](https://code.visualstudio.com/docs/remote/attach-container). To apply your changes, make sure to restart the container.
-
+- To debug a code inside a container, you can use vscode to attach to a container [instructions](https://code.visualstudio.com/docs/remote/attach-container).
+  To apply your changes, make sure to restart the container.
  ```bash
  docker restart <name_of_the_container>
  ```
-
-  > **⚠ WARNING:**
-  > Do not use nuclio dashboard to stop the container because with any modifications, it rebuilds the container and you will lose your changes.
--- a/serverless/deploy_gpu.sh
+++ b/serverless/deploy_gpu.sh
@@ -8,8 +8,18 @@ nuctl create project cvat
 nuctl deploy --project-name cvat \
    --path "$SCRIPT_DIR/tensorflow/faster_rcnn_inception_v2_coco/nuclio" \
    --platform local --base-image tensorflow/tensorflow:2.1.1-gpu \
-    --desc "Faster RCNN from Tensorflow Object Detection GPU API" \
+    --desc "GPU based Faster RCNN from Tensorflow Object Detection API" \
    --image cvat/tf.faster_rcnn_inception_v2_coco_gpu \
-    --resource-limit nvidia.com/gpu=1
+    --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \
+    --resource-limit nvidia.com/gpu=1 --verbose
+
+nuctl deploy --project-name cvat \
+    --path "$SCRIPT_DIR/tensorflow/matterport/mask_rcnn/nuclio" \
+    --platform local --base-image tensorflow/tensorflow:1.15.5-gpu-py3 \
+    --desc "GPU based implementation of Mask RCNN on Python 3, Keras, and TensorFlow." \
+    --image cvat/tf.matterport.mask_rcnn_gpu\
+    --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \
+    --resource-limit nvidia.com/gpu=1 --verbose
+

 nuctl get function
--- a/serverless/tensorflow/faster_rcnn_inception_v2_coco/nuclio/model_loader.py
+++ b/serverless/tensorflow/faster_rcnn_inception_v2_coco/nuclio/model_loader.py
@@ -15,9 +15,10 @@ class ModelLoader:
                serialized_graph = fid.read()
                od_graph_def.ParseFromString(serialized_graph)
                tf.import_graph_def(od_graph_def, name='')
-
-            config = tf.ConfigProto()
-            config.gpu_options.allow_growth = True
+            gpu_fraction = 0.333
+            gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=gpu_fraction,
+                                        allow_growth=True)
+            config = tf.ConfigProto(gpu_options=gpu_options)
            self.session = tf.Session(graph=detection_graph, config=config)

            self.image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')

--- a/serverless/tensorflow/matterport/mask_rcnn/nuclio/function.yaml
+++ b/serverless/tensorflow/matterport/mask_rcnn/nuclio/function.yaml
@@ -102,22 +102,19 @@ spec:
      value: /opt/nuclio/Mask_RCNN
  build:
    image: cvat/tf.matterport.mask_rcnn
-    baseImage: tensorflow/tensorflow:2.1.0-py3
+    baseImage: tensorflow/tensorflow:1.13.1-py3
    directives:
      postCopy:
        - kind: WORKDIR
          value: /opt/nuclio
        - kind: RUN
-          value: apt update && apt install --no-install-recommends -y git curl libsm6 libxext6 libgl1-mesa-glx
+          value: apt update && apt install --no-install-recommends -y git curl
        - kind: RUN
-          value: git clone https://github.com/matterport/Mask_RCNN.git
+          value: git clone --depth 1 https://github.com/matterport/Mask_RCNN.git
        - kind: RUN
          value: curl -L https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 -o Mask_RCNN/mask_rcnn_coco.h5
        - kind: RUN
-          value: pip3 install scipy cython matplotlib scikit-image opencv-python-headless h5py \
-            imgaug IPython[all] tensorflow==1.13.1 keras==2.1.0 pillow pyyaml
-        - kind: RUN
-          value: pip3 install pycocotools
+          value: pip3 install numpy cython pyyaml keras==2.1.0 scikit-image Pillow

  triggers:
    myHttpTrigger:

--- a/serverless/tensorflow/matterport/mask_rcnn/nuclio/model_loader.py
+++ b/serverless/tensorflow/matterport/mask_rcnn/nuclio/model_loader.py
-# Copyright (C) 2018-2020 Intel Corporation
+# Copyright (C) 2020-2021 Intel Corporation
 #
 # SPDX-License-Identifier: MIT

@@ -6,24 +6,13 @@ import os
 import numpy as np
 import sys
 from skimage.measure import find_contours, approximate_polygon
-
-# workaround for tf.placeholder() is not compatible with eager execution
-# https://github.com/tensorflow/tensorflow/issues/18165
 import tensorflow as tf
-tf.compat.v1.disable_eager_execution()
-#import tensorflow.compat.v1 as tf
-#   tf.disable_v2_behavior()
-
-# The directory should contain a clone of
-# https://github.com/matterport/Mask_RCNN repository and
-# downloaded mask_rcnn_coco.h5 model.
 MASK_RCNN_DIR = os.path.abspath(os.environ.get('MASK_RCNN_DIR'))
 if MASK_RCNN_DIR:
    sys.path.append(MASK_RCNN_DIR)  # To find local version of the library
-    sys.path.append(os.path.join(MASK_RCNN_DIR, 'samples/coco'))
-
 from mrcnn import model as modellib
-import coco
+from mrcnn.config import Config
+

 class ModelLoader:
    def __init__(self, labels):
@@ -31,12 +20,21 @@ class ModelLoader:
        if COCO_MODEL_PATH is None:
            raise OSError('Model path env not found in the system.')

-        class InferenceConfig(coco.CocoConfig):
-            # Set batch size to 1 since we'll be running inference on
-            # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
+        class InferenceConfig(Config):
+            NAME = "coco"
+            NUM_CLASSES = 1 + 80  # COCO has 80 classes
            GPU_COUNT = 1
            IMAGES_PER_GPU = 1

+        # Limit gpu memory to 30% to allow for other nuclio gpu functions. Increase fraction as you like
+        import keras.backend.tensorflow_backend as ktf
+        def get_session(gpu_fraction=0.333):
+            gpu_options = tf.GPUOptions(
+            per_process_gpu_memory_fraction=gpu_fraction,
+            allow_growth=True)
+            return tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
+
+        ktf.set_session(get_session())
        # Print config details
        self.config = InferenceConfig()
        self.config.display()
@@ -54,7 +52,7 @@ class ModelLoader:
        for i in range(len(output["rois"])):
            score = output["scores"][i]
            class_id = output["class_ids"][i]
-            mask = output["masks"][:,:,i]
+            mask = output["masks"][:, :, i]
            if score >= threshold:
                mask = mask.astype(np.uint8)
                contours = find_contours(mask, MASK_THRESHOLD)
@@ -74,6 +72,4 @@ class ModelLoader:
                    "type": "polygon",
                })

-        return results
-
-
+        return results
\ No newline at end of file