Versions scans wrapped in try-except (#80)

* try-except on versions scan * exit if no models could be accessed * fixing bug for wrong list of available versions * exit if no models for config file option * documentation update

Versions scans wrapped in try-except (#80)
* try-except on versions scan * exit if no models could be accessed * fixing bug for wrong list of available versions * exit if no models for config file option * documentation update
00494373 · Miłosz Żeglarski · Trawinski, Dariusz · 6649085f · 00494373 · 00494373
隐藏空白更改
内联并排

Showing with 89 addition and 14 deletion

README.md README.md +47 -6

ie_serving/main.py ie_serving/main.py +13 -2

ie_serving/models/model.py ie_serving/models/model.py +29 -6

未找到文件。
--- a/README.md
+++ b/README.md
@@ -141,13 +141,54 @@ docker logs ie-serving


 ### Model import issues
-OpenVINO&trade; model server will fail to start when any of the defined model cannot be loaded successfully. The root cause of
-the failure can be determined based on the collected logs on the console or in the log file.
+OpenVINO&trade; Model Server loads all defined models versions according 
+to set [version policy](docs/docker_container.md#model-version-policy). 
+A model version is represented by a numerical directory in a model path, 
+containing OpenVINO model files with .bin and .xml extensions.
+
+Below are examples of incorrect structure:
+```bash
+models/
+├── model1
+│   ├── 1
+│   │   ├── ir_model.bin
+│   │   └── ir_model.xml
+│   └── 2
+│       ├── somefile.bin
+│       └── anotherfile.txt
+└── model2
+    ├── ir_model.bin
+    ├── ir_model.xml
+    └── mapping_config.json
+```
+
+In above scenario, server will detect only version `1` of `model1`.
+Directory `2` does not contain valid OpenVINO model files, so it won't 
+be detected as a valid model version. 
+For `model2`, there are correct files, but they are not in a numerical directory. 
+The server will not detect any version in `model2`.
+
+When new model version is detected, the server will loads the model files 
+and starts serving new model version. This operation might fail for the following reasons:
+- there is a problem with accessing model files (i. e. due to network connectivity issues
+to the  remote storage or insufficient permissions)
+- model files are malformed and can not be imported by the Inference Engine
+- model requires custom CPU extension
+
+In all those situations, the root cause is reported in the server logs or in the response from a call
+to GetModelStatus function. 
+
+Detected but not loaded model version will not be served and will report status
+`LOADING` with error message: `Error occurred while loading version`.
+When model files becomes accessible or fixed, server will try to 
+load them again on the next [version update](docs/docker_container.md#updating-model-versions) 
+attempt.
+
+At startup, the server will enable gRPC and REST API endpoint, after all configured models and detected model versions
+are loaded successfully (in AVAILABLE state).
+
+The server will fail to start if it can not list the content of configured model paths.

-The following problem might occur during model server initialization and model loading:
-* Missing model files in the location specified in the configuration file.
-* Missing version sub-folders in the model folder.
-* Model files require custom CPU extension.

 ### Client request issues
 When the model server starts successfully and all the models are imported, there could be a couple of reasons for errors 

--- a/ie_serving/main.py
+++ b/ie_serving/main.py
@@ -76,7 +76,8 @@ def parse_config(args):
                                           'base_path'],
                                       batch_size=batch_size,
                                       model_version_policy=model_ver_policy)
-            models[config['config']['name']] = model
+            if model is not None:
+                models[config['config']['name']] = model
        except ValidationError as e_val:
            logger.warning("Model version policy for model {} is invalid. "
                           "Exception: {}".format(config['config']['name'],
@@ -85,6 +86,10 @@ def parse_config(args):
            logger.warning("Unexpected error occurred in {} model. "
                           "Exception: {}".format(config['config']['name'],
                                                  e))
+    if not models:
+        logger.info("Could not access any of provided models. Server will "
+                    "exit now.")
+        sys.exit()
    if args.rest_port > 0:
        process_thread = threading.Thread(target=start_web_rest_server,
                                          args=[models, args.rest_port])
@@ -112,7 +117,13 @@ def parse_one_model(args):
        logger.error("Unexpected error occurred. "
                     "Exception: {}".format(e))
        sys.exit()
-    models = {args.model_name: model}
+    models = {}
+    if model is not None:
+        models[args.model_name] = model
+    else:
+        logger.info("Could not access provided model. Server will exit now.")
+        sys.exit()
+
    if args.rest_port > 0:
        process_thread = threading.Thread(target=start_web_rest_server,
                                          args=[models, args.rest_port])

--- a/ie_serving/models/model.py
+++ b/ie_serving/models/model.py
@@ -58,13 +58,20 @@ class Model(ABC):
        logger.info("Server start loading model: {}".format(model_name))
        version_policy_filter = cls.get_model_version_policy_filter(
            model_version_policy)
-        versions_attributes, available_versions = cls.get_version_metadata(
-            model_directory, batch_size, version_policy_filter)
+
+        try:
+            versions_attributes, available_versions = cls.get_version_metadata(
+                model_directory, batch_size, version_policy_filter)
+        except Exception as error:
+            logger.error("Error occurred while getting versions "
+                         "of the model {}".format(model_name))
+            logger.error("Failed reading model versions from path: {} "
+                         "with error {}".format(model_directory, str(error)))
+            return None
+
        versions_attributes = [version for version in versions_attributes
                               if version['version_number']
                               in available_versions]
-        available_versions = [version_attributes['version_number'] for
-                              version_attributes in versions_attributes]
        versions_statuses = dict()
        for version in available_versions:
            versions_statuses[version] = ModelVersionStatus(model_name,
@@ -73,6 +80,9 @@ class Model(ABC):
        engines = cls.get_engines_for_model(versions_attributes,
                                            versions_statuses)

+        available_versions = [version_attributes['version_number'] for
+                              version_attributes in versions_attributes]
+
        model = cls(model_name=model_name, model_directory=model_directory,
                    available_versions=available_versions, engines=engines,
                    batch_size=batch_size,
@@ -81,10 +91,23 @@ class Model(ABC):
        return model

    def update(self):
-        versions_attributes, available_versions = self.get_version_metadata(
-            self.model_directory, self.batch_size, self.version_policy_filter)
+        try:
+            versions_attributes, available_versions = \
+                self.get_version_metadata(
+                    self.model_directory,
+                    self.batch_size,
+                    self.version_policy_filter)
+        except Exception as error:
+            logger.error("Error occurred while getting versions "
+                         "of the model {}".format(self.model_name))
+            logger.error("Failed reading model versions from path: {} "
+                         "with error {}".format(self.model_directory,
+                                                str(error)))
+            return
+
        if available_versions == self.versions:
            return
+
        logger.info("Server start updating model: {}".format(self.model_name))
        to_create, to_delete = self._mark_differences(available_versions)
        logger.debug("Server will try to add {} versions".format(to_create))