未验证 提交 3258438f 编写于 作者: J JieguangZhou 提交者: GitHub

[Feature] [MLOps] support mlflow deploy with docker compose (#10217)

* [Feature] [MLOps] support mlflow deploy with docker compose

fix doc

Update docs/docs/en/guide/task/mlflow.md

fix doc
Co-authored-by: NJiajie Zhong <zhongjiajie955@gmail.com>

revert cancel modification

fix ENV name and docker compose command

* fix doc image link

* fix testModelsDeployDockerCompose

* add docker compose container health check and fix mlflow bug

* update docker compose healthcheck timeout
上级 9782fe4e
......@@ -5,13 +5,13 @@
[MLflow](https://mlflow.org) is an excellent open source platform to manage the ML lifecycle, including experimentation,
reproducibility, deployment, and a central model registry.
MLflow task plugin used to execute MLflow tasks,Currently contains Mlflow Projects and MLflow Models.(Model Registry will soon be rewarded for support)
MLflow task plugin used to execute MLflow tasks,Currently contains MLflow Projects and MLflow Models. (Model Registry will soon be rewarded for support)
- Mlflow Projects: Package data science code in a format to reproduce runs on any platform.
- MLflow Projects: Package data science code in a format to reproduce runs on any platform.
- MLflow Models: Deploy machine learning models in diverse serving environments.
- Model Registry: Store, annotate, discover, and manage models in a central repository.
The Mlflow plugin currently supports and will support the following:
The MLflow plugin currently supports and will support the following:
- [x] MLflow Projects
- [x] BasicAlgorithm: contains LogisticRegression, svm, lightgbm, xgboost
......@@ -20,10 +20,10 @@ The Mlflow plugin currently supports and will support the following:
- [ ] MLflow Models
- [x] MLFLOW: Use `MLflow models serve` to deploy a model service
- [x] Docker: Run the container after packaging the docker image
- [ ] Docker Compose: Use docker compose to run the container, Will replace the docker run above
- [x] Docker Compose: Use docker compose to run the container, it will replace the docker run above
- [ ] Seldon core: Use Selcon core to deploy model to k8s cluster
- [ ] k8s: Deploy containers directly to K8S
- [ ] mlflow deployments: Built-in deployment modules, such as built-in deployment to SageMaker, etc
- [ ] k8s: Deploy containers directly to K8S
- [ ] MLflow deployments: Built-in deployment modules, such as built-in deployment to SageMaker, etc
- [ ] Model Registry
- [ ] Register Model: Allows artifacts (Including model and related parameters, indicators) to be registered directly into the model center
......@@ -37,7 +37,7 @@ The Mlflow plugin currently supports and will support the following:
## Task Example
First, introduce some general parameters of DolphinScheduler
First, introduce some general parameters of DolphinScheduler:
- **Node name**: The node name in a workflow definition is unique.
- **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select
......@@ -56,6 +56,11 @@ First, introduce some general parameters of DolphinScheduler
- **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as
upstream of the current task.
Here are some specific parameters for the MLFlow component:
- **MLflow Tracking Server URI**: MLflow Tracking Server URI, default http://localhost:5000.
- **Experiment Name**: Create the experiment where the task is running, if the experiment does not exist. If the name is empty, it is set to ` Default `, the same as MLflow.
### MLflow Projects
#### BasicAlgorithm
......@@ -64,24 +69,22 @@ First, introduce some general parameters of DolphinScheduler
**Task Parameter**
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
- **experiment name** :Create the experiment where the task is running, if the experiment does not exist. If the name is empty, it is set to ` Default `, the same as MLflow.
- **register model** :Register the model or not. If register is selected, the following parameters are expanded.
- **model name** : The registered model name is added to the original model version and registered as
- **Register Model**: Register the model or not. If register is selected, the following parameters are expanded.
- **Model Name**: The registered model name is added to the original model version and registered as
- **data path** : The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
test.csv for folder(In the suggested way, users should build their own test sets for model evaluation)
- **parameters** : Parameter when initializing the algorithm/AutoML model, which can be empty. For example
parameters `"time_budget=30;estimator_list=['lgbm']"` for flaml 。The convention will be passed with '; 'shards
- **Data Path**: The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
test.csv for folder(In the suggested way, users should build their own test sets for model evaluation).
- **Parameters**: Parameter when initializing the algorithm/AutoML model, which can be empty. For example
parameters `"time_budget=30;estimator_list=['lgbm']"` for flaml 。The convention will be passed with '; ' shards
each parameter, using the name before the equal sign as the parameter name, and using the name after the equal
sign to get the corresponding parameter value through `python eval()`.
- [Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)
- [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC)
- [lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier)
- [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier)
- **algorithm** :The selected algorithm currently supports `LR`, `SVM`, `LightGBM` and `XGboost` based
- **Algorithm**:The selected algorithm currently supports `LR`, `SVM`, `LightGBM` and `XGboost` based
on [scikit-learn](https://scikit-learn.org/) form.
- **Parameter search space** : Parameter search space when running the corresponding algorithm, which can be
- **Parameter Search Space**: Parameter search space when running the corresponding algorithm, which can be
empty. For example, the parameter `max_depth=[5, 10];n_estimators=[100, 200]` for lightgbm 。The convention
will be passed with '; 'shards each parameter, using the name before the equal sign as the parameter name,
and using the name after the equal sign to get the corresponding parameter value through `python eval()`.
......@@ -92,63 +95,56 @@ First, introduce some general parameters of DolphinScheduler
**Task Parameter**
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
- **experiment name** :Create the experiment where the task is running, if the experiment does not exist. If the name is empty, it is set to ` Default `, the same as MLflow.
- **register model** :Register the model or not. If register is selected, the following parameters are expanded.
- **model name** : The registered model name is added to the original model version and registered as
- **Register Model**: Register the model or not. If register is selected, the following parameters are expanded.
- **model name**: The registered model name is added to the original model version and registered as
- **data path** : The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
test.csv for folder(In the suggested way, users should build their own test sets for model evaluation)。
- **parameters** : Parameter when initializing the algorithm/AutoML model, which can be empty. For example
parameters `n_estimators=200;learning_rate=0.2` for flamlThe convention will be passed with '; 'shards
- **Data Path**: The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
test.csv for folder(In the suggested way, users should build their own test sets for model evaluation).
- **Parameters**: Parameter when initializing the algorithm/AutoML model, which can be empty. For example
parameters `n_estimators=200;learning_rate=0.2` for flaml. The convention will be passed with '; 'shards
each parameter, using the name before the equal sign as the parameter name, and using the name after the equal
sign to get the corresponding parameter value through `python eval()`. The detailed parameter list is as follows:
- [flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects)
- [autosklearn](https://automl.github.io/auto-sklearn/master/api.html)
- **AutoML tool** : The AutoML tool used, currently
- **AutoML tool**: The AutoML tool used, currently
supports [autosklearn](https://github.com/automl/auto-sklearn)
and [flaml](https://github.com/microsoft/FLAML)
and [flaml](https://github.com/microsoft/FLAML).
#### Custom projects
**Task Parameter**
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
- **experiment name** :Create the experiment where the task is running, if the experiment does not exist. If the name is empty, it is set to ` Default `, the same as MLflow.
- **parameters** : `--param-list` in `mlflow run`. For example `-P learning_rate=0.2 -P colsample_bytree=0.8 -P subsample=0.9`
- **Repository** : Repository url of MLflow Project,Support git address and directory on worker. If it's in a subdirectory,We add `#` to support this (same as `mlflow run`) , for example `https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native`
- **Project Version** : Version of the project,default master
- **parameters**: `--param-list` in `mlflow run`. For example `-P learning_rate=0.2 -P colsample_bytree=0.8 -P subsample=0.9`.
- **Repository**: Repository url of MLflow Project,Support git address and directory on worker. If it's in a subdirectory,We add `#` to support this (same as `mlflow run`) , for example `https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native`.
- **Project Version**: Version of the project,default master.
You can now use this feature to run all mlFlow projects on Github (For example [MLflow examples](https://github.com/mlflow/mlflow/tree/master/examples) )了。You can also create your own machine learning library to reuse your work, and then use DolphinScheduler to use your library with one click.
You can now use this feature to run all MLFlow projects on Github (For example [MLflow examples](https://github.com/mlflow/mlflow/tree/master/examples) ). You can also create your own machine learning library to reuse your work, and then use DolphinScheduler to use your library with one click.
The actual interface is as follows
### MLflow Models
General Parameters:
- **Model-URI**: Model-URI of MLflow , support `models:/<model_name>/suffix` format and `runs:/` format. See https://mlflow.org/docs/latest/tracking.html#artifact-stores.
- **Port**: The port to listen on.
**Task Parameter**
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
- **model-uri** :Model-uri of mlflow , support `models:/<model_name>/suffix` format and `runs:/` format. See https://mlflow.org/docs/latest/tracking.html#artifact-stores
- **Port** :The port to listen on
#### Docker
**Task Parameter**
- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
- **model-uri** :Model-uri of mlflow , support `models:/<model_name>/suffix` format and `runs:/` format. See https://mlflow.org/docs/latest/tracking.html#artifact-stores
- **Port** :The port to listen on
- **Max Cpu Limit**: For example `1.0` or `0.5`, the same as docker compose.
- **Max Memory Limit**: For example `1G` or `500M`, the same as docker compose.
## Environment to prepare
......@@ -156,7 +152,7 @@ The actual interface is as follows
You need to enter the admin account to configure a conda environment variable(Please
install [anaconda](https://docs.continuum.io/anaconda/install/)
or [miniconda](https://docs.conda.io/en/latest/miniconda.html#installing ) in advance )
or [miniconda](https://docs.conda.io/en/latest/miniconda.html#installing ) in advance).
......@@ -167,9 +163,9 @@ Conda environment.
### Start the mlflow service
Make sure you have installed MLflow, using 'PIP Install MLFlow'.
Make sure you have installed MLflow, using 'pip install mlflow'.
Create a folder where you want to save your experiments and models and start mlFlow service.
Create a folder where you want to save your experiments and models and start MLflow service.
mkdir mlflow
......@@ -177,8 +173,8 @@ cd mlflow
mlflow server -h -p 5000 --serve-artifacts --backend-store-uri sqlite:///mlflow.db
After running, an MLflow service is started
After running, an MLflow service is started.
After this, you can visit the MLFlow service (`http://localhost:5000`) page to view the experiments and models.
After this, you can visit the MLflow service (`http://localhost:5000`) page to view the experiments and models.
\ No newline at end of file
......@@ -4,25 +4,25 @@
[MLflow](https://mlflow.org) 是一个MLops领域一个优秀的开源项目, 用于管理机器学习的生命周期,包括实验、可再现性、部署和中心模型注册。
MLflow 组件用于执行 MLflow 任务,目前包含Mlflow Projects, 和MLflow Models。(Model Registry将在不就的将来支持)
MLflow 组件用于执行 MLflow 任务,目前包含Mlflow Projects, 和MLflow Models。(Model Registry将在不就的将来支持)
- Mlflow Projects: 将代码打包,并可以运行到任务的平台上。
- MLflow Projects: 将代码打包,并可以运行到任务的平台上。
- MLflow Models: 在不同的服务环境中部署机器学习模型。
- Model Registry: 在一个中央存储库中存储、注释、发现和管理模型 (你也可以在你的mlflow project 里面自行注册模型)。
- Model Registry: 在一个中央存储库中存储、注释、发现和管理模型 (你也可以在你的MLflow project 里面自行注册模型)。
目前 Mlflow 组件支持的和即将支持的内容如下中:
- [x] MLflow Projects
- [x] BasicAlgorithm: 基础算法,包含LogisticRegression, svm, lightgbm, xgboost
- [x] AutoML: AutoML工具,包含autosklean, flaml
- [x] BasicAlgorithm: 基础算法,包含LogisticRegression, svm, lightgbm, xgboost
- [x] AutoML: AutoML工具,包含autosklean, flaml
- [x] Custom projects: 支持运行自己的MLflow Projects项目
- [ ] MLflow Models
- [x] MLFLOW: 直接使用 `MLflow models serve` 部署模型
- [x] Docker: 打包 DOCKER 镜像后部署模型
- [ ] Docker Compose: 使用Docker Compose 部署模型,将会取代上面的Docker部署
- [ ] Seldon core: 构建完镜像后,使用Seldon Core 部署到k8s集群上, 可以使用Seldon Core的生成模型管理能力
- [ ] k8s: 构建完镜像后, 部署到k8s集群上
- [ ] mlflow deployments: 内置的允许MLflow 部署模块, 如内置的部署到Sagemaker等
- [x] MLFLOW: 直接使用 `mlflow models serve` 部署模型。
- [x] Docker: 打包 DOCKER 镜像后部署模型
- [x] Docker Compose: 使用Docker Compose 部署模型,将会取代上面的Docker部署。
- [ ] Seldon core: 构建完镜像后,使用Seldon Core 部署到k8s集群上, 可以使用Seldon Core的生成模型管理能力
- [ ] k8s: 构建完镜像后, 部署到k8s集群上
- [ ] MLflow deployments: 内置的允许MLflow 部署模块, 如内置的部署到Sagemaker等。
- [ ] Model Registry
- [ ] Register Model: 注册相关工件(模型以及相关的参数,指标)到模型中心
......@@ -48,6 +48,12 @@ MLflow 组件用于执行 MLflow 任务,目前包含Mlflow Projects, 和MLflow
- **超时告警** :勾选超时告警、超时失败,当任务超过"超时时长"后,会发送告警邮件并且任务执行失败。
- **前置任务** :选择当前任务的前置任务,会将被选择的前置任务设置为当前任务的上游。
以下是一些MLflow 组件的常用参数
- **MLflow Tracking Server URI** :MLflow Tracking Server 的连接, 默认 http://localhost:5000。
- **实验名称** :任务运行时所在的实验,若实验不存在,则创建。若实验名称为空,则设置为`Default`, 与 MLflow 一样。
### MLflow Projects
#### BasicAlgorithm
......@@ -56,8 +62,6 @@ MLflow 组件用于执行 MLflow 任务,目前包含Mlflow Projects, 和MLflow
- **mlflow server tracking uri** :MLflow server 的连接, 默认 http://localhost:5000。
- **实验名称** :任务运行时所在的实验,若实验不存在,则创建。若实验名称为空,则设置为`Default`, 与 MLflow 一样。
- **注册模型** :是否注册模型,若选择注册,则会展开以下参数。
- **注册的模型名称** : 注册的模型名称,会在原来的基础上加上一个模型版本,并注册为Production。
- **数据路径** : 文件/文件夹的绝对路径, 若文件需以.csv结尾(自动切分训练集与测试集), 文件夹需包含train.csv和test.csv(建议方式,用户应自行构建测试集用于模型评估)。
......@@ -66,7 +70,7 @@ MLflow 组件用于执行 MLflow 任务,目前包含Mlflow Projects, 和MLflow
- [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC)
- [lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier)
- [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier)
- **算法** :选择的算法,目前基于 [scikit-learn](https://scikit-learn.org/) 形式支持 `lr`, `svm`, `lightgbm`, `xgboost`.
- **算法** :选择的算法,目前基于 [scikit-learn](https://scikit-learn.org/) 形式支持 `lr`, `svm`, `lightgbm`, `xgboost`
- **参数搜索空间** : 运行对应算法的参数搜索空间, 可为空。如针对lightgbm 的 `max_depth=[5, 10];n_estimators=[100, 200]` 则会进行对应搜索。约定传入后会以;切分各个参数,等号前的名字作为参数名,等号后的名字将以python eval执行得到对应的参数值
#### AutoML
......@@ -75,8 +79,6 @@ MLflow 组件用于执行 MLflow 任务,目前包含Mlflow Projects, 和MLflow
- **mlflow server tracking uri** :MLflow server 的连接, 默认 http://localhost:5000。
- **实验名称** :任务运行时所在的实验,若实验不存在,则创建。若实验名称为空,则设置为`Default`, 与 MLflow 一样。
- **注册模型** :是否注册模型,若选择注册,则会展开以下参数。
- **注册的模型名称** : 注册的模型名称,会在原来的基础上加上一个模型版本,并注册为Production。
- **数据路径** : 文件/文件夹的绝对路径, 若文件需以.csv结尾(自动切分训练集与测试集), 文件夹需包含train.csv和test.csv(建议方式,用户应自行构建测试集用于模型评估)。
......@@ -84,65 +86,61 @@ MLflow 组件用于执行 MLflow 任务,目前包含Mlflow Projects, 和MLflow
- [flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects)
- [autosklearn](https://automl.github.io/auto-sklearn/master/api.html)
- **AutoML工具** : 使用的AutoML工具,目前支持 [autosklearn](https://github.com/automl/auto-sklearn)
, [flaml](https://github.com/microsoft/FLAML)
, [flaml](https://github.com/microsoft/FLAML)
#### Custom projects
- **mlflow server tracking uri** :MLflow server 的连接, 默认 http://localhost:5000。
- **实验名称** :任务运行时所在的实验,若实验不存在,则创建。若实验名称为空,则设置为`Default`, 与 MLflow 一样。
- **参数** : `mlflow run`中的 --param-list 如 `-P learning_rate=0.2 -P colsample_bytree=0.8 -P subsample=0.9`
- **运行仓库** : MLflow Project的仓库地址,可以为github地址,或者worker上的目录, 如Mlflow project位于子目录,可以添加 `#` 隔开, 如 `https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native`
- **运行仓库** : MLflow Project的仓库地址,可以为github地址,或者worker上的目录, 如MLflow project位于子目录,可以添加 `#` 隔开, 如 `https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native`
- **项目版本** : 对应项目中git版本管理中的版本,默认 master
现在你可以使用这个功能来运行github上所有的MLflow Projects (如 [MLflow examples](https://github.com/mlflow/mlflow/tree/master/examples) )了。你也可以创建自己的机器学习库,用来复用你的研究成果,以后你就可以使用DolphinScheduler来一键操作使用你的算法库。
### MLflow Models
### MLflow Models
- **部署模型的URI** :MLflow 服务里面模型对应的URI, 支持 `models:/<model_name>/suffix` 格式 和 `runs:/` 格式。
- **监听端口** :部署服务时的端口。
- **mlflow server tracking uri** :MLflow server 的连接, 默认 http://localhost:5000。
- **部署模型的uri** :mlflow 服务里面模型对应的uri, 支持 `models:/<model_name>/suffix` 格式 和 `runs:/` 格式。
- **部署端口** :部署服务时的端口。
#### Docker
- **mlflow server tracking uri** :MLflow server 的连接, 默认 http://localhost:5000。
- **部署模型的uri** :mlflow 服务里面模型对应的uri, 支持 `models:/<model_name>/suffix` 格式 和 `runs:/` 格式。
- **部署端口** :部署服务时的端口。
- **最大CPU限制** :如 `1.0` 或者 `0.5`, 与 docker compose 一致。
- **最大内存限制** :如 `1G` 或者 `500M`, 与 docker compose 一致。
## 环境准备
### conda 环境配置
或者[安装miniconda](https://docs.conda.io/en/latest/miniconda.html#installing) )
或者[安装miniconda](https://docs.conda.io/en/latest/miniconda.html#installing) )
### mlflow service 启动
### MLflow service 启动
确保你已经安装mlflow,可以使用`pip install mlflow`进行安装
确保你已经安装MLflow,可以使用`pip install mlflow`进行安装。
在你想保存实验和模型的地方建立一个文件夹,然后启动 mlflow service
在你想保存实验和模型的地方建立一个文件夹,然后启动 mlflow service
mkdir mlflow
......@@ -150,9 +148,9 @@ cd mlflow
mlflow server -h -p 5000 --serve-artifacts --backend-store-uri sqlite:///mlflow.db
可以通过访问 mlflow service (`http://localhost:5000`) 页面查看实验与模型
可以通过访问 MLflow service (`http://localhost:5000`) 页面查看实验与模型。
......@@ -36,19 +36,21 @@ public class MlflowConstants {
public static final String PRESET_BASIC_ALGORITHM_PROJECT = PRESET_REPOSITORY + "#Project-BasicAlgorithm";
public static final String RUN_PROJECT_BASIC_ALGORITHM_SCRIPT = "run_mlflow_basic_algorithm_project.sh";
public static final String RUN_PROJECT_AUTOML_SCRIPT = "run_mlflow_automl_project.sh";
public static final String MLFLOW_TASK_TYPE_PROJECTS = "MLflow Projects";
public static final String MLFLOW_TASK_TYPE_MODELS = "MLflow Models";
public static final String MLFLOW_MODELS_DEPLOY_TYPE_MLFLOW = "MLFLOW";
public static final String MLFLOW_MODELS_DEPLOY_TYPE_DOCKER = "DOCKER";
* template file
public static final String TEMPLATE_DOCKER_COMPOSE = "docker-compose.yml";
* mlflow command
......@@ -86,9 +88,22 @@ public class MlflowConstants {
public static final String MLFLOW_BUILD_DOCKER = "mlflow models build-docker -m %s -n %s --enable-mlserver";
public static final String DOCKER_RREMOVE_CONTAINER = "docker rm -f %s";
public static final String DOCKER_RUN = "docker run --name=%s -p=%s:8080 %s";
public static final String DOCKER_COMPOSE_RUN = "docker-compose up -d";
public static final String SET_DOCKER_COMPOSE_ENV = "export DS_TASK_MLFLOW_IMAGE_NAME=%s\n" +
"export DS_TASK_MLFLOW_CPU_LIMIT=%s\n" +
public static final String DOCKER_HEALTH_CHECK_COMMAND = "for i in $(seq 1 300); " +
"do " +
"[ $(docker inspect --format \"{{json .State.Health.Status }}\" %s) = '\"healthy\"' ] " +
"&& exit 0 && break;sleep 1; " +
"done; docker-compose down; exit 1";
......@@ -76,6 +76,10 @@ public class MlflowParameters extends AbstractParameters {
private String deployPort;
private String cpuLimit;
private String memoryLimit;
public void setAlgorithm(String algorithm) {
this.algorithm = algorithm;
......@@ -196,6 +200,22 @@ public class MlflowParameters extends AbstractParameters {
return deployPort;
public void setCpuLimit(String cpuLimit) {
this.cpuLimit = cpuLimit;
public String getCpuLimit() {
return cpuLimit;
public void setMemoryLimit(String memoryLimit) {
this.memoryLimit = memoryLimit;
public String getMemoryLimit() {
return memoryLimit;
public boolean checkParameters() {
Boolean checkResult = true;
......@@ -242,19 +262,6 @@ public class MlflowParameters extends AbstractParameters {
paramsMap.put("repo_version", MlflowConstants.PRESET_REPOSITORY_VERSION);
public String getScriptPath() {
String projectScript;
if (mlflowJobType.equals(MlflowConstants.JOB_TYPE_BASIC_ALGORITHM)) {
projectScript = MlflowConstants.RUN_PROJECT_BASIC_ALGORITHM_SCRIPT;
} else if (mlflowJobType.equals(MlflowConstants.JOB_TYPE_AUTOML)) {
projectScript = MlflowConstants.RUN_PROJECT_AUTOML_SCRIPT;
} else {
throw new IllegalArgumentException();
String scriptPath = MlflowTask.class.getClassLoader().getResource(projectScript).getPath();
return scriptPath;
public String getModelKeyName(String tag) throws IllegalArgumentException {
String imageName;
if (deployModelKey.startsWith("runs:")) {
......@@ -268,4 +275,15 @@ public class MlflowParameters extends AbstractParameters {
return imageName;
public String getDockerComposeEnvCommand() {
String imageName = "mlflow/" + getModelKeyName(":");
String env = String.format(MlflowConstants.SET_DOCKER_COMPOSE_ENV, imageName, getContainerName(), deployPort, cpuLimit, memoryLimit);
return env;
public String getContainerName(){
String containerName = "ds-mlflow-" + getModelKeyName("-");
return containerName;
......@@ -101,7 +101,7 @@ public class MlflowTask extends AbstractTaskExecutor {
public String buildCommand(){
public String buildCommand() {
String command = "";
if (mlflowParameters.getMlflowTaskType().equals(MlflowConstants.MLFLOW_TASK_TYPE_PROJECTS)) {
command = buildCommandForMlflowProjects();
......@@ -146,8 +146,7 @@ public class MlflowTask extends AbstractTaskExecutor {
runCommand = MlflowConstants.MLFLOW_RUN_CUSTOM_PROJECT;
runCommand = String.format(runCommand, mlflowParameters.getParams(), mlflowParameters.getExperimentName(), mlflowParameters.getMlflowProjectVersion());
else {
} else {
runCommand = String.format("Cant not Support %s", mlflowParameters.getMlflowJobType());
......@@ -173,11 +172,19 @@ public class MlflowTask extends AbstractTaskExecutor {
} else if (mlflowParameters.getDeployType().equals(MlflowConstants.MLFLOW_MODELS_DEPLOY_TYPE_DOCKER)) {
String imageName = "mlflow/" + mlflowParameters.getModelKeyName(":");
String containerName = "mlflow-" + mlflowParameters.getModelKeyName("-");
String containerName = mlflowParameters.getContainerName();
args.add(String.format(MlflowConstants.MLFLOW_BUILD_DOCKER, deployModelKey, imageName));
args.add(String.format(MlflowConstants.DOCKER_RREMOVE_CONTAINER, containerName));
args.add(String.format(MlflowConstants.DOCKER_RUN, containerName, mlflowParameters.getDeployPort(), imageName));
} else if (mlflowParameters.getDeployType().equals(MlflowConstants.MLFLOW_MODELS_DEPLOY_TYPE_DOCKER_COMPOSE)) {
String templatePath = getTemplatePath(MlflowConstants.TEMPLATE_DOCKER_COMPOSE);
args.add(String.format("cp %s %s", templatePath, taskExecutionContext.getExecutePath()));
String imageName = "mlflow/" + mlflowParameters.getModelKeyName(":");
args.add(String.format(MlflowConstants.MLFLOW_BUILD_DOCKER, deployModelKey, imageName));
args.add(String.format(MlflowConstants.DOCKER_HEALTH_CHECK_COMMAND, mlflowParameters.getContainerName()));
String command = ParameterUtils.convertParameterPlaceholders(String.join("\n", args), ParamUtils.convert(paramsMap));
......@@ -197,9 +204,15 @@ public class MlflowTask extends AbstractTaskExecutor {
public AbstractParameters getParameters() {
return mlflowParameters;
public String getTemplatePath(String template) {
String templatePath = MlflowTask.class.getClassLoader().getResource(template).getPath();
return templatePath;
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.
version: "3"
container_name: "${DS_TASK_MLFLOW_CONTAINER_NAME}"
test: ["CMD", "curl", ""]
interval: 5s
timeout: 5s
retries: 5
\ No newline at end of file
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.
echo $data_path
mlflow run $repo -P tool=${automl_tool} -P data_path=$data_path -P params="${params}" -P model_name="${model_name}" --experiment-name="${experiment_name}" --version="${repo_version}"
echo "training finish"
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.
echo $data_path
mlflow run $repo -P algorithm=${algorithm} -P data_path=$data_path -P params="${params}" -P search_params="${search_params}" -P model_name="${model_name}" --experiment-name="${experiment_name}" --version="${repo_version}"
echo "training finish"
......@@ -135,18 +135,37 @@ public class MlflowTaskTest {
public void testModelsDeployDocker() throws Exception {
public void testModelsDeployDocker() {
MlflowTask mlflowTask = initTask(createModelDeplyDockerParameters());
"mlflow models build-docker -m runs:/a272ec279fc34a8995121ae04281585f/model " +
"-n mlflow/a272ec279fc34a8995121ae04281585f:model " +
"--enable-mlserver\n" +
"docker rm -f mlflow-a272ec279fc34a8995121ae04281585f-model\n" +
"docker run --name=mlflow-a272ec279fc34a8995121ae04281585f-model " +
"docker rm -f ds-mlflow-a272ec279fc34a8995121ae04281585f-model\n" +
"docker run --name=ds-mlflow-a272ec279fc34a8995121ae04281585f-model " +
"-p=7000:8080 mlflow/a272ec279fc34a8995121ae04281585f:model");
public void testModelsDeployDockerCompose() throws Exception{
MlflowTask mlflowTask = initTask(createModelDeplyDockerComposeParameters());
"cp " + mlflowTask.getTemplatePath(MlflowConstants.TEMPLATE_DOCKER_COMPOSE) +
" /tmp/dolphinscheduler_test\n" +
"mlflow models build-docker -m models:/22222/1 -n mlflow/22222:1 --enable-mlserver\n" +
"export DS_TASK_MLFLOW_IMAGE_NAME=mlflow/22222:1\n" +
"export DS_TASK_MLFLOW_CONTAINER_NAME=ds-mlflow-22222-1\n" +
"export DS_TASK_MLFLOW_DEPLOY_PORT=7000\n" +
"export DS_TASK_MLFLOW_CPU_LIMIT=0.5\n" +
"docker-compose up -d\n" +
"for i in $(seq 1 300); do " +
"[ $(docker inspect --format \"{{json .State.Health.Status }}\" ds-mlflow-22222-1) = '\"healthy\"' ] && exit 0 && break;sleep 1; " +
"done; docker-compose down; exit 1");
private MlflowTask initTask(MlflowParameters mlflowParameters) {
TaskExecutionContext taskExecutionContext = createContext(mlflowParameters);
MlflowTask mlflowTask = new MlflowTask(taskExecutionContext);
......@@ -213,4 +232,16 @@ public class MlflowTaskTest {
return mlflowParameters;
private MlflowParameters createModelDeplyDockerComposeParameters() {
MlflowParameters mlflowParameters = new MlflowParameters();
return mlflowParameters;
......@@ -608,9 +608,6 @@ export default {
zeppelin_paragraph_id: 'zeppelinParagraphId',
'Please enter the paragraph id of your zeppelin paragraph',
zeppelin_parameters: 'parameters',
'Please enter the parameters for zeppelin dynamic form',
jupyter_conda_env_name: 'condaEnvName',
'Please enter the conda environment name of papermill',
......@@ -634,36 +631,38 @@ export default {
jupyter_others: 'others',
'Please enter the other options you need for papermill',
mlflow_algorithm: 'algorithm',
mlflow_algorithm: 'Algorithm',
mlflow_algorithm_tips: 'svm',
mlflow_params: 'parameters',
mlflow_params: 'Parameters',
mlflow_params_tips: ' ',
mlflow_searchParams: 'Parameter search space',
mlflow_searchParams: 'Parameter Search Space',
mlflow_searchParams_tips: ' ',
mlflow_isSearchParams: 'Search parameters',
mlflow_dataPath: 'data path',
mlflow_isSearchParams: 'Search Parameters',
mlflow_dataPath: 'Data Path',
' The absolute path of the file or folder. Ends with .csv for file or contain train.csv and test.csv for folder',
mlflow_dataPath_error_tips: ' data data can not be empty ',
mlflow_experimentName: 'experiment name',
mlflow_experimentName: 'Experiment Name',
mlflow_experimentName_tips: 'experiment_001',
mlflow_registerModel: 'register model',
mlflow_modelName: 'model name',
mlflow_registerModel: 'Register Model',
mlflow_modelName: 'Model Name',
mlflow_modelName_tips: 'model_001',
mlflow_mlflowTrackingUri: 'mlflow server tracking uri',
mlflow_mlflowTrackingUri: 'MLflow Tracking Server URI',
mlflow_mlflowTrackingUri_tips: '',
' mlflow server tracking uri cant not be empty',
mlflow_jobType: 'job type',
mlflow_automlTool: 'AutoML tool',
'MLflow Tracking Server URI can not be empty',
mlflow_jobType: 'Job Type',
mlflow_automlTool: 'AutoML Tool',
mlflow_taskType: 'MLflow Task Type',
mlflow_deployType: 'Deploy Mode',
mlflow_deployModelKey: 'model-uri',
mlflow_deployModelKey: 'Model-URI',
mlflow_deployPort: 'Port',
mlflowProjectRepository: 'Repository',
mlflowProjectRepository_tips: 'github respository or path on worker',
mlflowProjectVersion: 'Project Version',
mlflowProjectVersion_tips: 'git version',
mlflow_cpuLimit: 'Max Cpu Limit',
mlflow_memoryLimit: 'Max Memory Limit',
openmldb_zk_address: 'zookeeper address',
openmldb_zk_address_tips: 'Please enter the zookeeper address',
openmldb_zk_path: 'zookeeper path',
......@@ -694,4 +693,4 @@ export default {
'Please enter threshold number is needed',
please_enter_comparison_title: 'please select comparison title'
\ No newline at end of file
......@@ -637,19 +637,21 @@ export default {
mlflow_registerModel: '注册模型',
mlflow_modelName: '注册的模型名称',
mlflow_modelName_tips: 'model_001',
mlflow_mlflowTrackingUri: 'mlflow server tracking uri',
mlflow_mlflowTrackingUri: 'MLflow Tracking Server URI',
mlflow_mlflowTrackingUri_tips: '',
mlflow_mlflowTrackingUri_error_tips: ' mlflow server tracking uri 不能为空',
mlflow_mlflowTrackingUri_error_tips: ' MLflow Tracking Server URI 不能为空',
mlflow_jobType: '任务类型',
mlflow_automlTool: 'AutoML工具',
mlflow_taskType: 'MLflow 任务类型',
mlflow_deployType: '部署类型',
mlflow_deployModelKey: '部署的模型uri',
mlflow_deployModelKey: '部署的模型URI',
mlflow_deployPort: '监听端口',
mlflowProjectRepository: '运行仓库',
mlflowProjectRepository_tips: '可以为github仓库或worker上的路径',
mlflowProjectVersion: '项目版本',
mlflowProjectVersion_tips: '项目git版本',
mlflow_cpuLimit: '最大cpu限制',
mlflow_memoryLimit: '最大内存限制',
openmldb_zk_address: 'zookeeper地址',
openmldb_zk_address_tips: '请输入zookeeper地址',
openmldb_zk_path: 'zookeeper路径',
......@@ -23,6 +23,8 @@ export function useMlflowModels(model: { [field: string]: any }): IJsonItem[] {
const deployTypeSpan = ref(0)
const deployModelKeySpan = ref(0)
const deployPortSpan = ref(0)
const cpuLimitSpan = ref(0)
const memoryLimitSpan = ref(0)
const setFlag = () => {
model.isModels = model.mlflowTaskType === 'MLflow Models' ? true : false
......@@ -35,13 +37,21 @@ export function useMlflowModels(model: { [field: string]: any }): IJsonItem[] {
() => [model.mlflowTaskType, model.registerModel],
() => [model.mlflowTaskType],
() => {
() => [model.deployType],
() => {
cpuLimitSpan.value = model.deployType === "DOCKER COMPOSE" ? 12 : 0
memoryLimitSpan.value = model.deployType === "DOCKER COMPOSE" ? 12 : 0
......@@ -64,6 +74,18 @@ export function useMlflowModels(model: { [field: string]: any }): IJsonItem[] {
field: 'deployPort',
name: t('project.node.mlflow_deployPort'),
span: deployPortSpan
type: 'input',
field: 'cpuLimit',
name: t('project.node.mlflow_cpuLimit'),
span: cpuLimitSpan
type: 'input',
field: 'memoryLimit',
name: t('project.node.mlflow_memoryLimit'),
span: memoryLimitSpan
......@@ -76,5 +98,9 @@ const DEPLOY_TYPE = [
label: 'DOCKER',
value: 'DOCKER'
......@@ -355,6 +355,8 @@ export function formatParams(data: INodeData): {
taskParams.deployModelKey = data.deployModelKey
taskParams.mlflowProjectRepository = data.mlflowProjectRepository
taskParams.mlflowProjectVersion = data.mlflowProjectVersion
taskParams.cpuLimit = data.cpuLimit
taskParams.memoryLimit = data.memoryLimit
if (data.taskType === 'OPENMLDB') {
......@@ -49,6 +49,8 @@ export function useMlflow({
mlflowJobType: 'CustomProject',
mlflowProjectVersion: 'master',
automlTool: 'flaml',
cpuLimit: '0.5',
memoryLimit: '500M',
mlflowCustomProjectParameters: [],
delayTime: 0,
timeout: 30,
......@@ -336,6 +336,8 @@ interface ITaskParams {
deployType?: string
deployPort?: string
deployModelKey?: string
cpuLimit?: string
memoryLimit?: string
zk?: string
zkPath?: string
executeMode?: string
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
想要评论请 注册