From 9d6fc92af9ba5e2a0c822eb78dd6371fe72d02d7 Mon Sep 17 00:00:00 2001 From: Eric Gao Date: Fri, 12 Aug 2022 00:18:43 +0800 Subject: [PATCH] [Doc][Resources] Instruct users to use local storage if they have remote storage mounted to local (#11435) * [Doc][Resources] Instruct users to use local storage if they have remote storage mounted to local (#11427) * Remove dead link in pyds README * Add hyperlinks for docs Co-authored-by: Jiajie Zhong --- docs/docs/en/guide/resource/configuration.md | 25 +++++++++++--------- docs/docs/zh/guide/resource/configuration.md | 19 ++++++++------- 2 files changed, 25 insertions(+), 19 deletions(-) diff --git a/docs/docs/en/guide/resource/configuration.md b/docs/docs/en/guide/resource/configuration.md index 5614d3c35..2f74daffd 100644 --- a/docs/docs/en/guide/resource/configuration.md +++ b/docs/docs/en/guide/resource/configuration.md @@ -1,21 +1,24 @@ -# HDFS Resource Configuration +# Resource Center Configuration -When it is necessary to use the Resource Center to create or upload relevant files, all files and resources will be stored on HDFS. Therefore the following configuration is required. +- You could use `Resource Center` to upload text files, UDFs and other task-related files. +- You could configure `Resource Center` to use distributed file system like [Hadoop](https://hadoop.apache.org/docs/r2.7.0/) (2.6+), [MinIO](https://github.com/minio/minio) cluster or remote storage products like [AWS S3](https://aws.amazon.com/s3/), [Alibaba Cloud OSS](https://www.aliyun.com/product/oss), etc. +- You could configure `Resource Center` to use local file system. If you deploy `DolphinScheduler` in `Standalone` mode, you could configure it to use local file system for `Resouce Center` without the need of an external `HDFS` system or `S3`. +- Furthermore, if you deploy `DolphinScheduler` in `Cluster` mode, you could use [S3FS-FUSE](https://github.com/s3fs-fuse/s3fs-fuse) to mount `S3` or [JINDO-FUSE](https://help.aliyun.com/document_detail/187410.html) to mount `OSS` to your machines and use the local file system for `Resouce Center`. In this way, you could operate remote files as if on your local machines. -## Local File Resource Configuration +## Use Local File System -For a single machine, you can choose to use local file directory as the upload directory (no need to deploy Hadoop) by making the following configuration. +### Configure `common.properties` -### Configuring the `common.properties` +Configure `api-server/conf/common.properties` and `worker-server/conf/common.properties` as follows: -Configure the file in the following paths: `api-server/conf/common.properties` and `worker-server/conf/common.properties`. +- Change `resource.storage.upload.base.path` to your local directory path. Please make sure the `tenant resource.hdfs.root.user` has read and write permissions for `resource.storage.upload.base.path`, e,g. `/tmp/dolphinscheduler`. `DolphinScheduler` will create the directory you configure if it does not exist. +- Modify `resource.storage.type=HDFS` and `resource.hdfs.fs.defaultFS=file:///`. -- Change `data.basedir.path` to the local directory path. Please make sure the user who deploy dolphinscheduler have read and write permissions, such as: `data.basedir.path=/tmp/dolphinscheduler`. And the directory you configured will be auto-created if it does not exists. -- Modify the following two parameters, `resource.storage.type=HDFS` and `resource.hdfs.fs.defaultFS=file:///`. +> NOTE: Please modify the value of `resource.storage.upload.base.path` if you do not want to use the default value as the base path. -## Configuring the common.properties +## Use HDFS or Remote Object Storage -After version 3.0.0-alpha, if you want to upload resources using HDFS or S3 from the Resource Center, you will need to configure the following paths The following paths need to be configured: `api-server/conf/common.properties` and `worker-server/conf/common.properties`. This can be found as follows. +After version 3.0.0-alpha, if you want to upload resources to `Resource Center` connected to `HDFS` or `S3`, you need to configure `api-server/conf/common.properties` and `worker-server/conf/common.properties`. ```properties # @@ -44,7 +47,7 @@ data.basedir.path=/tmp/dolphinscheduler # resource storage type: HDFS, S3, NONE resource.storage.type=NONE # resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended -resource.storage.upload.base.path=/dolphinscheduler +resource.storage.upload.base.path=/tmp/dolphinscheduler # The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required resource.aws.access.key.id=minioadmin diff --git a/docs/docs/zh/guide/resource/configuration.md b/docs/docs/zh/guide/resource/configuration.md index e8edee566..bf6b246c8 100644 --- a/docs/docs/zh/guide/resource/configuration.md +++ b/docs/docs/zh/guide/resource/configuration.md @@ -1,21 +1,24 @@ # 资源中心配置详情 -资源中心通常用于上传文件、 UDF 函数,以及任务组管理等操作。针对单机环境可以选择本地文件目录作为上传文件夹(此操作不需要部署 Hadoop)。当然也可以选择上传到 Hadoop or MinIO 集群上,此时则需要有 Hadoop(2.6+)或者 MinIOn 等相关环境。 +- 资源中心通常用于上传文件、UDF 函数,以及任务组管理等操作。 +- 资源中心可以对接分布式的文件存储系统,如[Hadoop](https://hadoop.apache.org/docs/r2.7.0/)(2.6+)或者[MinIO](https://github.com/minio/minio)集群,也可以对接远端的对象存储,如[AWS S3](https://aws.amazon.com/s3/)或者[阿里云 OSS](https://www.aliyun.com/product/oss)等。 +- 资源中心也可以直接对接本地文件系统。在单机模式下,您无需依赖`Hadoop`或`S3`一类的外部存储系统,可以方便地对接本地文件系统进行体验。 +- 除此之外,对于集群模式下的部署,您可以通过使用[S3FS-FUSE](https://github.com/s3fs-fuse/s3fs-fuse)将`S3`挂载到本地,或者使用[JINDO-FUSE](https://help.aliyun.com/document_detail/187410.html)将`OSS`挂载到本地等,再用资源中心对接本地文件系统方式来操作远端对象存储中的文件。 -## 本地资源配置 - -在单机环境下,可以选择使用本地文件目录作为上传文件夹(无需部署Hadoop),此时需要进行如下配置: +## 对接本地文件系统 ### 配置 `common.properties` 文件 对以下路径的文件进行配置:`api-server/conf/common.properties` 和 `worker-server/conf/common.properties` -- 将 `data.basedir.path` 改为本地存储路径,请确保部署 DolphinScheduler 的用户拥有读写权限,例如:`data.basedir.path=/tmp/dolphinscheduler`。当路径不存在时会自动创建文件夹 -- 修改下列两个参数,分别是 `resource.storage.type=HDFS` 和 `resource.hdfs.fs.defaultFS=file:///`。 +- 将 `resource.storage.upload.base.path` 改为本地存储路径,请确保部署 DolphinScheduler 的用户拥有读写权限,例如:`resource.storage.upload.base.path=/tmp/dolphinscheduler`。当路径不存在时会自动创建文件夹 +- 修改 `resource.storage.type=HDFS` 和 `resource.hdfs.fs.defaultFS=file:///`。 + +> **注意**:如果您不想用默认值作为资源中心的基础路径,请修改`resource.storage.upload.base.path`的值。 -## HDFS 资源配置 +## 对接分布式或远端对象存储 -当需要使用资源中心进行相关文件的创建或者上传操作时,所有的文件和资源都会被存储在 HDFS 上。所以需要进行以下配置: +当需要使用资源中心进行相关文件的创建或者上传操作时,所有的文件和资源都会被存储在分布式文件系统`HDFS`或者远端的对象存储,如`S3`上。所以需要进行以下配置: ### 配置 common.properties 文件 -- GitLab