未验证 提交 91b59dee 编写于 作者: S sneh-wha 提交者: GitHub

[doc] Modified dq, monitor, security, resources (#10715)

上级 2a519ea9
......@@ -265,6 +265,10 @@ export default {
title: 'Resource',
children: [
title: 'Introduction',
link: '/en-us/docs/dev/user_doc/guide/resource/intro.html'
title: 'Configuration',
link: '/en-us/docs/dev/user_doc/guide/resource/configuration.html'
......@@ -657,6 +661,10 @@ export default {
title: '资源中心',
children: [
title: '简介',
link: '/zh-cn/docs/dev/user_doc/guide/resource/intro.html'
title: '配置详情',
link: '/zh-cn/docs/dev/user_doc/guide/resource/configuration.html'
......@@ -28,14 +28,16 @@
- Number of commands wait to be executed: statistics of the `t_ds_command` table data.
- The number of failed commands: statistics of the `t_ds_error_command` table data.
- Number of tasks wait to run: count the data of `task_queue` in the ZooKeeper.
- Number of tasks wait to be killed: count the data of `task_kill` in the ZooKeeper.
| **Parameter** | **Description** |
| ----- | ----- |
| Number of commands wait to be executed | Statistics of the `t_ds_command` table data. |
| The number of failed commands | Statistics of the `t_ds_error_command` table data. |
| Number of tasks wait to run | Count the data of `task_queue` in the ZooKeeper. |
| Number of tasks wait to be killed | Count the data of `task_kill` in the ZooKeeper. |
### Audit Log
The audit log provides information about who accesses the system and the operations made to the system and record related
time, which strengthen the security of the system and maintenance.
\ No newline at end of file
# Configuration
# HDFS Resource Configuration
The Resource Center is usually used for operations such as uploading files, UDF functions, and task group management. You can appoint the local file directory as the upload directory for a single machine (this operation does not need to deploy Hadoop). Or you can also upload to a Hadoop or MinIO cluster, at this time, you need to have Hadoop (2.6+) or MinIO or other related environments.
When it is necessary to use the Resource Center to create or upload relevant files, all files and resources will be stored on HDFS. Therefore the following configuration is required.
## Local File Resource Configuration
......@@ -13,13 +13,9 @@ Configure the file in the following paths: `api-server/conf/common.properties` a
- Change `data.basedir.path` to the local directory path. Please make sure the user who deploy dolphinscheduler have read and write permissions, such as: `data.basedir.path=/tmp/dolphinscheduler`. And the directory you configured will be auto-created if it does not exists.
- Modify the following two parameters, `resource.storage.type=HDFS` and `resource.hdfs.fs.defaultFS=file:///`.
## HDFS Resource Configuration
When it is necessary to use the Resource Center to create or upload relevant files, all files and resources will be stored on HDFS. Therefore the following configuration is required.
### Configuring the common.properties
## Configuring the common.properties
After version 3.0.0-alpha, if you want to upload resources using HDFS or S3 from the Resource Center, the following paths need to be configured: `api-server/conf/common.properties` and `worker-server/conf/common.properties`. This can be found as follows.
After version 3.0.0-alpha, if you want to upload resources using HDFS or S3 from the Resource Center, you will need to configure the following paths The following paths need to be configured: `api-server/conf/common.properties` and `worker-server/conf/common.properties`. This can be found as follows.
......@@ -42,12 +38,13 @@ After version 3.0.0-alpha, if you want to upload resources using HDFS or S3 from
# user data local directory path, please make sure the directory exists and have read write permissions
# resource storage type: HDFS, S3, NONE
# resource view suffixs
# resource store on HDFS/S3 path, resource file will store to this hadoop hdfs path, self configuration,
# please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
# resource storage type: HDFS, S3, NONE
# resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
# The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
......@@ -61,10 +58,9 @@ resource.aws.s3.bucket.name=dolphinscheduler
# if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler;
# if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
# whether to startup kerberos
......@@ -80,18 +76,16 @@ login.user.keytab.path=/opt/hdfs.headless.keytab
# kerberos expire time, the unit is hour
# resource view suffixs
# resourcemanager port, the default value is 8088 if not specified
# if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value;
# If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
# job history status url when application number threshold is reached(default 10000, maybe it was set to 1000)
# datasource encryption enable
......@@ -109,8 +103,7 @@ data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
# Whether hive SQL is executed in the same session
# use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions;
# if set false, executing user is the deploy user and doesn't need sudo permissions
# use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions; if set false, executing user is the deploy user and doesn't need sudo permissions
# network interface preferred like eth0, default: empty
......@@ -120,17 +113,26 @@ sudo.enable=true
# system env path
# development state
# rpc port
# Url endpoint for zeppelin RESTful API
# set path of conda.sh
# Task resource limit state
> **_Note:_**
> **Note:**
> * If only the `api-server/conf/common.properties` file is configured, then resource uploading is enabled, but you can not use resources in task. If you want to use or execute the files in the workflow you need to configure `worker-server/conf/common.properties` too.
> * If you want to use the resource upload function, the deployment user in [installation and deployment](../installation/standalone.md) must have relevant operation authority.
> * If you using a Hadoop cluster with HA, you need to enable HDFS resource upload, and you need to copy the `core-site.xml` and `hdfs-site.xml` under the Hadoop cluster to `worker-server/conf` and `api-server/conf`, otherwise skip this copy step.
\ No newline at end of file
> * If you using a Hadoop cluster with HA, you need to enable HDFS resource upload, and you need to copy the `core-site.xml` and `hdfs-site.xml` under the Hadoop cluster to `worker-server/conf` and `api-server/conf`, otherwise skip this copy step.
# Resource Center Introduction
The Resource Center is typically used for uploading files, UDF functions, and task group management. For a stand-alone
environment, you can select the local file directory as the upload folder (**this operation does not require Hadoop or HDFS deployment**).
Of course, you can also choose to upload to Hadoop or MinIO cluster. In this case, you need to have Hadoop (2.6+) or MinIOn and other related environments.
\ No newline at end of file
......@@ -2,9 +2,9 @@
The task group is mainly used to control the concurrency of task instances and is designed to control the pressure of other resources (it can also control the pressure of the Hadoop cluster, the cluster will have queue control it). When creating a new task definition, you can configure the corresponding task group and configure the priority of the task running in the task group.
### Task Group Configuration
## Task Group Configuration
#### Create Task Group
### Create Task Group
......@@ -20,7 +20,7 @@ You need to enter the information inside the picture:
- Resource pool size: The maximum number of concurrent task instances allowed.
#### View Task Group Queue
### View Task Group Queue
......@@ -28,7 +28,7 @@ Click the button to view task group usage information:
#### Use of Task Groups
### Use of Task Groups
**Note**: The usage of task groups is applicable to tasks executed by workers, such as [switch] nodes, [condition] nodes, [sub_process] and other node types executed by the master are not controlled by the task group. Let's take the shell node as an example:
......@@ -40,13 +40,13 @@ Regarding the configuration of the task group, all you need to do is to configur
- Priority: When there is a waiting resource, the task with high priority will be distributed to the worker by the master first. The larger the value of this part, the higher the priority.
### Implementation Logic of Task Group
## Implementation Logic of Task Group
#### Get Task Group Resources:
### Get Task Group Resources
The master judges whether the task is configured with a task group when distributing the task. If the task is not configured, it is normally thrown to the worker to run; if a task group is configured, it checks whether the remaining size of the task group resource pool meets the current task operation before throwing it to the worker for execution. , if the resource pool -1 is satisfied, continue to run; if not, exit the task distribution and wait for other tasks to wake up.
#### Release and Wake Up:
### Release and Wake Up
When the task that has occupied the task group resource is finished, the task group resource will be released. After the release, it will check whether there is a task waiting in the current task group. If there is, mark the task with the best priority to run, and create a new executable event. The event stores the task ID that is marked to acquire the resource, and then the task obtains the task group resource and run.
......@@ -2,20 +2,18 @@
The resource management and file management functions are similar. The difference is that the resource management is the UDF upload function, and the file management uploads the user programs, scripts and configuration files. Operation function: rename, download, delete.
- Upload UDF resources
- Upload UDF resources: Same as uploading files.
> Same as uploading files.
### Function Management
## Function Management
- Create UDF function
> Click "Create UDF Function", enter the UDF function parameters, select the UDF resource, and click "Submit" to create the UDF function.
> Currently, only supports temporary UDF functions of Hive.
> Click "`Create UDF Function`", enter the UDF function parameters, select the UDF resource, and click `Submit` to create the UDF function.
> Currently, only supports temporary UDF functions of `HIVE`.
- UDF function name: enter the name of the UDF function.
- Package name Class name: enter the full path of the UDF function.
- UDF resource: set the resource file corresponding to the created UDF function.
- UDF function name: Enter the name of the UDF function.
- Package name Class name: Enter the full path of the UDF function.
- UDF resource: Set the resource file corresponding to the created UDF function.
# Security (Authorization System)
* Only the administrator account in the security center has the authority to operate. It has functions such as queue management, tenant management, user management, alarm group management, worker group management, token management, etc. In the user management module, can authorize to the resources, data sources, projects, etc.
* Administrator login, the default username and password is `admin/dolphinscheduler123`
- Only the administrator account in the security center has the authority to operate. It has functions such as queue management, tenant management, user management, alarm group management, worker group management, token management, etc. In the user management module, can authorize to the resources, data sources, projects, etc.
- Administrator login, the default username and password is `admin/dolphinscheduler123`.
## Create Queue
......@@ -50,7 +50,7 @@
## Token Management
> Since the back-end interface has login check, token management provides a way to execute various operations on the system by calling interfaces.
Since the back-end interface has login check, token management provides a way to execute various operations on the system by calling interfaces.
- The administrator enters the `Security Center -> Token Management page`, clicks the `Create Token` button, selects the expiration time and user, clicks the `Generate Token` button, and clicks the `Submit` button, then create the selected user's token successfully.
......@@ -66,7 +66,6 @@
public void doPOSTParam()throws Exception{
// create HttpClient
CloseableHttpClient httpclient = HttpClients.createDefault();
// create http post request
HttpPost httpPost = new HttpPost("");
httpPost.setHeader("token", "123");
......@@ -96,9 +95,9 @@
## Granted Permissions
* Granted permissions include project permissions, resource permissions, data source permissions, UDF function permissions.
* The administrator can authorize the projects, resources, data sources and UDF functions to normal users which not created by them. Because the way to authorize projects, resources, data sources and UDF functions to users is the same, we take project authorization as an example.
* Note: The user has all permissions to the projects created by them. Projects will not be displayed in the project list and the selected project list.
- Granted permissions include project permissions, resource permissions, data source permissions, UDF function permissions.
- The administrator can authorize the projects, resources, data sources and UDF functions to normal users which not created by them. Because the way to authorize projects, resources, data sources and UDF functions to users is the same, we take project authorization as an example.
- Note: The user has all permissions to the projects created by them. Projects will not be displayed in the project list and the selected project list.
- The administrator enters the `Security Center -> User Management` page and clicks the `Authorize` button of the user who needs to be authorized, as shown in the figure below:
<p align="center">
......@@ -145,7 +144,6 @@ worker.groups=default,test
> Usage environment
- Create a task node in the workflow definition, select the worker group and the environment corresponding to the worker group. When executing the task, the Worker will execute the environment first before executing the task.
......@@ -153,11 +151,9 @@ worker.groups=default,test
## Cluster Management
> Add or update cluster
- Each process can be related to zero or several clusters to support multiple environment, now just support k8s.
> Usage cluster
- After creation and authorization, k8s namespaces and processes will associate clusters. Each cluster will have separate workflows and task instances running independently.
......@@ -173,3 +169,5 @@ worker.groups=default,test
- After creation and authorization, you can select it from the namespace drop down list when edit k8s task, If the k8s cluster name is `ds_null_k8s` means test mode which will not operate the cluster actually.
# 资源中心简介
资源中心通常用于上传文件、UDF 函数和任务组管理。 对于 standalone 环境,可以选择本地文件目录作为上传文件夹(此操作不需要Hadoop部署)。当然,你也可以
选择上传到 Hadoop 或者 MinIO 集群。 在这种情况下,您需要有 Hadoop(2.6+)或 MinION 等相关环境。
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
想要评论请 注册