The Resource Center is usually used for operations such as uploading files, UDF functions, and task group management. You can appoint the local file directory as the upload directory for a single machine (this operation does not need to deploy Hadoop). Or you can also upload to a Hadoop or MinIO cluster, at this time, you need to have Hadoop (2.6+) or MinIO or other related environments.
When it is necessary to use the Resource Center to create or upload relevant files, all files and resources will be stored on HDFS. Therefore the following configuration is required.
## Local File Resource Configuration
...
...
@@ -13,13 +13,9 @@ Configure the file in the following paths: `api-server/conf/common.properties` a
- Change `data.basedir.path` to the local directory path. Please make sure the user who deploy dolphinscheduler have read and write permissions, such as: `data.basedir.path=/tmp/dolphinscheduler`. And the directory you configured will be auto-created if it does not exists.
- Modify the following two parameters, `resource.storage.type=HDFS` and `resource.hdfs.fs.defaultFS=file:///`.
## HDFS Resource Configuration
When it is necessary to use the Resource Center to create or upload relevant files, all files and resources will be stored on HDFS. Therefore the following configuration is required.
### Configuring the common.properties
## Configuring the common.properties
After version 3.0.0-alpha, if you want to upload resources using HDFS or S3 from the Resource Center, the following paths need to be configured: `api-server/conf/common.properties` and `worker-server/conf/common.properties`. This can be found as follows.
After version 3.0.0-alpha, if you want to upload resources using HDFS or S3 from the Resource Center, you will need to configure the following paths The following paths need to be configured: `api-server/conf/common.properties` and `worker-server/conf/common.properties`. This can be found as follows.
```properties
#
...
...
@@ -42,12 +38,13 @@ After version 3.0.0-alpha, if you want to upload resources using HDFS or S3 from
# user data local directory path, please make sure the directory exists and have read write permissions
# resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
# if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
resource.hdfs.root.user=root
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler;
# if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
resource.hdfs.fs.defaultFS=hdfs://localhost:8020
resource.hdfs.root.user=hdfs
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
# Whether hive SQL is executed in the same session
support.hive.oneSession=false
# use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions;
# if set false, executing user is the deploy user and doesn't need sudo permissions
# use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions; if set false, executing user is the deploy user and doesn't need sudo permissions
sudo.enable=true
# network interface preferred like eth0, default: empty
> * If only the `api-server/conf/common.properties` file is configured, then resource uploading is enabled, but you can not use resources in task. If you want to use or execute the files in the workflow you need to configure `worker-server/conf/common.properties` too.
> * If you want to use the resource upload function, the deployment user in [installation and deployment](../installation/standalone.md) must have relevant operation authority.
> * If you using a Hadoop cluster with HA, you need to enable HDFS resource upload, and you need to copy the `core-site.xml` and `hdfs-site.xml` under the Hadoop cluster to `worker-server/conf` and `api-server/conf`, otherwise skip this copy step.
\ No newline at end of file
> * If you using a Hadoop cluster with HA, you need to enable HDFS resource upload, and you need to copy the `core-site.xml` and `hdfs-site.xml` under the Hadoop cluster to `worker-server/conf` and `api-server/conf`, otherwise skip this copy step.
The Resource Center is typically used for uploading files, UDF functions, and task group management. For a stand-alone
environment, you can select the local file directory as the upload folder (**this operation does not require Hadoop or HDFS deployment**).
Of course, you can also choose to upload to Hadoop or MinIO cluster. In this case, you need to have Hadoop (2.6+) or MinIOn and other related environments.
The task group is mainly used to control the concurrency of task instances and is designed to control the pressure of other resources (it can also control the pressure of the Hadoop cluster, the cluster will have queue control it). When creating a new task definition, you can configure the corresponding task group and configure the priority of the task running in the task group.
**Note**: The usage of task groups is applicable to tasks executed by workers, such as [switch] nodes, [condition] nodes, [sub_process] and other node types executed by the master are not controlled by the task group. Let's take the shell node as an example:
...
...
@@ -40,13 +40,13 @@ Regarding the configuration of the task group, all you need to do is to configur
- Priority: When there is a waiting resource, the task with high priority will be distributed to the worker by the master first. The larger the value of this part, the higher the priority.
### Implementation Logic of Task Group
## Implementation Logic of Task Group
#### Get Task Group Resources:
### Get Task Group Resources
The master judges whether the task is configured with a task group when distributing the task. If the task is not configured, it is normally thrown to the worker to run; if a task group is configured, it checks whether the remaining size of the task group resource pool meets the current task operation before throwing it to the worker for execution. , if the resource pool -1 is satisfied, continue to run; if not, exit the task distribution and wait for other tasks to wake up.
#### Release and Wake Up:
### Release and Wake Up
When the task that has occupied the task group resource is finished, the task group resource will be released. After the release, it will check whether there is a task waiting in the current task group. If there is, mark the task with the best priority to run, and create a new executable event. The event stores the task ID that is marked to acquire the resource, and then the task obtains the task group resource and run.
The resource management and file management functions are similar. The difference is that the resource management is the UDF upload function, and the file management uploads the user programs, scripts and configuration files. Operation function: rename, download, delete.
- Upload UDF resources
- Upload UDF resources: Same as uploading files.
> Same as uploading files.
### Function Management
## Function Management
- Create UDF function
> Click "Create UDF Function", enter the UDF function parameters, select the UDF resource, and click "Submit" to create the UDF function.
> Currently, only supports temporary UDF functions of Hive.
> Click "`Create UDF Function`", enter the UDF function parameters, select the UDF resource, and click `Submit` to create the UDF function.
> Currently, only supports temporary UDF functions of `HIVE`.
- UDF function name: enter the name of the UDF function.
- Package name Class name: enter the full path of the UDF function.
- UDF resource: set the resource file corresponding to the created UDF function.
- UDF function name: Enter the name of the UDF function.
- Package name Class name: Enter the full path of the UDF function.
- UDF resource: Set the resource file corresponding to the created UDF function.
* Only the administrator account in the security center has the authority to operate. It has functions such as queue management, tenant management, user management, alarm group management, worker group management, token management, etc. In the user management module, can authorize to the resources, data sources, projects, etc.
* Administrator login, the default username and password is `admin/dolphinscheduler123`
- Only the administrator account in the security center has the authority to operate. It has functions such as queue management, tenant management, user management, alarm group management, worker group management, token management, etc. In the user management module, can authorize to the resources, data sources, projects, etc.
- Administrator login, the default username and password is `admin/dolphinscheduler123`.
## Create Queue
...
...
@@ -50,7 +50,7 @@
## Token Management
> Since the back-end interface has login check, token management provides a way to execute various operations on the system by calling interfaces.
Since the back-end interface has login check, token management provides a way to execute various operations on the system by calling interfaces.
- The administrator enters the `Security Center -> Token Management page`, clicks the `Create Token` button, selects the expiration time and user, clicks the `Generate Token` button, and clicks the `Submit` button, then create the selected user's token successfully.
* Granted permissions include project permissions, resource permissions, data source permissions, UDF function permissions.
* The administrator can authorize the projects, resources, data sources and UDF functions to normal users which not created by them. Because the way to authorize projects, resources, data sources and UDF functions to users is the same, we take project authorization as an example.
* Note: The user has all permissions to the projects created by them. Projects will not be displayed in the project list and the selected project list.
- Granted permissions include project permissions, resource permissions, data source permissions, UDF function permissions.
- The administrator can authorize the projects, resources, data sources and UDF functions to normal users which not created by them. Because the way to authorize projects, resources, data sources and UDF functions to users is the same, we take project authorization as an example.
- Note: The user has all permissions to the projects created by them. Projects will not be displayed in the project list and the selected project list.
- The administrator enters the `Security Center -> User Management` page and clicks the `Authorize` button of the user who needs to be authorized, as shown in the figure below:
- Create a task node in the workflow definition, select the worker group and the environment corresponding to the worker group. When executing the task, the Worker will execute the environment first before executing the task.
- Each process can be related to zero or several clusters to support multiple environment, now just support k8s.
> Usage cluster
- After creation and authorization, k8s namespaces and processes will associate clusters. Each cluster will have separate workflows and task instances running independently.
- After creation and authorization, you can select it from the namespace drop down list when edit k8s task, If the k8s cluster name is `ds_null_k8s` means test mode which will not operate the cluster actually.