The Storage System in Apache DolphinScheduler provides a unified interface for storing and retrieving files across various storage backends. It enables resource management for workflows and tasks, allowing users to upload files such as scripts, JAR files, configuration files, and other artifacts that can be used in task execution. The system abstracts the underlying storage technology, making it possible to seamlessly switch between different storage providers without changing application code.
Architecture OverviewThe storage system is designed as a pluggable component with a consistent API across different storage implementations. This architecture allows DolphinScheduler to work with multiple storage backends while maintaining a unified interface for resource operations.
Sources:
DolphinScheduler supports the following storage backends:
Sources:
The storage functionality is implemented using a plugin architecture that allows for easy extension and maintenance.
Sources:
The storage system is configured through the common.properties file. Different storage backends require different configuration parameters.
Basic Configuration # Storage type: LOCAL, HDFS, S3, OSS, GCS, ABS, OBS, COS resource.storage.type=LOCAL # Base path for resource storage resource.storage.upload.base.path=/tmp/dolphinschedulerSources:
When DolphinScheduler starts, it loads the storage configuration and initializes the appropriate storage operator:
Sources:
The Local Storage option stores files on the local file system of the machine where DolphinScheduler is running. This is the default configuration.
resource.storage.type=LOCAL resource.storage.upload.base.path=/tmp/dolphinschedulerNote: When using LOCAL storage type with multiple DolphinScheduler nodes, each node has its own local file system. This means resources uploaded on one node are not automatically available on other nodes unless you use a shared file system.
Sources:
For HDFS storage, additional configuration is required:
resource.storage.type=HDFS resource.hdfs.fs.defaultFS=hdfs://namenode:8020 resource.hdfs.root.user=hdfsIf HDFS with Kerberos authentication is used, additional Kerberos configuration is required.
Sources:
For Amazon S3 or S3-compatible storage:
resource.storage.type=S3AWS connection parameters are specified in the aws.yaml file:
aws: s3: credentials.provider.type: AWSStaticCredentialsProvider access.key.id:Sources:
DolphinScheduler also supports storage on Alibaba Cloud OSS, Huawei Cloud OBS, Tencent Cloud COS, Google Cloud Storage, and Azure Blob Storage, each with its own configuration parameters.
Sources:
In addition to the actual file storage, DolphinScheduler also maintains metadata about resources in its database. The relevant tables include:
Sources:
The Storage System integrates with other DolphinScheduler components:
Sources: The diagrams provided in the prompt showing system architecture.
Usage ConsiderationsWhen selecting a storage type, consider the following:
Sources:
Sources:
\
All Rights Reserved. Copyright , Central Coast Communications, Inc.