-
Notifications
You must be signed in to change notification settings - Fork 5k
Open
Labels
Description
Search before asking
- I had searched in the DSIP and found no similar DSIP.
Motivation
Currently, upgrading Apache DolphinScheduler between major versions (e.g., from 1.3.x to 3.x.x) relies on the official upgrade-schema.sh script. This approach has several limitations for large-scale production environments:
- Downtime Requirement: The master/worker nodes and the metadata database must be offline during the schema upgrade, which is unacceptable for 24/7 SLA requirements.
- All-or-Nothing Risk: It is impossible to migrate only a subset of projects. If an upgrade fails, rolling back a massive database is time-consuming and risky.
- Schema Complexity: Major versions (especially the jump from 1.x to 2.x/3.x) introduced significant changes, such as the decoupling of task and process definitions.
Using Flink-CDC as a migration engine allows for real-time metadata synchronization, gradual "canary" migrations of specific workflows, and zero downtime for the source system.
Design Detail
The migration tool will be implemented as a Flink application that captures changes from the source metadata database and sinks them into the target database after applying version-specific transformations.
- Architecture:
- Source: MySQL/PostgreSQL (Source DS Database) using Flink CDC Connectors.
- Transformation Layer: A custom MapFunction or ProcessFunction that handles the schema mapping logic. For example:
- Converting the process_definition_json in 1.3.x into the decoupled task_definition and task_relation in 3.x.x.
- Generating new snowflake IDs (Global IDs) for the target version.
- Sink: JDBC Sink (Target DS Database).
- Key Components:
- Granular Filter: A configuration parameter (e.g., migration.project.codes) to allow users to select specific projects for migration.
- Stateful Mapping: Use Flink State to maintain the mapping between old IDs and new IDs to ensure consistency across multiple tables.
- Data Conversion Engine: A dedicated module to parse 1.x JSON strings and reconstruct them into the target version's relational model.
Compatibility, Deprecation, and Migration Plan
- Compatibility: This feature is an alternative migration path and does not replace the existing upgrade-schema.sh.
- It supports "Source-Live" mode, where the source system remains read-write while the target system is being populated.
- Deprecation: None.
- Migration Plan:
- Deploy the Target DolphinScheduler version (fresh install).
- Configure and start the Flink-CDC migration job.
- Perform verification on the Target environment (e.g., dry-run workflows).
- Gradually switch the scheduling traffic from Source to Target by project.
- Stop the CDC job once all projects are migrated.
Test Plan
- Unit Tests:
- Validate the JSON transformation logic from 1.3.x to 3.x.x.
- Test the ID generator and mapping state.
- Integration Tests:
- End-to-end migration from a standard DS 1.3.5 database to a DS 3.2.2 database.
- Verify workflow execution on the target side after migration.
- Consistency Tests:
- Compare the MD5 of process definitions between source and target.
- Validate record counts across all core tables (t_ds_project, t_ds_process_definition, etc.).
- Performance Tests:
- Benchmark the migration speed for environments with >10,000 workflow definitions.
Code of Conduct
- I agree to follow this project's Code of Conduct