Add db migrate job for Gateway upgrades by jchanbcbc · Pull Request #99 · CAAPIM/layer7-operator

jchanbcbc · 2026-06-30T05:33:45Z

Database Migration Job Support for Layer7 Gateway Operator

Overview

This PR introduces a pre-upgrade database migration job for the Layer7 Gateway Operator. It allows Liquibase schema migrations to run as a dedicated Kubernetes Job before Gateway pods start during an upgrade, reducing downtime risk and preventing schema conflicts. This feature requires Gateway 11.2.2 or higher.

What's New

Gateway Entrypoint — Schema Update Modes

The Gateway container (entrypoint.sh) now supports four explicit schema update modes via EXTRA_JAVA_ARGS:

Mode	Description
`default`	Gateway runs Liquibase migrations on startup (existing behavior)
`skip`	Gateway skips schema update entirely; assumes migrations were run externally
`liquibase-only`	Run Liquibase migrations and exit (used by the migration Job)
`liquibase-only-with-unlock`	Release the Liquibase `DATABASECHANGELOGLOCK` before migrating, then exit

Operator — Migration Job (`spec.app.management.database.migrationJob`)

A new opt-in migrationJob section in the Gateway CR spec:

management:
  database:
    migrationJob:
      enabled: true                    # opt-in; default false
      activeDeadlineSeconds: 300       # job timeout; default 5 minutes
      clearLocks: false                # set true to release stale Liquibase locks before migrating
      jdbcUrl: ""                      # optional override to by-pass db proxies, defaults to database.jdbcUrl

Job lifecycle and sequencing:

When migrationJob.enabled: true, the operator creates a Kubernetes Job that runs Liquibase in liquibase-only (requiring Gateway 11.2.2 or higher) mode before the Gateway Deployment is allowed to proceed.
The Gateway Deployment is blocked until the migration job succeeds. If the job fails, an error is logged with instructions and the Gateway remains in its current state — no automatic rollout to pods with an unmigrated schema.
If the job's spec changes between kubectl apply calls (e.g. a corrected jdbcUrl, a new image, or a changed clearLocks flag), the operator automatically deletes and recreates the job rather than getting stuck on a stale run.
Failed jobs require manual intervention: kubectl delete job <name> -n <namespace> to retry.

Automatic skip mode for Gateway pods:

When migrationJob.enabled: true, the operator automatically injects -Dgateway.db.schema-update.mode=skip into EXTRA_JAVA_ARGS for the main Gateway pods. This prevents them from attempting their own Liquibase run, since the Job already handles it. Users do not need to set this manually.

JDBC URL handling:

If migrationJob.jdbcUrl is explicitly set, the operator appends ?createDatabaseIfNotExist=false (using ? or & correctly based on whether query params already exist). This ensures that a misconfigured or mistyped URL fails fast rather than silently creating an unintended database.
If migrationJob.jdbcUrl is not set, the main database.jdbcUrl from the ConfigMap is used as-is, preserving the existing createDatabaseIfNotExist=true behavior for fresh installs.

Diskless vs non-diskless config parity:

In diskless mode (default): the Secret is exposed as env vars; migrationJob.jdbcUrl overrides the main JDBC URL via Kubernetes env precedence.
In non-diskless mode (disklessConfig.disabled: true): the Secret is mounted as a node.properties file (matching the main Gateway deployment behavior); migrationJob.jdbcUrl is not used — node.properties wins, consistent with Helm behavior.

clearLocks field:

When clearLocks: true, the job runs in liquibase-only-with-unlock mode, which releases any stale DATABASECHANGELOGLOCK before applying migrations. Useful for recovering from a previously aborted migration. Changing this field triggers automatic job replacement. Recommended to use kubectl patch with type=merge.

Optional liquibaseLogLevel:

spec.app.management.database.liquibaseLogLevel sets the LIQUIBASE_LOG_LEVEL env var on both the migration job and Gateway pods for debugging.

Files Changed

Component	File	Change
Gateway container	`entrypoint.sh`	4-mode schema update logic
Helm chart	`values.yaml`, `db-migration-job.yaml`, `README.md`, `release-notes.md`	Migration job template and documentation
Operator CRD	`api/v1/gateway_types.go`	`MigrationJob.ClearLocks`, `Database.LiquibaseLogLevel`
Operator job builder	`pkg/gateway/migration_job.go`	Job spec with diskless/non-diskless handling, URL normalization
Operator reconciler	`pkg/gateway/reconcile/migration_job.go`	Job lifecycle, deployment blocking, spec-change detection
Operator ConfigMap	`pkg/gateway/configmap.go`	Auto-inject `skip` mode, `LIQUIBASE_LOG_LEVEL`
Operator controller	`internal/controller/gateway/controller.go`	`Owns(&batchv1.Job{})` to watch Job status changes

Introduces an opt-in Kubernetes Job that runs Liquibase schema migrations before Gateway pods roll out during an upgrade. Gateway pods are blocked until the job succeeds, preventing pods from starting against an unmigrated schema. Key behaviors: - Works with Gateway 11.2.2 and later which provides four schema update modes in entrypoint.sh: default, skip, liquibase-only, liquibase-only-with-unlock - Migration job enabled via spec.app.management.database.migrationJob.enabled - clearLocks field releases stale Liquibase locks before migrating - Gateway pods automatically use skip mode when migration job is enabled - Job is auto-replaced when spec changes (image, jdbcUrl, clearLocks) - Failed jobs block the deployment and require manual deletion to retry - jdbcUrl override only applies in diskless mode; node.properties wins in non-diskless mode, consistent with Helm behavior

Gateway.Status.MigrationStatus. This addresses issue with previous implementation would re-run migrations on every 12-hour reconcile if the completed Job had been deleted. Changes: - Add MigrationStatus struct (SpecHash, Complete) to GatewayStatus in gateway_types.go. Once Complete is true, GatewayMigrationJob returns immediately on all future reconciles regardless of Job existence. - Add DeepCopyInto/DeepCopy for MigrationStatus in zz_generated.deepcopy.go. - Update CRD schema (security.brcmlabs.com_gateways.yaml) to include the new migrationStatus status fields. - Rewrite GatewayMigrationJob in reconcile/migration_job.go: - Compute a 16-char spec hash from image, effective jdbcUrl, clearLocks, and activeDeadlineSeconds. A hash change (e.g. image upgrade) resets status and triggers a fresh migration automatically. - On first enable: write hash to status, create Job, wait. - On Job success: write Complete=true to status, unblock Deployment. - On Job failure (both pod attempts exhausted): log error with exact kubectl delete command, block Deployment. User deletes Job to retry. - On disabled: clean up any orphaned Job. - Remove migrationJobSpecChanged — spec change detection is now fully handled by the hash comparison. Recovery after failure: fix the root cause (run kubectl patch with clearLocks:true or restore DB to pre-upgrade state if partially migrated) , then: kubectl delete job <name>-db-migration -n <namespace> The operator automatically creates a new Job. No kubectl apply needed.

…pec parity - Reconcile loop no longer short-circuits while the migration job is pending — it runs every op to completion each pass and only stops on a genuine error. The Deployment step alone gates on Gateway.Status.MigrationStatus. - GatewayMigrationJob returns nil for all "not done yet" states instead of a synthetic ErrMigrationPending; progress is driven by the Job's own watch events (create/update/delete) rather than a fixed poll interval. - Fixed an edge case where a spec change (e.g. image upgrade) with no existing Job to delete could stall migration Job recreation for up to 12 hours — it now creates the replacement Job in the same reconcile pass when there's nothing to wait on. - Failure log message no longer embeds a kubectl command. - The migration Job's pod/container spec now inherits the same settings as the main Gateway Deployment (security contexts, resources, node selector, affinity, tolerations, topology spread constraints, pod annotations/labels) instead of a bare-minimum hand-built spec, and uses the same app.kubernetes.io/* labeling convention as other operator-managed resources. - Added a regression test for the "spec changed, no existing Job" case; existing migration job tests updated for the new nil-return contract.

jchanbcbc added 2 commits June 29, 2026 22:25

jchanbcbc self-assigned this Jun 30, 2026

jchanbcbc added the IN PROGRESS label Jun 30, 2026

jchanbcbc added 2 commits June 29, 2026 23:37

fix bugs. Improve error handling.

a3ad636

fix to modify / characters in branch names to be replaced with hyphen.

b15b0f3

jchanbcbc removed the IN PROGRESS label Jun 30, 2026

jchanbcbc changed the title ~~Feature/add db migrate job~~ Add db migrate job for Gateway upgrades Jun 30, 2026

jchanbcbc added 7 commits June 30, 2026 16:13

improve re-use of the MigrationJobName function.

994e2f4

Gate on migration status instead of the GatewayMigrationJob

d895dc4

Remove extra comments.

3842db2

Fix bug.

323a6e6

Added control-plane related End2End tests for the database migration.

929890a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add db migrate job for Gateway upgrades#99

Add db migrate job for Gateway upgrades#99
jchanbcbc wants to merge 11 commits into
developfrom
feature/add_db_migrate_job

jchanbcbc commented Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jchanbcbc commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Database Migration Job Support for Layer7 Gateway Operator

Overview

What's New

Gateway Entrypoint — Schema Update Modes

Operator — Migration Job (spec.app.management.database.migrationJob)

Files Changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jchanbcbc commented Jun 30, 2026 •

edited

Loading

Operator — Migration Job (`spec.app.management.database.migrationJob`)