Skip to content

feat: add label-based sharding for horizontal scaling#270

Open
aashishtomar wants to merge 1 commit into
crossplane-contrib:mainfrom
aashishtomar:feat/horizontal-scaling-shard-label
Open

feat: add label-based sharding for horizontal scaling#270
aashishtomar wants to merge 1 commit into
crossplane-contrib:mainfrom
aashishtomar:feat/horizontal-scaling-shard-label

Conversation

@aashishtomar

Copy link
Copy Markdown

What problem does this solve?

Fixes #269 | Related: #212, crossplane/crossplane#2411

Currently, deploying multiple controller replicas with --leader-election only provides active/passive HA — all replicas compete for a single leader election lease, so only one replica reconciles Workspaces at any time. The remaining replicas sit idle.

For deployments managing hundreds of Workspaces, this becomes a throughput bottleneck. Each Workspace reconciliation involves running terraform init, plan, and apply, which are CPU-intensive operations. A single controller can only process a limited number of Workspaces per reconciliation cycle, leading to increased drift detection latency and slower convergence.

What changed?

This PR introduces a --shard-name flag (also configurable via SHARD_NAME env var) that enables label-based sharding. Users assign Workspaces to shards by labeling them with terraform.crossplane.io/shard=<name>, then deploy one controller instance per shard.

cmd/provider/main.go

  • Added --shard-name flag with SHARD_NAME env var support
  • Moved scheme registration before manager creation. This is required because cache.ByObject needs the Workspace types registered in the scheme at startup to determine whether they are cluster-scoped or namespaced. Previously, schemes were registered after ctrl.NewManager(), which worked because the manager auto-created a default scheme — but with ByObject referencing specific types, early registration is necessary.
  • When --shard-name is set, configures cache.ByObject with a label selector on both clusterv1beta1.Workspace and namespacedv1beta1.Workspace. This filters at the informer/watch level — the API server only sends events for matching Workspaces, which is more efficient than filtering in the reconciler.
  • Appends the shard name to the leader election lease ID (e.g., crossplane-leader-election-provider-terraform-shard-0), so each shard gets its own lease and multiple shards can be active simultaneously.

internal/controller/gc/gc.go

  • Updated Setup() signature to accept shardName string and pass it to the GarbageCollector via WithShardName().
  • The shard name is used for logging context so operators can distinguish GC logs from different shards.

internal/workdir/workdir.go

  • Added ShardLabel constant (terraform.crossplane.io/shard) as a single source of truth for the label key.
  • Added shardName field and WithShardName() option to GarbageCollector.
  • Critical design decision: The GC always lists ALL workspaces without any shard label filtering. This is intentional — if a shard's GC only listed its own workspaces, it could incorrectly determine that another shard's workspace directories are orphaned and delete them. By listing all workspaces globally, each GC instance can safely determine which directories are truly orphaned (i.e., the workspace no longer exists in any shard).

internal/workdir/workdir_test.go

  • Added TestCollectWithShardName with two table-driven test cases following existing test patterns:
    • ShardedGCListsAllWorkspaces: Sets up a sharded GC (shard-0) with directories from shard-0, shard-1, and a deleted workspace. Verifies that only the deleted workspace's directory is removed — both shard-0 and shard-1 directories are preserved.
    • ShardedGCPreservesOtherShardDirs: Sets up a sharded GC (shard-1) with directories from both shard-0 and shard-1. Verifies that shard-1's GC does not delete shard-0's directories.
  • Added TestWithShardName: Verifies the option function correctly sets the shardName field.
  • Added TestShardLabel: Verifies the constant value matches the expected label key.

All new tests use table-driven patterns with test.MockClient consistent with the existing test suite. No third-party test frameworks are introduced.

Why this approach?

I evaluated three alternatives:

Approach Pros Cons
Label-based sharding (this PR) Simple, explicit, operator-controlled, informer-level efficiency Requires labeling workspaces
Hash-based sharding (consistent hashing on UID) Automatic distribution Hard to reason about operationally, rebalancing complexity
ProviderConfig-based partitioning (#2411) Natural grouping Requires deeper Crossplane runtime changes

Label-based sharding was chosen because:

  • It can be implemented entirely within the provider without Crossplane runtime changes
  • It gives operators explicit control over which workspaces go where
  • It uses cache.ByObject for informer-level filtering, which is the most efficient approach (events are filtered at the API server watch, not in the reconciler)
  • There is precedent in the ecosystem — provider-ansible implements sharding using an event filter, though this PR uses informer-level filtering which is more efficient

Does this change user-facing behavior?

No, when --shard-name is not set (default). The controller behaves identically to today — it reconciles all Workspaces and uses the existing leader election lease. This is a purely additive, opt-in feature.

When --shard-name is set:

  • The controller only reconciles Workspaces with the matching terraform.crossplane.io/shard label
  • Unlabeled Workspaces are not reconciled by any sharded instance (operators should either label all Workspaces or keep one unsharded instance as a catch-all)
  • Each shard has its own leader election lease

Example deployment

# Deploy shard-0
env:
- name: SHARD_NAME
  value: "shard-0"
args:
- --leader-election
- --shard-name=shard-0

# Deploy shard-1
env:
- name: SHARD_NAME
  value: "shard-1"
args:
- --leader-election
- --shard-name=shard-1
# Assign workspaces to shards
kubectl label workspace my-vpc terraform.crossplane.io/shard=shard-0
kubectl label workspace my-rds terraform.crossplane.io/shard=shard-1

Backward compatibility

  • No API changes — no new CRDs, no schema changes to existing resources
  • No changed defaults — default behavior (no --shard-name) is identical to current behavior
  • No upgrade impact — existing deployments continue to work without any changes
  • Shard label is passive — adding the label to a Workspace has no effect unless a sharded controller is deployed

Tests added

  • TestCollectWithShardName (2 sub-tests): Validates GC correctness in sharded mode
  • TestWithShardName: Validates option function
  • TestShardLabel: Validates label constant

All existing tests continue to pass. Ran make reviewable locally — clean.

Manual testing performed

Tested on a local Kind cluster (3 nodes) with Crossplane 2.2.0 and provider-terraform v1.1.1:

  1. Deployed 3 sharded controller instances (shard-0, shard-1, shard-2)
  2. Created 6 Workspaces using null_resource, distributed across shards
  3. Verified each shard only reconciles its labeled Workspaces (checked controller logs)
  4. Verified each shard has its own leader election lease (kubectl get leases)
  5. Verified all Workspaces reach SYNCED=True, READY=True
  6. Verified GC does not delete directories belonging to other shards
  7. Verified relabeling a Workspace moves it between shards (old shard stops reconciling, new shard picks it up)

Docs impact

This PR adds inline code comments explaining the sharding behavior. User-facing documentation (e.g., a guide on setting up horizontal scaling) could be added in a follow-up once the approach is reviewed and approved.

Anything reviewers should pay special attention to?

  1. Scheme registration ordering — moved before ctrl.NewManager() to support cache.ByObject. This is a structural change to the startup sequence, though the end result is functionally equivalent.
  2. GC listing all workspaces — this is the key safety invariant. If this were ever changed to filter by shard, it could cause data loss (one shard deleting another's terraform state directories).
  3. Unlabeled workspace handling — unlabeled Workspaces are invisible to sharded controllers. This is by design, but operators need to be aware of it.

Currently the controller only supports active/passive HA via leader
election — a single replica reconciles all Workspaces regardless of
replica count. For deployments with hundreds of Workspaces, this becomes
a throughput bottleneck since each reconciliation runs terraform
init/plan/apply.

This change introduces a --shard-name flag (also configurable via
SHARD_NAME env var) that partitions Workspaces across controller
instances using the label terraform.crossplane.io/shard=<name>.

Changes:

cmd/provider/main.go:
- Add --shard-name flag with SHARD_NAME env var
- Move scheme registration before manager creation so cache.ByObject
  can resolve types at startup
- Configure cache.ByObject with a label selector on both cluster-scoped
  and namespaced Workspace types when shard name is set
- Append shard name to leader election lease ID for per-shard leaders

internal/controller/gc/gc.go:
- Accept shardName parameter and pass it to the GarbageCollector
  for logging context

internal/workdir/workdir.go:
- Add ShardLabel constant for terraform.crossplane.io/shard
- Add shardName field and WithShardName option to GarbageCollector
- GC intentionally lists ALL workspaces without shard filtering to
  avoid cross-shard directory deletion

internal/workdir/workdir_test.go:
- Add TestCollectWithShardName with two table-driven cases:
  ShardedGCListsAllWorkspaces verifies the GC deletes only directories
  for workspaces that no longer exist in any shard.
  ShardedGCPreservesOtherShardDirs verifies the GC does not delete
  directories belonging to other shards' workspaces.
- Add TestWithShardName to verify the option function
- Add TestShardLabel to verify the constant value

When --shard-name is empty (default), behavior is identical to the
existing single-controller mode for full backward compatibility.

Fixes crossplane-contrib#269

Signed-off-by: AshTom <aashish.tomar@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support horizontal scaling via label-based sharding

1 participant