Skip to content

fix(deployment): Auto-initialize MongoDB replica set on fresh volumes#2270

Open
goynam wants to merge 4 commits into
y-scope:mainfrom
goynam:fix/results-cache-replica-set-init
Open

fix(deployment): Auto-initialize MongoDB replica set on fresh volumes#2270
goynam wants to merge 4 commits into
y-scope:mainfrom
goynam:fix/results-cache-replica-set-init

Conversation

@goynam
Copy link
Copy Markdown
Contributor

@goynam goynam commented May 9, 2026

Summary

  • Adds a postStart lifecycle hook to the results-cache (MongoDB) StatefulSet that automatically initializes the replica set (rs0) when the PVC is fresh/recreated.
  • The hook waits for mongod to accept connections, then runs rs.initiate() only if rs.status() fails (i.e., the replica set is not yet configured). This is idempotent and safe on normal restarts.
  • Bumps Helm chart version to 0.3.2-dev.3.

Problem

When the results-cache pod's PVC is deleted and recreated (e.g., during disaster recovery or testing), MongoDB starts in RSGhost state because mongod.conf has replication.replSetName: "rs0" but no replica set has been initiated on the fresh data directory. All clients then fail with "No servers match selector Primary()".

Fixes #2257

Test plan

  • Deploy the Helm chart with a fresh PVC (delete the existing results-cache PVC first)
  • Verify the MongoDB pod starts successfully and transitions to Primary state
  • Verify clients (results-cache-indices-creator job, query workers, web UI) can connect without errors
  • Verify that a normal pod restart (without PVC deletion) still works correctly (idempotent hook)
  • Run kubectl exec into the results-cache pod and confirm rs.status() shows a healthy primary

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores

    • Bumped deployment version to 0.3.2-dev.3
  • Infrastructure Improvements

    • Worker replica counts now follow the scheduling configuration for better scaling control
    • Results cache now performs automatic MongoDB replica set initialization on startup for improved cache stability and reliability

Review Change Stack

Fixes #2257

Naman Goyal and others added 2 commits May 9, 2026 17:28
…ox conflicts (fixes y-scope#2259)

When running multiple compression/query worker pods, all workers registered
with the same static Celery node name causing RabbitMQ pidbox queue conflicts
and periodic crashes. Append $(HOSTNAME) (which Kubernetes sets to the pod
name) to make each worker's node name unique.

Also bumps Helm chart version to 0.3.2-dev.2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…fixes y-scope#2257)

When the results-cache (MongoDB) pod's PVC is deleted and recreated,
MongoDB starts in RSGhost state because it has replication.replSetName
configured but no replica set initialized. This causes all clients to
fail with "No servers match selector Primary()".

Add a postStart lifecycle hook that waits for mongod to accept
connections, then calls rs.initiate() if the replica set has not already
been initialized. The hook is idempotent: on subsequent restarts where
the replica set is already configured, rs.status() succeeds and
initiation is skipped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@goynam goynam requested a review from a team as a code owner May 9, 2026 11:58
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 9, 2026

Walkthrough

Helm chart version bumped to 0.3.2-dev.3. Compression and query worker Deployments now source replica counts from .Values.scheduling.*.replicas. results-cache StatefulSet adds a postStart hook that polls MongoDB and auto-initializes a single-node replica set if necessary.

Changes

Helm Deployment Configuration Updates

Layer / File(s) Summary
Chart version
tools/deployment/package-helm/Chart.yaml
Chart version changed from 0.3.2-dev.0 to 0.3.2-dev.3.
Worker replica value keys
tools/deployment/package-helm/templates/compression-worker-deployment.yaml, tools/deployment/package-helm/templates/query-worker-deployment.yaml
Deployment spec.replicas references updated to .Values.scheduling.compressionWorker.replicas and .Values.scheduling.queryWorker.replicas respectively.
Results-cache MongoDB postStart init
tools/deployment/package-helm/templates/results-cache-statefulset.yaml
Added a lifecycle.postStart exec hook that polls MongoDB with mongosh, runs rs.status(), and calls rs.initiate(...) to initialize a single-node replica set when needed.

Sequence Diagram(s)

sequenceDiagram
  participant ResultsCacheContainer as results-cache container
  participant Mongosh as mongosh
  participant MongoDB as MongoDB (localhost:27017)
  ResultsCacheContainer->>Mongosh: run db.runCommand('ping') loop (up to 30 attempts)
  Mongosh-->>ResultsCacheContainer: ping succeeds
  ResultsCacheContainer->>Mongosh: execute rs.status()
  alt rs.status() throws
    Mongosh->>MongoDB: rs.initiate({_id: "rs0", members: [{_id: 0, host: "localhost:27017"}]})
    MongoDB-->>Mongosh: replica set initialized
  else rs.status() OK
    Mongosh-->>ResultsCacheContainer: replica set already configured
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Out of Scope Changes check ❓ Inconclusive Changes to query-worker and compression-worker deployment replicas references appear unrelated to MongoDB replica set initialization; unclear if these are necessary or accidental modifications. Clarify whether the changes to .Values.queryWorker.replicas and .Values.compressionWorker.replicas paths are intentional and related to this PR's scope, or if they should be reverted.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: adding MongoDB replica set auto-initialization on fresh volumes, which addresses the core issue in #2257.
Linked Issues check ✅ Passed The PR implements Option B from issue #2257 by adding a postStart lifecycle hook that auto-initializes MongoDB replica set, checks replica set status idempotently, and bumps chart version as documented.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch fix/results-cache-replica-set-init

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tools/deployment/package-helm/templates/compression-worker-deployment.yaml (1)

30-91: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

$(HOSTNAME) will not be expanded — Celery receives the literal string.

Kubernetes only substitutes $(VAR_NAME) in command/args using variables declared in spec.containers[].env. HOSTNAME is set by the container runtime, not by Kubernetes, so it is not part of the substitution scope. Because the command invokes python3 directly (array format, no shell), shell expansion does not happen either. The Celery worker will end up with a literal node name compression-worker@$(HOSTNAME), which is the same constant for every pod and will not fix the pidbox conflict the PR aims to address.

Declare a downward-API env var so Kubernetes can substitute it:

🔧 Proposed fix — inject pod name via the downward API
           env:
+            - name: "HOSTNAME"
+              valueFrom:
+                fieldRef:
+                  fieldPath: "metadata.name"
             - {{- include "clp.celeryBrokerUrlEnvVar" . | nindent 14 }}

Alternatively, run the command via a shell:

           command: [
-            "python3", "-u",
-            "/opt/clp/lib/python3/site-packages/bin/celery",
-            "-A", "job_orchestration.executor.compress",
-            "worker",
-            "--concurrency", "{{ .Values.workerConcurrency }}",
-            "--loglevel", "WARNING",
-            "-Q", "compression",
-            "-n", "compression-worker@$(HOSTNAME)"
-          ]
+            "bash", "-c",
+            "exec python3 -u /opt/clp/lib/python3/site-packages/bin/celery -A job_orchestration.executor.compress worker --concurrency {{ .Values.workerConcurrency }} --loglevel WARNING -Q compression -n compression-worker@${HOSTNAME}"
+          ]

Note: The same issue affects query-worker-deployment.yaml (line 90).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/deployment/package-helm/templates/compression-worker-deployment.yaml`
around lines 30 - 91, The command currently uses a literal
compression-worker@$(HOSTNAME) which Kubernetes will not expand because HOSTNAME
isn't defined in spec.containers[].env and the array form bypasses a shell; add
a downward API env var (e.g., name POD_NAME using fieldRef fieldPath:
metadata.name) in the env block and replace the "-n",
"compression-worker@$(HOSTNAME)" entry in the command array with the pod-name
variable (compression-worker@$(POD_NAME)) so Kubernetes will substitute the
actual pod name at runtime; apply the same change pattern to the query-worker
deployment command as well.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/deployment/package-helm/templates/results-cache-statefulset.yaml`:
- Around line 43-61: The chart currently forces mongod into a replica set via
the postStart hook (see lifecycle.postStart exec invoking mongosh and
rs.initiate), which couples fresh-PVC recovery to a successful hook; change to
Option A by removing replication.replSetName and any replicaSet= URI param so
the StatefulSet's mongod runs standalone (no rs.initiate in
lifecycle.postStart), and instead document in the chart docs/values that for
multi-AZ clusters users should set volumeBindingMode: WaitForFirstConsumer or
use EFS to avoid PVC recreation across AZs; update
templates/results-cache-statefulset.yaml to drop replSet config and add the
storage/AZ guidance to the chart README/values comment.
- Around line 54-59: The rs.initiate call uses a hardcoded "localhost:27017"
which breaks stable member identity; change the mongosh --host invocation and
the members[0].host value in the rs.initiate payload to use the StatefulSet pod
FQDN (pod.service.namespace.svc.cluster.local with the :27017 port) injected via
Helm/downward API or template rendering instead of "localhost:27017", so the
rs.initiate and mongosh commands reference the stable pod DNS rather than
localhost.
- Around line 49-61: The postStart hook's unbounded wait and blanket suppression
of errors must be fixed: replace the infinite until loop around mongosh with a
bounded retry (e.g., loop with a max_attempts / timeout and exit non‑zero if not
reachable) so the container fails fast if mongod never becomes available, and
remove the global redirection/|| true on the rs.initiate() step so failures
surface; additionally change the JS catch around rs.status() to only handle
NotYetInitialized (error code 94) and explicitly print/initiate errors to stderr
(so kubernetes logs/events contain diagnostics) while ensuring rs.initiate() is
attempted only when code 94 is observed.

---

Outside diff comments:
In `@tools/deployment/package-helm/templates/compression-worker-deployment.yaml`:
- Around line 30-91: The command currently uses a literal
compression-worker@$(HOSTNAME) which Kubernetes will not expand because HOSTNAME
isn't defined in spec.containers[].env and the array form bypasses a shell; add
a downward API env var (e.g., name POD_NAME using fieldRef fieldPath:
metadata.name) in the env block and replace the "-n",
"compression-worker@$(HOSTNAME)" entry in the command array with the pod-name
variable (compression-worker@$(POD_NAME)) so Kubernetes will substitute the
actual pod name at runtime; apply the same change pattern to the query-worker
deployment command as well.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: be587f7a-838c-49d2-8eae-3b7ba813d557

📥 Commits

Reviewing files that changed from the base of the PR and between 0144808 and b5a158e.

📒 Files selected for processing (4)
  • tools/deployment/package-helm/Chart.yaml
  • tools/deployment/package-helm/templates/compression-worker-deployment.yaml
  • tools/deployment/package-helm/templates/query-worker-deployment.yaml
  • tools/deployment/package-helm/templates/results-cache-statefulset.yaml

Comment on lines +43 to +61
lifecycle:
postStart:
exec:
command:
- "bash"
- "-c"
- |
until mongosh --host localhost:27017 --eval "db.runCommand('ping')" \
--quiet 2>/dev/null; do
sleep 1
done
mongosh --host localhost:27017 --eval '
try { rs.status(); } catch(e) {
rs.initiate({
_id: "rs0",
members: [{_id: 0, host: "localhost:27017"}]
});
}
' 2>/dev/null || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 🏗️ Heavy lift

Consider Option A (standalone mongod) per the linked issue, and document storage/AZ guidance.

Per the discussion in #2257, Option A — dropping replication.replSetName and any replicaSet= URI parameter so mongod runs as a standalone — was the recommended fix because nothing in the chart appears to require replication semantics on a single-replica StatefulSet. Option B (this PR) works, but it permanently couples every fresh-PVC scenario to a successful postStart hook. Worth confirming the trade-off is intentional; if there's no current consumer of replica-set features (oplog, change streams, multi-member reads), Option A removes a whole failure mode rather than papering over it.

Separately, the linked issue notes that AZ mismatches are what cause the PVC recreation in the first place. Adding a brief note to chart docs/values about volumeBindingMode: WaitForFirstConsumer (or recommending EFS for multi-AZ clusters) would prevent users from hitting this path repeatedly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/deployment/package-helm/templates/results-cache-statefulset.yaml`
around lines 43 - 61, The chart currently forces mongod into a replica set via
the postStart hook (see lifecycle.postStart exec invoking mongosh and
rs.initiate), which couples fresh-PVC recovery to a successful hook; change to
Option A by removing replication.replSetName and any replicaSet= URI param so
the StatefulSet's mongod runs standalone (no rs.initiate in
lifecycle.postStart), and instead document in the chart docs/values that for
multi-AZ clusters users should set volumeBindingMode: WaitForFirstConsumer or
use EFS to avoid PVC recreation across AZs; update
templates/results-cache-statefulset.yaml to drop replSet config and add the
storage/AZ guidance to the chart README/values comment.

Comment thread tools/deployment/package-helm/templates/results-cache-statefulset.yaml Outdated
Comment on lines +54 to +59
mongosh --host localhost:27017 --eval '
try { rs.status(); } catch(e) {
rs.initiate({
_id: "rs0",
members: [{_id: 0, host: "localhost:27017"}]
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider using the StatefulSet pod DNS as the RS member host.

Hardcoding localhost:27017 works for a replicas: 1 StatefulSet, but the linked issue's recommendation was to initiate with the service/pod DNS host so the member identity is stable and portable. If the chart ever changes replicas or if mongod's advertised host needs to match clients' connection string, a localhost member entry can cause subtle replication/discovery problems and forces an rs.reconfig() later.

A drop-in stable alternative for a StatefulSet pod is <pod>.<service>.<namespace>.svc.cluster.local, which can be passed in via the downward API:

♻️ Optional — use pod FQDN for the RS member
           env:
+            - name: "POD_NAME"
+              valueFrom:
+                fieldRef:
+                  fieldPath: "metadata.name"
+            - name: "POD_NAMESPACE"
+              valueFrom:
+                fieldRef:
+                  fieldPath: "metadata.namespace"
-                    mongosh --host localhost:27017 --eval '
-                      try { rs.status(); } catch(e) {
-                        rs.initiate({
-                          _id: "rs0",
-                          members: [{_id: 0, host: "localhost:27017"}]
-                        });
-                      }
-                    ' 2>/dev/null || true
+                    HOST="${POD_NAME}.{{ include "clp.fullname" . }}-results-cache.${POD_NAMESPACE}.svc.cluster.local:27017"
+                    mongosh --host localhost:27017 --eval "
+                      try { rs.status(); } catch(e) {
+                        rs.initiate({
+                          _id: 'rs0',
+                          members: [{_id: 0, host: '${HOST}'}]
+                        });
+                      }
+                    "
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/deployment/package-helm/templates/results-cache-statefulset.yaml`
around lines 54 - 59, The rs.initiate call uses a hardcoded "localhost:27017"
which breaks stable member identity; change the mongosh --host invocation and
the members[0].host value in the rs.initiate payload to use the StatefulSet pod
FQDN (pod.service.namespace.svc.cluster.local with the :27017 port) injected via
Helm/downward API or template rendering instead of "localhost:27017", so the
rs.initiate and mongosh commands reference the stable pod DNS rather than
localhost.

…hook

- Add retry counter (max 30 attempts) to the postStart until loop to
  prevent infinite hangs if MongoDB never accepts connections
- Remove 2>/dev/null and || true from the rs.initiate() call so errors
  are visible in pod logs for debugging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
tools/deployment/package-helm/templates/results-cache-statefulset.yaml (2)

61-68: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Narrow the catch to NotYetInitialized (code 94).

The try { rs.status(); } catch(e) { rs.initiate(...) } block still treats any rs.status() failure as "uninitialized". Now that errors are no longer suppressed and a non‑zero exit from the hook will kill the container, a transient/auth/network failure on rs.status() will trigger an rs.initiate() that can fail for a different reason (e.g., already initialized with a different config), fail the postStart hook, and put the pod into a restart loop — which is more disruptive than the original silent path. Restricting the recovery to e.codeName === "NotYetInitialized" || e.code === 94 keeps the intended fresh-PVC bootstrap and re-throws everything else.

🛡️ Proposed narrowing
                     mongosh --host localhost:27017 --eval '
-                      try { rs.status(); } catch(e) {
-                        rs.initiate({
-                          _id: "rs0",
-                          members: [{_id: 0, host: "localhost:27017"}]
-                        });
-                      }
+                      try {
+                        rs.status();
+                      } catch (e) {
+                        if (e.codeName === "NotYetInitialized" || e.code === 94) {
+                          rs.initiate({
+                            _id: "rs0",
+                            members: [{_id: 0, host: "localhost:27017"}]
+                          });
+                        } else {
+                          throw e;
+                        }
+                      }
                     '
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/deployment/package-helm/templates/results-cache-statefulset.yaml`
around lines 61 - 68, The catch in the mongosh postStart hook currently treats
any rs.status() error as "uninitialized"; change the catch body to only call
rs.initiate(...) when the error indicates NotYetInitialized by checking
e.codeName === "NotYetInitialized" || e.code === 94, and re-throw (or let the
error propagate) for all other errors so transient/auth/network failures don't
trigger an inappropriate rs.initiate() or mask real failures; update the mongosh
script around the rs.status() call accordingly to perform this conditional check
before invoking rs.initiate().

63-66: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Hardcoded localhost:27017 RS member host — prior concern still open.

The member host is still localhost:27017. As noted on the previous commit, using the pod FQDN (<pod>.<service>.<namespace>.svc.cluster.local:27017) injected via the downward API yields a stable member identity and avoids future rs.reconfig() churn if replicas ever changes or if clients expect the advertised host to match the connection string. Not a blocker for replicas: 1, but worth resolving here rather than after the chart ships a baked-in rs0 config.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/deployment/package-helm/templates/results-cache-statefulset.yaml`
around lines 63 - 66, The rs.initiate() call currently hardcodes the member host
to "localhost:27017"; change it to use the pod FQDN injected via the downward
API so the replica set member advertises a stable, cluster-scoped identity.
Locate the rs.initiate invocation and replace the member host literal with the
environment/template variable that contains
"<pod>.<service>.<namespace>.svc.cluster.local:27017" (the value provided via
the downward API), ensuring the string is injected at render/runtime so the
replica set config uses the actual pod FQDN rather than "localhost".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@tools/deployment/package-helm/templates/results-cache-statefulset.yaml`:
- Around line 61-68: The catch in the mongosh postStart hook currently treats
any rs.status() error as "uninitialized"; change the catch body to only call
rs.initiate(...) when the error indicates NotYetInitialized by checking
e.codeName === "NotYetInitialized" || e.code === 94, and re-throw (or let the
error propagate) for all other errors so transient/auth/network failures don't
trigger an inappropriate rs.initiate() or mask real failures; update the mongosh
script around the rs.status() call accordingly to perform this conditional check
before invoking rs.initiate().
- Around line 63-66: The rs.initiate() call currently hardcodes the member host
to "localhost:27017"; change it to use the pod FQDN injected via the downward
API so the replica set member advertises a stable, cluster-scoped identity.
Locate the rs.initiate invocation and replace the member host literal with the
environment/template variable that contains
"<pod>.<service>.<namespace>.svc.cluster.local:27017" (the value provided via
the downward API), ensuring the string is injected at render/runtime so the
replica set config uses the actual pod FQDN rather than "localhost".

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5ceb5845-9ffe-459b-b119-e9cc359337d8

📥 Commits

Reviewing files that changed from the base of the PR and between b5a158e and 4475f31.

📒 Files selected for processing (1)
  • tools/deployment/package-helm/templates/results-cache-statefulset.yaml

@junhaoliao junhaoliao requested a review from hoophalab May 12, 2026 13:33
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes seem not related to the current PR

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes seem not related to the current PR

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will likely cause conflicts with the results-cache-indices-creator Job. we have been working on some fix locally over the past week. @hoophalab may coordinate the submission of the changes (either in this PR or a separate PR)

The compression-worker and query-worker node name changes belong in
PR y-scope#2269, not this MongoDB replica set PR. Reverting to main.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
tools/deployment/package-helm/templates/compression-worker-deployment.yaml (1)

9-9: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Changes unrelated to PR objectives and breaking change for existing deployments.

The replica count path change from .Values.compressionWorker.replicas to .Values.scheduling.compressionWorker.replicas is unrelated to the MongoDB initialization fix described in the PR objectives. Additionally, this is a breaking change that will cause deployment failures for users with existing values.yaml files that define compressionWorker.replicas at the top level instead of under scheduling.

Verify that .Values.scheduling.compressionWorker.replicas is properly defined:

#!/bin/bash
# Description: Check if the new value path is defined in values.yaml and verify migration path

# Check if old path exists in values files
echo "=== Checking for old path usage ==="
rg -n "compressionWorker:" --type yaml -A 5 -g 'values*.yaml'

# Check if new scheduling path exists
echo -e "\n=== Checking for new scheduling path ==="
rg -n "scheduling:" --type yaml -A 10 -g 'values*.yaml'

# Look for any documentation about this breaking change
echo -e "\n=== Checking for migration documentation ==="
rg -n -i "scheduling.*replicas|migration|breaking" -g 'README*' -g 'CHANGELOG*' -g 'values*.yaml' -g '*.md'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/deployment/package-helm/templates/compression-worker-deployment.yaml`
at line 9, The template change moved the replica path to
.Values.scheduling.compressionWorker.replicas which is a breaking change; update
compression-worker-deployment.yaml to accept the old top-level key as a fallback
so existing values.yaml continue to work: use the replicas field (replicas: {{
... }}) to prefer .Values.scheduling.compressionWorker.replicas but fall back to
.Values.compressionWorker.replicas (or a default) when the scheduling path is
not defined, and add a short comment referencing both keys so maintainers know
why both exist.
tools/deployment/package-helm/templates/query-worker-deployment.yaml (1)

10-10: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Changes unrelated to PR objectives and breaking change for existing deployments.

The replica count path change from .Values.queryWorker.replicas to .Values.scheduling.queryWorker.replicas is unrelated to the MongoDB initialization fix described in the PR objectives. Additionally, this is a breaking change that will cause deployment failures for users with existing values.yaml files that define queryWorker.replicas at the top level instead of under scheduling.

Verify that .Values.scheduling.queryWorker.replicas is properly defined:

#!/bin/bash
# Description: Check if the new value path is defined in values.yaml and verify migration path

# Check if old path exists in values files
echo "=== Checking for old path usage ==="
rg -n "queryWorker:" --type yaml -A 5 -g 'values*.yaml'

# Check if new scheduling path exists
echo -e "\n=== Checking for new scheduling path ==="
rg -n "scheduling:" --type yaml -A 10 -g 'values*.yaml'

# Look for any documentation about this breaking change
echo -e "\n=== Checking for migration documentation ==="
rg -n -i "scheduling.*replicas|migration|breaking" -g 'README*' -g 'CHANGELOG*' -g 'values*.yaml' -g '*.md'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/deployment/package-helm/templates/query-worker-deployment.yaml` at line
10, The change replaced the replica value path from .Values.queryWorker.replicas
to .Values.scheduling.queryWorker.replicas which is a breaking config change;
restore backward compatibility by updating the deployment template that sets
"replicas:" to accept both keys (use a conditional/fallback lookup so it prefers
.Values.scheduling.queryWorker.replicas but falls back to
.Values.queryWorker.replicas or a sensible default) and update docs/values.yaml
to document the new path if you intend to migrate users.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@tools/deployment/package-helm/templates/compression-worker-deployment.yaml`:
- Line 9: The template change moved the replica path to
.Values.scheduling.compressionWorker.replicas which is a breaking change; update
compression-worker-deployment.yaml to accept the old top-level key as a fallback
so existing values.yaml continue to work: use the replicas field (replicas: {{
... }}) to prefer .Values.scheduling.compressionWorker.replicas but fall back to
.Values.compressionWorker.replicas (or a default) when the scheduling path is
not defined, and add a short comment referencing both keys so maintainers know
why both exist.

In `@tools/deployment/package-helm/templates/query-worker-deployment.yaml`:
- Line 10: The change replaced the replica value path from
.Values.queryWorker.replicas to .Values.scheduling.queryWorker.replicas which is
a breaking config change; restore backward compatibility by updating the
deployment template that sets "replicas:" to accept both keys (use a
conditional/fallback lookup so it prefers
.Values.scheduling.queryWorker.replicas but falls back to
.Values.queryWorker.replicas or a sensible default) and update docs/values.yaml
to document the new path if you intend to migrate users.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 26e1e1d9-883a-4762-8d04-62bac4af0e99

📥 Commits

Reviewing files that changed from the base of the PR and between 4475f31 and 4ca19aa.

📒 Files selected for processing (2)
  • tools/deployment/package-helm/templates/compression-worker-deployment.yaml
  • tools/deployment/package-helm/templates/query-worker-deployment.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Results cache (MongoDB) fails to recover after PVC recreation due to uninitialized replica set

2 participants