fix(deployment): Auto-initialize MongoDB replica set on fresh volumes by goynam · Pull Request #2270 · y-scope/clp

goynam · 2026-05-09T11:58:58Z

Summary

Adds a postStart lifecycle hook to the results-cache (MongoDB) StatefulSet that automatically initializes the replica set (rs0) when the PVC is fresh/recreated.
The hook waits for mongod to accept connections, then runs rs.initiate() only if rs.status() fails (i.e., the replica set is not yet configured). This is idempotent and safe on normal restarts.
Bumps Helm chart version to 0.3.2-dev.3.

Problem

When the results-cache pod's PVC is deleted and recreated (e.g., during disaster recovery or testing), MongoDB starts in RSGhost state because mongod.conf has replication.replSetName: "rs0" but no replica set has been initiated on the fresh data directory. All clients then fail with "No servers match selector Primary()".

Fixes #2257

Test plan

Deploy the Helm chart with a fresh PVC (delete the existing results-cache PVC first)
Verify the MongoDB pod starts successfully and transitions to Primary state
Verify clients (results-cache-indices-creator job, query workers, web UI) can connect without errors
Verify that a normal pod restart (without PVC deletion) still works correctly (idempotent hook)
Run kubectl exec into the results-cache pod and confirm rs.status() shows a healthy primary

🤖 Generated with Claude Code

Summary by CodeRabbit

Chores
- Bumped deployment version to 0.3.2-dev.3
Infrastructure Improvements
- Worker replica counts now follow the scheduling configuration for better scaling control
- Results cache now performs automatic MongoDB replica set initialization on startup for improved cache stability and reliability

Fixes #2257

…ox conflicts (fixes y-scope#2259) When running multiple compression/query worker pods, all workers registered with the same static Celery node name causing RabbitMQ pidbox queue conflicts and periodic crashes. Append $(HOSTNAME) (which Kubernetes sets to the pod name) to make each worker's node name unique. Also bumps Helm chart version to 0.3.2-dev.2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…fixes y-scope#2257) When the results-cache (MongoDB) pod's PVC is deleted and recreated, MongoDB starts in RSGhost state because it has replication.replSetName configured but no replica set initialized. This causes all clients to fail with "No servers match selector Primary()". Add a postStart lifecycle hook that waits for mongod to accept connections, then calls rs.initiate() if the replica set has not already been initialized. The hook is idempotent: on subsequent restarts where the replica set is already configured, rs.status() succeeds and initiation is skipped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-09T11:59:08Z

Walkthrough

Helm chart version bumped to 0.3.2-dev.3. Compression and query worker Deployments now source replica counts from .Values.scheduling.*.replicas. results-cache StatefulSet adds a postStart hook that polls MongoDB and auto-initializes a single-node replica set if necessary.

Changes

Helm Deployment Configuration Updates

Layer / File(s)	Summary
Chart version `tools/deployment/package-helm/Chart.yaml`	Chart version changed from `0.3.2-dev.0` to `0.3.2-dev.3`.
Worker replica value keys `tools/deployment/package-helm/templates/compression-worker-deployment.yaml`, `tools/deployment/package-helm/templates/query-worker-deployment.yaml`	Deployment `spec.replicas` references updated to `.Values.scheduling.compressionWorker.replicas` and `.Values.scheduling.queryWorker.replicas` respectively.
Results-cache MongoDB postStart init `tools/deployment/package-helm/templates/results-cache-statefulset.yaml`	Added a `lifecycle.postStart` `exec` hook that polls MongoDB with `mongosh`, runs `rs.status()`, and calls `rs.initiate(...)` to initialize a single-node replica set when needed.

Sequence Diagram(s)

sequenceDiagram
  participant ResultsCacheContainer as results-cache container
  participant Mongosh as mongosh
  participant MongoDB as MongoDB (localhost:27017)
  ResultsCacheContainer->>Mongosh: run db.runCommand('ping') loop (up to 30 attempts)
  Mongosh-->>ResultsCacheContainer: ping succeeds
  ResultsCacheContainer->>Mongosh: execute rs.status()
  alt rs.status() throws
    Mongosh->>MongoDB: rs.initiate({_id: "rs0", members: [{_id: 0, host: "localhost:27017"}]})
    MongoDB-->>Mongosh: replica set initialized
  else rs.status() OK
    Mongosh-->>ResultsCacheContainer: replica set already configured
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	❓ Inconclusive	Changes to query-worker and compression-worker deployment replicas references appear unrelated to MongoDB replica set initialization; unclear if these are necessary or accidental modifications.	Clarify whether the changes to .Values.queryWorker.replicas and .Values.compressionWorker.replicas paths are intentional and related to this PR's scope, or if they should be reverted.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: adding MongoDB replica set auto-initialization on fresh volumes, which addresses the core issue in `#2257`.
Linked Issues check	✅ Passed	The PR implements Option B from issue `#2257` by adding a postStart lifecycle hook that auto-initializes MongoDB replica set, checks replica set status idempotently, and bumps chart version as documented.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

⚔️ Resolve merge conflicts

Resolve merge conflict in branch fix/results-cache-replica-set-init

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tools/deployment/package-helm/templates/compression-worker-deployment.yaml (1)

30-91: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

$(HOSTNAME) will not be expanded — Celery receives the literal string.

Kubernetes only substitutes $(VAR_NAME) in command/args using variables declared in spec.containers[].env. HOSTNAME is set by the container runtime, not by Kubernetes, so it is not part of the substitution scope. Because the command invokes python3 directly (array format, no shell), shell expansion does not happen either. The Celery worker will end up with a literal node name compression-worker@$(HOSTNAME), which is the same constant for every pod and will not fix the pidbox conflict the PR aims to address.

Declare a downward-API env var so Kubernetes can substitute it:

🔧 Proposed fix — inject pod name via the downward API

           env:
+            - name: "HOSTNAME"
+              valueFrom:
+                fieldRef:
+                  fieldPath: "metadata.name"
             - {{- include "clp.celeryBrokerUrlEnvVar" . | nindent 14 }}

Alternatively, run the command via a shell:

           command: [
-            "python3", "-u",
-            "/opt/clp/lib/python3/site-packages/bin/celery",
-            "-A", "job_orchestration.executor.compress",
-            "worker",
-            "--concurrency", "{{ .Values.workerConcurrency }}",
-            "--loglevel", "WARNING",
-            "-Q", "compression",
-            "-n", "compression-worker@$(HOSTNAME)"
-          ]
+            "bash", "-c",
+            "exec python3 -u /opt/clp/lib/python3/site-packages/bin/celery -A job_orchestration.executor.compress worker --concurrency {{ .Values.workerConcurrency }} --loglevel WARNING -Q compression -n compression-worker@${HOSTNAME}"
+          ]

Note: The same issue affects query-worker-deployment.yaml (line 90).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/deployment/package-helm/templates/compression-worker-deployment.yaml`
around lines 30 - 91, The command currently uses a literal
compression-worker@$(HOSTNAME) which Kubernetes will not expand because HOSTNAME
isn't defined in spec.containers[].env and the array form bypasses a shell; add
a downward API env var (e.g., name POD_NAME using fieldRef fieldPath:
metadata.name) in the env block and replace the "-n",
"compression-worker@$(HOSTNAME)" entry in the command array with the pod-name
variable (compression-worker@$(POD_NAME)) so Kubernetes will substitute the
actual pod name at runtime; apply the same change pattern to the query-worker
deployment command as well.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/deployment/package-helm/templates/results-cache-statefulset.yaml`:
- Around line 43-61: The chart currently forces mongod into a replica set via
the postStart hook (see lifecycle.postStart exec invoking mongosh and
rs.initiate), which couples fresh-PVC recovery to a successful hook; change to
Option A by removing replication.replSetName and any replicaSet= URI param so
the StatefulSet's mongod runs standalone (no rs.initiate in
lifecycle.postStart), and instead document in the chart docs/values that for
multi-AZ clusters users should set volumeBindingMode: WaitForFirstConsumer or
use EFS to avoid PVC recreation across AZs; update
templates/results-cache-statefulset.yaml to drop replSet config and add the
storage/AZ guidance to the chart README/values comment.
- Around line 54-59: The rs.initiate call uses a hardcoded "localhost:27017"
which breaks stable member identity; change the mongosh --host invocation and
the members[0].host value in the rs.initiate payload to use the StatefulSet pod
FQDN (pod.service.namespace.svc.cluster.local with the :27017 port) injected via
Helm/downward API or template rendering instead of "localhost:27017", so the
rs.initiate and mongosh commands reference the stable pod DNS rather than
localhost.
- Around line 49-61: The postStart hook's unbounded wait and blanket suppression
of errors must be fixed: replace the infinite until loop around mongosh with a
bounded retry (e.g., loop with a max_attempts / timeout and exit non‑zero if not
reachable) so the container fails fast if mongod never becomes available, and
remove the global redirection/|| true on the rs.initiate() step so failures
surface; additionally change the JS catch around rs.status() to only handle
NotYetInitialized (error code 94) and explicitly print/initiate errors to stderr
(so kubernetes logs/events contain diagnostics) while ensuring rs.initiate() is
attempted only when code 94 is observed.

---

Outside diff comments:
In `@tools/deployment/package-helm/templates/compression-worker-deployment.yaml`:
- Around line 30-91: The command currently uses a literal
compression-worker@$(HOSTNAME) which Kubernetes will not expand because HOSTNAME
isn't defined in spec.containers[].env and the array form bypasses a shell; add
a downward API env var (e.g., name POD_NAME using fieldRef fieldPath:
metadata.name) in the env block and replace the "-n",
"compression-worker@$(HOSTNAME)" entry in the command array with the pod-name
variable (compression-worker@$(POD_NAME)) so Kubernetes will substitute the
actual pod name at runtime; apply the same change pattern to the query-worker
deployment command as well.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: be587f7a-838c-49d2-8eae-3b7ba813d557

📥 Commits

Reviewing files that changed from the base of the PR and between 0144808 and b5a158e.

📒 Files selected for processing (4)

tools/deployment/package-helm/Chart.yaml
tools/deployment/package-helm/templates/compression-worker-deployment.yaml
tools/deployment/package-helm/templates/query-worker-deployment.yaml
tools/deployment/package-helm/templates/results-cache-statefulset.yaml

coderabbitai · 2026-05-09T12:03:47Z

+          lifecycle:
+            postStart:
+              exec:
+                command:
+                  - "bash"
+                  - "-c"
+                  - |
+                    until mongosh --host localhost:27017 --eval "db.runCommand('ping')" \
+                        --quiet 2>/dev/null; do
+                      sleep 1
+                    done
+                    mongosh --host localhost:27017 --eval '
+                      try { rs.status(); } catch(e) {
+                        rs.initiate({
+                          _id: "rs0",
+                          members: [{_id: 0, host: "localhost:27017"}]
+                        });
+                      }
+                    ' 2>/dev/null || true


🧹 Nitpick | 🔵 Trivial | 🏗️ Heavy lift

Consider Option A (standalone mongod) per the linked issue, and document storage/AZ guidance.

Per the discussion in #2257, Option A — dropping replication.replSetName and any replicaSet= URI parameter so mongod runs as a standalone — was the recommended fix because nothing in the chart appears to require replication semantics on a single-replica StatefulSet. Option B (this PR) works, but it permanently couples every fresh-PVC scenario to a successful postStart hook. Worth confirming the trade-off is intentional; if there's no current consumer of replica-set features (oplog, change streams, multi-member reads), Option A removes a whole failure mode rather than papering over it.

Separately, the linked issue notes that AZ mismatches are what cause the PVC recreation in the first place. Adding a brief note to chart docs/values about volumeBindingMode: WaitForFirstConsumer (or recommending EFS for multi-AZ clusters) would prevent users from hitting this path repeatedly.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/deployment/package-helm/templates/results-cache-statefulset.yaml` around lines 43 - 61, The chart currently forces mongod into a replica set via the postStart hook (see lifecycle.postStart exec invoking mongosh and rs.initiate), which couples fresh-PVC recovery to a successful hook; change to Option A by removing replication.replSetName and any replicaSet= URI param so the StatefulSet's mongod runs standalone (no rs.initiate in lifecycle.postStart), and instead document in the chart docs/values that for multi-AZ clusters users should set volumeBindingMode: WaitForFirstConsumer or use EFS to avoid PVC recreation across AZs; update templates/results-cache-statefulset.yaml to drop replSet config and add the storage/AZ guidance to the chart README/values comment.

coderabbitai · 2026-05-09T12:03:47Z

+                    mongosh --host localhost:27017 --eval '
+                      try { rs.status(); } catch(e) {
+                        rs.initiate({
+                          _id: "rs0",
+                          members: [{_id: 0, host: "localhost:27017"}]
+                        });


🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider using the StatefulSet pod DNS as the RS member host.

Hardcoding localhost:27017 works for a replicas: 1 StatefulSet, but the linked issue's recommendation was to initiate with the service/pod DNS host so the member identity is stable and portable. If the chart ever changes replicas or if mongod's advertised host needs to match clients' connection string, a localhost member entry can cause subtle replication/discovery problems and forces an rs.reconfig() later.

A drop-in stable alternative for a StatefulSet pod is <pod>.<service>.<namespace>.svc.cluster.local, which can be passed in via the downward API:

♻️ Optional — use pod FQDN for the RS member

env: + - name: "POD_NAME" + valueFrom: + fieldRef: + fieldPath: "metadata.name" + - name: "POD_NAMESPACE" + valueFrom: + fieldRef: + fieldPath: "metadata.namespace"

- mongosh --host localhost:27017 --eval ' - try { rs.status(); } catch(e) { - rs.initiate({ - _id: "rs0", - members: [{_id: 0, host: "localhost:27017"}] - }); - } - ' 2>/dev/null || true + HOST="${POD_NAME}.{{ include "clp.fullname" . }}-results-cache.${POD_NAMESPACE}.svc.cluster.local:27017" + mongosh --host localhost:27017 --eval " + try { rs.status(); } catch(e) { + rs.initiate({ + _id: 'rs0', + members: [{_id: 0, host: '${HOST}'}] + }); + } + "

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/deployment/package-helm/templates/results-cache-statefulset.yaml` around lines 54 - 59, The rs.initiate call uses a hardcoded "localhost:27017" which breaks stable member identity; change the mongosh --host invocation and the members[0].host value in the rs.initiate payload to use the StatefulSet pod FQDN (pod.service.namespace.svc.cluster.local with the :27017 port) injected via Helm/downward API or template rendering instead of "localhost:27017", so the rs.initiate and mongosh commands reference the stable pod DNS rather than localhost.

…hook - Add retry counter (max 30 attempts) to the postStart until loop to prevent infinite hangs if MongoDB never accepts connections - Remove 2>/dev/null and || true from the rs.initiate() call so errors are visible in pod logs for debugging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

♻️ Duplicate comments (2)

tools/deployment/package-helm/templates/results-cache-statefulset.yaml (2)

61-68: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Narrow the catch to NotYetInitialized (code 94).

The try { rs.status(); } catch(e) { rs.initiate(...) } block still treats any rs.status() failure as "uninitialized". Now that errors are no longer suppressed and a non‑zero exit from the hook will kill the container, a transient/auth/network failure on rs.status() will trigger an rs.initiate() that can fail for a different reason (e.g., already initialized with a different config), fail the postStart hook, and put the pod into a restart loop — which is more disruptive than the original silent path. Restricting the recovery to e.codeName === "NotYetInitialized" || e.code === 94 keeps the intended fresh-PVC bootstrap and re-throws everything else.

🛡️ Proposed narrowing

                     mongosh --host localhost:27017 --eval '
-                      try { rs.status(); } catch(e) {
-                        rs.initiate({
-                          _id: "rs0",
-                          members: [{_id: 0, host: "localhost:27017"}]
-                        });
-                      }
+                      try {
+                        rs.status();
+                      } catch (e) {
+                        if (e.codeName === "NotYetInitialized" || e.code === 94) {
+                          rs.initiate({
+                            _id: "rs0",
+                            members: [{_id: 0, host: "localhost:27017"}]
+                          });
+                        } else {
+                          throw e;
+                        }
+                      }
                     '

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/deployment/package-helm/templates/results-cache-statefulset.yaml`
around lines 61 - 68, The catch in the mongosh postStart hook currently treats
any rs.status() error as "uninitialized"; change the catch body to only call
rs.initiate(...) when the error indicates NotYetInitialized by checking
e.codeName === "NotYetInitialized" || e.code === 94, and re-throw (or let the
error propagate) for all other errors so transient/auth/network failures don't
trigger an inappropriate rs.initiate() or mask real failures; update the mongosh
script around the rs.status() call accordingly to perform this conditional check
before invoking rs.initiate().

63-66: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Hardcoded localhost:27017 RS member host — prior concern still open.

The member host is still localhost:27017. As noted on the previous commit, using the pod FQDN (<pod>.<service>.<namespace>.svc.cluster.local:27017) injected via the downward API yields a stable member identity and avoids future rs.reconfig() churn if replicas ever changes or if clients expect the advertised host to match the connection string. Not a blocker for replicas: 1, but worth resolving here rather than after the chart ships a baked-in rs0 config.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/deployment/package-helm/templates/results-cache-statefulset.yaml`
around lines 63 - 66, The rs.initiate() call currently hardcodes the member host
to "localhost:27017"; change it to use the pod FQDN injected via the downward
API so the replica set member advertises a stable, cluster-scoped identity.
Locate the rs.initiate invocation and replace the member host literal with the
environment/template variable that contains
"<pod>.<service>.<namespace>.svc.cluster.local:27017" (the value provided via
the downward API), ensuring the string is injected at render/runtime so the
replica set config uses the actual pod FQDN rather than "localhost".

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@tools/deployment/package-helm/templates/results-cache-statefulset.yaml`:
- Around line 61-68: The catch in the mongosh postStart hook currently treats
any rs.status() error as "uninitialized"; change the catch body to only call
rs.initiate(...) when the error indicates NotYetInitialized by checking
e.codeName === "NotYetInitialized" || e.code === 94, and re-throw (or let the
error propagate) for all other errors so transient/auth/network failures don't
trigger an inappropriate rs.initiate() or mask real failures; update the mongosh
script around the rs.status() call accordingly to perform this conditional check
before invoking rs.initiate().
- Around line 63-66: The rs.initiate() call currently hardcodes the member host
to "localhost:27017"; change it to use the pod FQDN injected via the downward
API so the replica set member advertises a stable, cluster-scoped identity.
Locate the rs.initiate invocation and replace the member host literal with the
environment/template variable that contains
"<pod>.<service>.<namespace>.svc.cluster.local:27017" (the value provided via
the downward API), ensuring the string is injected at render/runtime so the
replica set config uses the actual pod FQDN rather than "localhost".

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5ceb5845-9ffe-459b-b119-e9cc359337d8

📥 Commits

Reviewing files that changed from the base of the PR and between b5a158e and 4475f31.

📒 Files selected for processing (1)

tools/deployment/package-helm/templates/results-cache-statefulset.yaml

junhaoliao · 2026-05-12T13:33:42Z

changes seem not related to the current PR

junhaoliao · 2026-05-12T13:33:44Z

changes seem not related to the current PR

junhaoliao · 2026-05-12T13:39:05Z

this will likely cause conflicts with the results-cache-indices-creator Job. we have been working on some fix locally over the past week. @hoophalab may coordinate the submission of the changes (either in this PR or a separate PR)

The compression-worker and query-worker node name changes belong in PR y-scope#2269, not this MongoDB replica set PR. Reverting to main. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

♻️ Duplicate comments (2)

tools/deployment/package-helm/templates/compression-worker-deployment.yaml (1)
9-9: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Changes unrelated to PR objectives and breaking change for existing deployments.

The replica count path change from .Values.compressionWorker.replicas to .Values.scheduling.compressionWorker.replicas is unrelated to the MongoDB initialization fix described in the PR objectives. Additionally, this is a breaking change that will cause deployment failures for users with existing values.yaml files that define compressionWorker.replicas at the top level instead of under scheduling.

Verify that .Values.scheduling.compressionWorker.replicas is properly defined:
#!/bin/bash
# Description: Check if the new value path is defined in values.yaml and verify migration path

# Check if old path exists in values files
echo "=== Checking for old path usage ==="
rg -n "compressionWorker:" --type yaml -A 5 -g 'values*.yaml'

# Check if new scheduling path exists
echo -e "\n=== Checking for new scheduling path ==="
rg -n "scheduling:" --type yaml -A 10 -g 'values*.yaml'

# Look for any documentation about this breaking change
echo -e "\n=== Checking for migration documentation ==="
rg -n -i "scheduling.*replicas|migration|breaking" -g 'README*' -g 'CHANGELOG*' -g 'values*.yaml' -g '*.md'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/deployment/package-helm/templates/compression-worker-deployment.yaml`
at line 9, The template change moved the replica path to
.Values.scheduling.compressionWorker.replicas which is a breaking change; update
compression-worker-deployment.yaml to accept the old top-level key as a fallback
so existing values.yaml continue to work: use the replicas field (replicas: {{
... }}) to prefer .Values.scheduling.compressionWorker.replicas but fall back to
.Values.compressionWorker.replicas (or a default) when the scheduling path is
not defined, and add a short comment referencing both keys so maintainers know
why both exist.
tools/deployment/package-helm/templates/query-worker-deployment.yaml (1)
10-10: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Changes unrelated to PR objectives and breaking change for existing deployments.

The replica count path change from .Values.queryWorker.replicas to .Values.scheduling.queryWorker.replicas is unrelated to the MongoDB initialization fix described in the PR objectives. Additionally, this is a breaking change that will cause deployment failures for users with existing values.yaml files that define queryWorker.replicas at the top level instead of under scheduling.

Verify that .Values.scheduling.queryWorker.replicas is properly defined:
#!/bin/bash
# Description: Check if the new value path is defined in values.yaml and verify migration path

# Check if old path exists in values files
echo "=== Checking for old path usage ==="
rg -n "queryWorker:" --type yaml -A 5 -g 'values*.yaml'

# Check if new scheduling path exists
echo -e "\n=== Checking for new scheduling path ==="
rg -n "scheduling:" --type yaml -A 10 -g 'values*.yaml'

# Look for any documentation about this breaking change
echo -e "\n=== Checking for migration documentation ==="
rg -n -i "scheduling.*replicas|migration|breaking" -g 'README*' -g 'CHANGELOG*' -g 'values*.yaml' -g '*.md'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/deployment/package-helm/templates/query-worker-deployment.yaml` at line
10, The change replaced the replica value path from .Values.queryWorker.replicas
to .Values.scheduling.queryWorker.replicas which is a breaking config change;
restore backward compatibility by updating the deployment template that sets
"replicas:" to accept both keys (use a conditional/fallback lookup so it prefers
.Values.scheduling.queryWorker.replicas but falls back to
.Values.queryWorker.replicas or a sensible default) and update docs/values.yaml
to document the new path if you intend to migrate users.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@tools/deployment/package-helm/templates/compression-worker-deployment.yaml`:
- Line 9: The template change moved the replica path to
.Values.scheduling.compressionWorker.replicas which is a breaking change; update
compression-worker-deployment.yaml to accept the old top-level key as a fallback
so existing values.yaml continue to work: use the replicas field (replicas: {{
... }}) to prefer .Values.scheduling.compressionWorker.replicas but fall back to
.Values.compressionWorker.replicas (or a default) when the scheduling path is
not defined, and add a short comment referencing both keys so maintainers know
why both exist.

In `@tools/deployment/package-helm/templates/query-worker-deployment.yaml`:
- Line 10: The change replaced the replica value path from
.Values.queryWorker.replicas to .Values.scheduling.queryWorker.replicas which is
a breaking config change; restore backward compatibility by updating the
deployment template that sets "replicas:" to accept both keys (use a
conditional/fallback lookup so it prefers
.Values.scheduling.queryWorker.replicas but falls back to
.Values.queryWorker.replicas or a sensible default) and update docs/values.yaml
to document the new path if you intend to migrate users.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 26e1e1d9-883a-4762-8d04-62bac4af0e99

📥 Commits

Reviewing files that changed from the base of the PR and between 4475f31 and 4ca19aa.

📒 Files selected for processing (2)

tools/deployment/package-helm/templates/compression-worker-deployment.yaml
tools/deployment/package-helm/templates/query-worker-deployment.yaml

Naman Goyal and others added 2 commits May 9, 2026 17:28

goynam requested a review from a team as a code owner May 9, 2026 11:58

coderabbitai Bot reviewed May 9, 2026

View reviewed changes

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

junhaoliao requested a review from hoophalab May 12, 2026 13:33

junhaoliao reviewed May 12, 2026

View reviewed changes

fix(deployment): Remove out-of-scope worker deployment changes

4ca19aa

The compression-worker and query-worker node name changes belong in PR y-scope#2269, not this MongoDB replica set PR. Reverting to main. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

junhaoliao force-pushed the main branch from 4505f8c to 8977316 Compare May 13, 2026 22:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(deployment): Auto-initialize MongoDB replica set on fresh volumes#2270

fix(deployment): Auto-initialize MongoDB replica set on fresh volumes#2270
goynam wants to merge 4 commits into
y-scope:mainfrom
goynam:fix/results-cache-replica-set-init

goynam commented May 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 9, 2026 •

edited

Loading

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 9, 2026

Uh oh!

Uh oh!

coderabbitai Bot May 9, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

junhaoliao May 12, 2026

Uh oh!

junhaoliao May 12, 2026

Uh oh!

junhaoliao May 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

goynam commented May 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

junhaoliao May 12, 2026

Choose a reason for hiding this comment

Uh oh!

junhaoliao May 12, 2026

Choose a reason for hiding this comment

Uh oh!

junhaoliao May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

goynam commented May 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 9, 2026 •

edited

Loading