Skip to content

feat(hf): query worker cache for downloaded model state when worker is active#3612

Open
georgi wants to merge 57 commits into
mainfrom
feat/hf-worker-docker
Open

feat(hf): query worker cache for downloaded model state when worker is active#3612
georgi wants to merge 57 commits into
mainfrom
feat/hf-worker-docker

Conversation

@georgi

@georgi georgi commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Adds a scope param (local | worker) to the huggingfaceByType tRPC procedure — when worker, the server queries the Docker worker's cache list via bridge.listCachedModels() and filters with the newly-extracted filterModelsByHfType
  • Extracts filterModelsByHfType as a standalone exported function from hf-models.ts (was a private closure inside getModelsByHfType)
  • New useWorkerCachedModels hook fetches and normalises the worker's cached model IDs; HuggingFaceModelMenuDialog and RecommendedModelsView use it to show correct downloaded state when a worker is attached
  • Wires onSelect through RecommendedModelsViewModelListItem so recommended models are selectable in the model picker dialog
  • Fixes auto-switch-to-downloads firing repeatedly; now fires at most once per dialog open via a ref guard
  • Minor: stop click propagation on HF logo link and actions container; add selectable hover styles to ModelListItem

Test plan

  • With no worker attached: model dialog shows local HF cache downloaded state as before
  • With Docker worker attached: model dialog reflects the worker's cache (models downloaded on worker show as downloaded)
  • Clicking a recommended model in the picker selects it
  • Auto-switch to downloads view fires once on open, not on every re-render
  • HF logo link no longer triggers row click/select

🤖 Generated with Claude Code

georgi and others added 30 commits June 8, 2026 10:02
Adds the approved design for deploying the Python worker (nodetool-core /
nodetool-huggingface) as a remote WebSocket worker the TS server attaches to
via NODETOOL_WORKER_URL.

- HF image layers on the core worker image + PyPI CUDA wheels (no nvidia/cuda
  base; host driver supplies CUDA).
- Local-first: compose + Makefile, host port 8787 -> container 7777, CPU-only
  validation on macOS.
- NODETOOL_WORKER_TOKEN Authorization: Bearer handshake (worker process_request
  401 gate + bridge header on connect/reconnect; unset = open).
- Records RunPod/Vast.ai constraints (direct-TCP-only, dynamic endpoint,
  no-public-exposure security, CUDA/volume/cost) for the deferred cloud iteration.
- Notes the verified server-only bridge wiring: validation runs through a
  running server, not `nodetool run`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The remote WebSocket worker can now require a shared-secret token. When
NODETOOL_WORKER_TOKEN is set, the bridge sends `Authorization: Bearer <token>`
on the handshake — on both initial connect and every reconnect (socket creation
is centralized in one _openSocket() path). Unset/empty token sends no header,
preserving the open local/dev behavior.

- PythonBridgeOptions gains workerToken; createPythonBridge threads it from
  options.workerToken ?? process.env.NODETOOL_WORKER_TOKEN.
- Tests assert the exact header value on connect and reconnect, and its absence
  when unset/empty, plus end-to-end threading via the factory.

Pairs with the worker-side process_request 401 gate in nodetool-core.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The local CLI/DSL execution path resolved nodes only from in-process TS
registries, so `nodetool run` / `nodetool workflows run` failed any Python
worker node with "Unknown node type" — only the websocket server wired the
Python bridge. This wires the same capability into the in-process runners.

- runtime: new connectPythonBridgeForGraph() (lazily connects a bridge ONLY
  when the graph has a node type no TS registry resolves; remote when
  NODETOOL_WORKER_URL is set, else local stdio; a remote connect failure is
  surfaced, a missing local worker falls through to the normal error) and
  resolvePythonNodeExecutor() (builds a PythonNodeExecutor from bridge
  metadata, mirroring the server's resolveExecutor). + unit tests.
- dsl run(): connects the bridge, adds a Python fallthrough to resolveExecutor,
  closes the bridge in finally. New RunOptions.bridgeOptions forwards custom
  PythonBridgeOptions for programmatic callers.
- cli `workflows run`: same wiring.

Now `NODETOOL_WORKER_URL=ws://host:port NODETOOL_WORKER_TOKEN=... nodetool run
graph.json` executes Python (incl. HuggingFace worker) nodes on the remote
worker; with the URL unset it uses a local stdio worker.

Verified: runtime/dsl/cli typecheck clean; 6 new helper tests; dsl 138 + cli
228 tests pass; independent adversarial review (lifecycle, transport selection,
close timing, metadata mapping) clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "validate through a server, not nodetool run" caveat is obsolete now that
the CLI/DSL runners wire the Python bridge. Validation can use the CLI directly
(nodetool workflows run graph.json) or the web UI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Manage persistent RunPod Pods (the right shape for a long-lived NodeTool
WebSocket worker) via the REST API, separate from the serverless/GraphQL
endpoint helpers in runpod-api.ts.

- runpod-rest.ts: createPod / getPod / listPods / stopPod / deletePod,
  deployWorkerPod (CPU pod, port + token), waitForPodEndpoint, and direct-TCP /
  HTTP-proxy ws-URL helpers — all on rest.runpod.io/v1. The API key is passed
  EXPLICITLY so callers source it from the per-user secret store, never
  process.env (unlike runpod-api.ts's getApiKey).
- scripts/runpod-worker.ts: a deploy/list/get/destroy driver that reads
  RUNPOD_API_KEY from nodetool_secrets (getSecret), with an env fallback for
  headless/sandboxed contexts where the keychain master key isn't reachable.

Verified end-to-end against a live RunPod account: created a CPU pod from a
GHCR worker image, ran a HuggingFace SentenceSimilarity workflow through the
CLI bridge over wss:// + NODETOOL_WORKER_TOKEN, got a 384-dim embedding back
from the pod, and terminated it — every REST op exercised live.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Vendor the runpod/skills bundle installed via `npx skills add runpod/skills`
(RunPod pod/serverless tooling) alongside the project's nodetool-* skills, with
skills-lock.json pinning them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ocker-only

Collapse the six-target server-deployment matrix to Docker self-host only and
build a GPU-worker provisioning subsystem (packages/compute) with RunPod-pod
and Vast.ai providers, profiles->instances identity, a re-pointable-bridge
attach seam, and a cost guard. Captures the audit findings (dead workflows
field, broken runpod-serverless, untested fly/railway/hf) and parks HF/ZeroGPU,
RunPod-LB-serverless, and workflow-as-job with rationale.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Instruction-only TDD plan (no code samples): server prune to docker-only,
packages/compute worker subsystem with RunPod-pod + Vast providers, DB models,
re-pointable bridge attach seam, cost-guard reaper, tRPC/CLI/UI surfaces.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… hf, runpod-serverless)

Deletes the eight non-Docker deployer source files, their tests, and the
ad-hoc runpod-worker.ts script. This is the "delete" half of an atomic
two-step change: the barrel exports, manager, config union, and CLI still
reference these modules, so `npm run typecheck` is intentionally red until
the companion commit (A2) collapses the config to Docker-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reduces AnyDeploymentSchema to the Docker variant. Deletes the RunPod, GCP,
Fly, Railway, and HuggingFace deployment/state/config schemas and their
getServerUrl/imageConfigFullName helpers from deployment-config.ts; trims the
DeploymentType enum to "docker". Simplifies state.ts revalidateState, the
manager's listDeployments/validate type branches, configure.ts (drop
configureRunPod/configureGCP), and the deploy barrel.

CLI: the deployer factory map, resolveServerUrl, buildStubDeployment, and the
`deploy add --type` choices are now docker-only; removes the dead
`list-gcp-options` command (it enumerated GCP/RunPod targets that no longer
exist). Tests and the @nodetool-ai/deploy stub updated to match, including a
new case asserting the union rejects runpod/gcp/fly/railway/huggingface.

This is the companion of the prior deletion commit; typecheck/lint/tests are
green again.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove the unused `container.workflows` schema field and the
`NODETOOL_WORKFLOWS` env var it emitted in docker-run.ts and compose.ts.
The var was written but read nowhere — "deploy a workflow" was always a
misnomer since every target ships the same generic server that reads
workflows from its DB. Also drops the matching `workflows` param from
the configure helper.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds DB-native persistence for the GPU-worker provisioning subsystem
(spec §5): worker_profiles (declarative presets, unique name) and
worker_instances (ephemeral, billing-sensitive live handles).

- Drizzle table defs in schema/workers.ts, registered in the schema
  barrel and the package index.
- A create_worker_tables migration (20260608_000000) so migrateSqliteDb
  provisions both tables; mirrored into the initDb/initTestDb schema SQL.
- Test covers table+column existence via the migration routine and the
  unique-name constraint.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add function-based CRUD accessors over the worker_profiles table:
createWorkerProfile, getWorkerProfile, listWorkerProfiles,
updateWorkerProfile (bumps updated_at), deleteWorkerProfile. Duplicate
names are rejected with a thrown Error. spec round-trips as JSON via the
Drizzle jsonText custom type. Exported from the package barrel.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Relocate the RunPod REST transport client from deploy into compute as an
internal provider transport (git mv runpod-rest.ts; drop its deploy barrel
export). Add RunpodPodProvider implementing WorkerProvider over that
transport: provision creates a pod from the spec image, polls to running,
and returns the wss:// proxy attach handoff; status maps RunPod pod states
to WorkerStatus; stop issues the real DELETE teardown; list maps live pods
to ProviderInstance[]. API key is injected via the constructor, never read
from process.env.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add WorkerManager orchestrating the profiles→instances identity model over
injectable models accessors. Profile CRUD delegates to the model layer;
provision() resolves the per-target provider, generates a worker token when
the profile's token_policy is "generate", calls provider.provision, and
persists a worker_instances row transitioning provisioning → running. stop()
tears down the provider resource and marks the instance stopped; stopAll()
stops every non-stopped instance; list()/status() read the registry and
refresh from the provider. The provider API key loads from the secret store
per target (RUNPOD_API_KEY / VAST_API_KEY) with an env fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add setTarget(url, token) to WebsocketPythonBridge so the worker-attach
flow can re-point the live bridge at a provisioned GPU worker without
restarting the process. No-op when the url is unchanged; otherwise the
current socket is torn down (rejecting in-flight requests), the new
url/token replace the (now mutable) instance fields, and a reconnect is
forced through the existing reconnect path. An empty/missing token sends
no Authorization header. Env (NODETOOL_WORKER_URL/NODETOOL_WORKER_TOKEN)
stays the constructor default.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add attach/detach/getActiveWorker to WorkerManager. attach persists the
singleton settings pointer `active_worker_instance_id`, marks the instance
`attached`, and returns the `{ wsUrl, token }` connection for the caller to
apply to the Python bridge (the manager stays free of a runtime dependency).
detach clears the pointer and reverts the instance to `running`;
getActiveWorker reads the pointer and resolves the instance.

Settings access is injected (getSetting/setSetting/deleteSetting), defaulting
to the Setting model under the local desktop user, so the lifecycle is
unit-tested with fakes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add WorkerManager.reconcile(): diff provider.list() against the DB
registry per target. Mark running/attached instances absent from the
provider's live list as stopped (died/killed out-of-band); surface
provider-live refs not tracked in the DB as orphans; return a
{ orphans, liveCount, estimatedCostUsd } summary. Already-stopped rows
are left untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a `worker` tRPC router over the WorkerManager (`@nodetool-ai/compute`):
profiles list/create/delete, instances list, provision/stop/stopAll/status/
reconcile, and attach/detach. Attach re-points the live Python bridge at the
worker's {wsUrl, token}; detach points it back at the env/stdio default.

The server bootstrap constructs one WorkerManager, injects it plus a
`repointPythonBridge` callback into the tRPC context, runs `reconcile()` on
boot, and starts the reaper loop (cleared on shutdown).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the `nodetool worker` verbs on a single WorkerManager: profile
add/list/rm, create (--profile or inline --target/--gpu/--image, with
--attach), list, status <id>, stop <id> | --all. The CLI unlocks the DB +
master key like the deleted runpod-worker.ts script did; the manager loads
provider API keys from the secret store with an env fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Vast.ai-backed WorkerProvider: searches a rentable GPU offer, launches the
worker image with the worker token forwarded as env, polls the instance to
running, and derives the direct ws://<ip>:<port> attach URL. stop() destroys
the instance (real teardown). Registered under target "vast" in the
WorkerManager provider factory.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add docs/worker-deployment.md covering the profiles→instances model,
provisioning/attaching from the CLI and UI, the cost guard (idle-stop,
TTL, orphan reconcile), secret keys (RUNPOD_API_KEY/VAST_API_KEY), and
supported targets (RunPod pod, Vast.ai) plus the manual-local-worker note.

Rewrite docs/deployment.md to the server (Docker self-host) vs worker
(rent GPU) split; delete the runpod/gcp server-deploy guides and their
sidebar/_config/llms.txt references; update the README deployment tagline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…works

The cost guard's reaper measures idle time from a worker instance's
last_activity_at, but that timestamp was only ever set at creation —
touchWorkerInstance existed and was tested but never called in production.
An attached worker was therefore reaped at created_at + idle_timeout
regardless of active node execution.

WebsocketPythonBridge now emits "activity" on every outbound frame to the
remote worker, and the server touches the active worker instance on that
event (throttled to once per 10s, failures swallowed). The bridge is the
designated heartbeat source the reaper's contract already assumed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…time

The cost guard's estimated_cost_usd was always null/0: provision() never
passed a cost when creating the instance, no provider returned one, and
ProvisionResult had no cost field. reconcile()'s aggregation and the
WorkersPanel display were therefore dead.

- Add optional costUsd to ProvisionResult.
- RunPod provider reports the pod's costPerHr; Vast provider reports the
  chosen offer's dph_total (captured during offer search).
- Pass result.costUsd into createWorkerInstance, persisting it on the
  instance for UI display and reconcile() cost aggregation.

Tests cover the provider->manager->instance flow and the reconcile sum
without the prior manual cost injection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…res fresh workers

WorkerManager.attach marked an instance attached but left last_activity_at
pinned to creation time. A GPU worker whose provision + attach outran its
profile's idle_timeout_minutes (cold starts run minutes; idle windows can be
short) carried a stale activity timestamp and had emitted no bridge frame yet,
so the next reaper pass stopped the worker the user had just attached to before
any work ran. Attaching is itself activity, so it now refreshes the idle clock
alongside the status transition.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
georgi and others added 14 commits June 9, 2026 13:14
Give workers a durable volume (HF_HOME on /workspace) sized per profile,
and a stop=pause / resume / terminate lifecycle across the RunPod-pod and
Vast providers, the WorkerManager, and the reaper (idle -> pause,
TTL -> terminate). Surface resume/terminate plus a per-provider API-key
presence check (secret store OR env, matching how provisioning resolves
keys, so the UI does not false-warn on an env key) in the workers panel.
Redesign the profile form with curated GPU/disk/idle defaults and add a
Vast.ai API key entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
List, download, and delete HuggingFace models on an attached worker,
reusing the ModelManager UI and the existing /ws/download progress
surface. Adds models.* bridge methods (listCachedModels / downloadModel /
deleteCachedModel / supportsModelManagement), scope=worker on the model
tRPC routes plus a worker download relay, and a Local<->Worker scope
toggle that trusts the worker's downloaded flags instead of the local
cache.

Protocol versioning stays additive: BRIDGE_PROTOCOL_VERSION -> 2 advertises
models.*, but the new MIN_BRIDGE_PROTOCOL_VERSION floor stays at 1 so v1
workers still connect and degrade gracefully (gated via
supportsModelManagement), and shipping does not require a new nodetool-core
release first. Worker downloads are cancellable end-to-end (the relay
passes the web's composite id as the bridge request id; cancel_download
routes to the bridge), the scope falls back to Local on detach, and a
mid-download error surfaces a single error frame.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts:
#	packages/deploy/src/index.ts
#	packages/runtime/src/python-websocket-bridge.ts
#	packages/runtime/tests/python-websocket-bridge.test.ts
#	web/src/components/panels/PanelBottom.tsx
#	web/src/stores/BottomPanelStore.ts
Promote PythonBridge to a real interface (PythonBridgeBase implements it)
and add SwappableBridge — a stable bridge reference that delegates to a
current target and re-points listeners on swap(). This gives the server a
single bridge handle whose target can follow an attached worker without
reconnecting the consumers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Attach now swaps the live Python bridge (via SwappableBridge) to the
worker's {wsUrl, token} so ALL Python work — execution and model
management — runs on the attached worker; detach swaps back to local.

Add a worker readiness probe: WorkerManager.connectionInfo() exposes an
instance's connection target read-only, and worker.health opens a
transient bridge (the same discover handshake attach uses) to report
whether the worker is actually serving. The panel polls it while a
worker is `running` and shows Booting… → Ready, gating Attach until the
worker answers — so it's clear when a booting worker is healthy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a "CPU only (no GPU)" machine option to the RunPod profile form with
a vCPU selector; the spec carries vcpu so the provider provisions a CPU
pod (computeType: "CPU"). cpuFlavorIds is left unset — RunPod auto-selects
an available flavor via cpuFlavorPriority: "availability".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nloads

Model Manager source toggle (Installed | Recommended | Hub):
- Recommended lists the curated catalog aggregated from node metadata.
- Hub searches the live HuggingFace Hub (new models.huggingfaceHubSearch
  endpoint) by text + pipeline tag; the category sidebar always shows the
  full pipeline-tag catalog so a category can be picked before searching;
  results are labeled "Top N" since the Hub paginates.
Downloads route to the attached worker whenever one is attached.

Toolbar/sidebar streamlining: standardize the toggles via a new
`segmented` ToggleGroup variant (fixed height, smaller font), drop the
redundant All/Downloaded filter and the duplicate category icon, replace
the editor-scoped NodeSelect with SelectField for the sort control, and
fix the max-size slider spacing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rker

The published worker image (built from nodetool-core) is the
`nodetool-worker` GHCR package, but the profile default and fixtures
pointed at `ghcr.io/nodetool-ai/worker` — a package that doesn't exist,
so the default never pulled. Point them at `nodetool-worker:latest`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Profiles saved before the image rename carry ghcr.io/nodetool-ai/worker —
a GHCR package that never existed, so RunPod fails the pull with "denied".
Normalize it to ghcr.io/nodetool-ai/nodetool-worker when building the
provision spec, so already-stored profiles provision correctly without a
migration or a UI rebuild. Idempotent (the bad name isn't a substring of
the correct one).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixed directly in the SQLite db instead.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
READYT is RunPod's transitional "ready to run" desiredStatus, distinct
from RUNNING. Without it mapPodStatus() fell through to `provisioning`,
so the panel showed the pod as endlessly provisioning. Map READYT → running
in both the status mapper and the provisioning poll loop.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…s active

When a Docker worker is attached, the HF model dialog now queries the
worker's cache list instead of the local FS cache. Adds a `scope` param
to the `huggingfaceByType` tRPC procedure (local | worker), extracts
`filterModelsByHfType` as a reusable exported function, and wires
`onSelect` through `RecommendedModelsView` → `ModelListItem` so
recommended models can be selected directly. Fixes auto-switch-to-downloads
firing more than once per dialog open.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 10, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
nodetool-marketing be5f583 Commit Preview URL

Branch Preview URL
Jun 10 2026, 09:04 PM

georgi commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

CI Quality Gate is failing. Please rebase onto current main and fix the build. The worker cache query feature and useWorkerCachedModels hook look well-designed, just needs green CI.


Generated by Claude Code

@georgi georgi enabled auto-merge June 11, 2026 20:02

@georgi georgi left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR goes far beyond its title — it's closer to "add full worker deployment infrastructure" than "query worker cache for downloaded model state." 153 files with +15K lines is extremely hard to review meaningfully.

Recommendations:

  1. Fix CI — Quality Gate is failing
  2. Break this into smaller PRs — compute package, deploy package, CLI commands, HF cache query, etc. Each should be independently reviewable and testable.
  3. Update the title to reflect the actual scope

The current form is too large for a thorough review. Please split and resubmit.


Generated by Claude Code

@georgi georgi left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs work before merge:

  1. Merge conflictsmergeable_state is "dirty". Please rebase onto main and resolve conflicts.
  2. CI failing — Quality gate (npm run check) failed.
  3. Scope too broad — 153 files changed / 57 commits for a model-cache query feature. The branch likely includes unrelated changes or a stale merge from main. Please isolate the HF worker cache feature into a focused PR for reviewable size.

The feature concept (querying worker cache for downloaded model state) is solid, but the PR needs a rebase and scope reduction.


Generated by Claude Code

@georgi georgi left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature looks well-scoped (HF worker cache querying). Quality Gate is failing though — please rebase on latest main (6 PRs were just merged) and fix CI. Once green, this is ready for a deeper feature review.


Generated by Claude Code

@georgi georgi left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core worker-cache feature is well-designed — tRPC backward compatibility is preserved via .default("local"), filterModelsByHfType extraction is clean, auto-switch ref guard is correct, and backend test coverage is good.

Must fix:

  1. CI Quality Gate is failing — lint/typecheck/tests not passing.
  2. PR scope is far too large — 1383 files changed with 111k+ insertions, vastly beyond the described HF worker cache feature. This branch has accumulated many unrelated changes and should be rebased/split into focused PRs.
  3. useWorkerCachedModels query key["worker-cached-models"] has no worker-id qualifier. If the active worker changes, stale cache data persists until the 30s staleTime expires. Include activeWorker?.id in the query key.

Minor:
4. The HF logo link was removed from ModelListItem entirely (replaced with plain div) rather than just stopping propagation — this changes UX (model name no longer links to HF page). Was this intentional?
5. No frontend unit tests for useWorkerCachedModels.

The feature itself is solid once scoped correctly and CI passes.


Generated by Claude Code

georgi commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator Author

Needs significant cleanup before merge:

  1. Merge conflicts — mergeable_state: dirty
  2. Quality Gate failing
  3. 153 files / 57 commits — scope creep; the described feature (querying worker cache) shouldn't require this many changes. Consider rebasing, squashing, and splitting unrelated changes.
  4. Likely conflicts with feat(model-menu): fold recommended downloads into the model list #3698 (model menu, now merged) — the model picker changes overlap.

Needs: rebase onto current main, resolve conflicts, squash commits, and get CI green.


Generated by Claude Code

@georgi georgi left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has 153 files changed (15K+ additions, 10K+ deletions) for what should be a focused HF worker cache feature. The enormous diff suggests the branch accumulated unrelated changes or diverged significantly from main. CI Quality Gate is failing.

Recommend cleaning up the branch to contain only the relevant HF worker cache changes, or splitting into smaller PRs. Rebase needed before merge.


Generated by Claude Code

@georgi georgi left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has 153 files and 57 commits with CI failing. 11 days old with no code updates since Jun 11. Likely has merge conflicts. Needs a rebase and CI fix before merge.


Generated by Claude Code

georgi commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Review: Needs splitting

This PR has grown to 153 files, 57 commits, and +15K/-10K lines. The original HF worker cache feature is buried under unrelated changes spanning packages/deploy, packages/compute, packages/models, packages/runtime, packages/websocket, packages/cli, and docs.

Issues:

  • Scope creep: The PR mixes a feature (worker cache querying) with infrastructure changes (deploy, compute, docs rewrites). At this scale, effective code review is impractical.
  • CI failing: App Tests fail, which is a red flag given the breadth of changes.
  • Merge conflicts: Branch has diverged significantly from main.

Recommendation: Split this into smaller, focused PRs:

  1. Deploy/docs infrastructure changes
  2. Compute package changes
  3. The actual HF worker cache feature (with useWorkerCachedModels hook and model picker UI)

Each would be reviewable, testable, and mergeable independently.


Generated by Claude Code

@georgi georgi left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The described feature (HF worker cache querying) is reasonable, but this PR is too large to review confidently: 153 files changed, 25k+ lines of churn across 57 commits. The diff exceeds GitHub's rendering limit. Please rebase on main and consider splitting into smaller, independently reviewable PRs.


Generated by Claude Code

@georgi georgi left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is too large to review safely: 153 files changed, 57 commits, 15,000+ lines added, 10,500+ lines deleted. The diff exceeds GitHub's 20k-line rendering limit.

While the feature (querying worker cache for downloaded model state) is well-scoped in the description, the actual changeset is far broader — 57 commits accumulated over 15 days suggests unrelated changes got folded in.

Recommendation: Break this into smaller, focused PRs:

  1. Extract filterModelsByHfType as a standalone function (pure refactor)
  2. Add the scope param to huggingfaceByType tRPC procedure + bridge integration
  3. useWorkerCachedModels hook + UI changes in the model picker dialog
  4. Fix the auto-switch-to-downloads repeated firing
  5. Minor click-propagation / hover style fixes

Each would be reviewable in isolation and mergeable incrementally. As-is, the risk of regression is too high to merge without a very thorough manual review.


Generated by Claude Code

@georgi georgi left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has been open since June 10 and CI is red (quality gate failure from June 11). The branch is significantly behind main at this point. Needs a rebase onto current main and CI must pass before this can be merged.


Generated by Claude Code

@georgi georgi left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has several issues that need addressing:

  1. Merge conflicts — the branch is in a dirty mergeable state and needs a rebase onto main.
  2. CI failing — the quality gate is failing.
  3. Scope creep — the PR description mentions HF worker cache querying, but the diff touches 153 files with 15,326 additions and 10,546 deletions across 57 commits. This is far beyond the described feature scope and makes review very difficult. Consider splitting this into smaller, focused PRs.

The described feature (querying worker cache for downloaded model state) sounds useful, but the PR needs to be rebased and its scope tightened before it can be reviewed and merged.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant