Skip to content

feat(cli): honor --catalog/OUTPUT_CATALOG_ID across workflow commands#283

Draft
bnchrch wants to merge 1 commit into
mainfrom
ben/angry-villani-20acd5
Draft

feat(cli): honor --catalog/OUTPUT_CATALOG_ID across workflow commands#283
bnchrch wants to merge 1 commit into
mainfrom
ben/angry-villani-20acd5

Conversation

@bnchrch

@bnchrch bnchrch commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Problem

output workflow start/run <workflow> <scenario> could block ~30s before the (fast) workflow actually started.

  • The scenario resolver ran a catalog preflight against the API default catalog (GET /workflow/catalog) — not the catalog the command executes against — even though start/run already send the correct catalog in the POST body.
  • When that default catalog has no worker polling it (common in worktrees / shared local dev), the lookup waited on Temporal's ~30s query deadline before falling back to convention.
  • workflow test and workflow dataset generate had the same inconsistency: test routed its per-dataset runs, eval run, and eval-registration preflight to the default catalog; dataset generate resolved scenarios against it. Neither accepted --catalog.

Solution

  • Resolver — thread an optional catalog through resolveInputresolveScenarioPathfetchWorkflowPathfetchWorkflowCatalog, which already routes to GET /workflow/catalog/:id when given a catalog. Convention-based fallback is unchanged.
  • start / run — pass the resolved flags.catalog into scenario resolution (POST body was already correct).
  • test — add --catalog (env OUTPUT_CATALOG_ID); route the eval-registration preflight, each per-dataset run, and the eval run to it.
  • dataset generate — add --catalog (env OUTPUT_CATALOG_ID); route scenario resolution and the workflow run to it.
  • All flags reuse the existing char: 'c' + env: OUTPUT_CATALOG_ID block from list/start/run, so OUTPUT_CATALOG_ID=X behaves identically to --catalog X. list, start, run, test, and dataset generate now share one rule: explicit --catalog, then OUTPUT_CATALOG_ID, then API default.

Test plan

  • npm run lint passes
  • tsc --noEmit (CLI) passes
  • CLI test suite — 699 pass; new/updated coverage in scenario_resolver.spec.ts, run.spec.ts, start.spec.ts, and new test_eval.spec.ts / dataset/generate.spec.ts assert the catalog reaches GET /workflow/catalog/:id and the run/eval request bodies
  • Manual smoke (needs a multi-catalog stack): with a worker polling a worktree catalog but none on the API default, OUTPUT_CATALOG_ID=os-workflows npx output workflow start gemini_test default shows GET /workflow/catalog/os-workflows in the API log and no ~30s stall

Follow-up (out of scope)

  • 2 pre-existing failures in copy_assets.spec.ts stem from the native-copyfiles copy-assets build step (options.exclude must be a function) leaving dist/templates/* unpopulated. Unrelated to this change — worth a separate ticket.
  • resolveWorkflowDir() shares the same default-catalog fetchWorkflowPath call but resolves locally first in practice, so it doesn't hit the stall; left as-is to keep this change minimal.

Closes OUT-491

Thread the resolved catalog through scenario resolution and add --catalog
to `test` and `dataset generate`, so start/run/test/dataset generate use
the same catalog rule as `list`. Removes the ~30s default-catalog preflight
stall in worktrees where the default catalog has no worker.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant