Skip to content

Reduce per-request annotation workflow latency (post-LSP follow-up) #147

@neuromechanist

Description

@neuromechanist

Context

After #144 / #146 landed the persistent hed-lsp client, the per-request "Initializing annotation workflow..." gap dropped from 20-60s to ~10-12s. The remaining time is dominated by the LLM-based keyword extraction inside _semantic_preprocess_node (src/agents/workflow.py), not by the LSP call itself.

Measurement in prod (docker exec hedit python ...):

  • HedLspClient.spawn_stdio + initialize: 0.52 s
  • One batched hed/suggest with 5 queries: 0.46 s
  • Same call cached: 0.01 s

So LSP is fine. The slow piece is self.feedback_llm.ainvoke(...) in _extract_keywords. In prod that LLM is wired to the evaluation model (qwen/qwen3.6-35b-a3b @ wandb), which is ~7-10 s for a "extract 5 nouns" task. Even claude-haiku-4.5 takes ~7-9 s when extended thinking is on by default.

A keyword extraction call with claude-haiku-4.5 + thinking: disabled + max_tokens=200 runs in ~1 s with identical-quality output.

Sub-issues

  • B. Use fast LLM (thinking disabled) for keyword extraction. Single-PR fix.
  • C. Run semantic_preprocess in parallel with the first annotate LLM call so the perceived pre-annotate window goes to ~0. Folds hints in on retry if they arrive in time. Separate PR after B lands.

Acceptance

  • Wall time between request start and "Entering annotate node" log line drops from ~10 s to <2 s for typical descriptions (server warm, hed-lsp child already spawned).
  • Annotation quality unchanged (same first-attempt valid rate over a 20-prompt benchmark).

Metadata

Metadata

Assignees

No one assigned

    Labels

    component: agentsRelated to LangGraph agentspriority: highHigh priority - important for upcoming releasetype: performancePerformance improvements

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions