Proj/byterover tool mode#663
Conversation
…rser Lands the M1 keystone task: element vocabulary, per-element validators, data-driven registry, and HTML parser wrapper. No behavior change yet — no consumer wires the registry until T3 (curate path) and T4 (query path). Element vocabulary (5 elements): - bv-topic — root container; required `path`, optional importance/maturity/ recency/updatedat (lowercase per HTML5 normalization) - bv-rule — rule statements (severity, id) - bv-decision — decision records (id) - bv-bug — bug runbook entries (id, severity) - bv-fix — fix runbook entries (id) Architecture: - Data-driven `ELEMENT_REGISTRY` is the production-track guardrail. M2 vocabulary expansion is purely additive — new elements register via one entry + one schema/validator pair, no consumer code changes. - `makeAttributeValidator` factory eliminates per-element boilerplate; reduces M2's 12 additional element validators to one-line bindings. - HTML parser is parser-library-agnostic at the API boundary (`DocumentNode` / `ElementNode` / `TextNode`). parse5 used internally; consumers don't see parse5 types. - Attribute-case normalization is documented in `ElementNode` JSDoc: HTML5 lowercases attribute names at parse time, so schema keys must match parser output (lowercase), not source HTML (camelCase). Tests (106 new): - 5 per-element validator suites (61 cases total) — each ≥10 cases - Registry shape + wiring + metadata (14 cases) - HTML parser: basic parsing, malformed-input handling, walkElements, getInnerText, serializeHtml round-trip (22 cases) - End-to-end round-trip on a sample fixture covering all 5 elements (9 cases) Verification: - typecheck clean - lint 0 errors (242 warnings, all pre-existing baseline) - 7715 tests passing, 0 failing - build clean Adds parse5@^8.0.1 as a runtime dep — the W3C-spec HTML parser used by jsdom; types ship in-tree (no @types/parse5 needed). Justified vs hand- rolled parsing because robust HTML tag-soup handling is non-trivial and this is the standard library for the job.
- bv-topic schema: tighten recency regex to ^\d+(\.\d+)?$ (rejects '.', '..1', '1..2' which previously slipped past the regex), drop redundant Number.isFinite for style consistency with importance, accept timezone offsets in updatedat via {offset: true}
- html-parser: replace `as unknown as Parse5DocumentFragment` cast with parse5 defaultTreeAdapter factories (createDocumentFragment, createElement, createTextNode, appendChild) — no more structural casts in the serialize path
- html-parser: getInnerText now inserts a separator between sibling element children so compact emit (<p>foo.</p><p>bar.</p>) does not merge tokens for BM25
- html-parser: document <template> content limitation (parse5 places template children in .content, not childNodes)
- schema docstrings + test names: replace 'warn-only behaviour' with 'parse-and-skip' (no warning is actually emitted in M1)
- sample-topic round-trip: add explicit case-normalization regression test for updatedAt → updatedat
feat: [ENG-2737] HTML render layer foundation — element registry + parser
Rewrite the canonical curate tool description (loaded by ToolDescriptionLoader and presented to the agent on every curate operation) from the structured JSON-args shape (rawConcept/narrative/facts/etc.) to the M1 HTML-emission contract using the closed `<bv-*>` vocabulary defined by T1. Coverage: - All 5 elements (bv-topic, bv-rule, bv-decision, bv-bug, bv-fix) with full per-element attribute schemas: required/optional, value enums, format constraints (importance integer 0-100, recency numeric 0-1, ISO-8601 updatedat, severity enums per element, etc.). - Allowed-children semantics (any/inline/block) + pairing convention (bug+fix as siblings). - Path format (slash-separated snake_case) + domain guidelines. - Output contract: HTML only, exactly one bv-topic root, no preamble/fences/ commentary, lowercase attribute names per HTML5 spec, closed vocabulary, no invented attributes, no clarifying questions. - Three worked examples (bug+fix runbook, rule+decision pair, general topic) to anchor the model's element-pairing and id-naming conventions. Add a sanity test (test/unit/server/infra/render/curate-prompt.test.ts) that loads curate.txt and asserts every registered ELEMENT_NAME, every per-element enum value, and every output-contract directive is present. Guards against silent drift when M2 expands the vocabulary or refactors attribute schemas. Authoring fluency check (M1 T2 spike, separate harness in local-auto-test/curate-fluency/, not in this repo): 20/20 generated outputs validated against the M1 element registry on Sonnet 4.5. Gate cleared on first full run after one prompt-tuning iteration. Decision: proceed to T3. Full report: research repo at features/html-memory-conversion/milestones/01-experiment/fluency-report.md
Field-by-field review of curate.txt against the actual rendered .md
context files (frontmatter + ## Reason + ## Raw Concept + ## Narrative
+ ## Facts) surfaced two issues with the initial T2 draft:
1. <bv-topic> carried runtime-signal attributes (importance, maturity,
recency, updatedat) that were explicitly migrated to a sidecar store
under research/features/runtime-signals/ (per-user, per-machine state
that caused noisy `vc status` diffs and team-share conflicts). Drop
them from the schema; replace with the actual frontmatter fields the
markdown writer renders: title (required), summary, tags, keywords,
related.
2. The 5-element vocabulary covered Rules + three M1 net-new elements
(Decision/Bug/Fix) but left every other rendered MD section as
free-form HTML — losing the structural mapping the writer needs.
Expand M1 vocabulary to 16 elements (added 11), each mapping to a
specific section in the rendered .md output:
- bv-reason → ## Reason
- bv-task, → ## Raw Concept (Task / Changes / Files / Flow)
bv-changes,
bv-files,
bv-flow
- bv-structure, → ## Narrative (Structure / Dependencies /
bv-dependencies, Highlights / Examples / Diagrams);
bv-highlights, bv-rule already covered Rules
bv-examples,
bv-diagram (with type + title attrs for verbatim diagram preservation)
- bv-fact (subject/category/value attrs) → ## Facts list
Each element follows the existing makeAttributeValidator pattern; the
data-driven registry stays the single source of truth.
Tests:
- Update bv-topic test to drop runtime-signal cases and assert the
new required-attributes (path + title); add a regression case
asserting passthrough still tolerates legacy importance/maturity/
recency without enforcing them.
- Add registry test asserting (a) bv-topic exposes the new
optionalAttributes, (b) runtime signals are NOT in the registry
metadata, (c) every registered validator accepts its own empty node
+ rejects mismatched-tag.
- Consolidate the 9 attribute-free new elements into one shared
text-only-elements.test.ts (same trivial schema; per-element
duplication wouldn't add coverage).
- Add dedicated bv-diagram.test.ts (type enum) and bv-fact.test.ts
(category enum, subject/value pass-through).
- Update sample-topic.html fixture to exercise every new element +
the new frontmatter attributes; assert no runtime signals leak in.
- Update curate-prompt.test.ts to assert the expanded vocabulary,
category/type enums for bv-fact/bv-diagram, and that the prompt
explicitly excludes runtime signals from bv-topic attributes.
Re-run the M1 T2 fluency check on the same 20 fixtures with the new
vocabulary: 20 / 20 valid (gate cleared); 100% cohort-appropriate
placement (bug+fix → bv-bug+bv-fix; rule+decision → bv-rule+bv-decision;
general → bv-reason); 122 bv-fact instances extracted across the run
(~6 per fixture). Fluency report updated:
research/features/html-memory-conversion/milestones/01-experiment/fluency-report.md
Open contract violation persists: 18/20 outputs (90%) wrap in code
fences despite explicit prompt instruction. Recommendation for T3
unchanged — strip in the response parser, don't iterate the prompt.
Second writer audit (markdown-writer.ts:270-354) found 3 rawConcept sub-fields without an HTML home: rawConcept.timestamp, rawConcept.author, rawConcept.patterns. Add the missing elements so the M1 vocabulary covers every section the writer renders. Vocabulary additions (16 → 19 elements): - <bv-timestamp> — text-only; renders as `**Timestamp:**` under `## Raw Concept`. Distinct from frontmatter createdAt/updatedAt (system-set) — this is the date the concept's data represents. - <bv-author> — text-only; renders as `**Author:**`. - <bv-pattern> — structured (flags + description as attrs, pattern as text content). Multiple siblings inside <bv-topic> collected into the `**Patterns:**` bullet list. Mirrors how <bv-fact> works. Sanitiser: - stripCodeFenceWrapper(html) added to html-parser.ts. Strips a single outer ``` ... ``` (any language tag) wrapper from the LLM response. Inner fences (e.g., <pre><code> blocks inside <bv-diagram>) survive. T3's curate executor calls this before parseHtml. Defensive sanitisation generalises better than chasing the model's quirk via prompt iteration — observed fence-wrap rate is 70% on Sonnet 4.5 even with explicit prompt instruction to the contrary. Tests: - bv-pattern dedicated test (bv-pattern.test.ts) for the structured attrs. - text-only-elements.test.ts extended for bv-timestamp + bv-author (same trivial schema as the other text-only elements). - html-parser.test.ts: 7 new cases for stripCodeFenceWrapper covering ```html, ```xml, no-language ```, leading/trailing whitespace, unwrapped passthrough, mismatched-fence passthrough, inner-fence preservation, and parse-after-strip end-to-end. - sample-topic-roundtrip.test.ts: renderable-MD coverage assertion extended to require all 15 dedicated body-section elements (was 12). - sample-topic.html fixture exercises the 3 new elements. Re-ran the M1 T2 fluency check on the same 20 fixtures with the 19-element vocabulary. Results: - 20 / 20 valid (gate cleared) - 100% cohort-appropriate placement (unchanged) - bv-timestamp 60% adoption (used where input has a clear date) - bv-author 10% (used only on inputs that name an owner) - bv-pattern 0% — model correctly did not over-emit; no fixture contained regex content. Restraint is good behaviour. - Cost: $0.61 (vs. $0.60 for the 16-element draft; negligible) Updated fluency-report.md in research repo with final coverage matrix and the 19-element → markdown-writer mapping for T3.
…Query/CurateLogEntry Add per-call token usage (input, output, cached, cache-creation), latency tiers (searchMs, llmMs, totalMs), and format mode (markdown vs html) to QueryLogEntry and CurateLogEntry. Token rollup uses a per-task TaskUsageAggregator subscribed to llmservice:usage events on the agent bus; LoggingContentGenerator emits the events on both streaming and non-streaming paths. AiSdkContentGenerator folds result.usage and providerMetadata.anthropic.cacheCreationInputTokens into a synthesized rawResponse so per-provider extraction lands consistently across Anthropic, OpenAI, and Google. Live-validated against 30 curates × 3 providers; all four token fields populate end-to-end with cacheCreationTokens correctly Anthropic-only.
[ENG-2738] HTML-emission contract in curate.txt + fluency check
- core/interfaces no longer reaches into infra: CurateUsageRecord moved to core/domain/entities/curate-log-entry, IUsageAggregator added in core/interfaces/telemetry/, TaskUsageAggregator now `implements` it. - Consolidate curate telemetry assembly inside CurateExecutor: runAgentBody invokes reportTelemetry on both happy and error paths; agent-process passes onTelemetry that forwards via task:curateResult. Removes the duplicated inline assembly in agent-process. - QueryExecutor.buildTiming prefers TaskUsageAggregator.getLlmMs() over the executeOnSession wall-clock so llmMs counts only LLM-call time, not tool-execution overhead. Falls back to wall-clock for tests. - CurateLogHandler error finalization merges telemetryFields(state.usage) so failed curates don't underreport cost when the executor's best-effort error-path reportTelemetry has populated state.usage. - OOD short-circuit in QueryExecutor routes through formatDetector.detect([]) instead of returning format: undefined, so future HTML-aware detectors don't get silently bypassed. - DRY: extract makeUsageListener helper in agent-process and reuse for both curate and query handlers. - Document that PROVIDER_TYPES iteration in LoggingContentGenerator only matches 'aiSdk' in production (all live traffic flows through AiSdkContentGenerator).
…ocation Re-review flagged the gap that error-path reportTelemetry being silent could mask a future regression where failed-curate telemetry disappears from disk. Pin both directions of the contract: - happy path: onTelemetry fires once before runAgentBody returns - error path: onTelemetry fires once before the throw propagates so the daemon can still emit task:curateResult ahead of task:error
feat: telemetry producer for QueryLogEntry + CurateLogEntry
…or wiring Wires the M1 HTML context-tree experiment into the curate flow. When the new BrvConfig.useHtmlContextTree flag is enabled, curate emits a <bv-topic> HTML document using T1's element registry (19 elements after T2's expansion) and atomically writes it to .brv/context-tree/<path>.html. When the flag is off (the default), the existing markdown curate path runs unchanged. Rolled-in T2 review fixes (PR #628 auto-review): - curate.txt: split the "Standard HTML inside <bv-*>" clause per allowed-children semantics. Inline-only subset (code/strong/em) for bv-rule, bv-task, bv-flow, bv-fact, bv-pattern, bv-timestamp, bv-author. Full block subset for bv-topic and the block-content elements. Removes the M1-passthrough/M2-strict drift the reviewer flagged. - curate.txt: prefix the examples block with a disclaimer that the fences are for prompt readability only and the agent's actual output must be bare HTML. Lower-cost mitigation for the model's stubborn fence-wrap habit (T2's stripCodeFenceWrapper still handles the remaining cases). - curate-prompt.test.ts: enum-value tests now anchor to the section after each element header instead of whole-file string match. Catches "severity moved from bv-bug to bv-decision" drift. - curate-prompt.test.ts: new test that parses every fenced example block from curate.txt with parseHtml and runs each <bv-*> element through its registered validator. Catches example typos (severity="hihg") AND vocabulary drift in one assertion. T3 proper: BrvConfig: - New optional boolean field useHtmlContextTree (default undefined == false). Plumbed through the JSON validator, withReviewDisabled, withVersion, and withSpace withers. withoutSpace already used spread, so it picks up the new field automatically. 8 new BrvConfig unit tests cover round-trip + wither preservation. format-detector (src/server/infra/render/format/format-detector.ts): - Two pure functions: getFormatForWrite(config) reads the flag, getFormatForRead(filePath) inspects the extension. Default markdown. 12 unit tests cover both branches plus edge cases (case-insensitive extensions, directory-named-foo.html with .md inside, etc.). html-writer (src/server/infra/render/writer/html-writer.ts): - Public surface: validateHtmlTopic(html) (pure validation, no I/O) and writeHtmlTopic({contextTreeRoot, rawHtml}) (validation + atomic write). Pipeline: 1. stripCodeFenceWrapper (T2-provided) on raw input. 2. parseHtml + walkElements; require exactly one <bv-topic> and a non-empty path attribute. 3. Run every typed bv-* element through its registered validator. Reject on any failure with structured errors. 4. Atomically write via DirectoryManager.writeFileAtomic to <contextTreeRoot>/<path>.html. - Path-traversal hardening: backslashes normalised, `..` segments rejected, leading slashes stripped, resolved path must stay under contextTreeRoot. 15 unit tests cover validity branches, fence variants, atomic-write success/failure, and traversal attempts. curate executor wiring: - CurateExecuteOptions gains useHtmlContextTree?: boolean. - runAgentBody branches the runtime prompt's final-step instruction: HTML mode tells the agent its FINAL RESPONSE is the bv-topic document and explicitly forbids tools.curate calls (which would write a sibling .md file). MD mode keeps the existing "verify via result.applied[].filePath" instruction unchanged. - After agent.executeOnSession returns, HTML mode routes through a new handleHtmlCurateResponse() that calls writeHtmlTopic and shapes a CurationStatus from the result (success/failed). MD mode continues to call parseCurationStatus. - 6 new unit tests cover: valid topic written, fence-wrapped response stripped, missing bv-topic → failed, invalid attribute → failed, flag=false uses MD path, flag=undefined uses MD path. agent-process plumbing: - The daemon's curate dispatcher reads cachedBrvConfig.useHtmlContextTree and forwards it on every CurateExecuteOptions call. Default false remains production-safe; the bench's Configuration B sets it to true via .brv/config.json. Tests: 7892 pass, 0 fail. Typecheck clean, lint clean, build clean. Out of scope (per the M1 plan): - Storage path migration to .brv/render/ — M2. - IR (ComponentIR) compilation — M2. - Multi-tier query / structural selectors — T4. - to-injection/to-evidence/to-summary/to-md adapters — M2. - Migration tool for existing .brv/context-tree/ — M2.
HTML is the curate format; there is no MD/HTML branching to control. Removing the flag simplifies the executor (one prompt, one response handler), the BrvConfig schema (no per-project toggle), and the format-detector (no write-side decision — only read-side dispatch remains for legacy MD files). Code changes: - BrvConfig: remove the optional `useHtmlContextTree` field, the JSON validator branch, all wither plumbing, and the dedicated test block. - ICurateExecutor: remove `useHtmlContextTree` from CurateExecuteOptions. - CurateExecutor: collapse the prompt branch — every curate runs the HTML-emission instruction. Always route the agent's response through `handleHtmlCurateResponse`. Remove the dead `parseCurationStatus` method (the JSON-status-block path is gone). - agent-process: drop the `useHtmlContextTree` plumbing on the curate dispatch site. - format-detector: drop `getFormatForWrite` (no decision to make); keep `getFormatForRead` so the query/search read path can still route legacy `.md` files transparently. Drop the BrvConfig import. Test changes: - BrvConfig: drop the 8-case `useHtmlContextTree` describe block. - format-detector: drop the 4 `getFormatForWrite` cases. - curate-executor-html-mode: drop the two MD-fallback cases (no flag to toggle); the file is now strictly HTML-emission tests. Comment cleanup (separate concern, same commit): Strip milestone scaffolding (M1, M2, T1–T7) from comments across the render layer. Comments now describe logic and decisions, not the project plan that scaffolded them. Specifically: - "M1 light validation; ADR-007 §13 strict validation is M2" → "Light validation; strict validation per ADR-007 §13 is future work" - "M1 element registry"/"M1 vocabulary" → "element registry"/"closed bv-* vocabulary" - "T3 wires the writer", "T4 owns the index", etc. → describe the consumer by role, not by milestone task - "M1 keeps the existing .brv/context-tree/ storage" → describe what the resolver encodes, not which milestone made the decision - Test descriptions: "(parse-and-skip — M1 light validation)" → "(parse-and-skip — light validation)" Pipeline: - 7878 tests pass (lost 14 with the dropped flag — 8 BrvConfig + 4 format-detector write + 2 executor MD-fallback). - Typecheck clean, lint clean, build clean.
End-to-end smoke testing surfaced a gap: the executor's runtime prompt
told the agent to emit HTML as its final response, but the base system
prompt (system-prompt.yml) had ~500 lines of detailed guidance for the
old structured-args curate flow (UPSERT/ADD/UPDATE/MERGE/DELETE with
rawConcept/narrative/facts content shape). The system prompt's
specifics overrode the runtime instruction; the agent kept building
JS data structures via code_exec and never emitted HTML, so the writer
saw nothing to validate and no .html file landed.
Rewrite the curate-related sections of system-prompt.yml to match the
HTML-emission contract:
- Update <TOOL_QUICK_REFERENCE>, <WORKFLOW_SUMMARY>,
<PATH_TITLE_CLARIFICATION>, <PROHIBITED_BEHAVIORS>, <DIRECTORIES>,
and the <SANDBOX_PROGRAMMING> curate example to instruct HTML
emission. The agent's final response IS the bv-topic document; do
NOT call tools.curate; do NOT wrap in code fences.
- Replace the entire ## Curation Workflow section (~500 lines, lines
275–786) with a tighter HTML-focused workflow (~217 lines):
- Keep the recon → mapExtract → dedup → groupBySubject preprocessing
helpers (still useful for chunked/large inputs).
- Document the bv-topic frontmatter (path/title required; summary,
tags, keywords, related optional) and explicitly call out that
importance/maturity/recency/updatedat are sidecar runtime signals,
not authored.
- Map MD sections to bv-* elements (Reason → bv-reason, Raw Concept
sub-fields → bv-task/bv-changes/bv-files/bv-flow/bv-timestamp/
bv-author/bv-pattern, Narrative subsections → bv-structure/
bv-dependencies/bv-highlights/bv-rule/bv-examples/bv-diagram,
Facts → bv-fact, runbooks → bv-decision/bv-bug/bv-fix).
- Document inline-content vs block-content elements and the standard
HTML allowed inside each.
- Keep facts extraction guidance, content quality requirements,
temporal preservation, and content preservation rules — translated
to the bv-* element vocabulary.
- Drop the operation-type table (UPSERT/ADD/UPDATE/MERGE/DELETE),
Two-Part Context Model section, domainContext/topicContext/
subtopicContext (these belonged to the structured-args flow that
no longer exists).
- Update the query-side temporal-reasoning hint to reference
bv-timestamp content + sidecar runtime signals instead of
rawConcept/narrative timestamps.
- Update the related-topics search hint to reference the `related`
attribute on bv-topic alongside the legacy `relations` frontmatter
on .md topics.
Net change: 196 insertions, 494 deletions. The rewritten prompt is
~217 lines shorter and aligned with the actual emission contract in
curate.txt.
Verified end-to-end via local-auto-test/curate-html-e2e/ harness:
- 3/3 fixtures (bug-fix, rule-decision, general-context) curate to
registry-valid HTML topic files (Sonnet 4.5).
- File paths land at <projectDir>/.brv/context-tree/<path>.html as
expected.
- Per-fixture latency: 19.3s, 30.5s, 27.3s.
Pipeline: 7878 unit tests pass, lint clean, typecheck clean, build
clean.
…e time
The previous markdown writer rendered `createdAt` and `updatedAt` in
the YAML frontmatter of every topic file. The HTML writer was dropping
both — the LLM doesn't author them and the writer wasn't injecting
them either, so HTML topics had no temporal metadata on disk.
Inject both as `<bv-topic>` attributes during the writer's atomic
write, after fence-stripping and registry validation:
- `updatedat` is always set to the current ISO-8601 instant.
- `createdat` is preserved from the existing file on disk if one
exists at the resolved path; otherwise it is set to the current ISO
instant. Re-writes leave `createdat` alone, advance `updatedat`.
- Any value the LLM authored for these attributes is overridden — the
agent is not allowed to choose its own timestamps.
Implementation: surgical regex edit on the bv-topic opening tag (no
parse → re-serialize round-trip) so the LLM's formatting (whitespace,
attribute order, quoting style) survives intact. Two helpers added to
html-writer.ts:
- setBvTopicAttributes(html, attrs) — replace-or-insert each attr.
- readExistingTopicAttribute(filePath, name) — surgical read of an
attribute value from an existing on-disk topic without parsing the
whole document.
Tests:
- Existing assertions about file content equaling VALID_TOPIC updated
to match-with-injected-timestamps (body preserved verbatim, bv-topic
opening tag now has createdat/updatedat).
- Three new cases under "system-managed timestamps":
1. First write injects both timestamps within the test window.
2. Re-writes preserve createdat, advance updatedat.
3. LLM-authored values for either attribute are overridden by the
system value.
Verified end-to-end via local-auto-test/curate-html-e2e/ — all 3
fixtures produce HTML topic files with system-injected createdat and
updatedat ISO-8601 timestamps on the bv-topic root, alongside the
LLM's frontmatter (path/title/summary/tags/keywords/related).
Pipeline: 18 html-writer unit tests pass (3 new), full suite green,
typecheck clean, lint clean.
CI test failure (curate-executor-html-mode.test.ts:79) — the file-content
assertion still expected the LLM's raw HTML; now matches the
timestamp-injected file by checking the body is preserved verbatim and
that createdat/updatedat are present. Same fix in the fence-stripping
test variant.
Review feedback addressed:
1. Telemetry format hardcoded to 'markdown' (curate-executor.ts:485).
The flag is gone; HTML is the only curate format. Updated to emit
'html' and reworded the doc comment. Updated the regression test in
curate-executor.test.ts:491 to assert 'html' accordingly.
2. validateHtmlTopic does not catch path-traversal (html-writer.ts).
The doc comment promised "safe to use the topic" but a payload like
`<bv-topic path="../../../etc/passwd">` returned `{ok: true}` —
path-traversal was only caught later when topicPathToFilePath threw.
That works end-to-end (executor catches and surfaces as failed
status), but is a foot-gun for any future caller using
validateHtmlTopic standalone (preview, dry-run).
Move the path-segment safety check into validateHtmlTopic and
surface as a new structured error kind 'unsafe-path'.
topicPathToFilePath keeps path.resolve as defence-in-depth. Added
two unit tests covering `..` segments and single-`.` segments;
updated the existing executor-side test to expect a validation
failure (with `unsafe-path` errors) instead of a thrown exception.
3. Attribute-validation errors swallowed the `field` discriminant
(curate-executor.ts:344). The writer captures which attribute
failed (`severity`, `id`, etc.) but the executor's
verification.missing string only included the tag. So a malformed
`severity="urgent"` was logged as `attribute-validation
(bv-rule): …` rather than `attribute-validation (bv-rule.severity):
…`. Updated the mapping to include the field for
attribute-validation errors. Non-blocking but improves bench-log
debuggability.
Reviewed but obsolete (not actioned):
- "format-detector.ts ships an unused getFormatForWrite" — removed in
the flag-drop commit f701cd2.
- "agent-process.ts double-coerces useHtmlContextTree" — also obsolete
(flag dropped).
Pipeline: 7883 unit tests pass, lint clean, typecheck clean, build
clean. E2e smoke (curate-html-e2e/) green on the rebuilt CLI.
[ENG-2739] curate path — HTML writer + format detector + executor wiring
…ch routing
Wires the search/query path to read HTML topic files transparently
alongside markdown ones. Curate emits HTML; the legacy MD path stays
in for `brv swarm` and existing trees that haven't migrated. The BM25
index is format-agnostic (HTML inner-text decodes to the same plain
text the markdown tokenizer expects); the indexer dispatches on file
extension at read time.
New modules:
- src/server/infra/render/reader/html-reader.ts — readHtmlTopicSync(html)
returns {bodyText, elements, topicAttributes}. bodyText is the
entity-decoded inner text, ready to feed BM25. elements is a flat,
document-order list of every typed `<bv-*>` node with its attribute
map. topicAttributes lifts the bv-topic frontmatter (path, title,
summary, tags, keywords, related). Forgiving on malformed input
(parse5 returns a best-effort tree).
- src/server/infra/render/reader/element-axis-index.ts — in-memory
ElementAxisIndex. Two query keys:
findByTag(tag) → paths
findByAttribute(tag, attr, value) → paths
Reverse path→keys map keeps invalidation O(keys-touched) instead of
full-scan. Built in lockstep with the BM25 index; same lifetime,
same invalidation triggers. Today's consumer is the optional
`elementHint` filter on SearchOptions; the structural-selector
grammar will hook in here without further reshaping.
Search service changes (search-knowledge-service.ts):
- File discovery: glob extended from `**/*.md` to `**/*.{html,md}`.
- Indexing: new readIndexableContent() helper dispatches on
getFormatForRead(filePath). HTML files route through readHtmlTopicSync
(the BM25 input is bodyText, not raw markup) and contribute their
element list to the structural-axis index. MD files index their raw
content as before.
- IndexedDocument gains a `format: 'html' | 'markdown'` field, set
from the file extension at index time and surfaced on every
SearchKnowledgeResult entry. Telemetry/query-log consumers
(T5-shipped QueryLogEntry.format) populate from this without
format-detection at the call site.
- CachedIndex gains `elementAxisIndex: ElementAxisIndex`, populated
during buildFreshIndex from the per-document elements walk. The
empty-tree fallback also constructs an empty index so callers can
treat the field as non-nullable.
- SearchOptions gains `elementHint?: ElementHint`. When present, the
search composes a MiniSearch filter that intersects scope +
element-axis matches before BM25 ranking. No caller passes the hint
today; M2 selector grammar is the consumer.
- INDEX_SCHEMA_VERSION bumped to 6 so previously-cached indexes built
against MD-only schema rebuild on next query.
- Title resolution: HTML files use the bv-topic title attribute; MD
files retain the existing extractTitle frontmatter walk.
- The structural-selector hint is also exposed on the public
SearchKnowledgeOptions as `SearchKnowledgeElementHint` — wired
through ToolsSDK so SDK consumers can pass it without going through
the internal SearchOptions surface.
Tests:
- html-reader (9 cases): bodyText extraction, entity decoding,
attribute lift, elements-list document order + de-duplication,
unknown-bv-* skipping, missing bv-topic graceful behaviour,
malformed input non-throw.
- element-axis-index (10 cases): findByTag/findByAttribute, empty-array
for unknown tags, case-sensitivity on values, single-file multiple-
contribution dedup, remove() drops all keys + cleans empty key sets,
remove() on unknown path no-op, clear().
- search-knowledge-service-html-routing (8 cases): HTML files indexed,
format field populated for both HTML and MD, raw markup is NOT
searchable (inner-text only), bv-topic title attribute lifts as
doc title, elementHint pre-filter (no-match returns empty,
tag-only restricts, tag.attr=value restricts further).
Pipeline: 7910 unit tests pass (27 new), typecheck clean, lint clean
(0 errors), build clean.
Out of scope (per the M1 plan):
- Structural selector grammar — M2 (this PR ships the substrate).
- findReferences reverse-graph — M2.
- Signal-precise cache invalidation — mtime-based today is sufficient.
- BM25 ranking parity test (MD vs HTML same content) — M2 bench
measures end-to-end; the routing tests above cover the
determinism boundary.
- HTML BM25 input now concatenates `<bv-topic>` summary/tags/keywords/related
into the indexable content. Without this, a term living only in those
attributes ranked HTML topics far below the markdown equivalent (where
YAML frontmatter is fed verbatim into BM25).
- HTML title fallback uses `?.trim() || fallbackTitle` so a `title=""` or
whitespace-only attribute doesn't leak into BM25's 3x-boosted title field.
- `ElementHint` (service + tools-sdk) becomes a discriminated union so
`attribute` and `value` must be supplied together at compile time.
- `ElementAxisIndex` storage swaps the stringly-keyed
`${tag}.${attr}=${value}` table for nested Maps — eliminates the
`.`/`=`-in-attribute collision class without a delimiter discipline.
- `html-reader` lifts attributes off the FIRST `<bv-topic>` encountered
unconditionally rather than the first non-empty one (matches the
contract on malformed input).
- Glob discovery does two parallel passes (`**/*.html`, `**/*.md`)
instead of relying on the underlying engine to expand `{html,md}`.
Tests: +7 (BM25 attribute-payload parity, title-fallback, FS-backed
reader round-trip, topic-attribute lift contract, attribute-name/value
collision robustness x3). All 7917 pass; lint + typecheck clean.
[ENG-2740] query path — HTML reader + element-axis index + search routing
…sponse
Tier 2 (`brv query`'s direct-search response — no LLM) reads the full
file at `query-executor.ts:701` and ships the bytes back to the user
through `formatDirectResponse`. For HTML topics that's raw
`<bv-topic ...><bv-rule severity="must">...</bv-rule>...` markup.
Two consequences observed during the M1 multi-tier audit:
1. The 5000-char content budget in `direct-search-responder.ts:11`
burns on tags and attribute syntax before reaching meaningful
text. Long topics get truncated mid-element.
2. Any LLM-shaped consumer downstream (Tier 4 agent re-reads, the
bench harness, third-party integrations) re-parses the HTML.
Wasted tokens; lossy semantics.
Stripping every tag (bodyText only) loses the typed-element signals
the format exists to carry — severity on rules, subject/value on
facts, id on decisions. New `renderHtmlTopicForLlm(html)` walks the
parsed tree and renders each `<bv-*>` element with a semantic prefix:
- `<bv-rule severity="must" id="r-1">x</bv-rule>` →
`- **Rule** [must] (r-1): x`
- `<bv-fact subject="alg" value="RS256">…</bv-fact>` →
`- **Fact** (subject=alg, value=RS256): …`
Wired into `tryDirectSearchResponse`: when `result.format === 'html'`
the full content goes through the renderer; markdown topics pass
through verbatim. Renderer is forgiving on malformed input (mirrors
the rest of the parse5-driven reader pipeline); on any throw we fall
back to raw bytes so a single bad topic can't blank the response.
Tests: +15 (12 renderer, 2 Tier 2 HTML branch, 1 Tier 2 markdown
regression guard). 7931 pass; lint + typecheck + build clean.
fix: Tier 2 direct response renders HTML topics into structured markdown
Post-HTML-migration, every QueryLogEntry on disk reports `format: "markdown"` regardless of whether the recall touched HTML or markdown topics. Telemetry consumers (query-log summaries, analytics dashboards, the brv-bench harness) cannot distinguish HTML vs markdown traffic — every chart flat-lines as "all markdown" even on a fully migrated tree. Root cause: T5 (ENG-2741) introduced `MarkdownOnlyFormatDetector` as a deliberate stub with the explicit comment "swap when the format-detector task lands the real extension-aware detector." T3 (ENG-2739) shipped `format-detector.ts` for read-routing inside `search-knowledge-service` but did not swap the executor's binding. The stub stayed wired as the default in `QueryExecutor`. This change: - Adds `ExtensionAwareFormatDetector` — inspects each `matchedDoc.path` extension and returns `'html'` if any HTML doc was retrieved, `'markdown'` for legacy-only recalls, `undefined` for empty recalls (cache hit, OOD short-circuit, tier 4 LLM-only). - Single-`'html'` policy is deliberate: any HTML doc is the load-bearing signal for telemetry rollups in a partially-migrated tree. Reporting `'markdown'` for a mixed result would hide HTML traffic from cost / coverage dashboards. - Strips shared-source `[alias]:` prefixes before extension parsing so shared-source paths classify the same as local. - Defaults to `'markdown'` for extensionless paths (defensive). - Wires as the default `formatDetector` binding in `QueryExecutor`. - Keeps `MarkdownOnlyFormatDetector` for tests that pin pre-migration semantics; updates its docstring + comment to reflect the new role. Tests: +9 (extension-aware detector full coverage). 7940 pass; lint runs through the pre-existing submodule config issue, otherwise clean.
ByteRover-provider runs produce QueryLogEntry records with no
`inputTokens` / `outputTokens` / `cachedInputTokens` /
`cacheCreationTokens` even though Tier 3 calls take seconds of llmMs.
Telemetry consumers cannot compute per-query cost.
Root cause: T5's telemetry pipeline depends on
`GenerateContentChunk.rawResponse` being set on the terminating
chunk so `LoggingContentGenerator.generateContentStream()` can
capture the last non-undefined `rawResponse` and feed it to
`pickRawUsage()` (`.usage ?? .usageMetadata`). The AI-SDK content
generator wires this on its `'finish-step'` case but
`ByteRoverLlmHttpService.extractContentFromResponse()` (the simulated
stream backing every `ByteRoverContentGenerator` call) yielded chunks
without ever attaching the backend response. Result: `lastRawResponse`
stays undefined, no `llmservice:usage` event fires, no tokens land on
the log entry.
Fix: attach the full `GenerateContentResponse` as `rawResponse` on the
terminating chunk in both branches of `extractContentFromResponse` —
empty-content and full-content paths. The `pickRawUsage` helper
already handles the Gemini-shaped (`usageMetadata`) and Claude-shaped
(`usage`) payloads the ByteRover backend returns through this code
path, and the existing `extractUsage('google'|'anthropic')` mappers
take it from there.
Tests: +3 (terminating-chunk rawResponse coverage across content,
empty-parts, and function-call termination cases). 7943 pass.
The companion prompt loaded into every curate-mode invocation
(`curate-detail-preservation.yml`, auto-discovered by
`companion-contributor.ts:discoverCompanionPrompts('curate')` against
the `curate-*.yml` glob) still instructed the agent to "store in
narrative.rules / narrative.highlights / narrative.diagrams /
narrative.structure / narrative.dependencies / narrative.examples /
rawConcept.flow / content.facts / snippets" — every one of which is a
JSON-schema field on the pre-T3 curate-tool API that no longer exists.
T3 (ENG-2739) dropped `useHtmlContextTree` and made HTML the sole
emission format, but T2 (ENG-2738, the HTML-emission contract) only
updated `curate.txt` and missed this companion file.
Empirically observable on the proj/html-mem-conversion branch: a
LoCoMo-bench curate against `conv_26/session_1` produced a `<bv-topic>`
with one giant `<bv-rule>` element wrapping a markdown narrative —
the agent saw "store in narrative.rules" and treated `<bv-rule>` as
the closest match. Other sessions in the same run emitted clean
typed-element shapes; the variance correlates with whether the
legacy companion prompt dominated that turn's context.
This change rewrites the file so every preservation rule maps to the
correct `<bv-*>` element from the closed vocabulary
(`bv-rule`, `bv-task`, `bv-fact`, `bv-decision`, `bv-bug`, `bv-fix`,
`bv-flow`, `bv-diagram`, `bv-highlights`, `bv-structure`,
`bv-dependencies`, `bv-examples`, `bv-pattern`). General preservation
principles (completeness over conciseness, exact preservation, never
summarize) are kept verbatim. Adds an explicit guard at the end of
the prompt telling the agent NOT to use `narrative.*` / `rawConcept.*` /
`content.*` JSON-schema field names so any residual model bias toward
the pre-migration API surfaces is suppressed.
Tests: no test fixture asserts on this file's contents (it's a runtime
agent prompt, exercised via end-to-end curate runs). YAML still parses;
full suite green at 7943.
Two items from the review pass on this branch:
1. `extractContentFromResponse` had three terminating-chunk paths;
only two received `rawResponse` in the original fix. The third
branch — `candidates` array missing or empty (line 193) — fires
when Gemini blocks output via safety filter or Claude returns a
refusal-shaped response. Both providers populate `usageMetadata`
/ `usage` on those responses, so dropping `rawResponse` here
silently loses billable tokens on every safety-block /
refusal turn. Trivial completion: forward `response` from this
branch too, matching the other two. New test pins the contract.
2. JSDoc rewrite on `stripSharedAlias` to justify why the helper is
retained. Today `path.extname` happens to handle
`[alias]:rel/path.html` correctly (the colon lives in the dirname,
not the basename), so the strip is defensive rather than
load-bearing. The helper stays for two reasons noted in the
docstring: keeping the detector independent of
`getFormatForRead`'s internals (so a future switch to URL-style
parsing doesn't silently break the alias case), and making the
alias-stripping branch testable in isolation. Prevents a future
reader from deleting it as apparent dead code.
Tests: +1 (candidates-empty terminating-chunk rawResponse contract).
7944 pass.
fix: extension-aware format detector + byterover tokens + curate prompt
…aceholder
First milestone of byterover-tool-mode: ByteRover stops being an agent
for curate/query and becomes a tool the user's primary agent calls. The
user's LLM (Claude Code, Cursor, etc.) supplies completions; byterover
validates and persists. No provider config inside byterover.
This commit ships the wire-protocol surface for the curate side. The
orchestrator behind it is a placeholder — TKT 02 lands the real state
machine; TKT 03 fills in real prompts; TKT 04 authors the SKILL.md
that drives the loop from Claude Code.
Wire contract (documented at docs/curate-protocol.md):
Kickoff (env-gated):
BRV_CURATE_TOOL_MODE=1 brv curate "<text>" --format json
→ {status: needs-llm-step, sessionId, step: generate-html, prompt}
Continuation (--session implies tool mode):
brv curate --session <id> --response "<text>" --format json
→ {status: done, filePath} | {status: failed, errors[]}
Without the env var, brv curate runs today's legacy agent-driven path
unchanged — fully additive. The env var is the temporary opt-in; TKT 02
replaces it with a BrvConfig field as part of the real orchestrator.
Placeholder semantics:
- Kickoff always returns one `needs-llm-step` with a stub prompt
- Continuation always returns `done` on first valid response
- State on disk at `.brv/sessions/curate-<id>/state.json` (TKT 02
moves into daemon task-session sandbox vars when search-service
integration lands)
- Empty response → `failed` with kind=empty-response (session kept
live for retry); unknown sessionId → `failed` with
kind=unknown-session (terminal)
Tests: 12 placeholder unit tests covering kickoff, continuation
success/failure paths, on-disk state lifecycle, and envelope-contract
guarantees per the protocol doc. Manual round-trip smoke verified the
CLI happy path end-to-end.
…-, doc Four items from the code review: - Helper API redesign: `assertNoRemovedFlags` (which threw a bare `Error`) is now `findRemovedFlagMessage` returning `string | undefined`. Caller drives the response — JSON envelope when `--format json` is set, `this.error(...)` otherwise. Yields the canonical oclif `› Error:` rendering instead of a raw stack-leaking throw. - JSON-mode envelope: both `brv curate --format json` and `brv query --format json` now emit a structured rejection envelope on a removed flag instead of dumping a plain-stderr line. Detected with a new `argvRequestsJsonFormat(argv)` helper that mirrors the recognised forms (`--format json`, `--format=json`). - `--` terminator: the scan now stops at `--`, matching POSIX convention. Same applies to the JSON-format detector. Lets users on permissive-parse `Query` pass a literal query that contains a hyphen-prefixed token without hitting a false-positive rejection (e.g. `brv query -- what does --timeout do`). - Stale SKILL.md `-f` / `--files` references: line 639 (the "File access" paragraph that described the removed --files flow) and lines 658–660 (error-table rows for `Maximum 5 files allowed` / `File type not supported`) replaced with a single row pointing agents at the new migration-error message shape. The other three doc files were already clean. Tests - 15 unit cases for the helper (was 8; +7 for `--` terminator, --flag=value, and argvRequestsJsonFormat). - Full suite: 8521 passing, 16 pending, 0 failing. - Typecheck clean. Lint 0 errors / 262 pre-existing warnings.
refactor: [ENG-2880] drop legacy flags from brv curate / brv query
|
Claude finished @danhdoan's task in 3m 52s —— View job Re-review of PR #663 after the ENG-2880 batch (PR #687)Two PR-specific commits since Score: 4.85 / 5 — net improvement from 4.75/5. The longest-standing live finding on this PR is now closed; only cosmetic / defer-defensible items remain. Ready to merge. What changed since
|
| Commit | Scope |
|---|---|
c3f17aa |
ENG-2880 — drops legacy -d/-f/--detach/--timeout from brv curate and --timeout from brv query. Adds shared src/oclif/lib/removed-flags.ts (findRemovedFlagMessage + argvRequestsJsonFormat + CURATE_REMOVED_FLAGS + QUERY_REMOVED_FLAGS). Updates SKILL.md, brv-instructions.md, workflow.md, command-reference.md to match. |
c4c4a52 |
ENG-2880 review fix — JSON envelope on rejection (was a bare throw), this.error(...) instead of stack-leaking throw new Error(), -- terminator honored, scrubs SKILL.md lines 639 + 658-660 of stale -f / --files references. Bumps the helper's test count from 8 → 15. |
Strengths
- Helper-shape is well-chosen.
findRemovedFlagMessagereturningstring | undefinedlets the caller drive the response — curate usesemitToolModeEnvelope(structured{errors: [{kind: 'removed-flag', message}], ok: false, status: 'failed'}), query useswriteJsonResponsewith its existing{error, status}shape — and either falls back tothis.error(message, {exit: 1})for the human path. Much better than a barethrow new Error(...)which would have stack-leaked through oclif's parser into--format jsoncallers' stdout. --terminator +--flag=valueform. Line 40 stops scanning at the standard POSIX terminator, which fixes thebrv query -- what does --timeout dofalse-positive on permissive-parse (relevant becauseQueryisstrict = false). Line 42 matches the=form viastartsWith('${f}='), and the=separator naturally prevents--folder-extraprefix collisions.- Symmetric scan-then-parse. Both commands run the scan before
await this.parse(...), so the migration message surfaces under strict and permissive parse modes alike. Curate (strict) wouldn't have parsed the flag anyway; query (permissive) would have silently swallowed it as positional — both now reject coherently. - Doc cleanup is complete.
SKILL.mdlines 639 + 658-660 (the "File access" paragraph and three error-table rows forMaximum 5 files allowed/File does not exist/File type not supported) are gone, replaced with a single migration-error row.brv-instructions.md/command-reference.md/workflow.mdlikewise scrubbed. The agent-side description no longer instructs agents to use removed flags. - Test design. 15 unit cases cover
undefinedon no-match, long-form match, short-alias match,--flag=valueform, first-match short-circuit,---terminator (both "stops scanning" and "still matches BEFORE the terminator"), and the newargvRequestsJsonFormatdiscriminator. The "matches BEFORE--" assertion is the load-bearing case — without it, a user typingbrv curate --detach -- "ok"could slip a removed flag through.
Issues
1. nit (new, stale docstring) — useHtmlContextTree reference in CurateLogEntry.format doc
src/server/core/domain/entities/curate-log-entry.ts:57-61 still says:
Format mode of the curate output. `'html'` when `useHtmlContextTree` is
on, else `'markdown'`. Settled at task start, not derived from output.
But the PR description explicitly lists useHtmlContextTree as a removed flag — HTML is now the only curate output format. The 'markdown' literal is only retained for back-compat with log entries persisted before the format migration. Also a stray *. on line 60. One-block JSDoc rewrite (suggestion posted inline). Non-blocking.
2. observation (carry-over, non-blocking) — error messages still reference removed brv providers connect/switch commands
Confirmed still standing across:
src/oclif/lib/daemon-client.ts:36, 38, 41, 181-182, 274-277src/oclif/commands/dream.ts:144src/tui/utils/error-messages.ts:17, 18, 21src/server/infra/daemon/task-validation.ts:26, 36
brv dream is still the live reachable path. Deferred across multiple reviews; the team's pattern has been post-merge polish. Now that ENG-2880 has knocked down the longest carry-over, this is the next-oldest live item. Fix this →
3. observation (carry-over, non-blocking) — <existing-topic> block still inlines system-managed timestamps verbatim
src/server/core/domain/render/curate-prompt-builder.ts:119-123 unchanged. OUTPUT_CONTRACT (line 168) tells the model not to emit createdat/updatedat, but the inlined existing-topic visibly shows them. With tool-mode-dream MERGE candidates now exercising the UPDATE → correction-loop path under meta.previousSummary, worth a one-regex strip pass before the prompt builder shape settles permanently.
4. observation (carry-over, non-blocking) — topic-loader.ts regex parser inconsistent with index-generator.ts parse5
src/server/infra/dream/tool-mode/topic-loader.ts:43, 110-155 still uses BV_TOPIC_RE + parseAttributes, while src/server/infra/context-tree/index-generator.ts:208-222 uses parse5-backed parseHtml / walkElements for the same task. Entity-decoded title attributes diverge (& in the regex path stays literal). Latent — topics are writer-produced today — but worth aligning.
5. observation (carry-over, non-blocking) — CLI ConsoleLogger emits chalk on stderr under --format json
src/oclif/lib/curate-session.ts:380 (sidecar-bump) and :423 (index regen) still construct new ConsoleLogger() unconditionally. dream.ts:33 handles this with format === 'json' ? new NoOpLogger() : new ConsoleLogger(). Cheap to plumb format through ContinueOptions.
Praise
- The two-commit shape.
c3f17aadoes the surface removal + helper;c4c4a52does the four polish items from the synchronous code review (JSON envelope,this.error,--terminator, doc residue). The split keeps each diff focused and reviewable — no "address review feedback" mega-commit. Even the migration sentences read well at human scale: "Pack the folder content into the<bv-topic>HTML directly before calling brv curate" tells the agent what to do, not just what's removed. - Tests track the helper's invariants, not its implementation. Cases assert on
--short-circuit behavior,--flag=valuematching, and first-match priority — the user-observable contract — rather than internal control flow. Future contributors can refactor the loop without breaking the test suite if the semantics hold. - Catch on the
--terminator. The original commit (c3f17aa) didn't include the--terminator handling; it was added in the review-fix commit (c4c4a52) to handlebrv query -- what does --timeout do. Permissive-parse onQuerymakes this a real ergonomic concern, and the fix is one line in the helper plus one line in the format detector. - Doc/code parity. When the agent-side description still mentioned
-f/--files, an LLM would have happily emitted those flags into the nextbrv curatecall and gotten the new migration error — a confusing loop. Scrubbing both the command and the SKILL.md it ships closes that gap end-to-end.
Test plan suggestions
- Optional: integration test that
brv curate "ctx" --detach --format jsonemits the structured rejection envelope on stdout and exits non-zero. Would also lock thatsuccess: falsesurvives thewriteJsonResponsewrapper. - Optional:
brv query -- --timeout 600(positional content containing a removed flag token) succeeds — the--terminator path needs an end-to-end smoke test, not just the unit-level helper test.
Final disposition
LGTM. ENG-2880 is high-quality follow-up — clean helper shape, complete doc parity, 15-case test coverage with the load-bearing -- and --flag=value cases explicit. Carry-over items are all known and have consistently been polished post-merge.
Todo
- Identify what changed since
4e84552 - Spot-check the ENG-2880 surface removal + helper + tests
- Verify outstanding carry-over items from prior reviews
- Post inline comments + final summary with score
· Branch:proj/byterover-tool-mode
| export function findRemovedFlagMessage(argv: string[], removed: RemovedFlag[]): string | undefined { | ||
| for (const token of argv) { | ||
| if (token === '--') return undefined | ||
| for (const {flags, migration} of removed) { | ||
| const matched = flags.find((f) => token === f || token.startsWith(`${f}=`)) | ||
| if (matched) return formatRejection(matched, migration) | ||
| } | ||
| } | ||
|
|
||
| return undefined | ||
| } |
There was a problem hiding this comment.
praise: Closes the longest-standing finding from this PR (carry-over across 7 prior reviews). The helper-shape is well-chosen:
- Returns
string | undefinedso the caller owns the response shape (JSON envelope vsthis.error(...)) — much better than the originalthrow new Error()shape, which would have stack-leaked through oclif and broken--format jsonparsers. --terminator is honored (line 40), which fixes thebrv query -- what does --timeout dofalse-positive case for permissive-parse.--flag=valueform is matched correctly viastartsWith('${f}='), and the=separator naturally prevents--folder-extraprefix collisions.- The error message names the literal token the user typed AND surfaces a migration sentence, not a generic "removed" line.
The 15-case test file (including the ---terminator cases at lines 45-51 and 71-73) locks the contract. Nice end-to-end execution on a finicky bit of CLI ergonomics.
| /** | ||
| * Format mode of the curate output. `'html'` when `useHtmlContextTree` is | ||
| * on, else `'markdown'`. Settled at task start, not derived from output. | ||
| *. | ||
| */ | ||
| format?: 'html' | 'markdown' |
There was a problem hiding this comment.
nit (stale doc): The docstring says 'html' when useHtmlContextTree is on, else 'markdown', but the PR description explicitly removes the useHtmlContextTree flag — HTML is now the only curate output format. Two consequences:
- The condition is no longer expressible: there's no flag to check. New writes are always
'html'. - The
'markdown'branch is only set for the back-compat read path (markdown topics still parse viaMarkdownWriter), which is the reader concern — but this field tags curate outputs.
If format can still legitimately be 'markdown' (e.g. legacy operations replayed from older log entries), say so explicitly. Otherwise, simplify to 'html' literal and drop the optional. Either way, the useHtmlContextTree reference here will mislead the next contributor doing log-entry archaeology.
| /** | |
| * Format mode of the curate output. `'html'` when `useHtmlContextTree` is | |
| * on, else `'markdown'`. Settled at task start, not derived from output. | |
| *. | |
| */ | |
| format?: 'html' | 'markdown' | |
| /** | |
| * Format of the curate output. Tool-mode curate always writes HTML | |
| * (`<bv-topic>`); the `'markdown'` literal is retained for back-compat | |
| * with log entries persisted before the format migration. Settled at | |
| * task start, not derived from output. | |
| */ | |
| format?: 'html' | 'markdown' |
Also: stray *. on line 60 inside the docblock — minor JSDoc lint.
Tool-mode subcommands (`brv dream scan / finalize / undo / sessions /
cancel`, shipped in ENG-2858) cover the same workflow deterministically
— daemon enumerates candidates, agent decides via `brv curate`, daemon
archives via `finalize`. No provider required. With tool-mode as the
project default the legacy LLM-driven entry surface is orphaned, so
remove it cleanly.
`brv dream` (no subcommand) is now a thin topic root that prints a
migration hint. `--force`, `--timeout`, `--undo` on the topic root are
gone; users hit `brv dream undo` instead. The `buildUndoDeps` export
stays here since the `undo` subcommand imports it.
Removed
- `src/oclif/commands/dream.ts` LLM-driven `run()` body (was a 200-line
block dispatching dream tasks through the daemon). Topic root keeps
the `buildUndoDeps` helper for the `undo` subcommand.
- `case 'dream'` in `agent-process.ts` — daemon-side LLM dispatch.
- `AgentIdleTimeoutPolicy.onAgentIdle` dream-eligibility branch and
`preDispatchCheck` for `task.type === 'dream'` in `brv-server.ts`.
The auto-dispatch had no provider-config guard, so on tool-mode
projects it would silently try to dispatch and fail — removal closes
that latent issue. Idle agents are now just cleaned up.
- `src/server/infra/executor/dream-executor.ts` + its test.
- `src/server/infra/dream/dream-trigger.ts` + its test.
- `src/server/infra/dream/operations/{consolidate,synthesize,prune}.ts`
+ their tests. Tool-mode subcommands use `src/server/infra/dream/
tool-mode/*` instead.
- `src/server/infra/dream/parse-dream-response.ts` +
`dream-response-schemas.ts` + their tests — no consumer after the
operations files are gone.
- `src/oclif/lib/timeout-deprecation.ts` + its test — only consumer
was the legacy dream command.
Trimmed
- `test/commands/dream.test.ts` rewritten as a 1-case smoke that
asserts the new topic-root behaviour (prints the subcommand listing).
- `test/unit/infra/runtime-signals/sidecar-failure-logging.test.ts`
dropped the `dream operations` describe block (consolidate / prune
sidecar warnings). Other describe blocks (curate-tool, archive
service, manifest, search, commit-6 wiring) keep their coverage.
Kept
- `src/server/infra/dream/dream-undo.ts`, `dream-state-service.ts`,
`dream-lock-service.ts`, `dream-log-store.ts` — LLM-agnostic services
reused by the tool-mode pipeline.
- `src/server/infra/dream/tool-mode/*.ts` — the subcommand pipeline.
- `src/oclif/commands/dream/{scan,finalize,sessions,cancel,undo}.ts`.
Verified
- npm run typecheck: clean
- npm run lint: 0 errors (256 pre-existing warnings)
- npm test: 8337 passing, 16 pending, 0 failing (was 8521 — net -184
from the dead test deletions; matches the deleted file inventory).
Migration
- `brv dream` (no subcommand) now prints "Use a subcommand: brv dream
{scan|finalize|undo|sessions|cancel}" and exits 0.
- `brv dream --force` / `--timeout` / `--undo` removed. Use the
subcommands.
- Background dream is no longer daemon-scheduled; the agent decides
when to run `brv dream scan` (e.g. post-curate burst, on session
start). The v2 model is agent-driven cadence.
…lose schema/post-work straggler gates
Two should-fix items + comment cleanup from the reviewer:
1. DREAM_REMOVED_FLAGS was exported and unit-tested but never consumed.
`brv dream --timeout 30` printed the topic-root hint instead of the
migration message that names --timeout. Wired
`findRemovedFlagMessage(this.argv, DREAM_REMOVED_FLAGS)` into the
topic root with the same JSON-envelope / this.error branching the
curate and query commands use.
2. Dropped 'dream' from the transport schemas + post-work gate so any
straggler `type: 'dream'` task that still hits the daemon (older
client, stale queue, internal stray producer) fails at the schema
boundary instead of silently completing with result = ''.
- TaskTypeSchema + TaskExecuteSchema type enums: 'dream' removed.
- TaskExecuteSchema.force field: removed (dream was the only consumer).
- TaskExecuteSchema.trigger enum: 'agent-idle' removed (idle-trigger
dispatch is gone). 'cli' and 'manual' remain.
- TaskCreateRequestSchema.force field: removed.
- agent-process.ts post-work gate: dropped `type === 'dream'`.
- task-router.ts: dropped the orphan `data.force` plumbing.
Cosmetic
- task-router.test.ts: swapped the `type: 'dream'` fixtures (used to
exercise the generic preDispatchCheck plumbing) to `type: 'curate'`.
The tests stay — they cover the gate mechanism, not dream-specific
logic.
- review.ts description: dropped the stale `brv dream needsReview`
bullet. Tool-mode dream-finalize writes `needsReview: false`
unconditionally; the toggle has no dream-side effect to describe.
- Cleared stale `dream-executor` comment refs in curate-session.ts,
dream-state-service.ts, agent-process.ts, and two test files.
- Topic-root exit code: kept exit 0 (matches oclif topic-root default)
with a clarifying code comment noting the deliberate choice.
Kept
- `'agent-idle'` in dream-log-schema.ts — backwards-read compat for
existing dream-log entries on disk. No new entries will carry that
trigger; existing files still parse.
Verified
- npm run typecheck clean
- npm run lint 0 errors (256 pre-existing warnings)
- npm test: 8337 passing, 16 pending, 0 failing
Three FE-breaking output classes addressed by prompt updates across the 7 LLM-visible curate surfaces (SKILL.md, curate-prompt-builder, MCP tool description, registry, curate.txt, system-prompt.yml, curate-detail-preservation.yml): - <li> content must be plain text — no leading -, *, •, or 1./2. markers (the renderer adds them via CSS, so a literal prefix double-bullets) - <bv-diagram> body goes directly inside the element with HTML entities for <, >, & — no <![CDATA[…]]> wrap (HTML5 parses CDATA as a bogus comment and the first --> in mermaid closes it, corrupting the source) - `related` distinguishes files from folders by suffix: file targets end in .html, folder/domain targets stay bare. Replaces the prior "no .html extension" guidance in SKILL.md. Also adds a read-only related-ref warner that resolves each ref against .brv/context-tree/ and surfaces warnings for broken (neither file nor folder exists) and ambiguous (both exist) refs. Surfaced via a new optional `warnings` field on the curate done envelope (both CLI tool-mode and daemon curate-html-direct paths). The warner never mutates the attribute and never rejects writes — refs stay advisory so legitimate forward references (target curated later) still ship. Fourth issue from the ticket (<bv-index> description attribute) is deferred to a follow-up because index.html is server-generated and needs an end-to-end config + capture UX, not a prompt or content validator change.
…lope, race-guard stat calls Two real issues from review: - MCP curate path was dropping warnings. CurateHtmlDirectResult gained an optional `warnings?: readonly string[]` field and renderEnvelope now appends ⚠ lines under the ✓ confirmation when present, matching the CLI text-mode + JSON envelope behavior. - The warner used existsSync + statSync, which is race-prone — a target deleted between the two calls would throw and bubble out of writeHtmlTopic, turning a successful curate into a reported failure even though the topic is already on disk. Switched to a single swallow-on-error statSync helper (safeStatIsFile / safeStatIsDir). Both return false on any error, so a transient FS error surfaces the ref as "broken" rather than failing the post-write check. Added a test that chmods a parent dir to 0 to exercise the EACCES path. The reviewer's "lower priority" suggestion to also plumb warnings into the legacy curate-executor.ts in-daemon path is deferred — that surface is being phased out and the fix would expand CurationStatus shape. Other reviewer notes filed as follow-ups (not addressed here): - lstatSync vs statSync for symlink handling (today the comment overstates the symlink guarantee; the resolved-path check only validates the input string, not the link target) - case-insensitive FS edge case on macOS/Windows - skip ambiguity warning when ref carries an explicit .html suffix - .md legacy-extension strip for the dwindling MD topics
The previously-warned "both <ref>.html and <ref>/ exist" case is not actually a failure mode — the suffix the agent wrote IS the choice (bare → folder, .html → file per the convention) and the FE routes by suffix. The other on-disk shape is irrelevant. Removing the case and all references to it: - checkRef now reduces to a single broken check: warn only when neither <ref>.html nor <ref>/ exists. - HtmlWriteSuccess + CurateSessionEnvelope JSDocs dropped the "broken / ambiguous" mention. - Removed the ambiguous-ref test block from related-ref-warner.test.ts (and the both-exist no-warning block that I had repurposed mid-cycle). The remaining tests cover the actual behavior cleanly. - Updated the MCP renderer multi-warning test to use two broken-ref fixtures instead of broken + ambiguous. Also addressed two PR #690 review nits in the same pass: - Stale comment in related-ref-warner.ts:82-87 still referenced the old existsSync + statSync TOCTOU shape; rewrote to describe what the current safeStat helpers actually do. - Added MCP rendering tests for the new ⚠ branch (warnings present, warnings empty, warnings field absent — defends against pre-commit daemon payloads). Mirrors the CLI-side coverage added previously. Two review nits deferred and resolved as follow-ups: - chmodSync(…, 0) test fragility under root uid (Docker node:* images). - WebUI Tasks panel does not yet decode/render the new warnings field.
…nits
The warner previously passed if EITHER `<ref>.html` or `<ref>/` existed.
That left a symmetric false-negative class to the dropped ambiguous
case: a `.html` ref pointing at a folder (or a bare ref pointing at a
file) passed silently even though the FE routes by suffix and would
show a dead pill.
Tightened to probe only the form the agent's suffix picked:
- `.html` suffix → check `<ref>.html` (file form) only
- bare → check `<ref>/` (folder form) only
A shape mismatch on disk is no longer ignored — it now surfaces as a
broken-ref warning, with a coaching hint on the bare-form case ("add
.html if you meant a file") since "agent forgot the .html suffix" is
the typical mistake. Updated the JSDoc, tightened the broken-ref
messages to name the missing shape ("no file at …" vs "no folder at
…"), and replaced the two now-impossible "bare ref matches file" /
"`.html` ref matches folder" no-warning tests with their inverses
asserting the new warnings.
Same commit also addresses two cosmetic review nits:
- `renderEnvelope` docblock in brv-curate-tool.ts:266-275 updated to
describe the head + ⚠ lines shape (was "a single confirmation line").
- Stale sequencing comment in html-writer.test.ts:392 ("before the
disk write") corrected to "after the atomic write".
fix: add HTML output contract rules and related-ref warner
|
Claude finished @danhdoan's task in 5m 40s —— View job Re-review of PR #663 after the ENG-2883 batch (PR #690)Four PR-specific commits since Score: 4.85 / 5 — held from prior review. Quality of the new code is high; remaining items are all known carry-overs that have been deferred across multiple reviews. Ready to merge. What changed since
|
| Commit | Scope |
|---|---|
8d5ded0 |
ENG-2883 — adds 3 HTML output contract rules (<li> plain-text, <bv-diagram> body-direct + entity-escape, related file/folder suffix convention) across 7 LLM surfaces (SKILL.md, curate-prompt-builder.ts, brv-curate-tool.ts, registry.ts, curate.txt, system-prompt.yml, curate-detail-preservation.yml). Adds related-ref-warner.ts + wiring through HtmlWriteSuccess.warnings. |
f6ec3c2 |
ENG-2883 review fix — surfaces warnings in the MCP envelope (CurateHtmlDirectResult.warnings?), renderEnvelope appends ⚠ <text> lines, swaps existsSync+statSync for race-resilient safeStatIsFile / safeStatIsDir helpers (test exercises EACCES via chmod 0 on the parent dir). |
ffe6856 |
ENG-2883 review fix — drops the "ambiguous (both exist) refs" warning case (the suffix IS the choice; the other shape is irrelevant); rewrites stale TOCTOU comment; adds MCP rendering tests for the three wire shapes (warnings: [...], warnings: [], warnings absent — defends against pre-PR daemon payloads). |
5887e2a |
ENG-2883 review fix — tightens to probe ONLY the chosen shape; replaces the now-impossible "bare matches file" / ".html matches folder" no-warn tests with their inverses (suffix-mismatch IS a warning now). Adds coaching hint on the bare-form case ("add .html if you meant a file") since "agent forgot the .html suffix" is the typical mistake. |
Strengths
- The four-commit arc is a textbook "ship → review → tighten" sequence. Each commit's body explains WHAT changed, WHY (with the specific failure mode it prevents), and what was DEFERRED (with reasoning).
5887e2ain particular catches a symmetric false-negative that the original commit had — probing both forms silently accepted dead-pill scenarios where the agent's suffix and the on-disk shape disagreed. Replacing the now-impossible no-warn tests with their inverses (the new behavior) is the right shape for a "we changed the contract" follow-up. - Race-resilience is documented in source.
safeStatIsFile / safeStatIsDir(related-ref-warner.ts:103-117) swallow ALL errors as "not present" with an explicit comment at l.82-89 explaining why: the warner runs post-write and must never turn a successful curate into a reported failure. The chmod-0 test (related-ref-warner.test.ts:146-167) exercises the EACCES path concretely, not just the ENOENT happy path. - Test coverage spans all three layers. The warner unit tests (16 cases) cover happy paths, broken refs, suffix-mismatch, race-resilience, and path-traversal. The writer test (
html-writer.test.ts:391-427) confirms warnings flow throughHtmlWriteSuccess. The CLI side (curate-session.test.ts:168-198) and MCP side (brv-curate-tool.test.ts:434-525) both lock the wire shape (present, empty, absent — including the "older daemon payload" defensive case at line 491). - 3 output-contract rules across 7 surfaces — and they actually agree.
<li>plain-text rule appears verbatim in 7 places.<bv-diagram>no-CDATA + entity-escape rule appears in 7 places with the same-->example. Therelatedfile-vs-folder suffix convention is consistent acrossSKILL.md, the MCP description, the prompt builder, the system prompt, andcurate.txt. The diagram preservation note incurate-detail-preservation.yml:47-57even includes a worked mermaid example with corrected entities. Previously this kind of multi-surface change had drifted between SKILL.md and the prompt builder; this batch lands them in sync. - Honest deferral. The PR fix: add HTML output contract rules and related-ref warner #690 commit messages explicitly list what wasn't addressed and why: legacy
curate-executor.tswarnings plumbing (the surface is being phased out, would expandCurationStatus), WebUI Tasks panel doesn't decode the newwarningsfield yet, symlink resolution (today's comment overstates the guarantee), case-insensitive FS edge case,chmod 0test fragility under root uid (Dockernode:*images). This is the right level of transparency — the operator running real-corpus tests in two weeks won't be confused about whether something is a known gap vs. a real bug.
Issues
1. suggestion — .htm suffix inconsistency with the file-reader
Posted inline at related-ref-warner.ts:52. wantsFile = stripped.endsWith('.html') is case-sensitive and only matches .html — but file-context-file-reader.ts:95-98 matches both .html AND .htm (the May-15 fix), and format-detector.ts:24 treats both as HTML. So a topic that exists on disk as foo/bar.htm (migrated, manual moves) is readable as a topic but unreachable from a related="@foo/bar.htm" ref — the warner probes for a folder and surfaces it as broken.
2. observation (UX, batch curate) — forward refs surface as warnings on every first-touch
Posted inline at related-ref-warner.ts:17-19. The header comment correctly says "a forward reference to a topic about to be curated is legit" — and the warner stays advisory rather than rejecting. But the ⚠ still SURFACES on every such ref. For tool-mode dream link/merge candidate flows (paired cross-refs across two topics by design), this is the dominant case. Worth a thought once you have telemetry on how often warnings fire on real-corpus dream runs.
3. observation (security, defer-defensible) — symlink defense-in-depth at related-ref-warner.ts:74-80
Posted inline. The comment says symlinks "could theoretically escape; reject any resolution that does" but path.resolve() operates on the string only — it does NOT follow symlinks. The follow-up lstatSync vs statSync swap is already on the list per f6ec3c2. The actual security impact is minimal here (the warner only returns booleans, no content read).
4. issue (carry-over) — error messages still reference removed brv providers connect/switch commands
Confirmed still standing across:
src/oclif/lib/daemon-client.ts:36, 38, 41, 181-182, 274-277src/oclif/commands/dream.ts:144src/tui/utils/error-messages.ts:17, 18, 21src/server/infra/daemon/task-validation.ts:26, 36
brv dream is still the live reachable path. Now the longest-standing carry-over since ENG-2880 closed the curate-flags one. Fix this →
5. observation (carry-overs, non-blocking)
<existing-topic>block incurate-prompt-builder.ts:119-123still inlines system-managed timestamps verbatim. Heavier traffic now with tool-mode-dream MERGE candidates exercising the UPDATE path undermeta.previousSummary. One-regex strip pass before the prompt builder shape settles permanently.topic-loader.tsregex parser still inconsistent withindex-generator.tsparse5. Latent — topics are writer-produced today — but worth aligning.- CLI
ConsoleLoggerchalk-on-stderr under--format json(curate-session.ts:380, 423).dream.ts:33solves this withformat === 'json' ? new NoOpLogger() : new ConsoleLogger().
Praise
- Posted inline at
related-ref-warner.ts:86-96— tightening to probe only the suffix the agent chose is the right call, and replacing the "ambiguous" warning with asymmetric-mismatch coverage is a clean net improvement. - Coaching hint design. The "bare ref + only file present" warning at l.96 includes
add ".html" if you meant a file— that's the specific text an agent encountering the typical "I forgot the suffix" mistake needs to see. Cheap to write, high-leverage on the agent's correction loop. The asymmetry (only the bare-form gets the hint, not the file-form) reflects empirical understanding of which mistake is more common. - MCP envelope render-test triad.
brv-curate-tool.test.ts:434-525covers three wire shapes (warnings: [...],warnings: [],warningsabsent) — the third case is the load-bearing one for backward compatibility with pre-PR daemon payloads. Most teams would have written just the happy path. - Diagram preservation prompt cross-cut. Updating
curate-detail-preservation.yml:47-57to use-->in the mermaid example (instead of leaving the prior-->arrows that would now trip the renderer) shows the kind of multi-file consistency discipline that prevents prompt/contract drift. The corrected example doubles as inline documentation of the rule.
Test plan suggestions
- A test for the
.htmref case (issue 1) — either assert the current "broken" warning text mentions the suffix mismatch, or add.htmto the wantsFile set and assert clean resolution against a seededfoo/bar.htmfile. - A test for the symlink-inside-contextTreeRoot case (issue 3) — even if you defer the
lstatSyncswap, a regression test that locks the current behavior (symlink-to-outside-folder probed viastatfollows the link and returns isDirectory()=true → no warning) at least documents what we have today.
Final disposition
LGTM. ENG-2883 is high-quality follow-up work — multi-surface prompt-contract alignment that previously had drifted, a well-tested read-only warner with race-resilient stat probes, and a textbook iterative review-tightening sequence across four commits. The remaining carry-overs are all known, well-documented, and have consistently been polished post-merge.
Todo
- Identify what changed since
c4c4a52(8d5ded0→f6ec3c2→ffe6856→5887e2a) - Spot-check the ENG-2883 warner + envelope + prompt-contract updates
- Verify outstanding carry-over items from prior reviews
- Post inline comments and final summary with score
· Branch:proj/byterover-tool-mode
| // target is reported as broken rather than thrown. Probe only the | ||
| // shape the agent chose: a `.html` ref against a folder (or a bare | ||
| // ref against a file) is a dead pill the FE cannot route, so it | ||
| // must surface even when the "other" shape happens to exist. | ||
| if (wantsFile) { | ||
| if (safeStatIsFile(candidate)) return undefined | ||
| return `related ref "${originalRef}" was not found — no file at "${relative}.html" under the context tree` | ||
| } | ||
|
|
||
| if (safeStatIsDir(candidate)) return undefined | ||
| return `related ref "${originalRef}" was not found — no folder at "${relative}/" under the context tree (bare refs target folders; add ".html" if you meant a file)` |
There was a problem hiding this comment.
praise: Tightening this to probe only the suffix the agent chose is the right call — and the commit message captures the rationale precisely: the suffix IS the decision, and probing both forms would silently accept a dead-pill (a .html ref + only-folder-on-disk would FE-404 either way). The pair with the .html "coaching hint" on bare-vs-file mismatch (l.96) is a nice ergonomic touch — that's the typical "agent forgot the .html" mistake.
The test in related-ref-warner.test.ts:124-142 ("warns when an explicit .html ref points at a folder instead of a file" / "warns when a bare ref points at a file instead of a folder") locks the contract from both directions. Replacing the previous "ambiguous" warning with this asymmetric-mismatch coverage is a clear net improvement.
The post-write race-resilient stat path (l.103-117) is also well-considered: a transient FS error surfacing the ref as "broken" rather than throwing out of writeHtmlTopic is the right priority order — the topic is already on disk, the warner's job is advisory only.
| const wantsFile = stripped.endsWith('.html') | ||
| const withoutExt = wantsFile ? stripped.slice(0, -'.html'.length) : stripped |
There was a problem hiding this comment.
suggestion (consistency with the file-reader): endsWith('.html') here is case-sensitive and only matches the .html suffix — but src/server/infra/context-tree/file-context-file-reader.ts:95-98 matches both .html AND .htm, and format-detector.ts:24 treats both as HTML. So a topic that exists on disk as foo/bar.htm (migrated content, older OS conventions, manual moves) is readable as a topic but unreachable from a related="@foo/bar.htm" ref — the warner here treats .htm as a bare ref, probes for a folder, and flags it as broken even though the file exists.
Three options ranked by surface area:
-
Cheapest: broaden the suffix test:
const lower = stripped.toLowerCase() const wantsFile = lower.endsWith('.html') || lower.endsWith('.htm') const withoutExt = wantsFile ? stripped.slice(0, -(lower.endsWith('.html') ? 5 : 4)) : stripped
This matches the file-reader's surface.
-
Cleaner: add a small
parseRefSuffix(stripped) → {wantsFile, withoutExt}helper that's shared between the warner and any future ref consumer (FE router, BM25 indexer). -
Or just document that
.htmrefs are explicitly unsupported and surface a dedicatedunsupported-extensionwarning.
Latent — topics today are writer-produced as .html — but this is a small wedge between two sibling code paths that the file-reader already had aligned. Case-sensitivity is also listed in the deferred follow-ups (f6ec3c2 commit body), and a single .toLowerCase() on stripped closes that nit at the same time.
| * The warner never mutates the attribute and never rejects the write | ||
| * — refs are advisory metadata, and a "forward reference" to a topic | ||
| * about to be curated is legit. |
There was a problem hiding this comment.
observation (UX, batch curate): The header note says "a forward reference to a topic about to be curated is legit" — and the warner correctly stays advisory rather than rejecting. But the warning still SURFACES on every such ref, which means an agent doing batch curation (curate A → curate B → curate A again to add related="@b"`` is the natural pattern, but curate A with related="@B" first → curate B produces one ⚠ on A's write).
For tool-mode dream's link/merge candidate flows (which by design produce paired cross-refs across two topics) this is the dominant case. The agent gets a ⚠ on the first write, then no ⚠ on the second — but the user-facing log line will read like a real problem.
Two thoughts (defer-defensible — not blocking):
- Two-pass curate in the orchestrator. Tool-mode dream finalize could write both halves of a link/merge pair, then re-check refs once both are on disk and drop warnings that resolved. Adds complexity; only pays for itself if forward-ref noise becomes an observed pain.
- Cheap-now signal in the warning text. When the resolved ref's domain folder DOES exist but the target file/folder doesn't, hint at "forward reference?" alongside the existing "no file at …" / "no folder at …" message. Tells the agent (and reviewer) "this is the routine batch case", not "this is a typo".
Worth a thought once you have telemetry on how often warnings fire on real-corpus dream runs. The asymmetric-mismatch coaching at l.96 already shows good intuition for "shape of the typical mistake"; this is the same principle applied to the other common case.
| // Defence in depth: even after segment filtering, the resolved path | ||
| // must remain inside the context-tree root. A symlink in `contextTreeRoot` | ||
| // could theoretically escape; reject any resolution that does. | ||
| const rootResolved = path.resolve(contextTreeRoot) | ||
| if (!isInsideRoot(candidate, rootResolved)) { | ||
| return `related ref "${originalRef}" resolves outside the context-tree root and was not checked` | ||
| } |
There was a problem hiding this comment.
observation (security, defer-defensible): The comment says "A symlink in contextTreeRoot could theoretically escape; reject any resolution that does." But path.resolve() does NOT follow symlinks — it operates on the string only. If <contextTreeRoot>/security is a symlink pointing at /etc, then path.resolve(contextTreeRoot, 'security/passwd') still returns <contextTreeRoot>/security/passwd (string-wise), isInsideRoot returns true, and the subsequent safeStatIs{File,Dir} would follow the symlink and probe /etc/passwd.
The current commit message (f6ec3c2) already flags this as a follow-up: "lstatSync vs statSync for symlink handling (today the comment overstates the symlink guarantee; the resolved-path check only validates the input string, not the link target)". The actual security impact is minimal here: the warner only does stat().isFile() / isDirectory() and returns a boolean — no content is read, no path is logged in detail beyond the warning string the agent is going to see anyway. So an attacker who controls contextTreeRoot symlinks could only learn "is /etc/passwd a file? yes." through warning suppression — not much of an exfiltration vector.
Still worth either:
- updating the comment to be honest about the guarantee (the input-string check, not the link target), or
- swapping
statSyncforlstatSyncso symlinks are reported as not-the-expected-shape (a symlink isFile() = false, isDirectory() = false, isSymbolicLink() = true — both probes would surface the ref as broken, which is the correct user-facing result).
Cheap to fix at the same time as the case-insensitive FS deferral. Non-blocking.
…nt path normalization) The dream-scan output emits candidate paths with .html (e.g. auth/jwt.html) and the documented dream→curate merge workflow tells the agent to write the survivor at that path. The writer previously appended .html unconditionally, producing auth/jwt.html.html on disk while reporting status: done and bypassing the path-exists guard — silent data corruption during merge: the survivor's merged content landed in a stale file while the agent archived the loser thinking the merge had succeeded. topicPathToFilePath now strips a trailing .html before appending. Both path="x/y" (bare convention) and path="x/y.html" (the form dream-scan emits) collapse to the same on-disk file at x/y.html and trigger the path-exists guard identically. 4 new tests in html-writer.test.ts cover: .html-suffixed path writes to single-extension file; path-exists fires across both forms in both orderings; --overwrite leaves only one file.
… capture pre-archive mtime/signals so dream undo is bit-exact Three coupled changes that touch the search-service contract and the tool-mode dream finalize/undo path. Search service contract: - ISearchKnowledgeService gains refreshIndex(): invalidates the cached MiniSearch index so the next search() call rebuilds from disk regardless of TTL. Required because the service's 5s TTL fast-path would otherwise serve a cached index that pre-dates just-loaded topics, surfacing zero candidates on a fresh scan. - SearchKnowledgeOptions gains combineWith?: 'AND' | 'auto' | 'OR'. Default 'auto' preserves the historical AND-first-with-OR-fallback behavior. Internal SearchOptions references the public option's type to keep the two declarations from drifting if a new value is added. - bm25-pair-discovery passes combineWith: 'OR'. Multi-word distinctive titles (e.g. "Redis Caching Layer") AND-combine to terms only the source itself contains; OR-mode lets cross-pairs surface based on any-term overlap. Dream finalize / undo signal+mtime restore: - finalizeDreamSession captures each archived file's mtimeMs (via stat) and current RuntimeSignals (via runtimeSignalStore.get) before the rename, and returns them on the result alongside previousTexts. - DreamLogSchema PRUNE op gains optional previousSignals + previousMtimes (backward-compat — older logs still undo via fallback). - agent-process dream-finalize branch writes the new metadata into each PRUNE op. - DreamUndoDeps gains optional runtimeSignalStore. undoPrune restores mtime via utimes() and signals via runtimeSignalStore.set() when the PRUNE op carries the new fields; falls back gracefully when not. - buildUndoDeps threads runtimeSignalStore through. scanDreamCandidates now invokes searchService.refreshIndex() after loadToolModeTopics and before the parallel candidate-generator block, so the first scan after seed produces full results. Tests cover: refreshIndex called pre-discovery; combineWith: 'OR' passed to search; mtime+signal capture on finalize; mtime+signal restoration on undo; backward-compat fallback path; mock updates for the extended ISearchKnowledgeService in unrelated test files.
Both subcommands are intentional v1 placeholders — the daemon doesn't
persist session state, so dream sessions always returns [] and dream
cancel is a no-op. The previous surface advertised them as if they
were functional: descriptions said 'List active...' / 'Discard a...',
JSON envelopes returned success-looking shapes ({sessions: [], status:
ok} / {status: cancelled}), and machine-readable consumers that
branched on the response had no way to know the daemon was never
consulted.
This is a cheap-honesty fix, not a real persistence implementation.
Real persistence is a separate ticket.
Changes:
- DreamSessions.description prefixed with '[v1 stub]' and now names
the stateless behavior inline.
- DreamCancel.description prefixed with '[v1 stub]' and now names the
no-op behavior inline.
- Both JSON envelopes gain a 'note' field disclosing v1 status so
machine consumers see the caveat at the same key they'd branch on.
- Topic-root listing in brv dream's run() body flags both subcommands
as '[v1 stub]' in the one-liner hint.
4 tests in test/commands/dream.test.ts cover: each static description
contains 'v1 stub'; each --format json envelope's data.note discloses
stateless / no-op semantics.
…, worked example A skill-discovery sub-agent test surfaced gaps in §7 (brv dream): prose described the surface but no sample JSON envelopes existed, prune-signal thresholds were left to the agent to guess, and there was no end-to-end walkthrough — so consumers had to grep the source to learn field shapes. Additions to §7: - Sample dream scan --format json envelope (sessionId + candidates with one example each of link / merge / prune / synthesize shapes, including signals + daysSinceModified for prune). - Sample dream finalize response (archived, skipped, logId, status). - Prune signal thresholds inline: importance < 35 → low-importance, draft >60d / validated >120d → stale-mtime, maturity 'core' never surfaced. - 'Worked example — prune the stalest topic (no curate detour)': three-block scan → finalize → undo walkthrough. - Path-attribute warning in the merge bullet so agents stripping the scan-emitted .html before <bv-topic path=> see the guidance. - Updated undo description to reflect the symmetric mtime + signal restoration shipped in this PR — was 'mtime resets to current time'; is 'content, original mtime, and sidecar runtime signals... all restored'. Companion change in prune-candidates.ts: sync-hint comment near the threshold constants pointing at SKILL.md §7, so future threshold changes don't silently drift the agent-facing docs.
…566ac) Reviewer caught a real race in the refreshIndex() shipped in f5566ac. The orphan-builder problem: T0: caller A invokes search() — acquireIndex starts buildA; state.buildingPromise = buildA. The IIFE body is still awaiting fileSystem.listDirectory and the eventual buildFreshIndex. T1: scanDreamCandidates calls refreshIndex(). Old code cleared state.cachedIndex AND state.buildingPromise immediately and returned. T2: caller C invokes search() — sees both fields cleared, kicks off a NEW build (buildB). state.buildingPromise = buildB. T3: buildA finally completes and writes state.cachedIndex = freshIndexA (line 1016 of acquireIndex). T4: buildB completes and writes state.cachedIndex = freshIndexB. If buildA happens to settle AFTER buildB (e.g. slower disk read, larger source set), the older freshIndexA wins — reintroducing the exact TTL-stale-data scenario Fix #2 was meant to prevent. Realistic on a daemon with concurrent brv search / brv query / dream-scan tasks. Fix per reviewer's accepted shape: refreshIndex() now AWAITS any in-flight buildingPromise before clearing cached state. By the time the await returns, the in-flight build has either published its result (which is then cleared) or rejected (swallowed — callers don't care about that build's outcome, they just want a clean slate). No orphan publisher can write back after invalidation because the publishing point is now in the past relative to the clear. New test file search-knowledge-service-refresh.test.ts covers: - the contract under test (refreshIndex MUST NOT resolve while a build is in flight) - idempotent behavior on cold state (no in-flight build) - graceful clear when the in-flight build rejects Not a follow-up sub-agent regression replay: the race manifests only under concurrent operations on a single SearchKnowledgeService instance, which the 12-scenario test matrix (per-scenario daemons, serial workflow) does not exercise. The unit test at the contract boundary is the right pin.
Five items from the latest codex review batch on PR #689. - agent-process.ts: drop the dead-code `?? ''` fallback on previousTexts. archived[] paths are invariant-guaranteed to have all three fields (text + mtime + signals) populated together, so the fallback was already unreachable — and if it ever fired, the resulting `{[path]: undefined}` would violate the new PRUNE schema's z.record(z.string(), z.number()) for previousMtimes. All three lookups now agree: trust the invariant uniformly. - search-knowledge-service.ts: extend refreshIndex docstring with a maintainer note explaining that the explicit `state.buildingPromise = undefined` after the await is defensive (acquireIndex's finally block already nulls it). Also documents the benign double-build inefficiency when a fresh acquireIndex() races into the gap between await-resolve and clear — not a correctness issue, just non-obvious. - SKILL.md §7: soften the merge bullet's path-attribute warning. The prior "drop the trailing .html — the writer always appends it" instruction was load-bearing before the path-doubling fix shipped; now that the writer normalizes idempotently (both bare and .html-suffixed forms resolve to the same on-disk file), the instruction reduces to agent-facing cognitive load. Updated to "either form works; the writer normalizes" so the agent doesn't have to track which form scan emitted. - dream.test.ts: add two regression tests for the DREAM_REMOVED_FLAGS wire-up in dream.ts's run() body. (1) text mode: --timeout 30 rejects with this.error throwing before this.log runs; (2) json mode: --timeout 30 --format json emits a JSON envelope with success:false / data.status:'error' / data.error containing '--timeout'. Pins the early-exit ordering so a future refactor that drops the findRemovedFlagMessage(this.argv, ...) call fails the suite instead of silently regressing to pre-removal state. - schemas.ts: hoist TaskTypeSchema declaration above TaskExecuteSchema (the schema previously sat ~100 lines below) and replace TaskExecuteSchema.type's inline z.enum(...) with a reference to TaskTypeSchema. The doc block at the original declaration site claimed it was the single source of truth, but the inline enum on TaskExecuteSchema.type silently violated that — the only outlier; TaskCreatedSchema.type (now line 605) and TaskCreateRequestSchema.type already referenced TaskTypeSchema. Hoist needed to avoid a Zod TDZ at module load. No behavioral change; schemas now share one literal enum source.
refactor: [ENG-2884] remove legacy LLM-driven brv dream path
|
Claude finished @danhdoan's task in 7m 51s —— View job Re-review of PR #663 after the ENG-2884 batch (PR #689)Two PR-specific blocks since Score: 4.85 / 5 — held from prior review. The refactor is unusually clean for its size (32 files deleted, net -5825 LOC) and the in-cycle bug catches show real review discipline. What changed since
|
| Commit(s) | Scope |
|---|---|
13d83a6 |
ENG-2884 refactor — removes dream-executor.ts, dream-trigger.ts, consolidate/synthesize/prune LLM ops, dream-response-schemas.ts, parse-dream-response.ts, timeout-deprecation.ts. brv dream is now a thin topic root printing a migration hint. Net –4000 LOC, –14 files. |
7d77a8e |
ENG-2884 PR #689 review — wires DREAM_REMOVED_FLAGS into the topic root (was exported but never consumed); drops 'dream' + force + 'agent-idle' from TaskTypeSchema / TaskExecuteSchema / TaskCreateRequestSchema. Closes the straggler-task gap (a stale type:'dream' now fails at the schema boundary instead of silently completing with empty result). |
7a397ba |
Strip trailing .html in topicPathToFilePath so path="x/y" and path="x/y.html" both resolve to the same on-disk file — fixes the merge-workflow silent-data-loss bug where the survivor landed in x/y.html.html while the agent archived the loser thinking the merge succeeded. |
f5566ac |
searchService.refreshIndex() + SearchKnowledgeOptions.combineWith + pre-archive mtime/signals capture on finalize. DreamLogSchema.PRUNE gains optional previousSignals / previousMtimes. undoPrune restores via utimes() + runtimeSignalStore.set(). |
2e138a6 |
Discloses [v1 stub] status on dream sessions / dream cancel in descriptions + JSON envelopes (data.note: 'v1: ...'). |
53004fd |
SKILL.md §7 — sample envelopes, prune-signal thresholds, worked example. Sync-hint comment on the prune threshold constants pointing at SKILL.md. |
8c7c1cc |
Race-fix on refreshIndex from PR #689 reviewer catch — awaits in-flight build before clearing, so orphan builders can't write back after invalidation. Maintainer-note documents the benign double-build window. New search-knowledge-service-refresh.test.ts (3 cases). |
e3f96e8 |
PR #689 codex review fixes — drop dead-code ?? '' fallback on previousTexts (invariant-guaranteed populated together); hoist TaskTypeSchema above TaskExecuteSchema to make it the single source for both (was inline-duplicated on TaskExecuteSchema.type); SKILL.md path-attribute note softened (writer normalizes either form). |
Strengths
- Refactor scope is right-sized. Removed exactly the dead surface: LLM-driven entry, daemon-side dream-trigger, agent-idle dispatch branch, response parsers, timeout-deprecation helper (no consumer post-removal). Kept LLM-agnostic services (
dream-undo,dream-state-service,dream-lock-service,dream-log-store) that the tool-mode pipeline reuses. The commit message's "Removed / Trimmed / Kept" tri-list is unusually clear. 8c7c1ccrace fix — see inline comment atsearch-knowledge-service.ts:1123. The orphan-builder window was easy to miss and the unit-test contract (refresh must not resolve while a build is in flight) is the cheapest possible pin.- mtime + signal capture on finalize. The metadata that drove the prune decision (importance, maturity, recency, mtime) is now restorable, so a topic archived as low-importance or stale-mtime returns with the same prune-eligibility on the next scan. Without this, undo silently mutated observable state and the topic stopped re-surfacing — a quiet correctness regression that would only show up under repeated dream cycles. The three-case test triad in
dream-undo.test.ts(capture-and-restore, restore-via-utimes, restore-via-signal-store, legacy-log-fallback) is the right shape. - Path-doubling fix is symmetric. Two reading orderings +
confirmOverwriteround-trip = 4 new tests; not just the doubled-extension case in isolation. See inline comment athtml-writer.ts:274for one small follow-up (.htmextension symmetry). - Schema-source consolidation.
TaskTypeSchemais now the single source forTaskExecuteSchema.type,TaskCreateRequestSchema.type, andTaskCreatedSchema.type— previouslyTaskExecuteSchema.typehad an inlinez.enum(...)that silently violated the "single source of truth" doc above the declaration.e3f96e8hoisted the declaration to fix the Zod TDZ at module-load and removed the inline duplicate. Quiet drift that nobody flagged in three prior reviews. - Honest v1-stub disclosure.
[v1 stub]prefix in command descriptions +data.notefield in JSON envelopes fordream sessions/dream cancel. Machine consumers branching onsessions.lengthorstatus: 'cancelled'see the caveat at the same key shape they'd inspect.
Issues
1. observation (semantic deviation, defer-defensible) — previousSignals captured for never-tracked topics
Posted inline at dream-session.ts:221. runtimeSignalStore.get() returns validateOrDefault(...) — i.e. system defaults when the sidecar has no entry. On a never-tracked topic, undo writes those defaults via set(), creating a sidecar entry that didn't exist before the archive. Strictly speaking, that's not "undo to original state" — it's "undo to original-state + defaults". Defensible (a future access would mutate the defaults anyway), but worth one line in the docstring.
2. suggestion (consistency) — strip .htm too in topicPathToFilePath
Posted inline at html-writer.ts:274. Latent today (dream-scan only emits .html), but /\.html?$/i matches both .html and .htm at no extra cost, and the file-reader / format-detector already handle both — this is the only path-shape resolver that doesn't.
3. nit (CLI behavior asymmetry) — JSON-mode removed-flag rejection exits 0, text-mode exits 1
Posted inline at dream.ts:94. Same pattern as the curate/query precedent, so consistency with the rest of the CLI is preserved. But set -e shell wrappers see different exit codes for the same logical error depending on the --format flag. Either document, or set process.exitCode = 1 before the return.
4. issue (stale comment, post-ENG-2884) — task-router.ts:138-139 still references the removed dream consumer
The preDispatchCheck doc says "Used for dream task type to enforce gates 1-3 (time, activity, queue) even on the CLI dispatch path — mirrors the idle-trigger pre-check pattern." But the 'dream' task type is gone from TaskTypeSchema, and a grep shows no caller passes a preDispatchCheck anymore (only the type plumbing remains). The new task-router.test.ts correctly swapped its fixtures to type: 'curate' — this comment is the last straggler.
Worth either updating to "Currently unused — kept for future per-type gating" or trimming the dream-specific paragraph. (Can't post inline since this exact line wasn't touched in the PR's diff.)
5. observation (carry-over) — daemon-client.ts / task-validation.ts still reference brv providers connect/switch / /provider REPL
Carry-over from prior reviews — but the reachable-impact analysis has shifted significantly since the prior review. With the LLM-driven dream path gone, the validateProviderForTask gate (agent-process.ts:513) is now skipped for all five user-facing tool-mode types (search, query-tool-mode, curate-html-direct, dream-scan, dream-finalize). The only paths that still hit provider validation are the legacy 'curate' / 'curate-folder' / 'query' task types — and the user-facing CLI commands (brv curate, brv query) have all been migrated to the new tool-mode types.
Real-world reachability is now near-zero unless a stale MCP client or an older CLI binary still produces legacy task types. Worth a one-pass scrub regardless, but the urgency from the May 19 review is gone.
6. observations (carry-overs, unchanged)
topic-loader.tsregex parser still inconsistent withindex-generator.tsparse5.<existing-topic>block incurate-prompt-builder.ts:119-123still inlines system-managed timestamps.- CLI
ConsoleLoggerchalk-on-stderr under--format json(sidecar-bump + index-regen incurate-session.ts).
Praise
- PR refactor: [ENG-2884] remove legacy LLM-driven brv dream path #689 review catch on the refresh-index race — saving a real correctness bug from landing. Pulling the cycle in three days (ship
f5566ac→ reviewer catches the orphan-builder window → ship8c7c1ccwith explicit unit-test contract) is the model "ship → review → tighten" arc. - Codex review-fix discipline in
e3f96e8— five items, each scoped to one file, each with a why-this-matters line in the commit message. TheTaskTypeSchemahoist+dedupe fixes drift that three prior human review rounds missed because it lived behind a "single source of truth" doc comment that was technically lying. - Test count delta is honest. Net
-184tests is exactly what the deleted file inventory predicts — no orphan suites pointing at vanished modules, no zombie tests pinning long-dead behavior. - Migration text quality.
dream cancel's text-mode output (l.41-46) tells the user that curate writes between scan and cancel are NOT rolled back bybrv dream undo, and points tobrv review rejectas the right rescue. That's the specific rescue path a confused user needs — and most teams would have shipped a generic "session cancelled" line. - Sub-agent test design.
search-knowledge-service-refresh.test.tsinjectsstate.buildingPromisedirectly rather than going through a full IFileSystem mock, because the contract under test is exactly that interaction. Pragmatic, well-scoped, leaves a 28-line file that any future maintainer can read top-to-bottom.
Final disposition
LGTM. ENG-2884 is high-quality refactor work — the size doesn't represent risk because the removed surface was orphaned and the kept surface is fully covered. The mid-cycle catches on 8c7c1cc (race) and e3f96e8 (schema drift) are the kind that pay for themselves immediately. Remaining items are all defer-defensible or post-merge polish.
Todo
- Identify what changed since
5887e2a - Spot-check the ENG-2884 refactor + ENG-2884 sub-fixes (
13d83a6,7d77a8e,7a397ba,f5566ac,2e138a6,53004fd,8c7c1cc,e3f96e8) - Verify outstanding carry-over items from prior reviews
- Post inline comments + final summary with score
· Branch:proj/byterover-tool-mode
| * point that encodes that convention. | ||
| */ | ||
| function topicPathToFilePath(contextTreeRoot: string, topicPath: string): string { | ||
| const normalized = topicPath.replaceAll('\\', '/').replace(/^\/+/, '').replace(/\.html$/i, '') |
There was a problem hiding this comment.
praise: Idempotent .html stripping is the right shape for the dream→curate handoff. The doc above (l.260-271) names the failure mode it prevents (auth/jwt.html.html, silent path-exists bypass, archived loser + stale survivor), and the four new tests in html-writer.test.ts lock the contract from both orderings (bare-first-then-suffixed + suffixed-first-then-bare + confirmOverwrite round-trip). That's the right test triad for a "two input shapes must resolve to one file" invariant — it would have been easy to just test the doubled-extension case in isolation and miss the symmetry.
One micro-nit: the case-insensitive regex /\.html$/i correctly handles .HTML etc., but file-context-file-reader.ts:95-98 and format-detector.ts:24 both treat .htm as HTML too. If path="x/y.htm" ever shows up (manual migration, legacy import), this resolver would keep the .htm suffix, append .html, and produce x/y.htm.html — same doubled-extension problem this fix was designed to prevent, just one extension off.
Latent today (dream-scan emits .html only), but for full symmetry with the reader, consider:
const normalized = topicPath.replaceAll('\\', '/').replace(/^\/+/, '').replace(/\.html?$/i, '')The s? matches .html and .htm identically. Same regex character cost.
| * may start a parallel build. That's a benign double-build | ||
| * inefficiency, not a correctness issue. | ||
| */ | ||
| async refreshIndex(): Promise<void> { |
There was a problem hiding this comment.
praise (the race fix in 8c7c1cc): The original refreshIndex() in f5566ac had the orphan-builder race — clearing cachedIndex + buildingPromise while a prior acquireIndex() build was still in flight, then the orphan build resolved and wrote back to state.cachedIndex (line ~1016), defeating the invalidation. Awaiting the in-flight before clearing is the right shape: by the time the await returns, the in-flight has either published its result (which is then cleared) or rejected (which is swallowed — the caller doesn't care about the orphan's outcome). No publisher can write back after invalidation because the publishing point is now in the past relative to the clear.
The maintainer note about state.buildingPromise = undefined being defensive (l.1111-1121) is unusually thoughtful — it identifies the only post-fix race window (a fresh acquireIndex() that captures a new buildingPromise between await resolution and the explicit clear, where the clear then nulls the new pointer leaving a benign parallel double-build) and explicitly classifies it as performance-only, not correctness. That class of "we know about this edge case and it's intentional" comment is exactly what prevents the next contributor from "fixing" the non-bug.
The unit test in test/unit/agent/tools/search-knowledge-service-refresh.test.ts pins the contract (refresh-must-not-resolve-while-building) directly via state.buildingPromise injection — the cheapest test that locks the behavior at the seam. Cold-state idempotency and rejected-build cases round it out. Three cases, every one load-bearing.
| content = await readFile(source, 'utf8') | ||
| const stats = await stat(source) | ||
| mtimeMs = stats.mtimeMs | ||
| signals = await runtimeSignalStore.get(relPath) |
There was a problem hiding this comment.
observation (semantic deviation on never-tracked topics): runtimeSignalStore.get(relPath) never returns undefined — validateOrDefault ensures a default RuntimeSignals is returned when the sidecar has no entry for relPath (runtime-signal-store.ts:49-52). For a topic that was created without ever going through the sidecar-writing path (rare today, but possible: hand-authored files, restored from a non-tool-mode archive, future migration tools), the captured previousSignals[path] is the default object, not "no signals".
On undo, undoPrune then writes those default signals via runtimeSignalStore.set() — which creates a sidecar entry that didn't exist before the archive. Not strictly an "undo to original state" — the post-undo sidecar inventory now has one more entry than the pre-archive inventory. Defensible (the defaults are now explicit, and the next access bump will mutate them anyway), but worth noting in the docstring or capturing as a discriminator.
If you want strict bit-exactness on this edge case, two options:
- Probe the sidecar's raw existence via
keyStorage.getbefore capturing, and storeundefinedinpreviousSignals[path]when the sidecar entry was genuinely absent. ThenundoPruneeither callsset()ordelete()accordingly. - Have
runtimeSignalStore.getRaw(relPath): Promise<RuntimeSignals | undefined>so the dream-finalize path can distinguish "present-as-defaults" from "absent".
Non-blocking — the deviation only affects never-tracked topics, which are not the dream-scan target population. Worth one line in the previousSignals docstring at l.150-163: "For topics with no pre-archive sidecar entry, the captured snapshot is the system default — undo will create a sidecar where one didn't exist."
| const removed = findRemovedFlagMessage(this.argv, DREAM_REMOVED_FLAGS) | ||
| if (removed) { | ||
| if (argvRequestsJsonFormat(this.argv)) { | ||
| writeJsonResponse({command: 'dream', data: {error: removed, status: 'error'}, success: false}) |
There was a problem hiding this comment.
nit (consistency): The text-mode branch uses this.error(removed, {exit: 1}) (l.98), which throws and sets exit 1. The JSON-mode branch uses writeJsonResponse(...) then return — implicitly exit 0 — but the envelope carries success: false / data.status: 'error'.
The discrepancy is intentional and matches the curate/query precedent (both do the same thing), but a CLI caller wrapping brv dream with set -e will get different shell-script behavior between brv dream --timeout 30 (fails) and brv dream --timeout 30 --format json (succeeds). Two clean options if scripts that branch on exit code matter:
- Document the asymmetry in
removed-flags.ts(it's an implicit contract today). - Set a non-zero exit after the JSON envelope:
process.exitCode = 1; return. The envelope still flushes to stdout cleanly, and shell callers see consistent failure semantics.
Not blocking — the JSON envelope's success: false is the machine-readable truth, and set -e users typically inspect .success. But this is the same asymmetry that bit MCP integrations on the pre-ENG-2837 curate path; worth being explicit about it.
Summary
Lands the full byterover tool-mode project end-to-end:
HTML context-tree format. The 19-element
<bv-*>vocabulary(
bv-topic,bv-task,bv-pattern,bv-rule,bv-diagram, …) isnow the on-disk format for curated topics. Parser, writer, validator
registry, element-axis index, and LLM-ready renderer all ship in
src/server/infra/render/. Markdown topics keep working via theformat detector + extension-aware reader.
Tool mode is the default. Both
brv curateandbrv queryrunwithout any byterover LLM provider configured.
BRV_*_TOOL_MODEenv-var gates have been removed (ENG-2815). The calling agent owns
synthesis; byterover retrieves, renders, validates, and writes.
MCP migrated to provider-free.
brv-queryswaps totask:create type='query-tool-mode'and returns the structuredenvelope (ENG-2836).
brv-curateis replaced bybrv-curate-html— accepts pre-authored<bv-topic>HTML and routesthrough the writer directly, no LLM (ENG-2837). After this lands,
MCP integrations (Claude Desktop / Cursor MCP / etc.) inherit the
no-provider property the CLI already has.
New
brv read <path>command (ENG-2769) for single-topic renderEditorial topic viewer in webui (ENG-2810) for
bv-*HTMLtopics, hardened against untrusted HTML.
Telemetry producers (ENG-2741) attach token/latency/format
metadata to
QueryLogEntryandCurateLogEntryvia a sharedTaskUsageAggregator.What's new
Curate
<bv-topic>schema validation and atomictmp+rename writes (ENG-2737/2738/2739).
CurateOrchestratorstate machine: generate → validate → correct→ write → retry loop (ENG-2766).
ELEMENT_REGISTRY, withprompt-injection-hardened delimiters (ENG-2767).
kickoffSession/continueSessioninstead of the legacy daemon
runAgentBodypath (ENG-2765).createdat/updatedatinjected at write time —the model is not allowed to author these.
confirmOverwriteguard returns existing content for the caller tomerge instead of clobbering silently.
Query
the
canRespondDirectlyconfidence threshold (ENG-2811/2812). Thethreshold was a byterover-side judgment; calling agents now own the
"is this match good enough?" decision.
SearchKnowledgeServiceso HTML topics renderidentically to MD in Tier 2.
docs/curate-protocol.mddocuments the wire envelope.MCP
brv-querymigrated to the tool-mode envelope; renders matchedtopics as markdown sections.
brv-curatereplaced bybrv-curate-html(breaking — see below).src/shared/transport/query-tool-mode-content.tsand
src/shared/transport/curate-html-content.ts.SKILL.md
skippedSharedCountsemantics(ENG-2814).
Breaking changes
MCP
brv-curatetool input shape changes. The previous{context, files, folder}natural-language inputs are gone. The newtool (
brv-curate-html/curate-html-directtask type) accepts{html, confirmOverwrite?}— pre-authored<bv-topic>HTML fromthe calling agent. MCP integrations relying on the natural-language
curate input must update. See
src/server/templates/skill/SKILL.mdfor the authoring guide.
BRV_CURATE_TOOL_MODE/BRV_QUERY_TOOL_MODEenv vars removed.No-op now — tool mode is the only path. Remove from any local
scripts.
useHtmlContextTreeflag removed. HTML is the only curateoutput format. Existing markdown topics still read fine via the
format detector; new writes are always HTML.
Tickets shipped
ENG-2737, ENG-2738, ENG-2739, ENG-2740, ENG-2741, ENG-2765, ENG-2766,
ENG-2767, ENG-2768, ENG-2769, ENG-2810, ENG-2811, ENG-2812, ENG-2814,
ENG-2815, ENG-2822, ENG-2836, ENG-2837.
Rollout
.brv/context-tree/*.mdtopics keep working — read pathis format-aware.
.html. A separate migration script(Phat, parallel work) can re-author legacy MD topics into the
<bv-topic>HTML format and stash originals under.brv/_archive/context-tree-md/. Not included in this PR.brv providers disconnect <all>will now find MCPbrv-queryandbrv-curate-htmlwork end-to-end — same property as the CLI.Test plan
npm run typecheckcleannpm run lintcleannpm test— full unit + integration suite (153 files changed,~15k LOC added; new tests cover HTML reader/writer/elements,
curate-session, query-retrieval, MCP tool-mode envelopes,
telemetry aggregator, format detectors, prompt builder)
npm run buildclean.brv/brv-query+brv-curate-htmlfromClaude Desktop or Cursor MCP
and does not block this PR