ARIA: evidence-linking for working-memory tracking by holoduke · Pull Request #354 · holoduke/myagent

holoduke · 2026-06-02T10:10:55Z

Summary

Adds optional evidence fields (evidenceObsId, evidenceSender, evidenceTs) to shortTermTracking items so the brain must point at WHICH message proved a status change when marking items resolved/afgerond.

Display format: X afgerond (evidence: msg <id> from <sender> at <ts>)

Why

Today's capability-demo failure (2026-06-02, 11:01–11:41) had Ilse send a one-word "Oke!" about camping-kleden which the brain mis-attributed as acknowledgment of an agenda-request that ARIA never actually answered. Gillis corrected at 11:44.

Root cause: tracking items had no link back to the observation that supposedly resolved them. The brain could mark agenda-request afgerond from any nearby positive-feeling string, regardless of whether that string was actually about the agenda.

Aligns with the existing feedback_working_memory_honesty rule — never mark tasks "afgerond" without direct evidence.

What changed

backend/memory/types.ts — new ShortTermTrackingItem union: string | { text, evidenceObsId?, evidenceSender?, evidenceTs? }. Plain strings remain valid for in-flight notes; structured form is required for resolved items.
backend/memory/working-memory.ts — trackingItemText() helper renders structured items with an inline evidence suffix for prompt/UI display.
backend/brain-prompt.ts — new EVIDENCE-LINKING guideline tells the model: if you cannot point at a specific message that completes the item, do NOT mark it resolved.
backend/system-prompt.ts, backend/memory/reconstruction.ts — use the helper to render tracking items.
frontend/app/types/aria.ts + pages/overview.vue + pages/memory.vue — UI handles both forms via the same helper.

Test plan

npx tsc --noEmit passes
vitest run on working-memory/cognitive-load/scene-predictor/digest-template tests — 58/58 pass
After deploy: verify next time the brain marks something afgerond, the structured form is emitted with a valid evidenceObsId
After deploy: verify a stray one-word "Oke!" no longer triggers a spurious afgerond on an unrelated tracking item

🤖 Generated with Claude Code

Daily and weekly summaries in working-memory previously cut off at a hard char count, often mid-word. Replace with smartTruncate(): prefer the last sentence boundary (. ! ?) within the budget, fall back to last whitespace, then hard cap. Boundary must lie within 60% of the budget to avoid losing most of the content to a stray early period. Same char budgets (200 daily / 300 weekly) — just smarter cutoff. Intent-summary: Recent-days/Recent-weeks summaries truncated mid-word, costing parse-effort on every reflect tick. Intent-tokens: truncation, midword, summary, boundary, readability, temporal Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds optional evidence fields (obsId, sender, ts) to shortTermTracking items so the brain must point at WHICH message proved a status change when marking items resolved/afgerond. The display format is: 'X afgerond (evidence: msg <id> from <sender> at <ts>)'. Today's capability-demo failure (11:01-11:41) had Ilse send a one-word "Oke!" about camping-kleden which the brain mis-attributed as acknowledgment of an agenda-request that ARIA never actually answered. Without a structural evidence link, the brain has no constraint forcing it to verify the "Oke!" actually addressed the open item — it only needed the strings to feel adjacent. Implementation: - New ShortTermTrackingItem union (string | { text, evidenceObsId?, evidenceSender?, evidenceTs? }) - trackingItemText() helper renders structured items with the evidence suffix for prompt/UI display - Brain prompt teaches the model the structured form and explicitly states: if you cannot point at a specific message that completes the item, do NOT mark it resolved - Frontend types/views handle both forms via the same helper Plain-string entries remain valid (back-compat) for in-flight notes. Intent-summary: working memory hallucinated task completion from an unrelated short acknowledgment because tracking items had no link back to the message that supposedly resolved them Intent-tokens: evidence, hallucination, completion, working-memory, ack, attribution, audit Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

ARIA and others added 2 commits June 1, 2026 10:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARIA: evidence-linking for working-memory tracking#354

ARIA: evidence-linking for working-memory tracking#354
holoduke wants to merge 2 commits into
mainfrom
aria/evidence-linked-tracking

holoduke commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

holoduke commented Jun 2, 2026

Summary

Why

What changed

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant