Skip to content

ARIA: filter sa_moltbook sub-agent run summaries from commitment extraction#350

Open
holoduke wants to merge 2 commits into
mainfrom
aria/filter-subagent-commitments
Open

ARIA: filter sa_moltbook sub-agent run summaries from commitment extraction#350
holoduke wants to merge 2 commits into
mainfrom
aria/filter-subagent-commitments

Conversation

@holoduke
Copy link
Copy Markdown
Owner

Problem

Today's reflect-tick surfaced two phantom commitments from a sa_moltbook run:

  • "I'll reply to the 6 highest-signal comments"
  • "Let me write a helper that handles verification"

These are intra-run scratch text from the sub-agent's own run summary/details fields. The sub-agent either already completed those actions inside that same run or moved on — they were never promises made to a human channel.

Surfacing them on the COMMITMENT REVIEW block pollutes the noise floor and pushes ARIA toward creating goals that have no real-world referent.

Root cause

getRecentMoltbookActivity() in backend/brain-ticks.ts sources recentMoltbookActivity entirely from run.details || run.summary of the sa_moltbook sub-agent — i.e. sub-agent self-narration about its own run, not the actual public Moltbook post bodies.

buildCommitmentsBlock() in backend/brain-prompt.ts then ran extractAndClassifyCommitments() over those transcripts indiscriminately, treating "I'll reply to X" as if it were a promise made by ARIA to a human.

Fix

  • Skip extractAndClassifyCommitments() on recentMoltbookActivity since the entire source is sub-agent task transcripts (already-executed intra-run narrative).
  • Keep the activity in the prompt for context, but label the section as "sa_moltbook sub-agent run summaries — already executed, NOT personal commitments".
  • Add an ACTION REQUIRED note instructing reflect not to treat those phrases as promises.
  • recentOutgoingActivity (whatsapp DMs, email, brain messages to human channels) still runs through the extractor — that's where commitment language has a real audience.

Verification

npx tsc --noEmit passes.

Memory context

  • n_subagentcomm01 (insight, 2026-05-23): the very observation that triggered this fix.

🤖 Generated with Claude Code

ARIA and others added 2 commits May 20, 2026 22:06
Drift audit 2026-05-20 flagged that the "Active threads" section of the
working-memory prompt was being polluted by promotional/automated streams
(currently the AutoScout24 "Nieuwe matches voor je Zoekopdracht" newsletter
was the *only* active thread shown). Real signal-to-noise on that section
had dropped to 0%.

Defense in depth:
- working-memory.ts: introduce isNewsletterParticipant() and reject new
  threads whose sender/chat matches noreply / no-reply / notifications. /
  newsletter / savedsearches / mailings. / updates@ / bounce, or known
  one-way notification domains (autoscout24, schoolkassa, rdw, anwb
  notifications). Also sweep any pre-existing newsletter threads on every
  update tick — fixes the currently-stuck AutoScout24 entry.
- brain-prompt.ts: filter active threads at render time using the same
  helper, so even if a newsletter slips past the write-time guard via
  another path, the prompt stays clean.

Intent-summary: Newsletter and automation senders were being promoted to "active conversation threads" in the working-memory prompt, crowding out real conversations Gillis is in.
Intent-tokens: newsletter, noise, active-threads, prompt-pollution, working-memory, automation-sender, filter

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…agent run summaries

The reflect-tick commitment-review surface was mining intra-run scratch
text from sa_moltbook sub-agent run summaries/details (e.g. "I'll reply
to the 6 highest-signal comments", "let me write a helper that handles
verification") and surfacing them as personal commitments needing
follow-through. Those phrases were sub-agent self-narration about
actions it already executed within that same run — not promises to a
human channel.

Fix in buildCommitmentsBlock (backend/brain-prompt.ts): the Moltbook
activity coming from getRecentMoltbookActivity() is sourced entirely
from sub-agent run summary/details fields, so stop running
extractAndClassifyCommitments() over it. Still show the activity for
context, but explicitly label the section as "already executed, NOT
personal commitments" and add an action-line note telling reflect not
to treat those phrases as promises.

extractAndClassifyCommitments() is still applied to
recentOutgoingActivity (whatsapp DMs, email, brain messages to human
channels), where the audience is actually a human and commitment
language is meaningful.

Intent-summary: phantom commitments were being surfaced from sub-agent intra-run narrative text because the commitment extractor did not distinguish sub-agent task transcripts from real human-channel messages.
Intent-tokens: subagent, commitment, attribution, phantom, moltbook, transcript, narration

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant