Skip to content

fix(rooms): clear stranded compaction active state at turn settlement#67

Merged
lsaether merged 3 commits into
mainfrom
fix/compaction-stale-active-state
May 28, 2026
Merged

fix(rooms): clear stranded compaction active state at turn settlement#67
lsaether merged 3 commits into
mainfrom
fix/compaction-stale-active-state

Conversation

@lsaether

@lsaether lsaether commented May 28, 2026

Copy link
Copy Markdown
Owner

Addresses item 1 of #66.

Problem

compaction_state.active is set by a Hermes context compression started stderr line and cleared by the matching done. If the done never arrives — the compaction failed, or its line was dropped by the lossy stderr pump (STDERR_CAPACITY) — the flag stayed true indefinitely, so /debug/sessions and session/attach snapshots reported a perpetual "compacting" state to late joiners.

Fix

Hermes compacts within a prompt turn, so by the time that turn's response settles, any compaction it triggered has either completed (cleared by done) or been abandoned. Turn settlement is the bound — the same signal a live client uses to clear its own "compacting" affordance (amux/turn_complete).

  • Live path: clear the transient active / pending_hermes_session_id at turn settlement. compaction_count and last_* history are durable and untouched. No frame emitted — live clients already clear on amux/turn_complete.
  • Durable path (added after first review): the live clear is in-memory only, so it didn't survive replay-store restart — hydration rebuilds compaction_state from persisted frames, and a persisted context_compaction_started with no done resurrected active = true. Fixed by applying the same bound during hydration: a persisted amux/turn_complete clears a still-active compaction. amux/turn_complete is broadcast (and persisted) on every prompt-turn settlement, so no new marker frame is needed.

Tests

  • stranded_compaction_active_clears_at_turn_settlement — live clear on turn settlement, no fabricated count.
  • completed_compaction_is_unaffected_by_turn_settlement_clear — happy-path guard.
  • hydration_clears_stranded_compaction_via_persisted_turn_completestartedturn_complete → restart asserts restored active == false (verified it fails without the hydration arm).

cargo test 89 lib + 82 integration green; cargo clippy -D warnings and cargo fmt --check clean.

Scope note

Closes out the actionable part of #66. The other two items are accepted/deferred — rationale in this comment on #66:

  • Item 2 (future double-rotation when Hermes ships _meta.hermes) — contingent on a structured upstream signal not being pursued while we support today's Hermes.
  • Item 3 (stderr/stdout ordering race) — inherent, bounded, documented in code.

#66 stays open until this merges; I'll close it then.

Refs #66.

lsaether added 2 commits May 28, 2026 14:24
`compaction_state.active` is set by a Hermes `context compression
started` stderr line and cleared by the matching `done`. If the `done`
never arrives — the compaction failed, or its line was dropped by the
lossy stderr pump — the flag stayed `true` forever, so /debug/sessions
and session/attach snapshots reported a perpetual "compacting" state to
late joiners (issue #66, item 1).

Hermes compacts within a prompt turn, so by the time that turn's
response settles any compaction it triggered has completed or been
abandoned. Clear the stranded transient flag at turn settlement — the
same signal a live client uses to clear its own "compacting" affordance
(amux/turn_complete). Only the transient active/pending fields reset;
compaction_count and the last_* history are durable and untouched. No
frame is emitted (live clients already clear on amux/turn_complete; this
only realigns snapshot state for future attachers).

Tests: stranded `started` with no `done` clears on turn settlement
without fabricating a count; a completed compaction is unaffected.

Refs #66.
…e restart

The turn-settlement clear added in this branch was in-memory only and
emitted no frame, so it didn't survive restart: hydration rebuilds
compaction_state purely from persisted frames, and a persisted
context_compaction_started with no matching _done resurrected
active=true after a restart — defeating the snapshot fix for persistent
rooms.

Teach hydration the same turn-settlement bound the live path uses: when
rebuild_compaction_from_frame replays a persisted amux/turn_complete and
compaction is still marked active, clear the transient active/pending
fields (count + history untouched). amux/turn_complete is broadcast and
therefore persisted on every prompt-turn settlement, so no new lifecycle
marker frame is needed.

Test: started -> turn_complete -> restart asserts restored
compaction.active == false (verified it fails without the hydration arm).

Refs #66.
clippy::collapsible_match (newer on CI's toolchain than my local one)
flagged the `if compaction_state.active` nested inside the
`amux/turn_complete` match arm. Rewrite as a guarded arm
`"amux/turn_complete" if compaction_state.active =>`. Behavior is
identical — a turn_complete with no active compaction falls through to
the no-op arm. Verified against rustc/clippy 1.96.0 locally.
@lsaether lsaether merged commit d2bd079 into main May 28, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant