Skip to content

bug(pi): resume fails — promoted short session-id can't be resolved by --session (every follow-up errors "no agent_end") #565

Description

Symptom

On the channelo auditor-pi Pi chat (cwd /home/nathan/auditor-toolkit/main), the first message in a fresh Pi session works, but every follow-up fails with (verbatim from Telegram):

error · pi · 4s
pi finished without an agent_end event
session: 019e4936 · resumed

(Two iPhone screenshots captured locally at incoming/file_7869.jpg and incoming/file_7870.jpg, sessions 019e491f and 019e4936.) Pi is effectively single-turn over Telegram — any second message in a session errors.

Repro: send 2+ messages in any one Pi chat. First works; second onward errors.

Root cause (confirmed from channelo logs + on-disk session files)

The resume token Untether stores and replays cannot be resolved by Pi's --session flag.

Three consecutive runs, chat -5017167845:

time resume token --session arg result
16:26:19 (fresh) /home/nathan/.pi/agent/sessions/--home-nathan-auditor-toolkit-main--/2026-05-21T06-26-19...+00-00_356a44a199d14842ae8ae3ac97343f86.jsonl (file path) ✅ AgentEnd, ok=True → then pi.session.promoted new=019e4936 old=<path>, session.resume.saved resume=019e4936
16:27:25 019e4936 --session 019e4936 (bare short id) rc=0 in ~1.5s, no AgentEndpi.stream.no_agent_end

So a working fresh run (resumed by path) gets its resume token overwritten with a short session id, and the next turn feeds that id to --session, which can't resolve it.

Why the short id can't resolve — two compounding defects:

  1. Filename ≠ Pi's internal session id. _new_session_path (src/untether/runners/pi.py:612) names files with a uuid4 hex, e.g. ..._356a44a199d14842ae8ae3ac97343f86.jsonl, but Pi writes its own UUIDv7 id inside the file: 019e4936-af45-7124-8749-8fdb5350b045. Verified by comparison:

    • Native pi-created file: ..._019e48fe-8caa-70eb-86ae-f89cac979aca.jsonl → internal id 019e48fe-8caa-70eb-86ae-f89cac979aca (filename embeds the id).
    • Untether-created file: ..._356a44a199d14842ae8ae3ac97343f86.jsonl → internal id 019e4936-af45-7124-8749-8fdb5350b045 (no relation).

    Pi 0.74.1 --session <path|id> = "specific session file or partial UUID" — but it can't map 019e4936 back to the 356a44a1…-named file.

  2. Truncation to a non-unique prefix. _short_session_id (src/untether/runners/pi.py:102-109) cuts the id at the first -, yielding 019e4936 — the UUIDv7 millisecond-timestamp segment. Sessions created close in time collide on it (the session dir shows many 019e48fd/48fe/4936… ids), so it's an ambiguous lookup key even when a match exists.

The promotion (_maybe_promote_session_id, pi.py:112-126; introduced/refined in dd31bbf and 8cf4246) was meant to produce a short, typable resume line — but the promoted value is fed straight into --session on the next turn (pi.py:462-465), where it doesn't resolve. The footer resume line pi --session 019e4936 is broken for users for the same reason.

Affected code

  • _maybe_promote_session_id / _short_session_idsrc/untether/runners/pi.py:102-126
  • build_args --session branch — src/untether/runners/pi.py:462-465
  • _new_session_path filename scheme — src/untether/runners/pi.py:612-625
  • new_statesrc/untether/runners/pi.py:492-500

Proposed fixes (choose/validate)

  • (A, recommended) Resume by the file path Untether already owns. Stop overwriting the resume token with the short id; persist the session file path as the resume token and pass --session <path> every turn (path form is explicitly supported and provably works above). Keep _short_session_id only for footer display, decoupled from the resume value.
  • (B) Name session files by Pi's own id so partial-UUID resume resolves — harder, since the id isn't known until Pi emits the SessionHeader.
  • (C) Don't truncate (promote the full UUIDv7) — insufficient alone, because the Untether filename still doesn't contain it; needs (A) or (B) too.

Guard/UX: on a resumed run, treat rc=0 + no AgentEnd as a clear "couldn't resume session" error (and/or fall back to --continue) instead of the cryptic "finished without an agent_end event".

Why this wasn't auto-caught

  • Not a crash — rc=0, and pi.stream.no_agent_end is only a WARNING, so untether-issue-watcher error patterns don't file it. Follow-up: consider promoting pi.stream.no_agent_end on a resumed run to error-tier signal.
  • Pi version split / suspected regression: channelo runs Pi 0.74.1; the dev-bot host (lba-1, @untether_dev_bot) runs Pi 0.72.1 (both advertise --session <path|id>). Promotion likely resolved under 0.72's partial-UUID lookup but stops under 0.74's — so Tier-1 integration on the older dev-bot Pi may still pass. Confirm by running a multi-turn Pi resume on dev-bot (0.72.1); test the fix on both versions. If 0.72 also fails, the bug is latent across versions and Tier-1 wasn't exercising Pi multi-turn.

Verification (for the fix)

  • Unit (tests/test_pi_runner.py): build_args on resume passes a resolvable token; promotion no longer clobbers the resume value used for --session.
  • Integration (@untether_dev_bot Pi chat): message, then a follow-up referencing the first — second turn must succeed and retain context. Confirm on channelo auditor-pi after rollout.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingengine:piPi CLI (badlogic/pi-mono)severity:majorSignificant functionality broken or seriously degraded; workaround exists but painful

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions