Skip to content

docs: pre-1.0 patch judgement note + Pi model footer matrix fix#469

Open
Nathan Schram (nathanschram) wants to merge 21 commits into
masterfrom
feature/pi-roadmap-and-doc-fixes
Open

docs: pre-1.0 patch judgement note + Pi model footer matrix fix#469
Nathan Schram (nathanschram) wants to merge 21 commits into
masterfrom
feature/pi-roadmap-and-doc-fixes

Conversation

@nathanschram

Copy link
Copy Markdown
Member

Summary

  • Add Pre-1.0 minor judgement paragraph to .claude/rules/release-discipline.md clarifying when opt-in additive features can ship in patches vs minors. Codifies precedent already in use (#409 shipped a config addition in a patch).
  • Fix engine compatibility matrix in README.md: Pi Model in footer cell changes from to ✅⁷ with footnote 7 explaining the supplementary StartedEvent mechanism. Code at src/untether/runners/pi.py:290-307 already does this (since #225); the matrix was stale.

Why

Both fixes surfaced during the Pi engine audit that produced the upcoming roadmap:

The judgement-note edit removes ambiguity for the v0.35.8 RPC runner decision (#170). The matrix fix is a pure code-doc reconciliation — no functional change.

Test plan

  • No code changed — docs only
  • Format: git diff --stat → 2 files, 4 insertions, 1 deletion
  • Local lint (skipped — no Python/code files touched)

Notes

🤖 Generated with Claude Code

Enables `[claude] extra_args = ["--chrome"]` so Untether-spawned Claude
Code sessions can opt into the Claude-in-Chrome extension — previously
the `mcp__claude-in-chrome__*` tool namespace was absent from Untether
sessions because Claude Code 2.1.x gates it behind `--chrome` /
`CLAUDE_CODE_ENABLE_CFC=1`, and Untether never passed the flag.

Mirrors `codex.extra_args` and `pi.extra_args`. Flags Untether manages
internally (`-p`, `--print`, `--output-format`, `--input-format`,
`--resume`/`-r`, `--continue`/`-c`, `--permission-mode`,
`--permission-prompt-tool`) are rejected at config-load with a
`ConfigError` so duplicate-argv surprises fail fast. User args land on
argv after the managed stream-json prelude and before resume / model /
effort / allowed-tools / permission flags, preserving the trailing
`-p <prompt>` (or stdin prompt under permission-mode) position.

- src/untether/runners/claude.py: add `extra_args` field, thread
  through `build_args`, parse + validate in `build_runner`
- tests/test_build_args.py: +8 tests (argv ordering, permission-mode
  argv, multi-flag order, build_runner parsing, reserved-flag rejection
  for individual flags and `key=value` prefixes)
- docs/reference/config.md, docs/reference/runners/claude/runner.md:
  document the new key, including reserved-flag list
- CHANGELOG.md: v0.35.3 (unreleased) entry

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: staging 0.35.3rc1

Stage Claude extra_args (#407) for TestPyPI. This rc1 is the wheel the Mac
Untether instance will install to validate Claude-in-Chrome end-to-end per
docs/audits/2026-04-21-claude-in-chrome-test-plan.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* deps: bump lxml 6.0.2→6.1.0 and python-dotenv 1.2.1→1.2.2

pip-audit flagged two new transitive CVEs after PR #408 merged:
- lxml 6.0.2: CVE-2026-41066 (fix 6.1.0) — pulled via sulguk
- python-dotenv 1.2.1: CVE-2026-28684 (fix 1.2.2) — pulled via
  pydantic-settings

Both have clean fixes. Lockfile-only change; pyproject.toml constraints
unchanged. Local pip-audit clean after bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(security): Group 1A hygiene — 8 issues

Bundles eight low-risk security hygiene fixes for v0.35.3:

- #205 — split runner.start log so prompt content stays at DEBUG
- #206 — flip AMP dangerously_allow_all default to False (opt-in only)
- #207 — Pi session dir created with mode 0o700 + chmod existing
- #208 — extend stderr sanitisation to /Users, /private/var, /tmp,
        /var, /opt, /srv, /etc, /usr/local, /app, /workspace, /root
- #211 — replace stat()+read_bytes() with capped streaming read in
        anyio worker thread; closes TOCTOU window on /file get
- #213 — add OPENAI_PROJECT_KEY_RE for sk-proj-... redaction (the
        underscore/hyphen char set is not covered by the generic
        sk- pattern)
- #402 — bump Pygments 2.19.2 → 2.20.0 via uv lock (CVE-2026-4539
        ReDoS, transitive)
- #403 — replace 123456789:ABCdef… placeholder bot tokens with
        <BOT_ID>:<BOT_TOKEN> in non-test paths (onboarding.py,
        install.md, llms-full.txt); test fixtures kept as-is for
        GitHub-UI dismissal

All 2410 tests pass; ruff check + format clean; uv lock --check ok.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: silence bandit B108 false positive + ignore CVE-2026-3219

- bandit B108 fires on the new /tmp/ regex pattern in
  _PATH_PATTERNS at runner.py — regex for stderr redaction, not
  a hardcoded temp-file write. Suppressed with `# nosec B108`
  matching the existing render.py:111 pattern.

- pip-audit now flags pip 26.0.1 → CVE-2026-3219 (advisory
  published recently; no fix available upstream). Added to the
  --ignore-vuln list alongside CVE-2026-4539 (pygments — kept
  for posterity even though #402 lockfile bump fixed it).

No source/test code changes. CI-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
)

`_daily_cost` is a module-level tuple updated via read-modify-write
in record_run_cost(). Concurrent finalize_run callers could both
read (today, X), both write (today, X + cost), and lose one run's
cost — letting a malicious or runaway concurrent workload defeat
the per-day budget gate.

Fix: wrap the RMW block in a `threading.Lock`. Critical section is
a single tuple assignment (sub-microsecond), so the lock is fine
under both async (cooperative) and threaded callers without an
async-signature ripple. get_daily_cost() also acquires the lock for
snapshot consistency.

Trade-off note: kept the function sync rather than pivoting to
`anyio.Lock` because that would require updating the 6 sync test
call sites and the 1 sync caller in runner_bridge.py — needless
churn for a sub-microsecond critical section.

Test: new ThreadPoolExecutor-driven fuzz test (16 workers, 200
calls) asserts the observed total equals n * unit_cost — would
fail under racing RMW.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings the voice transcription API key into parity with `bot_token`
(closed #196): SecretStr masks the value in repr()/str()/tracebacks
and any accidental structlog serialisation. Access the raw value
via `.get_secret_value()` at the transport boundary.

Changes:
- `settings.py`: field type `NonEmptyStr | None` → `SecretStr | None`;
  new `_validate_voice_key_not_empty` validator preserves the prior
  no-empty-string contract by round-tripping `""`/whitespace to None
- `telegram/bridge.py`: `TelegramBridgeConfig.voice_transcription_api_key`
  annotation → `SecretStr | None`; `update_from()` unchanged (assigns
  SecretStr to SecretStr)
- `telegram/loop.py:2208`: sole unwrap point — call
  `.get_secret_value()` only when non-None before passing to
  `transcribe_voice` (OpenAI SDK still wants raw `str | None`)
- `telegram/voice.py`: unchanged; boundary stays at the loop caller

Tests:
- `test_settings.py`: new `test_voice_transcription_api_key_is_secret_str`
  (round-trip + repr/str masking), `_empty_string_normalised_to_none`
  (whitespace → None), `_default_none` (omitted → None)
- `test_bridge_config_reload.py`: hot-reload tests updated to use
  `.get_secret_value()` for value comparison
- `test_telegram_backend.py`: updated build_and_run assertion

All 2413 tests pass; ruff check + format clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bump rc1 → rc2 to publish a fresh staging wheel that includes:

- #431 — Group 1A security hygiene (8 issues: #205, #206, #207, #208,
        #211, #213, #402, #403)
- #432#379 daily cost tracker race (threading.Lock guard)
- #433#378 voice_transcription_api_key SecretStr

rc1 (b6c6ad6) only carried #407 (Claude extra_args). rc2 supersedes
it on TestPyPI.

No CHANGELOG entry — per release-discipline.md §"Staging / rc
versions", entries batch into the stable bump.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ult (#409) (#435)

Self-installed Untether users in heterogeneous environments need to
thread credential-manager tokens (1Password, Doppler, Vault, Infisical,
…) into engine subprocesses. Today the env allowlist is hard-coded in
`utils/env_policy.py` so adding a single var requires a fork + release.

Changes:
- `utils/env_policy.py`:
  - new `is_allowed_with_extras(name, extra_exact=, extra_prefix=)`
  - `filtered_env()` extended with `extra_prefix=` parameter
  - new `log_user_extensions_once()` — module-level latch emits one
    `env_policy.user_extension` INFO per process when user extras are
    active, so the operator sees the addition in journalctl
- `settings.py` `SecuritySettings`:
  - `env_extra_allow: list[str]` (default `[]`)
  - `env_extra_prefix_allow: list[str]` (default `[]`)
  - field validators reject empty/whitespace and enforce `[A-Z_][A-Z0-9_]*`
- `runners/claude.py`, `runners/pi.py`:
  - new `_load_env_extras()` helper (best-effort settings load — never
    blocks a run on a config error, mirrors the env_audit pattern)
  - threads extras through `filtered_env()` + `log_user_extensions_once()`
- `utils/env_audit.py` `audit_proc_env()`:
  - new `user_extra_exact=`/`user_extra_prefix=` params so user-allowed
    names aren't false-flagged as `claude.env_audit.leaked_var`
- Built-in defaults: `BWS_ACCESS_TOKEN` promoted into `_EXACT_ALLOW`
  (Bitwarden Secrets Manager — common enough to ship as a default).
- Docs: `docs/reference/config.md` `[security]` table, CLAUDE.md
  features list.

Tests: +19 across `tests/test_env_policy.py` (8 user-extension cases +
log latch), `tests/test_env_audit.py` (4 user-extras cases), and
`tests/test_settings.py` (7 round-trip + validator cases).

`uv run pytest` → 2432 passed, 2 skipped; ruff clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bump rc2 → rc3 to publish a fresh staging wheel that includes #435.

Cumulative since rc1:
- #431 — Group 1A security hygiene (8 issues: #205, #206, #207, #208,
        #211, #213, #402, #403)
- #432#379 daily cost tracker race (threading.Lock guard)
- #433#378 voice_transcription_api_key SecretStr
- #435#409 user-extensible env allowlist + BWS_ACCESS_TOKEN default

No CHANGELOG entry — per release-discipline.md §"Staging / rc versions",
entries batch into the stable bump.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
) (#437)

#377 fix:
- `TelegramTransportSettings` gains `allow_any_user: bool = False` (opt-in
  escape hatch) and `_validate_allowed_user_ids_or_optin` model_validator
  raising ValueError when `allowed_user_ids == []` and `allow_any_user is
  False`. Pre-v0.35.3 the empty default silently shipped open bots —
  this is the v0.35.3 promotion of the warning to a hard ConfigError.
- `TelegramBridgeConfig` and `update_from()` carry the new field through
  hot-reload; backend constructs with the value.
- `telegram/loop.py` drops the per-update `security.no_allowed_users`
  warning (validator now blocks startup) and emits
  `security.allow_any_user` INFO every boot when the opt-out is in
  effect.
- `config_migrations.py` `_migrate_legacy_telegram` relocates a top-level
  `allow_any_user` key into `[transports.telegram]` alongside `bot_token`
  / `chat_id` so legacy configs migrate cleanly.

CHANGELOG: backfilled `## v0.35.3 (unreleased)` with `### breaking`,
`### changes`, `### fixes` subsections covering all 13 issues that
shipped in rc1-rc4 (#205, #206, #207, #208, #211, #213, #377, #378,
#379, #402, #403, #407, #409). Per release-discipline.md the section
heading stays `(unreleased)` until the dev → master stable bump
populates the date.

Docs sweep:
- `docs/how-to/security.md` — required-allowlist wording, dev/demo
  opt-out callout, env_extra_allow / env_extra_prefix_allow extension
  guide, sk-proj redaction note, voice-key SecretStr note.
- `docs/how-to/troubleshooting.md` — new top-of-page section for
  `allowed_user_ids is empty` startup error.
- `docs/how-to/group-chat.md` — required wording.
- `docs/how-to/operations.md` — `env_extra_allow` + `allow_any_user`
  added to hot-reloadable list.
- `docs/tutorials/install.md` — `allowed_user_ids` added to all three
  example configs (assistant / workspace / handoff).
- `docs/reference/config.md` — `allow_any_user` row added,
  `allowed_user_ids` flipped to required, AMP `dangerously_allow_all`
  default note flipped to `false`.
- `docs/reference/runners/amp/runner.md` — flag is now optional;
  `dangerously_allow_all = false` example.
- `docs/reference/env-vars.md` — `BWS_ACCESS_TOKEN` default mention,
  `[security] env_extra_*` extension subsection.

Test fixtures:
- ~30 test fixtures across `test_settings`, `test_cli_*`,
  `test_projects_config`, `test_telegram_backend`,
  `test_bridge_config_reload`, `test_config_watch`,
  `test_config_path_env`, `test_onboarding*`, `test_runtime_loader`,
  `test_settings_contract`, `test_exec_bridge` patched to add
  `allow_any_user = true` (or `"allow_any_user": True`) where the
  fixture exercises non-allowlist behaviour. Tests that specifically
  cover #377 use `populated allowlist` cases.

#377 tests: 4 new in `test_settings.py` covering block + opt-out +
populated + both-set.

GitHub housekeeping (parallel to this commit, not in the diff):
- Closed #205, #206, #207, #208, #211, #213, #378, #379, #402, #403,
  #409 with implementation references. #377 closes via this PR's body.

Version: 0.35.3rc3 → 0.35.3rc4 (`pyproject.toml`, `uv.lock`).

Verification: 2436 tests pass / 2 skipped (~68s). Ruff check + format
clean. uv lock --check in sync.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the literal "Basic dXNlcjpwYXNz" string in test_malformed_bearer_header
with a runtime-constructed header so GitHub's secret-scanner stops flagging it.
The test still asserts verify_auth rejects Basic auth — Untether webhooks only
accept Bearer + HMAC.

The corresponding GitHub secret-scanning alert is a true false positive (test
fixture, not a real credential) and will be dismissed in the GitHub UI as
"Used in tests / false positive".

Closes #404
…-approve safety (#380) (#442)

The 2026-04-20 audit (§ASI02) flagged
``ControlRewindFilesRequest`` and ``ControlMcpMessageRequest`` as worth
a deeper look because rewind could in principle undo state that drove a
prior denial decision and MCP messages could carry tainted payloads
from a compromised MCP server.

Audit verdict: both are safe to auto-approve under the current upstream
Claude Code 2.1.x trust model.

- mcp_message: Untether is a transport pass-through; the message
  payload is opaque storage and is never inspected, executed, or
  rendered. A compromised MCP server is the inherent threat model of
  any MCP server, not specific to auto-approve. Routing this through
  Telegram approval would not block the payload.
- rewind_files: rewind is user-initiated upstream (the model cannot
  trigger it autonomously). Untether's per-session approval state
  (_PLAN_EXIT_APPROVED, _DISCUSS_APPROVED, _HANDLED_REQUESTS) is NOT
  mutated by rewind. Subsequent writes still pass through the standard
  ControlCanUseToolRequest gate.

No code change beyond:

1. Multi-paragraph safety-invariant comment in
   src/untether/runners/claude.py near _AUTO_APPROVE_TYPES, including
   the re-audit trigger (upstream semantic change to either subtype).
2. 3 regression-lock tests in
   tests/test_claude_control.py::TestAutoApproveSafetyInvariant
   that fail loudly if the auto-approve path starts inspecting payloads
   or coupling to per-session approval state.
3. Audit memo at docs/audits/2026-04-27-380-auto-approve-scope-review.md.

Closes #380

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (#440)

The chat-level message-routing command (`all` / `mentions` / `clear`)
shared a name with the unrelated webhook/cron triggers system, which
became increasingly confusing as `/config` grew separate trigger pages.

User-visible changes:
- New `/listen` command (`all`/`mentions`/`clear`) replaces `/trigger`
- `/trigger` continues to work as a deprecated alias for one release
  cycle and prepends a one-line deprecation notice
- `/config → 📡 Listen` page replaces `📡 Trigger`
- Home page summary renders `Listen: all` instead of `Trigger: all`
- Bot command menu lists `listen` instead of `trigger`

Internal renames:
- `telegram/trigger_mode.py` → `telegram/listen_mode.py`
- `commands/trigger.py` → `commands/listen.py`
- Type `TriggerMode` → `ListenMode`
- Function `resolve_trigger_mode` → `resolve_listen_mode`
- ChatPrefsStore / TopicStateStore: new `*_listen_mode` methods;
  legacy `*_trigger_mode` methods preserved as one-release aliases

Storage: msgspec field is still named `trigger_mode` for backward
compat with existing `telegram_chat_prefs_state.json` /
`telegram_topics_state.json` files. No migration is needed.

Tests: full suite passes (2438 passed, 2 skipped). Two new tests in
test_telegram_agent_trigger_commands.py cover the deprecation prefix
and clean `/listen` output. test_config_command toast expectations
updated to "Listen: ...".

Closes #297

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a global pause control for the trigger system (crons + webhooks)
accessible via /config in Telegram. During pause:
- Cron scheduler skips its tick — run_once crons are NOT consumed and
  fire on the next matching tick after resume
- Webhook server returns 503 (with Retry-After: 60) instead of
  dispatching, so external monitors can distinguish paused-but-up from
  healthy. Returns 404 for unknown paths as before
- /health endpoint surfaces {"status":"paused","paused":true}

Pause is in-memory only — restart auto-resumes. This is the safe
default per the issue's recommendation, and mirrors /at scheduler
behaviour.

UI:
- New /config home-page row "⏸ Pause triggers" / "▶️ Resume triggers"
  appears only when triggers are configured
- New dedicated "📡 Triggers" page (config:tg) showing state + counts
  with Pause/Resume button; gracefully handles no-trigger-manager
  and zero-config cases
- /ping shows "⏸ triggers paused: … (suspended)" indicator while paused

Tests: 15 new tests across test_trigger_manager.py (8 pause toggle
behaviours including 503 webhook check), test_ping_command.py
(2 paused/resumed indicators), and test_config_command.py
(5 TestTriggersPage covering unavailable/empty/pause/resume/toast).
Full suite: 2445 passed, 2 skipped.

Closes #294

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…fication (#438) (#443)

Adds [watchdog] claude_stream_idle_timeout_ms (default 300_000 ms,
range 30 s – 30 min) so deployments hitting upstream Anthropic API
stalls on long opus 4.7 1M plan-mode generations can raise the
watchdog without forking the codebase. Untether's Claude runner reads
the value via setdefault — shell-set CLAUDE_STREAM_IDLE_TIMEOUT_MS
still wins. Settings load failure falls back to the hardcoded 300_000
default with a debug log entry.

Type-A vs Type-B classification on the failure message:

- Type A — mid-generation stall (num_turns >= 1 && duration_api_ms > 0).
  Often legitimate long opus reasoning that exceeded the watchdog.
  Inline hint suggests raising the new config knob.
- Type B — cold-start zero-byte stall (num_turns <= 1 && duration_api_ms
  == 0). Upstream API outage — raising the timeout will NOT help.
  Inline message says so explicitly.

Auto-retry on Stream idle timeout deferred to v0.35.4 pending upstream
Anthropic stabilisation (8 duplicate api:anthropic issues filed
2026-04-17→26 across macOS/Windows/web/WSL).

Tests: 5 new tests in test_claude_runner.py. Full suite 2460 passed,
2 skipped. Lint clean.

Closes #438

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…410) (#444)

Promotes claude_usage.schema_mismatch from one-shot per-process to
per-call counter so the issue-watcher catches ongoing API-shape drift
instead of just the first hit. Structured event carries a cumulative
`count` field; new runner_bridge.get_usage_schema_mismatch_count()
exposes the counter for the debug page.

UsageCacheStats added to utils/usage_cache.py tracking last successful
fetch wall time, cache age, last-error class+message; populated on
every fetch path including stale-while-error fallbacks.

_read_token_expiry_ms() added to telegram/commands/usage.py so the
OAuth token expiry can be surfaced without raising on missing
credentials (best-effort: returns None on any read failure).

/usage debug appends a 🔧 debug block (HTML) showing:
- last successful fetch (UTC ISO + age + fresh/stale label)
- last error (class + message, 120-char truncated)
- OAuth token expiry (with hh/mm remaining)
- cumulative schema-mismatch counter

Operator-facing signal so the next time the subscription footer goes
silent, the root cause is visible without grepping journalctl.

Tests: 5 new in test_usage_cache.py::TestCacheStatsObservability;
1 in test_command_engine_gates.py::TestUsageDebugMode; existing
test_schema_mismatch_warning_fires_once repurposed to assert per-call
firing with cumulative counts. Full suite: 2465 passed, 2 skipped.

Closes #410

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n + last-fired history + /stats breakdown (#271) (#445)

Tier 2: `/config → ⏰ Triggers` now lists every cron and webhook configured
for the current chat. Crons render as `id · describe_cron(...) · proj · eng ·
last X` and webhooks as `id · path · auth · proj · eng · last X`. Lists are
scoped via `crons_for_chat`/`webhooks_for_chat` with the bridge default_chat_id
fallback, capped at 10 entries with an overflow marker, and omitted when the
chat has no triggers (pause/resume controls remain regardless).

Tier 3: new `triggers/history.py` JSON store at
`<config_path>.with_name("triggers_history.json")`. Records `time.time()`
after every successful cron dispatch (cron.py:130) and webhook dispatch
(dispatcher.py:dispatch_webhook + dispatch_action). Recording is best-effort
— OSError writes log `triggers.history.write_failed` and swallow.

`/stats` appends `(N triggered, M manual)` per engine line and on the totals
row when at least one count > 0. `DayBucket`/`AggregatedStats` carry additive
`triggered_count`/`manual_count` with `.get(..., 0)` fallbacks so existing
stats.json files load cleanly. `runner_bridge.handle_message` resolves the
split via `triggered=bool(context and context.trigger_source)`.

28 new tests: 10 in test_triggers_history.py (round-trip, corrupt JSON,
version mismatch, persistence), 7 in test_session_stats.py (triggered/manual
split, back-compat with old format), 3 in test_stats_command.py (breakdown
present/omitted/totals), 7 in test_config_command.py::TestTriggersPagePerChat
(crons listed, webhooks listed, chat filtering, default_chat_id fallback,
last-fired rendering, overflow cap), 2 in test_trigger_cron.py (cron firing
records last_fired + history failure resilience), 2 in
test_trigger_dispatcher.py (webhook records last_fired + history failure
resilience). Full suite: 2496 passed, coverage 82.18%.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…) (#446)

After a Claude bidirectional session emits `result`, the CLI keeps stdin
open so multi-turn sessions don't re-spawn. In practice this leaves a
400 MB RSS subprocess + ~200 TCP sockets idling for 30+ minutes between
prompts, and from the user's perspective the session looks "stuck" —
final message rendered, no further indication of state.

Option D hybrid:
- New `[watchdog].post_result_idle_enabled = true` (kill switch) and
  `[watchdog].post_result_idle_timeout = 600.0` (30s–1h) in settings.
- `ClaudeStreamState.result_received_at` armed by `translate_claude_event`
  on every `StreamResultMessage` (re-armed per turn so multi-turn works).
- New `ClaudeRunner._post_result_idle_watchdog` task runs in the existing
  `run_impl` task group when `use_control_channel` is True. Polls the
  timer; when the deadline passes, calls `this_proc_stdin.aclose()`
  (same mechanism as the normal-flow exit at line 2412, just earlier).
  CLI hits stdin EOF and exits gracefully (rc=0).

- Auto-continue safety: the existing `_should_auto_continue` gate
  excludes `last_event_type == "result"` (locked by
  `test_skips_result_event_type` in test_exec_bridge.py), so the clean
  rc=0 exit will not phantom-resume the session.
- Approval-state guard: if `_REQUEST_TO_SESSION` or `_PENDING_ASK_REQUESTS`
  has live entries for this session, defer the close (re-arm the timer)
  to avoid orphaning a button-click control_response in flight.

UX hint #1: a supplementary `StartedEvent` with `meta={"complete":
"✓ turn complete"}` is emitted alongside `CompletedEvent` on successful
results (the supported pattern for late-arriving meta per
runner-development.md). `markdown.format_meta_line` renders it in the
footer so the user sees the turn boundary immediately. Errored results
don't get the hint (no false "complete" tag on a failure).

Two structlog events for ops:
- `claude.post_result_idle.deferred` — approval guard suppressed close
- `claude.post_result_idle.closing_stdin` — deadline passed, stdin closed

7 new tests in test_claude_runner.py: result-event arms timer, emits
turn-complete meta, skips meta on error, watchdog fires when clean,
watchdog defers when pending approval, format_meta_line renders the hint
when present and omits it when absent. Full suite: 2503 passed.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#447)

Closes #269. The four settings groups in the issue had different states:
- [footer]: already loads fresh per-message via _load_footer_settings (no work)
- [cost]: already loads fresh per-call inside _check_cost_budget (no work)
- [watchdog]: already loads fresh per-run via _load_watchdog_settings at the
  top of handle_message (no work — verified, applies on next run)
- [progress]: was baked in at startup via MarkdownFormatter constructor +
  ExecBridgeConfig.min_render_interval — this PR closes that gap

Changes:
- markdown.py: new MarkdownFormatter.refresh_from(progress_settings) updates
  max_actions + verbosity from a fresh ProgressSettings snapshot. Tolerates
  missing/invalid attributes (clamps negative max_actions to 0; ignores
  unknown verbosity values).
- telegram/bridge.py: new TelegramPresenter.refresh_progress_settings()
  delegates to formatter.refresh_from.
- runner_bridge.py: new _load_progress_settings() sibling of
  _load_footer_settings / _load_watchdog_settings; handle_message reads it
  fresh per-run, calls cfg.presenter.refresh_progress_settings(...) via
  duck-typed getattr (Presenter is a Protocol, so we don't add to it), and
  threads progress_cfg.min_render_interval into each ProgressEdits instance
  instead of the startup snapshot. Per-chat /verbose overrides downstream
  of _resolve_presenter reconstruct from the refreshed defaults.

Out of scope (entry-point limitation): engine + command registration still
require pipx upgrade / restart. Documented on the issue.

8 new tests in tests/test_meta_line.py: TestMarkdownFormatterRefresh covers
max_actions update, verbosity update, negative clamp, invalid-verbosity
rejection, missing-attribute tolerance, presenter delegation. Plus
_load_progress_settings defaults / error-fallback. Full suite: 2511 passed.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All 9 v0.35.3 Group 2 issues now landed on dev:

- #404 — secret-scanning alert (PR #439)
- #297 — /trigger → /listen rename + alias (PR #440)
- #294 — master trigger pause/resume toggle (PR #441)
- #380 — auto-approve scope review (PR #442)
- #438 — claude_stream_idle_timeout_ms + Type-A/B classification (PR #443)
- #410 — subscription usage observability + /usage debug (PR #444)
- #271 — trigger visibility Tier 2 + Tier 3 (PR #445)
- #333 — Claude post-result idle timeout + ✓ turn complete UX hint (PR #446)
- #269 — hot-reload [progress] settings (PR #447)

Bumps to TestPyPI for staging via @hetz_lba1_bot once integration tests
U1-U7 pass against @untether_dev_bot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small docs-only updates:

- .claude/rules/release-discipline.md — add "Pre-1.0 minor judgement"
  paragraph clarifying that opt-in additive features behind config flags
  may ship in patches if they preserve all current defaults. Codifies
  precedent set by [security] env_extra_allow (#409 in v0.35.3).
  Removes ambiguity for upcoming v0.35.8 (Pi RPC, #170) decision.

- README.md — fix engine compatibility matrix: Pi "Model in footer" cell
  changes from "—" to "✅⁷" with footnote explaining Pi populates the
  footer model from a supplementary StartedEvent carrying the model name
  extracted from message_end (#225). Code at src/untether/runners/pi.py:290-307
  has implemented this since #225 landed; matrix was stale.

Related: docs/plans/2026-05-02-pi-engine-enhancements.md (local plan,
gitignored) — covers the broader Pi enhancement roadmap that motivated
both fixes. Issues created: #460-#468 + promotions of #170 → v0.35.8 and
#180 → v0.35.9.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 3, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ad69bb2d-6469-4511-9081-202522ec07bf

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/pi-roadmap-and-doc-fixes

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

- docs/reference/dev-instance.md: add Engine auth & defaults section
  (matrix of CLI ver, auth method, default provider/model, cred location
  for Claude/Codex/Gemini/OpenCode/Pi/AMP), smoke verification commands,
  per-engine notes (OpenCode google-whitelist=[], Pi Kimi K2.6 zero-cost,
  Gemini trusted folders + #471, AMP mode-not-model routing). Also
  document the three special-purpose instances (demo, dev-hf, dev-ws)
  and the lockfile PID-reuse race
- docs/reference/integration-testing.md: cross-reference the new auth
  section so test runs check current state before tier 1
- CLAUDE.md, .claude/rules/dev-workflow.md: extend 2-instance section to
  the full 5-instance topology, clarifying that demo/dev-hf/dev-ws share
  the editable .venv but are not on the release path
- .claude/hooks.json: dev-workflow-guard now recognises all 4
  editable-source services (untether-dev, untether-dev-hf,
  untether-dev-ws, untether-demo) and only blocks restarts of staging

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Base automatically changed from dev to master May 26, 2026 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant