Skip to content

Releases: CelestialCreator/pocket-pi

v0.4.0 — UI automation + baked control skill

13 May 18:59

Choose a tag to compare

v0.4.0 — agent has the phone and the screen

Pocket Pi's first release where the agent can actually drive other apps. On top of v0.3's phone surface (notifications, intents both directions, share-sheet, camera, mic, location, clipboard, deep-link inbox), v0.4 adds a full UI-automation surface and the playbook to use it.

What's new

19 new /ui/* tools for the agent, mirroring an AccessibilityService vendored from KarryViber/orb-eye (MIT). The agent can now:

  • Read any app's element tree (pocket_pi_ui_screen, pocket_pi_ui_find)
  • Dispatch taps, swipes, long-presses, scrolls, multi-finger gestures (pocket_pi_ui_tap, _click, _swipe, _scroll, _long_press, _gesture)
  • Set text on focused fields (pocket_pi_ui_type)
  • Take screenshots (Android 11+, pocket_pi_ui_screenshot)
  • Fire global actions — back / home / recents / notifications shade / quick settings (pocket_pi_ui_global)
  • Buffer system notifications + long-poll a window-change + notification event channel for proactive behaviour (pocket_pi_ui_notifications, pocket_pi_ui_poll_events)

All routed through the same bearer-token-gated localhost bridge as v0.3 — no companion APK, no root, no shell setup.

Baked-in pocket-pi-android-control skill at ~/.pi/agent/skills/. Shipping the tools alone wasn't enough — flash-tier models would call pocket_pi_intent_send once, hit a wall, and give up. The skill anchors the fallback chain (deep link → generic intent → launch by package + UI drive → verify) with six worked examples and anti-patterns. With it loaded, the same models that bailed yesterday now complete full chains.

One-time setup on first install

Android forbids programmatic enablement of AccessibilityService. The app's onboarding pane deep-links straight to Settings → Accessibility → Pocket Pi — tap "Use Pocket Pi", accept the system consent dialog, and the pane self-dismisses. UI tools fail gracefully with 403 until the toggle is on.

Validated on emulator

  • Fresh install → bootstrap → AccessibilityPane → Settings deep-link → toggle → consent → service bound (singleton non-null within 2 s of the poll)
  • All 19 /ui/* endpoints return ok end-to-end (info, screen, find, tap, click, type, swipe, scroll, long_press, gesture, global, wait, notifications, screenshot, events/poll)
  • Driving Settings → Battery navigation via find + tap → verified by screencap
  • Proactive notification reactor: agent calls ui_poll_events (long-poll), notify fires from a side channel, agent unblocks within ~100 ms and reads the title/body back

What's next (v0.5)

  • Working shell-session tab in the dashboard (node-pty android-arm64 prebuild still missing; stub no-ops on tab open)
  • Background location escalation when a real use case lands
  • Custom-prefix bootstrap so applicationId can move off com.termux

Credits

  • orb-eyeKarryViber for the AccessibilityService that powers v0.4's UI surface. Vendored under MIT with attribution preserved in source.
  • Pi coding agentMario Zechner + earendil-works for the runtime that powers every chat turn.
  • pi-agent-dashboard + pi-anthropic-messagesBlackBelt Technology for the WebView chat UI and the Anthropic OAuth bridge.
  • Termux — for the Linux-on-Android runtime.

Install

Sideload pocket-pi-v0.4.0.apk (68 MB, arm64-v8a). Allow "install from this source" on first run. First launch downloads/extracts the bundled runtime (~30 s on modern phones).

v0.3.1 — dashboard 0.5.3 + apt-retry fix

13 May 11:35

Choose a tag to compare

v0.3.1 — dashboard 0.5.3 + apt-retry fix

Bumps the bundled chat UI from pi-agent-dashboard@0.4.6 to 0.5.3 and fixes a postinstall bug that masked critical apt packages on flaky mirrors. Backwards-compatible with v0.3.0; users on v0.3.0 should re-run setup or wipe app data to get the postinstall fix.

What's new

Bigger dashboard

  • Pinned @blackbelt-technology/pi-agent-dashboard@0.5.3 (was 0.4.6). 7 minor/patch versions of upstream improvements:
    • Node 25 crash safety net — unhandledRejection + uncaughtException handlers in cli.ts so a misbehaving plugin can't kill the dashboard under Node 25's fatal-by-default async-fault policy.
    • tsxjiti consolidation. The dashboard's spawn shim resolves jiti from pi's tree and re-execs node with --import <jiti>; we no longer hand-roll the loader.
    • Chat markdown: agent can inline local images (![alt](/abs/path.png)) and LaTeX math ($$...$$).
    • /skill:foo invocations render as collapsible cards instead of dumping the expanded body into the chat bubble.
    • Slash-command :- aliases — /skill-foo and /skill:foo both work.
    • Provider catalogue auto-mirrors pi's full provider list (DeepSeek, Fireworks, Cerebras, HuggingFace, Vertex, Bedrock, etc.) — was a hardcoded 8-item subset.
    • Tunnel watchdog — auto-recycles stale zrok tunnels (toggle off by default in this build since we don't bundle zrok).
  • PiBridge simplified to invoke pi-dashboard start directly via the npm-installed shim, instead of node --import tsx-loader cli.ts start.

Postinstall robustness

  • apt-retry second pass. The original postinstall ran xargs -a packages.txt pkg install -y 2>/dev/null || true once and only retried nodejs if npm was missing — so when a Termux mirror flaked mid-batch, packages like git, python, ripgrep silently never installed, breaking pi-anthropic-messages (which needs git clone) and leaving the dashboard with a "1 required missing" amber banner. Now a second pass identifies what didn't land (via command -v + dpkg -s) and re-runs just the missing set.
  • Legacy @mariozechner/pi-coding-agent scope dropped from npm-packages.txt. The canonical @earendil-works/pi-coding-agent@0.74+ supersedes it and ships jiti; the old scope tops at 0.73.x and its bin/pi symlink collides with the new scope. Upgrading from v0.3.0? Postinstall explicitly npm uninstall -g @mariozechner/pi-coding-agent first.

Diagnostic noise suppression

  • mkdir -p $HOME/.pi-dashboard so the dashboard's "Managed install not created" warning stops firing. Pocket Pi never uses that path; everything lives under $PREFIX via tool-overrides.json.
  • Pre-seed ~/.pi/dashboard/config.json with tunnel.enabled=false. Zrok-watchdog noise gone; re-enable in Settings → General → Tunnel if you actually want tunneling.

Install

  1. Download pocket-pi-v0.3.1.apk (aarch64 only, ~39 MB) below.
  2. adb install -r pocket-pi-v0.3.1.apk or sideload normally.
  3. First-launch postinstall is ~5–10 min on Wi-Fi. Grant Camera / Mic / Location / Notifications when prompted.

Upgrading from v0.3.0: if you want the apt-retry fix to take effect, either:

  • Force-stop com.termux + clear app data + relaunch — full fresh postinstall. Cleanest. Or
  • After install, open the app, wait 15 s for the recovery UI to surface, tap Re-run setup — refreshes the postinstall script from the new APK and runs it. Faster but might leave some legacy state.

Not yet done

Carried over from v0.3.0:

  • UI automation — deferred to v0.4. droidrun-portal Accessibility Service vendor when ready.
  • Background location — foreground only.
  • Other OAuth providers (Gemini CLI, Codex, Copilot, Antigravity) — Sign-In completes but no Pi-side protocol bridge bundled. Use API-key path.
  • Working shell-session tabnode-pty has no android-arm64 prebuild.

v0.3.0 — phone-surface bridge for the agent

13 May 08:01

Choose a tag to compare

v0.3.0 — phone-surface bridge for the agent

The agent now has the phone. A localhost HTTP server (127.0.0.1:9998) gated by a per-launch bearer token exposes the device's surface to Pi — without any companion APK.

What's new

  • Notificationstermux_notify (title, content, priority).
  • Outgoing Android intentspocket_pi_intent_send (generic), pocket_pi_intent_open_url, pocket_pi_intent_dial, pocket_pi_intent_settings. The agent can open Wi-Fi/Bluetooth/permission screens, dial a number, view a URL, fire any Intent action.
  • Incoming intents — share-target ("Share to Pocket Pi" for text/* + image/*) and pi://agent/... deep-link intent-filters. Payloads queue into $HOME/.pi/agent/inbox/<ts>-<rand>.json and the agent drains via pocket_pi_inbox_list / pocket_pi_inbox_pop.
  • Locationtermux_location (gps / network / fused, foreground), via a transient LocationFgService so the privacy indicator only shows during a fix.
  • Cameratermux_camera_photo (front/back), CameraX-backed CameraFgService. Output lands in ~/.pi/agent/captures/<ts>.jpg.
  • Microphonepocket_pi_mic_record (1–300 s, AAC/.m4a), via MicFgService.
  • Clipboard + battery + toast + TTS + share-sheettermux_clipboard_{get,set}, termux_battery_status, termux_toast, termux_tts_speak, termux_share.
  • pocket-pi-api shell shim in $PREFIX/bin/pocket-pi-api notify '{"title":"…","content":"…"}', pocket-pi-api camera/photo '{"camera":"back"}', etc. — from any Termux session inside the app.
  • Dashboard tool overrides for node, npm, git, ps, pgrep (Settings → Tools now shows them as ✓ via override).

Install

  1. Download pocket-pi-v0.3.0.apk (aarch64 only, ~68 MB) below.
  2. Sideload — tap the APK on the phone (allow install from unknown sources) or adb install pocket-pi-v0.3.0.apk.
  3. Open the app. First launch runs the bootstrap (~5–10 min on Wi-Fi). Grant Camera / Mic / Location / Notifications when prompted.
  4. When the dashboard loads, tap its ⚙ → Providers → add at least one provider (Claude Pro/Max OAuth or any API key). Then chat.

Security model

  • The HTTP API binds 127.0.0.1 only — not exposed off-device.
  • A 32-byte bearer token is regenerated on every service start and written to $PREFIX/etc/pocket-pi/api-token (mode 0600, same UID as Termux).
  • Apps that don't share Pocket Pi's UID can't read the token; off-device callers can't reach 127.0.0.1.

Not yet done

  • UI automation — deferred to v0.4. Plan is to vendor droidrun/droidrun-portal's Kotlin AccessibilityService so the agent can read screens + dispatch taps/swipes against other apps. Irreducible UX cost: the user has to manually enable Accessibility in Settings (Android forbids a runtime dialog).
  • Background location ("Allow all the time") — foreground only this release.
  • Other OAuth providers end-to-end (Gemini CLI, Codex, Copilot, Antigravity) — Sign-In completes but no Pi-side protocol bridge is bundled. Use the API-key path.
  • Working shell-session tab in the dashboard — node-pty has no android-arm64 prebuild; chat/files/tasks work, terminal tab will fail.

Credits

Pocket Pi is just packaging. The actual agent engine + chat UI are someone else's work:

Pocket Pi v0.2.1

12 May 13:40

Choose a tag to compare

Pocket Pi v0.2.1

Patch release. Removes the defaultProvider=nvidia preseed so fresh installs prompt the user to pick a provider through the dashboard's native settings instead of landing on NVIDIA with no key behind it. No API keys (or placeholders for any provider) ship in the APK.

Install

  • Download: pocket-pi-v0.2.1.apk (40 MB, aarch64 only)
  • Sideload or `adb install pocket-pi-v0.2.1.apk`.

Providers — what works

See the README provider matrix for the authoritative table. Short version:

  • OAuth, end-to-end: Anthropic (Claude Pro/Max). Uses pi-anthropic-messages as the protocol bridge. Sign-In opens the device's default browser via an `xdg-open` shim → Android `ACTION_VIEW`. Cross-device manual paste of the callback URL also works.
  • OAuth Sign-In completes but unusable (no Pi-side bridge bundled): Google Gemini CLI, ChatGPT Plus/Pro (Codex), GitHub Copilot, Antigravity. Use the API-key path for these vendors instead.
  • API key paste — fully working: OpenAI, Anthropic API, Google Gemini (AI Studio key), Mistral, Groq, xAI, Z.ai, OpenRouter, NVIDIA NIM.

Changes vs v0.2.0

  • Drop `defaultProvider`/`defaultModel` preseeding in postinstall — settings.json starts empty; dashboard prompts for provider on first launch.
  • Drop the empty `~/.config/nvidia/api-key` placeholder from the bootstrap HOME skel.
  • README: open with credits to the upstream projects (Pi by Mario Zechner / earendil-works, pi-agent-dashboard + pi-anthropic-messages by BlackBelt Technology, Termux), honest provider matrix, drop "team testers" framing.

Carry-over from v0.2.0

  • Chat UI: pi-agent-dashboard. Slash commands, model switcher, session history, native settings cog.
  • Compose UI: bootstrap splash + recovery only (inline Restart Pi / Re-run setup after 15s stall).

Known limits

  • Shell-session tab inside the dashboard is stubbed (node-pty has no android-arm64 prebuild).
  • `applicationId` still `com.termux` (avoids a Docker-based bootstrap rebuild).
  • Old Android WebView builds (< Chrome 120) may render a blank dashboard. Real devices auto-update; emulator system images can lag.

Credits

Pocket Pi is packaging. The agent engine is Pi by Mario Zechner (now at earendil-works). The chat UI is pi-agent-dashboard by BlackBelt Technology. Linux-on-Android runtime is Termux.

Pocket Pi v0.2.0

12 May 13:24

Choose a tag to compare

Pocket Pi v0.2.0 — POC, shippable

Drop-in single-APK Pi coding agent for Android. The chat UI is now pi-agent-dashboard (slash commands, session history, model switcher, native settings all built in). NVIDIA NIM is pre-seeded so first chat is one tap away.

Install

  • Download: pocket-pi-v0.2.0.apk (40 MB)
  • Sideload (allow unknown sources) or adb install pocket-pi-v0.2.0.apk.
  • First launch runs the bootstrap (3–5 min on Wi-Fi).

What's new vs v0.1

  • Chat UI: pi-agent-dashboard replaces the prior pi-mobile PWA. Built-in slash commands, model switcher, native settings cog.
  • Provider config now lives in the dashboard. Anthropic Claude Pro/Max OAuth, ChatGPT, GitHub Copilot, Gemini CLI all surfaced; API-key paste for OpenAI / Anthropic / Mistral / Groq / Gemini.
  • xdg-open shim in postinstall so OAuth Sign-In opens the device's default browser (Chrome) via am ACTION_VIEW. Cross-device manual paste of the callback URL also works.
  • pi-anthropic-messages bundled for proper tool-call rendering on Claude OAuth providers.
  • Compose UI reduced to bootstrap splash + recovery (inline Restart Pi / Re-run setup buttons after a 15s stall).

Known limits

  • Shell-session tab inside the dashboard is stubbed (node-pty has no android-arm64 prebuild).
  • applicationId is still com.termux (avoids a Docker-based bootstrap rebuild).
  • Old Android WebView builds (< Chrome 120) may render a blank dashboard. Real devices auto-update; emulator system images can lag.

Notes

Same fast-tracked POC. Whether to invest in productizing (custom prefix bootstrap, real applicationId, Play Store) vs. building a proper native Android client is what this POC is meant to inform.

Pocket Pi v0.1.3 (POC)

11 May 15:53

Choose a tag to compare

Changes

  • New launcher icon — replaces the Greek π with the pi.dev brand mark (the tetris-style "P" + "i" dot from https://pi.dev/logo-auto.svg).
  • Resilient npm install — if a single npm package fails (e.g. one with a broken native build), postinstall logs a WARN and keeps going instead of aborting the whole bootstrap. Found while spiking the pi-agent-dashboard alternative UI; this fix is independently useful.

Install

Sideload over your existing install — your keys, selected model, and chat history are preserved on-device.

Pocket Pi v0.1.2 (POC)

11 May 13:59

Choose a tag to compare

Fixes

  • Provider chip shows up for every key you've saved. Before, a provider only appeared in the chip row if it was in models.json. If you saved an OpenRouter key on v0.1.0 (or earlier) it never made it into the picker. v0.1.2 reconciles on every sheet open — saved key → chip in the row.
  • Add any model. Pi's predefined per-provider list isn't exhaustive (OpenRouter alone has hundreds). v0.1.2 adds a Custom model section in the Config sheet: paste a model id (e.g. qwen/qwen-2.5-coder-32b-instruct), give it a friendly name, tap Add model, then Use this model.

Install

Sideload over your existing v0.1.x — your keys and selected model are preserved on-device.

Pocket Pi v0.1.1 (POC)

11 May 13:37

Choose a tag to compare

Fixes

  • Chat now actually replies. Pi 0.74 reads defaultProvider/defaultModel from settings.json — without them the agent had nothing to call, so sending a message looked like it did nothing. v0.1.1 ships a first-run default (NVIDIA Qwen3 Coder 480B) and an in-app picker so anyone can switch.
  • Active model picker in the Config sheet (⚙ icon): provider chips + per-model rows + "Use this model" button. Reads what's available from models.json; writes the active selection to settings.json and restarts Pi.
  • Saving an API key now also seeds the active model if none is set yet, so the "save key + send hi" flow works on a fresh install.

Install

Sideload the APK below. If you already have v0.1.0, just install over it — your keys and configs are preserved. After install, open the app and tap ⚙ to pick a model.

Pocket Pi v0.1.0 (POC)

11 May 12:11

Choose a tag to compare

First POC build for team testing.

Install

  1. Download pocket-pi-v0.1.0.apk below.
  2. Sideload on an aarch64 Android phone (adb install pocket-pi-v0.1.0.apk, or tap the file on the device after enabling install-from-unknown-sources).
  3. Open the app. First launch takes 3–5 min on Wi-Fi (bootstrap + npm install).
  4. Once you see "Send a message to your Pi agent", tap the at the top right.
  5. Paste at least one API key (NVIDIA NIM is free at https://build.nvidia.com/, OpenRouter / OpenAI / Anthropic / Groq also wired up).
  6. Tap Save keysRestart Pi.
  7. If any extensions are missing, tap Re-run setup (idempotent, safe).

Status

  • ✓ All 8 Pi extensions register, 17 tools online
  • ✓ NVIDIA NIM + OpenRouter tested end-to-end
  • ✓ Recovery UI if pi-webserver doesn't bind within 15s

Known limitations

  • Slash commands (/session, /clear, /model, …) currently fall through to the LLM as plain text — planned for the next iteration.
  • Chat history doesn't persist across tab switches yet.
  • applicationId is still com.termux (requires a custom bootstrap rebuild to change).
  • A cosmetic ghost-icon row may render at the top of the chat on some Android WebView builds.

Feedback wanted

Whether to keep iterating on this Termux-fork-in-an-APK approach or rewrite as a proper native Android client that talks to Pi over the network — that's what this POC is meant to inform.