Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
d66af63
wasm-sandbox: add denyx-interpreter crate (Phase 2 scaffold)
mlainez May 14, 2026
2d053d7
interpreter: enable Print extension in Globals
mlainez May 14, 2026
f766ea2
examples: add wasm-smoke harness for Phase 2 acceptance
mlainez May 14, 2026
6913e65
runtime-starlark: add publishable byte-slice crate (Phase 3 + 7)
mlainez May 14, 2026
21d3078
host: add WasmRunner scaffold (Phase 4.1)
mlainez May 14, 2026
95a9765
interpreter: add denyx_alloc/denyx_dealloc allocator exports (Phase 4.2)
mlainez May 14, 2026
47d6501
host: wire fs.read through wasmtime + Policy gate (Phase 4.3)
mlainez May 14, 2026
5781442
host: wire fs.write through wasmtime + Policy gate (Phase 4.4)
mlainez May 14, 2026
a23e9aa
host: wire fs.delete through wasmtime + Policy gate (Phase 4.5)
mlainez May 14, 2026
8df1664
host: wire env.read through wasmtime + Policy gate (Phase 4.6)
mlainez May 14, 2026
eeba160
host: wire subprocess.exec through wasmtime + Policy gate (Phase 4.7)
mlainez May 14, 2026
2e17da3
host: wire net.http_{get,post,put,patch,delete} through wasmtime (Pha…
mlainez May 14, 2026
8a3ec1d
host: add fuel-based preemption to WasmRunner (Phase 5.1)
mlainez May 14, 2026
579003e
cli: add --use-wasm flag dispatching to WasmRunner (Phase 5.2)
mlainez May 14, 2026
67ae3b6
mcp + multistep: add --use-wasm opt-in (Phase 5.3)
mlainez May 14, 2026
6a6a1e9
interpreter: match in-process Runner's full extension set
mlainez May 14, 2026
c2fc513
host: route Wasm http_* through finalize_http_response (3xx → error)
mlainez May 14, 2026
6739292
host: wire TaintRegistry through Wasm path (Phase 4.9)
mlainez May 14, 2026
1907482
host: wire AuditSink emission through Wasm closures (Phase 4.10)
mlainez May 14, 2026
a788db9
host: cache wasmtime Engine + Module on WasmRunner (perf)
mlainez May 14, 2026
d4f5901
mcp: add denyx_fs_read_range + denyx_fs_replace tools (perf)
mlainez May 14, 2026
1f42cd5
host: wire ConfirmHook through Wasm closures (Phase 4.11)
mlainez May 14, 2026
6e706fa
host + mcp: fs.read_range — true bounded read at IO layer
mlainez May 14, 2026
c1228d3
host: close outbound taint + subprocess env filtering parity gaps
mlainez May 14, 2026
eb899b7
docs: wasm-sandbox migration — new reference doc + threat-model deltas
mlainez May 14, 2026
d05c50d
bench + docs: actual per-invocation cost is 481 ms cold, not ~5 ms
mlainez May 14, 2026
126fb0b
docs: multistep eval 36/36 on the wasm path — close the validation gate
mlainez May 14, 2026
5af1d8d
runtime-starlark + host: AOT-precompile the .wasm to .cwasm
mlainez May 14, 2026
2806573
host + harness: pentest surfaces two real regressions on the wasm path
mlainez May 14, 2026
622df66
docs: wasm-sandbox threat assessment — new attack surface + pentest r…
mlainez May 14, 2026
2a55350
docs: Round 2 pentest result — 0 LEAK across 23 Sonnet attempts on wasm
mlainez May 14, 2026
b1aa8cc
docs: Opus 4.7 Round 2 pentest result + honest defense-layer accounting
mlainez May 14, 2026
7e8f637
pentest harness: rewrite prompt to force breadth across all defense l…
mlainez May 15, 2026
2e0ec05
docs: Round 2 v2 result on wasm — full coverage, 0 LEAK across 50 att…
mlainez May 15, 2026
886cb48
pentest harness: typed-error capture + fs.replace prompt fix
mlainez May 15, 2026
22c0f3f
docs: Round 2 v3 result on wasm — typed-error layer-by-layer accounting
mlainez May 15, 2026
1cceffd
pentest harness v4: Starlark cheatsheet + retry contract + fail-scrub…
mlainez May 15, 2026
7c6efb3
docs: Round 2 v4 — 5 defense layers validated, accidental 30% → 8.9%
mlainez May 15, 2026
8f2fd7f
doctor + probes: detect wasm-vs-deadline parity gap, validate confirm…
mlainez May 15, 2026
b1ee86d
wasm-runner: enforce [runtime].max_seconds — close deadline parity gap
mlainez May 15, 2026
09811f2
docs: backfill commit hash c3ca651 for the wasm deadline fix
mlainez May 15, 2026
70c7113
pentest harness v5 + wasm-sandbox docs: multi-seed (n=2) + Interprete…
mlainez May 15, 2026
5518d07
docs: Round 2 v6 — while-fix validated, 112 attempts / 0 LEAK across …
mlainez May 15, 2026
bb5d445
docs: wasm is the recommended Starlark runtime; bwrap moves to advanc…
mlainez May 15, 2026
742ec8d
gitignore: cover nested target/ dirs (examples/*/target)
mlainez May 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Rust build artifacts
/target
**/target/
**/*.rs.bk

# Python
Expand Down
180 changes: 180 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,186 @@ breaking API changes between minor versions until they hit `1.0.0`.

## [Unreleased]

- **Wasm-path pentest evidence.** Wasm path pentested across 5
LLM-driven runs (Opus n=3 + Sonnet n=2), 112 attempts, 0 LEAK /
0 DERIVED_LEAK / 0 WEAK_LEAK. 7/8 designed defense layers
empirically validated by the LLM panel; deadline by deterministic
probe (the pentest policies don't enable `runtime.max_seconds`).
Full layer-by-layer accounting in `docs/wasm-sandbox.md`. Sample
size is small (white-box, single-shot per shape); see the round
reports for caveats.

### Added

- **Wasmtime-sandboxed Starlark runner.** New `--use-wasm` flag on
`denyx run` and `denyx-mcp` routes evaluation through a
`wasm32-wasip1`-compiled `starlark-rust` interpreter running
inside `wasmtime`. The policy gate stays in Rust on the host
side — the wasm path is functionally equivalent to the
in-process runner on every security boundary documented in
`docs/04-security-threat-model.md`, plus fuel-based preemption
and interpreter-bug containment. Currently opt-in via
`--use-wasm`; default in next release once
`denyx-runtime-starlark` publishes to crates.io (tracked in
#64). Two new crates underpin this: `denyx-interpreter` (NOT
published — source for the `.wasm` artefact) and
`denyx-runtime-starlark` (published — ships the pre-built
`.wasm` as a `&[u8]`).
- **Fuel-based preemption** *(`--use-wasm` only)*. wasmtime's
per-instruction fuel budget (`DEFAULT_WASM_FUEL = 200_000_000`)
traps runaway pure-CPU loops within ~1 sec as
`DenyxError::RuntimeLimit` (exit code 6 — same as `max_seconds`
on the in-process runner). Closes the gap where
`for _ in range(10**9): pass` runs forever within
`[runtime].max_seconds` because wall-time deadlines don't catch
pure-CPU loops.
- **`denyx_fs_read_range(path, offset, limit)` MCP tool + Starlark
builtin.** Bounded read at the IO layer via `File::seek` +
`Read::take(limit)`. Same `read_allow` gate as `fs.read`. For
surgical reads of large files, reduces both wire bytes (across
the MCP boundary) and disk-read cost.
- **`denyx_fs_replace(path, old, new)` MCP tool.** Read-modify-write
with an exactly-one-match guard. Refuses if `old` occurs 0 or 2+
times in the file — ambiguous patches fail loudly instead of
applying silently. Goes through `fs.read` + `fs.write` gates;
**not atomic** under concurrent writes (same semantics as plain
`fs.write`).

### Changed

- `denyx-host` gains `wasmtime` and `wasmtime-wasi` as workspace
dependencies. The in-process `Runner` is unchanged; the new
`WasmRunner` lives alongside.
- The Starlark interpreter's globals now include the same
`LibraryExtension` set on both paths (`Print, StructType,
NamespaceType, Json, Map, Filter, Debug`) — required for the
Wasm path's parity with the in-process runner.

### Operator-facing notes

- `--use-wasm` prints a one-line warning to stderr listing the
current deferral set. Audit log shape, exit codes, and error
messages are byte-identical to the in-process runner except
for `DenyxError::RuntimeLimit`'s reason string
(`"wasm fuel exhausted after N units"` vs `"wall-time deadline
exceeded"`).
- Cold-call cost on the wasm path is ~16.5 ms median per
`WasmRunner` instance — down from ~481 ms in earlier builds.
`denyx-runtime-starlark`'s `build.rs` AOT-precompiles the
embedded `.wasm` to a wasmtime serialized module (`.cwasm`) on
the host architecture; `WasmRunner` loads it via
`Module::deserialize` (single-digit ms) instead of JIT-compiling
the raw `.wasm` (~480 ms). If deserialize fails (different
wasmtime version, different Config flags, target-architecture
mismatch), the runner falls back transparently to JIT-compiling
the raw `.wasm` — same behaviour as before AOT existed.
Amortized per-call cost inside an already-instantiated runner is
~4 µs vs ~3 µs for the in-process runner — statistically
indistinguishable. Measured by `scripts/bench-wasm-runner.py`.

### Not yet validated (gates on promoting `--use-wasm` to default)

- ~~No multistep-eval rerun against the final wasm path.~~ ✓
**Closed 2026-05-14.** Both runners reach 32-36/36 on
`qwen2.5-coder:7b` depending on whether the LLM emits literal-
argument shapes that the verifier catches statically (failing
the harness's success-with-redaction check) or variable-argument
shapes that flow through the runtime redactor. Both outcomes
are valid security behaviour; the harness's task definition is
sensitive to LLM-emission shape. The deterministic exfil probe
is the more informative parity signal — **10 REDACTED, 2
WEAK_LEAK, 0 LEAK on both runners**, identical.
- **Round 2 v6 — while-fix validation (2026-05-15).** Single Opus
run validating the v6 cheatsheet's `while`-keyword correction.
23 attempts, $0.79, **0/23 accidental** (vs 3/55 in v5 Opus).
Zero `while` keywords in any of 23 script bodies. The one
WASM_TRAP fire is the same quadratic-string-concat shape as v5
(Interpreter-bug containment, designed defense). Aggregated
Round 2 v5+v6: **112 attempts across 5 runs, 0 LEAK / 0
DERIVED_LEAK / 0 WEAK_LEAK**, accidental 3/112 = 2.7% (all
Opus-side script bugs, no cheatsheet gaps remaining). 7 of 8
designed defense layers adversarially validated by the LLM
panel.
- **Round 2 v5 multi-seed pentest (2026-05-15).** n=2 per model
(4 runs total) with a tightened Starlark cheatsheet. **89
attempts combined, 0 LEAK / 0 DERIVED_LEAK.** v5 first-times:
- Interpreter-bug containment empirically confirmed (1
WASM_TRAP fire — Opus's quadratic string-concat caused the
Starlark interpreter to `process::abort` inside the wasm
guest; the sandbox contained it cleanly). 7 of 7 named
designed defense layers (excluding deadline, which the
pentest policy doesn't enable) now have at least one
adversarial fire from the LLM panel.
- Confirm hook fired twice in v5 (vs 0 in v4) — Opus seed 1
on encoded-secret fs.delete paths.
Reproducibility: Sonnet n=2 produced identical verdict
distributions seed-to-seed (17 attempts each, same buckets);
Opus n=2 varied materially (31 vs 24 attempts, different
bucket emphasis). Accidental dropped from 8.9% (v4) to 4.5%
(v5); all 4 residuals were Opus, with 2 sharing a single root
cause (`while` keyword inside def — the v5 cheatsheet
incorrectly implied `while` works inside def, but the Starlark
Standard dialect rejects `while` EVERYWHERE; corrected in
follow-up commit).
- **WasmRunner wall-time deadline parity (2026-05-15).** Surfaced
by `examples/local_executor/probe_layer_variants.py --variant
deadline`: in-process Runner 3/3 PASS, wasm Runner 0/3 PASS —
`wasm_runner.rs` had zero references to `check_deadline`,
`start_time`, or `max_seconds`. `[runtime].max_seconds` was
silently dropped on the wasm path. Closed by threading a
`start_time: Instant` through `WasmState` and calling a new
`check_wasm_deadline` helper at the top of every effecting
Func closure (11 builtins). Validation: 2 new wasm-runner
unit tests + `probe_layer_variants.py` now 7/7 PASS (4 confirm
+ 3 deadline). 140/140 `denyx-host` tests green. Doctor
diagnostic added in commit `5a34fa4` still flags the same
combination for older builds where the runner fix hasn't
shipped yet.
- ~~No pentest re-run against the wasm path.~~ **Closed 2026-05-15
via four harness iterations.**
- **v1 → v3**: methodology evolution recorded in commits
`9e0ab61`, `735a741`, `328b89d`. v3 final: 46 attempts,
0 LEAK, 32 designed-defense fires, 30% accidental
(Starlark parser rejecting Python idiom).
- **v4 prompt + classifier** (commit `07e6349` — Starlark
dialect cheatsheet, retry-doesn'''t-count-toward-coverage
contract, [REDACTED]-in-error-body matcher for fail-scrub):
Opus 24 attempts ($3.76, truncated by 60-min wallclock),
Sonnet 21 attempts ($0.52). **0 LEAK / 0 DERIVED_LEAK
across 45 attempts.** Layer-by-layer:
- Outbound-taint refusal (POLICY_DENY): 23 fires
- Wasm fuel preemption: 5 fires
- Print redactor: 6 fires (partial credit)
- Verifier static check: 3 fires (Opus only; both models
routinely bypass via name concat — verifier is
opportunistic, not load-bearing)
- **Fail-message scrubber (FAIL_SCRUB): 4 fires** —
previously misclassified in v2/v3 as STARLARK_RUNTIME;
the v3 doc'''s "not adversarially validated" claim was
wrong. `fail(secret)` produces `fail: [REDACTED]` on
every attempt; the scrubber has fired 100% of the time.
- Accidental (parser + runtime): 4/45 = 8.9% (down from
30% in v3) — Starlark dialect rejections, not Denyx.
Five distinct designed defense layers empirically validated
across the round. Confirm-hook and deadline layers still
unmeasured (pentest policy doesn'''t enable `requires_approval`
or `runtime.max_seconds`). Sample size n=1 per model per round,
single seed, white-box. `fs.replace` guard not reachable
through this harness — unit-tested. Round 2 v3 (tool-poisoning)
scoped to in-process; wasm migration does not change that
surface.
- No pentest re-run against the wasm path. Round 1 and Round 2 v3
reports cover the in-process runner only.
- CI doesn't yet stage the `.wasm` into `denyx-runtime-starlark`
before `cargo publish`. `cargo install denyx-cli` from
crates.io would not work until that lands. **This is the sole
remaining default-blocker.**
- Fuel budget is hardcoded; no `[runtime].max_wasm_fuel` policy
field yet.

See [docs/wasm-sandbox.md](docs/wasm-sandbox.md) for the full
parity table, threat-model differences, and open work.

## [0.3.0] — 2026-05-11

### Added
Expand Down
Loading
Loading