Skip to content

Commit 3e66370

Browse files
committed
feat: [US-400] - [Set up CI test pipeline, push, and iterate until CI is fully green]
1 parent 55fea5e commit 3e66370

File tree

2 files changed

+18
-5
lines changed

2 files changed

+18
-5
lines changed

registry/tool/sandbox/tests/vm-integration.test.ts

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ import { createSandboxFs, createSandboxToolkit } from "../src/index.js";
1919

2020
let sandbox: SandboxAgentContainerHandle;
2121
const sandboxBasePath = "/tmp/agent-os-sandbox-vm";
22+
const SANDBOX_VM_TEST_TIMEOUT_MS = 120_000;
2223

2324
const hasWasm = existsSync(coreutils.commandDir);
2425
const skipReason = process.env.SKIP_SANDBOX_TESTS
@@ -40,11 +41,11 @@ beforeAll(async () => {
4041
`Failed to prepare sandbox base path ${sandboxBasePath}: ${mkdir.stderr}`,
4142
);
4243
}
43-
}, 150_000);
44+
}, SANDBOX_VM_TEST_TIMEOUT_MS);
4445

4546
afterAll(async () => {
4647
if (sandbox) await sandbox.stop();
47-
});
48+
}, SANDBOX_VM_TEST_TIMEOUT_MS);
4849

4950
describe.skipIf(skipReason)("VM integration", () => {
5051
let vm: AgentOs;
@@ -63,11 +64,11 @@ describe.skipIf(skipReason)("VM integration", () => {
6364
],
6465
toolKits: [createSandboxToolkit({ client: sandbox.client })],
6566
});
66-
});
67+
}, SANDBOX_VM_TEST_TIMEOUT_MS);
6768

6869
afterEach(async () => {
6970
await vm.dispose();
70-
});
71+
}, SANDBOX_VM_TEST_TIMEOUT_MS);
7172

7273
// -- Filesystem mount tests --
7374

@@ -98,7 +99,7 @@ describe.skipIf(skipReason)("VM integration", () => {
9899
const result = await vm.exec("cat /sandbox/shell-read.txt");
99100
expect(result.exitCode).toBe(0);
100101
expect(result.stdout).toBe("read by shell");
101-
});
102+
}, SANDBOX_VM_TEST_TIMEOUT_MS);
102103

103104
// -- Toolkit shim installation --
104105

scripts/ralph/progress.txt

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
- When TS integration tests boot `AgentOs.create({ software })`, build the software list from packages whose `commandDir` actually exists locally; optional registry packages with missing `wasm/` directories abort VM startup before the target behavior can be tested.
44
- For network-capable WASM command packages in the current runtime, prefer a tiny `wasi-spawn` wrapper that launches guest `node` over linking directly to `wasi-http`; the generic WASM runner does not expose `host_net` imports yet.
55
- After a fresh workspace install, linked registry agent/software packages still need their built `dist/` outputs before adapter-backed or software-backed `packages/core` Vitest suites are meaningful; otherwise Vite fails on package entry resolution before test logic runs.
6+
- Sandbox VM-integration tests that boot a real `AgentOs` instance per case need explicit hook/test timeouts under Turbo CI load; the default 30s Vitest budget is too tight once the sandbox mount plus WASM shell path are exercised in parallel with `packages/core`.
67
- Sidecar service tests must use a unique `compile_cache_root` per test sidecar instance; sharing one import-cache root lets the one-time stale-cache sweep delete another parallel test's `register.mjs` and runner assets.
78
- When moving grouped test files, update root `package.json` test scripts and `scripts/benchmarks/bench-utils.ts` path constants along with the test files themselves; those hard-coded references are part of the test layout contract.
89
- Sidecar integration tests that override `AGENT_OS_NODE_BINARY` must hold a shared lock across the full test body, including sibling tests in the same binary that use the real Node runtime; the env var is process-global and fake-node shims otherwise leak into unrelated tests.
@@ -311,3 +312,14 @@ Started: Sat Apr 5 2026
311312
- Deterministic sidecar networking tests are more reliable when they drive the `net` sync-RPC surface directly instead of relying on guest event-loop timing around socket close semantics.
312313
- Quality checks: `cargo test --package agent-os-sidecar -- --nocapture` passed, `cargo test --package agent-os-sidecar -- --ignored --test-threads=1` passed, `pnpm --dir packages/core exec vitest run tests/session/session-comprehensive.test.ts tests/sidecar/native-process.test.ts` passed, and `pnpm --dir packages/core exec vitest run tests/session/protocol.test.ts` passed.
313314
---
315+
## 2026-04-06 07:38:47 PDT - US-400
316+
- What was implemented
317+
- Pulled the failed GitHub Actions logs for run `24035754667`, identified the remaining CI-only failure as `registry/tool/sandbox/tests/vm-integration.test.ts` exhausting Vitest's default 30s hook budget, and raised the timeout only for that VM-backed sandbox suite.
318+
- Files changed
319+
- `registry/tool/sandbox/tests/vm-integration.test.ts`
320+
- `scripts/ralph/progress.txt`
321+
- **Learnings for future iterations:**
322+
- The sandbox toolkit package can pass locally and still fail in the root Turbo run because its VM integration suite starts a real sandbox agent plus a fresh `AgentOs` VM per test while `packages/core` is running in parallel; treat it like the other heavy sidecar/session suites when budgeting hook timeouts.
323+
- GitHub job logs are readable before the overall workflow finishes by querying the failed job log endpoint directly; for this run the actionable signal was `tests/vm-integration.test.ts > should cat a sandbox file from the WASM shell` reporting `Hook timed out in 30000ms`.
324+
- Quality checks: `CI=1 AGENTOS_E2E_NETWORK=1 pnpm --dir registry/tool/sandbox exec vitest run` passed, and `CI=1 pnpm --dir packages/core exec vitest run tests/session/session-comprehensive.test.ts tests/sidecar/native-process.test.ts` passed while the sandbox VM integration suite ran concurrently.
325+
---

0 commit comments

Comments
 (0)