fix: green the failing unit test suite by lLukDreams · Pull Request #1296 · XiaomiMiMo/MiMo-Code

lLukDreams · 2026-06-24T11:20:43Z

Issue for this PR

Closes #

Type of change

Bug fix

What does this PR do?

Repairs the unit suite. The failures were a mix:

Stale tests (code changed, assertion didn't):

Drop /root from the forbidden-dirs assertion — it was intentionally removed from the blocklist.
Update the cache_control expectation to the rolling-tail marker placement the provider transform now uses.

Real bugs:

A subtask that fails before onReady lost its metadata; seed sessionId/model up front so the error tool state keeps it.
.claude.json MCP servers were eagerly connected instead of lazy pending — the mcp_origins "claude" marker wasn't honored.
The checkpoint writer read the Actor service only via the global spawnRef, which goes stale when a memoized Actor layer is torn down by instance disposal in an earlier test. Resolve from the live context via Effect.serviceOption (optional, so no layer cycle).
Workflow cancel could spawn orphans and hang with a wide fan-out: the promise-based agent semaphore kept starting queued agents as reclaim freed permits, and Actor.cancel awaited a fiber unwind that never finishes when stuck on a hung LLM stream. Bail queued spawns once cancelling, and bound the interrupt.

Test isolation:

History backfill inspects all FTS rows but only cleared the shared DB in afterEach; rows from earlier shard files leaked in. Clear in beforeEach too.

How did you verify your code works?

Ran the full suite on Linux across all 4 shards (matching the sharded ubuntu-latest CI) — green, 0 failures. The workflow cancel fix specifically: 20/20 with the concurrency ceiling pinned below the fan-out to force queuing. typecheck clean.

Checklist

I have tested my changes locally
I have not included unrelated changes in this PR

/root was deliberately removed from the directory blocklist in instance.ts (commits 20b79f7, 49c0d9e) because it is the root user's home directory (common in Docker/CI) and is already covered by the workspace-trust prompt. The directory-safety test was left asserting that /root rejects with "Access denied", which fails on the Linux CI runner. Align the test with the intended implementation.

The provider transform was refactored (rolling-tail/head cache breakpoints) after this test was written, so the expected Anthropic payload was stale: the ephemeral cache_control marker now sits on the last assistant tool_use and last user tool_result, not on the first user text block. Align the expectation with the current behavior.

A subtask that failed before the actor signaled onReady (e.g. the actor service is unavailable or the model is unresolvable) lost its metadata on the resulting error tool state, because metadata was only reported from onReady. Seed sessionId/model onto the tool part up-front so it survives a failure at any point during spawn.

Servers imported from .claude.json carry mcp_origins[name].type === 'claude' and are meant to connect lazily, but MCP init eagerly connected every configured server — so a Claude config auto-spawned every local MCP server on startup and the lifecycle test (expecting 'pending' until an explicit connect()) failed. Honor the origin marker: leave Claude servers pending.

…nRef tryStartCheckpointWriter read the Actor service only via the late-bound global spawnRef, which goes stale when a memoized Actor layer is torn down by instance disposal elsewhere in the process (observed as cross-test pollution: the writer skipped with 'Actor service unavailable' depending on test order). Prefer the Actor service from the live Effect context via Effect.serviceOption — always present under AppRuntime and robust to the global going stale — falling back to spawnRef. serviceOption does not add Actor to requirements, so the layer cycle that motivated spawnRef is intact.

…red DB The backfill suite inspects ALL FTS rows and only cleared the global DB in afterEach, so rows left by earlier files in the same shard leaked into its assertions (saw 11 stray part_ids). Clear the DB in beforeEach too.

Cancelling a workflow run with more agents than the concurrency ceiling (globalSem = config.workflow.maxConcurrentAgents ?? min(16, 2x cores)) could spawn orphans and intermittently hang. Two causes, two fixes: 1. Orphan spawns (runtime.ts): the promise-based agent semaphore fires queued callbacks as permits free. While reclaim cancels the started agents and frees their permits, queued siblings would acquire them and spawn NEW actors that escape the cancel sweep. Mark entry.cancelling before reclaim and bail at both spawn sites so a late-acquiring agent never starts. 2. Cancel hang (spawn.ts): Actor.cancel awaited each agent's runner cancel, which Fiber.interrupts the work fiber and awaits its full unwind. A fiber blocked on a hung LLM response stream never finishes unwinding, hanging the whole sweep (and runtime.cancel) indefinitely. Bound that interrupt with a grace period — the signal is still delivered and the registry status still records the cancellation (cancel's contract); we just stop awaiting a stuck unwind. The regression test now pins the ceiling BELOW the fan-out so some agents are always queued (deterministic on any host), exercising both paths. Verified 20/20 with forced queuing and green across all 4 CI shards.

lLukDreams · 2026-06-24T11:46:00Z

@yanyihan-xiaomi

lLukDreams added 7 commits June 24, 2026 09:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: green the failing unit test suite#1296

fix: green the failing unit test suite#1296
lLukDreams wants to merge 7 commits into
XiaomiMiMo:mainfrom
lLukDreams:fix/test-root-not-forbidden

lLukDreams commented Jun 24, 2026

Uh oh!

lLukDreams commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lLukDreams commented Jun 24, 2026

Issue for this PR

Type of change

What does this PR do?

How did you verify your code works?

Checklist

Uh oh!

lLukDreams commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant