Entity Loom: stale running checkpoint blocks resumed background stages after restart
Summary
If Entity Loom is interrupted while a long background stage is running, the
checkpoint can remain on status: "running" even though the daemon has been
restarted and no in-memory stage lock exists. After resuming the package, the UI
trusts the persisted running status, hides the Start button, and shows the
stage as running without any active work happening.
Observed Case
From a Gemini import package captured after the stall:
- Upload detected as
gemini.
- Convert/staging completed successfully.
- 338 conversations / 3452 messages were committed.
- Significant stage started with 338 conversations.
- 101 significant items were checkpointed.
- 21 significant memory files were written.
- Later daemon starts only logged
Resumed package.
checkpoint.json still had:
{
"currentStage": "significant",
"stages": {
"significant": {
"status": "running",
"completed": false
}
}
}
At that point getRunningStage() was null, so the status was stale.
Impact
The package is resumable, but the UI does not offer the user a way to resume it.
This is especially likely on large imports where Significant/Daily/Graph can run
for a long time and the terminal, app, or machine may close before completion.
Proposed Fix
On package resume, if no stage is actually running in memory, normalize stale
running statuses for resumable background stages:
Set them to aborted with completed: false, preserve processedItems and
failedItems, save the checkpoint, and let the existing resumable UI path show
the Start/Continue action.
Sketch:
const RESUMABLE_BACKGROUND_STAGES: StageName[] = [
"significant",
"daily",
"graph",
];
function recoverStaleRunningStages(checkpoint: CheckpointStateV2): boolean {
if (getRunningStage()) return false;
let recovered = false;
for (const stage of RESUMABLE_BACKGROUND_STAGES) {
const stageCheckpoint = checkpoint.stages[stage];
if (stageCheckpoint.status === "running") {
stageCheckpoint.status = "aborted";
recovered = true;
log(
"warn",
`Recovered stale '${stage}' stage as aborted/resumable on package resume`,
);
}
}
return recovered;
}
Then call it during loadPackage() after loading/migrating the checkpoint and
save the checkpoint if anything changed.
Local Patch / Verification
Patched locally in packages/entity-loom/src/stages/setup-stage.ts.
Verification performed:
deno check src/main.ts passes in the local Psycheros workspace.
- A copied affected package with
significant.status = "running" and 101
processed items was repaired to status = "aborted" while preserving the 101
processed item IDs.
- The repair path is packaged in the community Gemini resume patch as an
explicit modded Entity Loom file set, not silently mixed into the exporter.
Notes
The triggering import happened to be Gemini, but the stale checkpoint behavior is
not Gemini-specific. The same failure mode should apply to any long
Significant/Daily/Graph stage if the process exits mid-run.
Entity Loom: stale
runningcheckpoint blocks resumed background stages after restartSummary
If Entity Loom is interrupted while a long background stage is running, the
checkpoint can remain on
status: "running"even though the daemon has beenrestarted and no in-memory stage lock exists. After resuming the package, the UI
trusts the persisted
runningstatus, hides the Start button, and shows thestage as running without any active work happening.
Observed Case
From a Gemini import package captured after the stall:
gemini.Resumed package.checkpoint.jsonstill had:{ "currentStage": "significant", "stages": { "significant": { "status": "running", "completed": false } } }At that point
getRunningStage()was null, so the status was stale.Impact
The package is resumable, but the UI does not offer the user a way to resume it.
This is especially likely on large imports where Significant/Daily/Graph can run
for a long time and the terminal, app, or machine may close before completion.
Proposed Fix
On package resume, if no stage is actually running in memory, normalize stale
runningstatuses for resumable background stages:significantdailygraphSet them to
abortedwithcompleted: false, preserveprocessedItemsandfailedItems, save the checkpoint, and let the existing resumable UI path showthe Start/Continue action.
Sketch:
Then call it during
loadPackage()after loading/migrating the checkpoint andsave the checkpoint if anything changed.
Local Patch / Verification
Patched locally in
packages/entity-loom/src/stages/setup-stage.ts.Verification performed:
deno check src/main.tspasses in the local Psycheros workspace.significant.status = "running"and 101processed items was repaired to
status = "aborted"while preserving the 101processed item IDs.
explicit modded Entity Loom file set, not silently mixed into the exporter.
Notes
The triggering import happened to be Gemini, but the stale checkpoint behavior is
not Gemini-specific. The same failure mode should apply to any long
Significant/Daily/Graph stage if the process exits mid-run.