Summary
agent-brain index reliably crashes with SIGSEGV (segmentation fault) inside kuzu::storage::NodeTable::lookupPK during sustained GraphRAG indexing, when graphrag.store_type: kuzu and graphrag.doc_extractor: langextract are enabled. Observed three times on the same corpus, each time after 2–3 hours of indexing and when the Kuzu DB has grown past ~1–2 GB. Vector + BM25 indexing is unaffected; the crash is isolated to the Kuzu native worker.
Workaround: switching graphrag.store_type: kuzu → simple eliminates the crash (no native code on the hot path).
Environment
| Component |
Version |
agent-brain-cli |
10.0.7 (PyPI, installed via uv tool install --with "agent-brain-rag[graphrag-all]==10.0.7" agent-brain-cli==10.0.7) |
agent-brain-rag |
10.0.7 |
kuzu |
0.11.3 |
llama-index-graph-stores-kuzu |
0.9.1 |
langextract |
1.5.0 |
| Python |
3.12.9 |
| OS |
macOS 26.2 (25C56), arm64 (Mac14,6) |
| RAM |
96 GB (memory pressure was never high — 93% free at crash time) |
Config (relevant subset)
embedding:
provider: "openai"
model: "text-embedding-3-large"
api_key_env: "OPENAI_API_KEY"
summarization:
provider: "anthropic"
model: "claude-haiku-4-5-20251001"
api_key_env: "ANTHROPIC_API_KEY"
storage:
backend: "chroma"
graphrag:
enabled: true
store_type: "kuzu" # ← the offending setting
use_code_metadata: true
doc_extractor: "langextract"
traversal_depth: 2
max_triplets_per_chunk: 10
Reproduction
- Configure as above.
- Index a corpus of ~230 markdown documents (~1,984 chunks at chunk_size=1024, chunk_overlap=100):
agent-brain index ./corpus_dir \
--chunk-size 1024 \
--chunk-overlap 100 \
--exclude-patterns "**/images/**"
- Let it run. Vector + BM25 phase completes quickly (~1 min). GraphRAG phase runs at ~17 chunks/min via LangExtract → gpt-4o-mini.
- After ~2 hours, when Kuzu DB has grown to ~1–2 GB, the server process gets SIGSEGV (process is killed by the kernel).
Observed crashes
| Server PID |
Started |
Died |
Lifetime |
Kuzu size at death |
| 90292 |
14:31 PDT |
16:33 PDT |
2h02m |
unknown (hit 7200s job timeout before segfault opportunity) |
| 59334 |
16:44 PDT |
19:25 PDT |
2h41m |
~2.3 GB |
| 92503 |
19:50 PDT |
22:18 PDT |
2h28m |
~1.0 GB (had been auto-recovered from snapshot) |
The crash report (Apple .ips format) is for an interim process pid 88888 that segfaulted within 5 minutes of restart — likely while opening the Kuzu DB that had been left locked by the killed pid 59334.
Crash report
"exception": {
"type": "EXC_BAD_ACCESS",
"signal": "SIGSEGV",
"subtype": "KERN_INVALID_ADDRESS at 0x0000000000000008"
}
"termination": {
"namespace": "SIGNAL",
"indicator": "Segmentation fault: 11"
}
Faulting thread backtrace (thread 15)
#0 _kuzu.cpython-312-darwin.so 0x663814
#1 _kuzu.cpython-312-darwin.so 0x54c704
#2 _kuzu.cpython-312-darwin.so 0x5f4854
#3 _kuzu.cpython-312-darwin.so 0x5fcb74
#4 _kuzu.cpython-312-darwin.so 0x5f8bb0
#5 _kuzu.cpython-312-darwin.so 0x647150 kuzu::storage::NodeTable::lookupPK(
kuzu::transaction::Transaction const*,
kuzu::common::ValueVector*,
unsigned long long,
unsigned long long&) const
#6 _kuzu.cpython-312-darwin.so 0x563604
#7 _kuzu.cpython-312-darwin.so 0x596cc4 kuzu::processor::PhysicalOperator::getNextTuple(
kuzu::processor::ExecutionContext*)
#8 _kuzu.cpython-312-darwin.so 0x59aa58
#9 _kuzu.cpython-312-darwin.so 0x5aca08
#10 _kuzu.cpython-312-darwin.so 0x9edec kuzu::common::TaskScheduler::runWorkerThread()
#11 _kuzu.cpython-312-darwin.so 0x9f5e0
#12 libsystem_pthread.dylib 0x6c08 _pthread_start
#13 libsystem_pthread.dylib 0x1ba8 thread_start
The null-pointer + 8-byte deref (0x0000000000000008) in NodeTable::lookupPK during a getNextTuple looks like a use-after-free or torn read of an internal storage pointer — likely a concurrency bug between the LangExtract triplet-writer thread and Kuzu's own query worker.
Server log around death
Just before the kill, the log shows normal LangExtract activity — successful OpenAI calls every few seconds, periodic graph_snapshot.write: wrote snapshot events. No Python exception, no traceback. The last line in every case is:
/Users/.../python3.12/multiprocessing/resource_tracker.py:255: UserWarning:
resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
That resource_tracker warning is what Python prints during cleanup when the interpreter is being torn down by an external signal — consistent with the kernel SIGSEGV.
v10.0.6 / v10.0.7 self-heal observed
After each crash, restart correctly detected the Kuzu DB as corrupted (file lock held by dead PID) and:
[WARNING] agent_brain_server.storage.graph_store:
Kuzu graph store at .../kuzu_db appears corrupted (likely from a prior process kill
mid-indexing): IO exception: Could not set lock on file ...
Renaming to .corrupted-<ts> and starting fresh.
[INFO] agent_brain_server.storage.graph_store:
Quarantined corrupted Kuzu files: db=.../kuzu_db.corrupted-20260528T005033Z
[WARNING] agent_brain_server.storage.graph_store:
Restored 110 triplets from snapshot snapshot-2026-05-28T00-50-33Z.json
after recovering corrupted kuzu_db at .../kuzu_db
The self-heal is excellent — it preserved a 2.4 GB corrupted DB for forensics and restored the latest snapshot. But the underlying segfault was not fixed by 10.0.6/10.0.7; it just got better recovery. Each subsequent indexing run re-creates the conditions for the same crash.
Workaround
Switch graph store to in-memory JSON:
graphrag:
store_type: "simple" # was: "kuzu"
This routes around the native code entirely. Trade-off: graph state lives in memory + JSON snapshots, but no SIGSEGV.
Suggestions
-
The narrowest likely fix is a lock/refcount issue inside NodeTable::lookupPK when called from a query worker while another writer thread is mutating the same node. Either:
- Take a read lock on the page in
lookupPK before dereferencing the index entry, or
- Validate the entry pointer before reading offset 0x8.
-
Consider adding a config option to bound the Kuzu DB size at which agent-brain index automatically pauses + checkpoints (e.g. graphrag.kuzu_max_db_mb) as a defense-in-depth measure for users hitting this in the wild.
-
The Kuzu version pinned in 10.0.7's extras (kuzu==0.11.3) is a few releases old at time of writing — worth checking whether Kuzu themselves have shipped a fix for this code path in a newer point release before adopting upstream.
Happy to capture and share the full .ips file privately if it would help — it's 122 KB of JSON and contains stack traces for all threads, not just the faulting one.
Summary
agent-brain indexreliably crashes withSIGSEGV(segmentation fault) insidekuzu::storage::NodeTable::lookupPKduring sustained GraphRAG indexing, whengraphrag.store_type: kuzuandgraphrag.doc_extractor: langextractare enabled. Observed three times on the same corpus, each time after 2–3 hours of indexing and when the Kuzu DB has grown past ~1–2 GB. Vector + BM25 indexing is unaffected; the crash is isolated to the Kuzu native worker.Workaround: switching
graphrag.store_type: kuzu→simpleeliminates the crash (no native code on the hot path).Environment
agent-brain-cliuv tool install --with "agent-brain-rag[graphrag-all]==10.0.7" agent-brain-cli==10.0.7)agent-brain-ragkuzullama-index-graph-stores-kuzulangextractConfig (relevant subset)
Reproduction
agent-brain index ./corpus_dir \ --chunk-size 1024 \ --chunk-overlap 100 \ --exclude-patterns "**/images/**"Observed crashes
The crash report (Apple
.ipsformat) is for an interim processpid 88888that segfaulted within 5 minutes of restart — likely while opening the Kuzu DB that had been left locked by the killedpid 59334.Crash report
Faulting thread backtrace (thread 15)
The null-pointer + 8-byte deref (
0x0000000000000008) inNodeTable::lookupPKduring agetNextTuplelooks like a use-after-free or torn read of an internal storage pointer — likely a concurrency bug between the LangExtract triplet-writer thread and Kuzu's own query worker.Server log around death
Just before the kill, the log shows normal LangExtract activity — successful OpenAI calls every few seconds, periodic
graph_snapshot.write: wrote snapshotevents. No Python exception, no traceback. The last line in every case is:That
resource_trackerwarning is what Python prints during cleanup when the interpreter is being torn down by an external signal — consistent with the kernel SIGSEGV.v10.0.6 / v10.0.7 self-heal observed
After each crash, restart correctly detected the Kuzu DB as corrupted (file lock held by dead PID) and:
The self-heal is excellent — it preserved a 2.4 GB corrupted DB for forensics and restored the latest snapshot. But the underlying segfault was not fixed by 10.0.6/10.0.7; it just got better recovery. Each subsequent indexing run re-creates the conditions for the same crash.
Workaround
Switch graph store to in-memory JSON:
This routes around the native code entirely. Trade-off: graph state lives in memory + JSON snapshots, but no SIGSEGV.
Suggestions
The narrowest likely fix is a lock/refcount issue inside
NodeTable::lookupPKwhen called from a query worker while another writer thread is mutating the same node. Either:lookupPKbefore dereferencing the index entry, orConsider adding a config option to bound the Kuzu DB size at which
agent-brain indexautomatically pauses + checkpoints (e.g.graphrag.kuzu_max_db_mb) as a defense-in-depth measure for users hitting this in the wild.The Kuzu version pinned in 10.0.7's extras (
kuzu==0.11.3) is a few releases old at time of writing — worth checking whether Kuzu themselves have shipped a fix for this code path in a newer point release before adopting upstream.Happy to capture and share the full
.ipsfile privately if it would help — it's 122 KB of JSON and contains stack traces for all threads, not just the faulting one.