Instrument recall trace so phase metrics account for total duration (#2361) by nicoloboschi · Pull Request #2371 · vectorize-io/hindsight

nicoloboschi · 2026-06-23T11:59:51Z

Summary

Fixes #2361. SearchTrace.summary.phase_metrics[] only accounted for ~10-15% of total_duration_seconds; the rest sat in un-instrumented blocks of _search_with_retries. This adds a phase metric for each missing block so the named phases now account for ~100% of the wall clock.

New phase metrics

Phase	Covers	Why it was missing
`backend_acquisition`	`_get_read_backend()`	timed inside total, never recorded
`combined_scoring`	`apply_combined_scoring` + boosts + final sort + reranked-entry trace serialization	the `reranking` metric captured `step_duration` before this block ran
`prefer_observations_dedup`	Step 4.8 supersession DB query	uninstrumented `await`
`chunk_fetch`	obs-source + chunk SQL + tiktoken encoding	no metric
`result_serialization`	visit recording (O(n²) entry-point check) + dict conversion	no metric
`source_fact_fetch`	source-fact SQL passes	no metric
`entity_build`	entity SQL + `MemoryFact`/`EntityState` construction	no metric

Diagnostic phases (excluded from the coverage sum)

Flagged details.diagnostic because they overlap or fall outside the timeline:

retrieval_{semantic,bm25,graph,temporal} — children of the parallel_retrieval umbrella (summing both double-counts retrieval)
semaphore_wait / connection_wait — surfaced as the issue requested; these waits precede/overlap other phases
trace_finalize — finalize() snapshots total_duration_seconds at entry, so its own serialization is necessarily after the total; reported for visibility but not part of the sum

On the proposed CI assertion

The issue's assert abs(total - named) < 0.005 can't hold as written: the phase list intentionally keeps an umbrella (parallel_retrieval) plus per-method sub-phases, and trace_finalize runs after the total snapshot. Instead, test_trace_phase_coverage sums only the non-diagnostic partition and asserts it covers total within a small tolerance and never exceeds it — a guard against both un-instrumented regressions and double-counting. Measured 100% coverage locally.

Testing

tests/test_search_trace.py::test_trace_phase_coverage (new) + existing test_search_with_trace pass
./scripts/hooks/lint.sh, ruff, ty all clean

…2361) _search_with_retries only recorded ~10-15% of total_duration_seconds as named phase metrics; the rest sat in un-instrumented blocks (backend acquisition, combined scoring, chunk/source-fact/entity enrichment, result serialization). Add a phase metric for each, and split the combined-scoring work out of the reranking metric (which captured its duration before scoring ran). Mark the per-method retrieval splits, pool waits, and trace_finalize as diagnostic (they overlap parallel_retrieval or fall outside the total window) so they are excluded from the coverage sum. Adds test_trace_phase_coverage asserting the non-diagnostic phases sum to total without exceeding it.

nicoloboschi merged commit 9e06237 into main Jun 23, 2026
92 of 98 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrument recall trace so phase metrics account for total duration (#2361)#2371

Instrument recall trace so phase metrics account for total duration (#2361)#2371
nicoloboschi merged 1 commit into
mainfrom
fix/recall-trace-phase-coverage-2361

nicoloboschi commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicoloboschi commented Jun 23, 2026

Summary

New phase metrics

Diagnostic phases (excluded from the coverage sum)

On the proposed CI assertion

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant