Instrument recall trace so phase metrics account for total duration (#2361)#2371
Merged
Merged
Conversation
…2361) _search_with_retries only recorded ~10-15% of total_duration_seconds as named phase metrics; the rest sat in un-instrumented blocks (backend acquisition, combined scoring, chunk/source-fact/entity enrichment, result serialization). Add a phase metric for each, and split the combined-scoring work out of the reranking metric (which captured its duration before scoring ran). Mark the per-method retrieval splits, pool waits, and trace_finalize as diagnostic (they overlap parallel_retrieval or fall outside the total window) so they are excluded from the coverage sum. Adds test_trace_phase_coverage asserting the non-diagnostic phases sum to total without exceeding it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2361.
SearchTrace.summary.phase_metrics[]only accounted for ~10-15% oftotal_duration_seconds; the rest sat in un-instrumented blocks of_search_with_retries. This adds a phase metric for each missing block so the named phases now account for ~100% of the wall clock.New phase metrics
backend_acquisition_get_read_backend()combined_scoringapply_combined_scoring+ boosts + final sort + reranked-entry trace serializationrerankingmetric capturedstep_durationbefore this block ranprefer_observations_dedupawaitchunk_fetchresult_serializationsource_fact_fetchentity_buildMemoryFact/EntityStateconstructionDiagnostic phases (excluded from the coverage sum)
Flagged
details.diagnosticbecause they overlap or fall outside the timeline:retrieval_{semantic,bm25,graph,temporal}— children of theparallel_retrievalumbrella (summing both double-counts retrieval)semaphore_wait/connection_wait— surfaced as the issue requested; these waits precede/overlap other phasestrace_finalize—finalize()snapshotstotal_duration_secondsat entry, so its own serialization is necessarily after the total; reported for visibility but not part of the sumOn the proposed CI assertion
The issue's
assert abs(total - named) < 0.005can't hold as written: the phase list intentionally keeps an umbrella (parallel_retrieval) plus per-method sub-phases, andtrace_finalizeruns after the total snapshot. Instead,test_trace_phase_coveragesums only the non-diagnostic partition and asserts it covers total within a small tolerance and never exceeds it — a guard against both un-instrumented regressions and double-counting. Measured 100% coverage locally.Testing
tests/test_search_trace.py::test_trace_phase_coverage(new) + existingtest_search_with_tracepass./scripts/hooks/lint.sh,ruff,tyall clean