fix: restore over_time time slider behavior in examples#857
fix: restore over_time time slider behavior in examples#857
Conversation
Over_time examples were creating separate tabs per time snapshot instead of single tabs with interactive time sliders, breaking the expected user experience in published docs. Changes: - Add _is_over_time_series_continuation() to detect time series continuations - Modify append_result() to only add time labels for non-continuation results - Preserve single-tab behavior for plot_sweep() calls with same title - Restore interactive time slider functionality for over_time examples Fixes issue where commit e033fbe inadvertently changed over_time behavior from aggregated time sliders to separate timestamped tabs.
Reviewer's GuideRestores the original single-tab over_time time slider behavior by detecting when new over_time results should be aggregated with existing series and only appending time labels for genuinely separate experiments. Sequence diagram for append_result over_time continuation detectionsequenceDiagram
participant Caller
participant BenchReport
participant BenchResult
participant BenchCfg
participant DataSet
participant Coords
participant Panel
Caller->>BenchReport: append_result(bench_res)
BenchReport->>BenchResult: get bench_cfg
BenchResult-->>BenchReport: bench_cfg
BenchReport->>BenchCfg: get title
BenchCfg-->>BenchReport: title
BenchReport->>BenchReport: _time_event_label(bench_res)
BenchReport-->>BenchReport: label or null
alt label is not null
BenchReport->>BenchReport: _is_over_time_series_continuation(bench_res)
BenchReport->>BenchResult: get bench_cfg
BenchResult-->>BenchReport: bench_cfg
BenchReport->>BenchCfg: read over_time
BenchCfg-->>BenchReport: over_time
BenchReport->>BenchResult: get ds
BenchResult-->>BenchReport: ds
BenchReport->>DataSet: get coords
DataSet-->>BenchReport: coords
BenchReport->>Coords: contains(over_time)
Coords-->>BenchReport: bool
alt over_time and coord present
BenchReport->>BenchReport: iterate bench_results
BenchReport-->>BenchReport: found existing_res with same title?
end
BenchReport-->>BenchReport: should_add_time_label
alt should_add_time_label
BenchReport->>BenchReport: title = title + " [" + label + "]"
end
end
BenchReport->>BenchReport: bench_results.append(bench_res)
BenchReport->>BenchResult: plot()
BenchResult-->>BenchReport: pane
BenchReport->>Panel: append_tab(pane, title)
Panel-->>Caller: updated UI tab
Class diagram for updated BenchReport over_time aggregation logicclassDiagram
class BenchReport {
list~BenchResult~ bench_results
_time_event_label(bench_res: BenchResult) str
_is_over_time_series_continuation(bench_res: BenchResult) bool
append_result(bench_res: BenchResult) void
append_to_result(bench_res: BenchResult, pane: pn_panel) void
append_tab(pane: pn_panel, title: str) void
}
class BenchResult {
BenchCfg bench_cfg
DataSet ds
plot() pn_panel
}
class BenchCfg {
str title
bool over_time
}
class DataSet {
Coords coords
}
class Coords {
contains(key: str) bool
}
BenchReport "1" --> "*" BenchResult : bench_results
BenchResult "1" --> "1" BenchCfg : bench_cfg
BenchResult "1" --> "1" DataSet : ds
DataSet "1" --> "1" Coords : coords
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
_is_over_time_series_continuation()implementation doesn’t fully match its docstring: it only checks the incomingbench_resforover_time/coords but doesn’t verify that any matching existing result is itself anover_timeseries, which can lead to unintended aggregation when a non-over_time result happens to share the same title. - Relying on
bench_cfg.titleequality as the sole key for series continuation may be brittle if titles are reused for logically distinct experiments; consider also keying on a more stable identifier (e.g., a run ID or config hash) to avoid accidental merging.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_is_over_time_series_continuation()` implementation doesn’t fully match its docstring: it only checks the incoming `bench_res` for `over_time`/coords but doesn’t verify that any matching existing result is itself an `over_time` series, which can lead to unintended aggregation when a non-over_time result happens to share the same title.
- Relying on `bench_cfg.title` equality as the sole key for series continuation may be brittle if titles are reused for logically distinct experiments; consider also keying on a more stable identifier (e.g., a run ID or config hash) to avoid accidental merging.
## Individual Comments
### Comment 1
<location path="bencher/bench_report.py" line_range="85-94" />
<code_context>
label = label[:57] + "..."
return label
+ def _is_over_time_series_continuation(self, bench_res: BenchResult) -> bool:
+ """Check if this over_time result is a continuation of an existing time series.
+
+ Returns True if there's already a result with the same title that has over_time,
+ indicating this should be part of the same aggregated time series.
+ """
+ if not bench_res.bench_cfg.over_time or "over_time" not in bench_res.ds.coords:
+ return False
+
+ base_title = bench_res.bench_cfg.title
+
+ # Check if we already have a result with this exact title (no time label)
+ for existing_res in self.bench_results: # Check all existing results
+ if existing_res.bench_cfg.title == base_title:
+ return True
</code_context>
<issue_to_address>
**issue (bug_risk):** Filter by over_time when deciding if this is a continuation.
The implementation doesn’t match the docstring: it only checks for matching titles and never confirms that any `existing_res` is actually over_time. As a result, a non-over_time result with the same title will cause a later over_time result to be treated as a continuation. In the loop, also check something like `existing_res.bench_cfg.over_time` or `"over_time" in existing_res.ds.coords` so you only aggregate into true over_time series.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def _is_over_time_series_continuation(self, bench_res: BenchResult) -> bool: | ||
| """Check if this over_time result is a continuation of an existing time series. | ||
|
|
||
| Returns True if there's already a result with the same title that has over_time, | ||
| indicating this should be part of the same aggregated time series. | ||
| """ | ||
| if not bench_res.bench_cfg.over_time or "over_time" not in bench_res.ds.coords: | ||
| return False | ||
|
|
||
| base_title = bench_res.bench_cfg.title |
There was a problem hiding this comment.
issue (bug_risk): Filter by over_time when deciding if this is a continuation.
The implementation doesn’t match the docstring: it only checks for matching titles and never confirms that any existing_res is actually over_time. As a result, a non-over_time result with the same title will cause a later over_time result to be treated as a continuation. In the loop, also check something like existing_res.bench_cfg.over_time or "over_time" in existing_res.ds.coords so you only aggregate into true over_time series.
Performance Report for
|
| Metric | Value |
|---|---|
| Total tests | 980 |
| Total time | 89.65s |
| Mean | 0.0915s |
| Median | 0.0020s |
Top 10 slowest tests
| Test | Time (s) |
|---|---|
test.test_bench_examples.TestBenchExamples::test_example_meta |
22.902 |
test.test_over_time_save_perf::test_save_faster_without_aggregated_tab |
5.425 |
test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool] |
3.956 |
test.test_generated_examples::test_generated_example[result_types/result_image/result_image_to_video.py] |
3.067 |
test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max |
2.068 |
test.test_optuna_result.TestOptunaResult::test_collect_optuna_plots_with_repeats |
1.054 |
test.test_optuna_result.TestOptunaReportRouting::test_optuna_plots_per_sweep_tab |
1.042 |
test.test_bencher.TestBencher::test_bench_cfg_hash |
0.959 |
test.test_result_bool.TestVolumeResult::test_volume_3float_multi_repeat |
0.891 |
test.test_over_time_repeats.TestShowAggregatedTimeTab::test_curve_aggregated_tab_absent_when_disabled |
0.854 |
Updated by Performance Tracking workflow
Apply pre-commit formatting fixes: - Standardize indentation and whitespace - Consolidate multi-line assignment - Remove extra blank lines These changes ensure CI pre-commit checks pass.
Performance Report for
|
| Metric | Value |
|---|---|
| Total tests | 980 |
| Total time | 88.75s |
| Mean | 0.0906s |
| Median | 0.0020s |
Top 10 slowest tests
| Test | Time (s) |
|---|---|
test.test_bench_examples.TestBenchExamples::test_example_meta |
22.871 |
test.test_over_time_save_perf::test_save_faster_without_aggregated_tab |
4.990 |
test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool] |
3.836 |
test.test_generated_examples::test_generated_example[result_types/result_image/result_image_to_video.py] |
2.921 |
test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max |
2.440 |
test.test_bench_runner.TestBenchRunner::test_benchrunner_unified_interface |
1.197 |
test.test_optuna_result.TestOptunaResult::test_collect_optuna_plots_with_repeats |
1.054 |
test.test_optuna_result.TestOptunaReportRouting::test_optuna_plots_per_sweep_tab |
1.017 |
test.test_result_bool.TestVolumeResult::test_volume_3float_multi_repeat |
0.860 |
test.test_bencher.TestBencher::test_combinations |
0.825 |
Updated by Performance Tracking workflow
Performance Report for
|
| Metric | Value |
|---|---|
| Total tests | 1019 |
| Total time | 102.21s |
| Mean | 0.1003s |
| Median | 0.0010s |
Top 10 slowest tests
| Test | Time (s) |
|---|---|
test.test_bench_examples.TestBenchExamples::test_example_meta |
22.934 |
test.test_over_time_save_perf::test_save_faster_without_aggregated_tab |
5.393 |
test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool] |
3.694 |
test.test_generated_examples::test_generated_example[advanced/advanced_cartesian_animation.py] |
2.928 |
test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max |
2.928 |
test.test_generated_examples::test_generated_example[result_types/result_image/result_image_to_video.py] |
2.864 |
test.test_bench_runner.TestBenchRunner::test_benchrunner_unified_interface |
1.289 |
test.test_optuna_result.TestOptunaResult::test_collect_optuna_plots_with_repeats |
1.278 |
test.test_bencher.TestBencher::test_combinations_over_time |
1.276 |
test.test_bencher.TestBencher::test_combinations |
1.087 |
Updated by Performance Tracking workflow
Problem
The recent commit e033fbe (which fixed panel labeling) inadvertently broke the expected behavior of over_time examples in published documentation:
plot_sweep()call withtime_srccreated separate tabs with timestamp labels likeover_time [2000-01-01 00:00:00]This meant that meta-generated examples that were supposed to show interactive time sliders instead showed multiple disconnected tabs, breaking the user experience in published docs.
Solution
This PR restores the original behavior by implementing smart detection of over_time series continuations:
Changes Made
Added
_is_over_time_series_continuation()method: Detects when an over_time result is part of an existing time series with the same base titleModified
append_result()logic: Only adds timestamp labels when results are NOT part of a time series continuationPreserved single-tab behavior: Multiple
plot_sweep()calls with the same title now aggregate into one result with time snapshotsResult
Testing
Fixes the issue introduced by commit e033fbe while preserving its intended panel labeling improvements.
Summary by Sourcery
Restore correct aggregation behavior for over_time benchmark results so that multiple snapshots appear under a single tab with a time slider instead of separate timestamped tabs.
Bug Fixes:
Enhancements: