feat(experimentation): experiment results model, task and endpoints#7796
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7796 +/- ##
========================================
Coverage 98.58% 98.59%
========================================
Files 1466 1467 +1
Lines 57010 57332 +322
========================================
+ Hits 56203 56525 +322
Misses 807 807 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
acb3e36 to
d94bd85
Compare
85c5982 to
7db02c3
Compare
4f4876e to
94671bf
Compare
94671bf to
694bf2c
Compare
Docker builds report
|
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Failed testsfirefox › tests/project-permission-test.pw.ts › Project Permission Tests › Project-level permissions control access to features, environments, audit logs, and segments @enterprise Details
Failed testsfirefox › tests/project-permission-test.pw.ts › Project Permission Tests › Project-level permissions control access to features, environments, audit logs, and segments @enterprise Details
Failed testsfirefox › tests/project-permission-test.pw.ts › Project Permission Tests › Project-level permissions control access to features, environments, audit logs, and segments @enterprise Details
Failed testsfirefox › tests/project-permission-test.pw.ts › Project Permission Tests › Project-level permissions control access to features, environments, audit logs, and segments @enterprise Details
Failed testsfirefox › tests/project-permission-test.pw.ts › Project Permission Tests › Project-level permissions control access to features, environments, audit logs, and segments @enterprise Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
|
Visual Regression19 screenshots compared. See report for details. |
Zaimwa9
left a comment
There was a problem hiding this comment.
Looking good! Thanks. Wish we had the control as a MV option.
If there are some details to finetune i'll take it up from here while implementing the frontend
Persist per-experiment Bayesian results and expose them. Add ExperimentResults on a shared abstract ExperimentComputation base (with ExperimentExposures), compute_results_summary orchestration (metric specs + expected SRM split, with srm.unkeyed_variant / srm.overallocated skips), the compute_experiment_results task, and the GET/POST results endpoints with refresh-request validation.
Thanks for submitting a PR! Please check the boxes below:
docs/if required so people know about the feature.Changes
Contributes to the experiment results stats layer (stacked on #7781; merge after it).
So the experiment detail page can show per-metric results — lift, chance-to-win, and a sample-ratio-mismatch check — this computes them from the warehouse on demand and stores one row per experiment, updated in place (mirroring the exposures panel).
ExperimentResultsmodel (migration0008), on a new abstractExperimentComputationbase now shared withExperimentExposures:as_of/payload/last_error_at/refresh_requested_at,is_final, andrecord_refresh/record_failure/record_refresh_request.compute_results_summary(services.py): derives the metric specs and the expected SRM split from the environment's multivariate allocations (control= unallocated remainder), then runs the feat(experimentation): results aggregation query and payload builder #7781 aggregation and feat(experimentation): Bayesian stats kernel #7769 kernel. SRM is skipped (andsrm.unkeyed_variantlogged) when an option has no variant key to attribute its share to.compute_experiment_resultstask: recomputes the full window; on warehouse failure keeps the last good payload and logsresults.compute_failed.GET …/results/andPOST …/results/refresh/: read the row, or enqueue a refresh (202)._validate_refresh_requestraisesValidationErrorbefore start / once final (400) andThrottledwithin the refresh interval (429+Retry-After).How did you test this code?
make testfor the experimentation app — unit tests for the model, task,compute_results_summary/_experiment_metric_specs/_expected_variant_shares, and both endpoints, at 100% diff coverage.mypyandruffclean.