[Draft] feat(vardiff): decline-safe vardiff champion + simulation framework + proof#2154
[Draft] feat(vardiff): decline-safe vardiff champion + simulation framework + proof#2154gimballock wants to merge 6 commits into
Conversation
11b2560 to
88d8d1d
Compare
|
The code is cheap and only meant to demonstrate the feasibility, but the concept ack revolves around these points imo:
|
| | share/min | rate | p10 | p50 | p90 | p99 | | ||
| | --- | --- | --- | --- | --- | --- | | ||
| | 6 | 83.3% | 10m | 12m | 21m | 25m | | ||
| | 12 | 95.4% | 10m | 10m | 20m | 25m | | ||
| | 30 | 99.5% | 10m | 10m | 15m | 25m | | ||
| | 60 | 100.0% | 10m | 10m | 10m | 20m | | ||
| | 120 | 100.0% | 10m | 10m | 10m | 15m | |
There was a problem hiding this comment.
The first row here shows results of the convergence time test for the the default case (6 spm).
The convergence times are between 10 and 25 minutes, with total failures to converge (w/in quiet_window_secs of simulated time) occurring 17% of the time!
the next few rows describe the results for faster share rates. the most extreme times (25m reduces to 15m) and the total failure cases generally disappear around 30 spm.
| ## Settled accuracy (stable load, post-convergence) | ||
|
|
||
| `|final_hashrate / true_hashrate - 1|` at trial end. Smaller is better. | ||
|
|
||
| | share/min | p10 | p50 | p90 | p99 | | ||
| | --- | --- | --- | --- | --- | | ||
| | 6 | 0.0% | 4.9% | 23.6% | 70.3% | | ||
| | 12 | 0.0% | 0.0% | 12.3% | 26.9% | | ||
| | 30 | 0.0% | 0.0% | 0.8% | 15.6% | | ||
| | 60 | 0.0% | 0.0% | 0.0% | 3.1% | | ||
| | 120 | 0.0% | 0.0% | 0.0% | 0.0% | | ||
|
|
||
| ## Steady-state jitter (fires per minute) | ||
|
|
||
| Post-convergence rate of vardiff fires. Smaller is better — ideal is zero under stable load. | ||
|
|
||
| | share/min | p50 | p90 | p99 | mean | | ||
| | --- | --- | --- | --- | --- | | ||
| | 6 | 0.000 | 0.200 | 0.385 | 0.059 | | ||
| | 12 | 0.000 | 0.077 | 0.217 | 0.019 | | ||
| | 30 | 0.000 | 0.000 | 0.067 | 0.002 | | ||
| | 60 | 0.000 | 0.000 | 0.000 | 0.000 | | ||
| | 120 | 0.000 | 0.000 | 0.000 | 0.000 | |
There was a problem hiding this comment.
These two metrics (proximity to true hashrate and post-converged adjustments) show a similar trend,
Lots of undesired behavior in the extreme cases (top 10%, top 1%) of 6 shares/min case that is alleviated at higher share rates.
| ## Reaction time to a 50% drop (step at 15 min) | ||
|
|
||
| | share/min | reacted | p10 | p50 | p90 | p99 | | ||
| | --- | --- | --- | --- | --- | --- | | ||
| | 6 | 69.7% | 1m | 3m | 5m | 5m | | ||
| | 12 | 54.8% | 1m | 3m | 5m | 5m | | ||
| | 30 | 32.6% | 2m | 4m | 5m | 5m | | ||
| | 60 | 16.3% | 3m | 5m | 5m | 5m | | ||
| | 120 | 8.6% | 4m | 5m | 5m | 5m | | ||
|
|
||
| ## Reaction sensitivity (P[fire within 5 min of step change]) | ||
|
|
||
| | Δ% | 6 | 12 | 30 | 60 | 120 | | ||
| | --- | --- | --- | --- | --- | --- | | ||
| | -50% | 0.70 | 0.55 | 0.33 | 0.16 | 0.09 | | ||
| | -25% | 0.44 | 0.23 | 0.08 | 0.00 | 0.00 | | ||
| | -10% | 0.39 | 0.15 | 0.02 | 0.00 | 0.00 | | ||
| | -5% | 0.40 | 0.15 | 0.02 | 0.00 | 0.00 | | ||
| | +5% | 0.39 | 0.13 | 0.02 | 0.00 | 0.00 | | ||
| | +10% | 0.42 | 0.17 | 0.03 | 0.00 | 0.00 | | ||
| | +25% | 0.48 | 0.23 | 0.07 | 0.01 | 0.00 | | ||
| | +50% | 0.64 | 0.47 | 0.32 | 0.22 | 0.29 | |
There was a problem hiding this comment.
These tables show how long it takes for vardiff to respond to an unexpected change in hashrate. Where the changes are to either increase or decrease by proportional amounts anywhere from 5% to 50%.
The first table specifically looks at a 50% draw down showing that a full 30% of the time vardiff fails to adjust after 5 min. The next few rows show that the situation worsens at higher share rates, at 120 spm 91% of the trials failed to adjust after 5m.
The second table shows that this effect is basically the same for hashrate changes in the opposite direction and also that changes of lesser magnitude respond much more quickly.
| //! - **Convergence rate**: `current >= baseline - 0.01` | ||
| //! - **Convergence p90**: `current <= baseline * 1.10` | ||
| //! - **Settled accuracy p50 / p90**: `current <= baseline * 1.15` | ||
| //! - **Jitter p50**: `current <= baseline + 0.02` (absolute; baseline can be near zero) | ||
| //! - **Jitter p95**: `current <= baseline * 1.25` | ||
| //! - **Reaction rate**: `current >= baseline - 0.02` | ||
| //! - **Reaction p50**: `current <= baseline * 1.20` | ||
| //! - **Sensitivity at large |Δ| (|Δ| >= 50%)**: `current >= baseline - 0.02` | ||
| //! - **Sensitivity at small |Δ| (|Δ| <= 5%)**: `current <= baseline + 0.05` |
There was a problem hiding this comment.
Convergence rate: Must be no more than 1% slower than the baseline convergence time
Convergence p90: The slowest 10% convergence times must be within 10% of the baseline's convergence time
Settled accuracy: must be within 15% of baseline's accuracy for the slowest 50% / 10%
Jitter p50/p95: must be within 2% and 25% of baseline
...etc.
You see the pattern, there are lots of magic thresholds in this portion of the code that are arbitrarily chosen at this point and fair game for analysis.
5cbed7c to
85d6f8b
Compare
|
after some optimization I got
|
I'm so excited to see people other people nerding out on vardiff with me! Thank you! A couple things I noticed in your results, the 2m convergence time is impressive but your response to a 50% hashrate drop only succeeds in readjusting 4.4% of the time. I'm not sure how best to balance those two metrics but probably not one at the expense of the other. |
2d10f57 to
414afbb
Compare
211bc98 to
2a88fde
Compare
|
Some learning's I had @gimballock
|
63a19d0 to
a18c3a3
Compare
|
Thanks for these insights @adammwest — especially the point about fitness decomposition and normalization. A lot of what you're describing matches the evolution I've gone through on this PR, so let me give a timeline of Phase 1: Basic metrics + simulation harness Initially I focused on three metrics I thought were important: convergence time, jitter, and accuracy. These were evaluated via a time-compressed simulation that replays a synthetic share stream through the vardiff Phase 2: Decomposed pipeline model I wanted to make algorithm search more systematic, so I decomposed "a vardiff algorithm" into four independent, replaceable components: estimator, statistic, boundary, and decision rule. The idea was to mix-and-match This model worked well for the classic algorithm, the parametric variant, and the EWMA approach. But when I tried to embed a Bayesian model, it broke down — the components aren't truly independent. There's a sequential The resulting three-stage pipeline (Estimator → Boundary → UpdateRule) is what's in this PR. It successfully hosts the classic algorithm, EWMA, AdaCUSUM, and could host a Bayesian approach. Phase 3: Aggregate fitness metric To your point about "how you combine all metrics into a final value" — we now have a configurable aggregate metric that allows weighting across the underlying measurements. This addresses exactly the gaming concern you Your suggestion to separate fitness into improvement vs. regression categories per scenario group (stable, coldstart, reaction) is a good one. Currently the regression test does compare per-cell, so a coldstart regression Phase 4: Realistic operating conditions After discussions with hardware engineers, I retuned the test scenarios to realistic share rates (2–30 spm instead of the earlier 6–120 range). The engineers confirmed that responding to partial hardware failures and Current direction I've backed off from prioritizing convergence speed after seeing overcorrection in practice. The current focus is on:
On your point about normalization: agreed, and the per-metric tolerance budgets in the regression test (absolute slack + optional multiplicative slack) are our current mechanism for this. Open to suggestions on better |
|
Hi, this is quite a detailed analysis and great decomposition of relevant parts. Have you had a look at how ckpool implements this? I think he has quite naturally arrived at a very optimal state, balancing the different metrics. This is the repo, you'll have to grep through to find the vardiff implementation: https://github.com/ckolivas/ckpool I've re-implemented his approach in Rust as well and made it a bit more configurable here: https://github.com/parasitepool/para/blob/master/src/vardiff.rs Would be interesting to see how that algorithm performs in your benchmarks. |
Thanks for the info @paratoxicdev , I will add this to my investigation. I know that is a sv1 native pool but i will see what bits of his research crossover to sv2 context and see if it's competitive! |
|
Here is a breakdown of the calibration comparisons I made between the real hashrate tests and the simulation results confirming the predictions with the understanding that; with this algorithm responsiveness scales with the age of the connection. Suggesting that this detail be included in the simulation so metrics are more directly comparable: |
006363a to
a58132b
Compare
Clean extraction of the best-performing vardiff algorithm from the simulation framework in stratum-mining#2154, with all test scaffolding, traits, and alternative algorithm implementations removed. The previous VardiffState used a fixed time-dependent threshold ladder and full retarget. This produced: - 6.6% median settled error (p99: 30% at low SPM) - 5–9 minute cold-start convergence (p90) - 33% detection rate for 10% hashrate declines (thermal throttle, failing ASICs) - 28% target overshoot during cold-start ramp (p99 at SPM 6) The new algorithm (EWMA + adaptive boundary + accelerating partial retarget): - Settled accuracy: <3% median error across all SPM - Cold-start overshoot bounded to <10% (was 28%) - Jitter: 0.03 fires/min at low SPM (was 0.06) — half the unnecessary retargets - Small-change detection: 85% reaction to -10% steps at SPM 6 (was 33%) - Transient disconnects recover in 1–2 fires rather than requiring a full cold-start ramp (20%/fire partial retarget vs old algo's 50–67% slash) - Asymmetric cost: loosening fires 3x faster than tightening, because loosening is free but tightening rejects in-flight shares Breaking: adds private fields to VardiffState (previously all-pub). Requires channels_sv2 major version bump. Public constructor API (new, new_with_min) and Vardiff trait interface are unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clean extraction of the best-performing vardiff algorithm from the simulation framework in stratum-mining#2154, with all test scaffolding, traits, and alternative algorithm implementations removed. The previous VardiffState used a fixed time-dependent threshold ladder and full retarget. This produced: - 6.6% median settled error (p99: 30% at low SPM) - 5–9 minute cold-start convergence (p90) - 33% detection rate for 10% hashrate declines (thermal throttle, failing ASICs) - 28% target overshoot during cold-start ramp (p99 at SPM 6) The new algorithm (EWMA + adaptive boundary + accelerating partial retarget): - Settled accuracy: <3% median error across all SPM - Cold-start overshoot bounded to <10% (was 28%) - Jitter: 0.03 fires/min at low SPM (was 0.06) — half the unnecessary retargets - Small-change detection: 85% reaction to -10% steps at SPM 6 (was 33%) - Transient disconnects recover in 1–2 fires rather than requiring a full cold-start ramp (20%/fire partial retarget vs old algo's 50–67% slash) - Asymmetric cost: loosening fires 3x faster than tightening, because loosening is free but tightening rejects in-flight shares Breaking: adds private fields to VardiffState (previously all-pub). Requires channels_sv2 major version bump. Public constructor API (new, new_with_min) and Vardiff trait interface are unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
43d973d to
55e5dd0
Compare
Fold in the reviewer's substantive points and the external-validation data from PR stratum-mining#2154 that the reviewer wrote without seeing: - §1 front-matter (why the model): vardiff exists to hold every connection near r* despite orders-of-magnitude hashrate spread; r* bounds per-connection load, estimate variance, and reward variance. Poisson- in-log justified as forced, not chosen. States it as an evaluation (not control-design) model and lists what it omits (within-window steady H, one worker/connection, lossless retargets, continuous uncapped D). - §8 rewritten "three questions, three views": why not one number (a time-integral hides when/what-kind, can't separate slow leak from transient) or an unconstrained radar (axis-order-dependent area; best-in-set rescaling) — only the fixed-reference construction and the trajectory's time axis survive. Rank/characterize/trust hierarchy. - §6 share-volume correction: under-difficulty runs the connection over r* permanently; resource cost is linear in excess volume, hence ~linear in |e| near operating point — a one-sided regret_under bump, no new form. Pricing it can only shrink the cost-optimal offset; magnitude modest for headroom pools (the common case, r*≈4–6 spm), convex only under tight per-connection quotas (older/stressed machines). Deliverable: measure c. - §9 NEW "Validation against real hardware": the shape-proxy tests on an Antminer S21 / testnet4 reproduced the classic mechanics quantitatively (jitter, −16.7%/fire, 300s cadence, overshoot, ±50% symmetry) and the counter-age dependence (5-min counter → 4.4min, 51-min → 51.8min); and the champion beat classic live on a halved-hashrate drop (classic: hours, wrong direction; champion: minutes, correct). Scopes the claim honestly: behaviorally validated on hardware, weights remain a calibrated judgment. Lists open HW tests (slow decline, multi-connection, measure c). - Errata from the prior pass retained: Theorem/Lemma reserved for proved results, §5/§6 → Argument, offset recast as the asymmetry-optimal quantile ≈−0.67·σ_eff, "unbiased" restored in falsifier (b), accuracy ceiling clarified as a noise band not a bias, Σs² cadence-cap assumption, scalar demoted with commensurability caveat, regret=tracking-loss disclaimer. Sections renumbered 1–11; status table gains HW + share- volume rows. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ac21e77 to
dc70e2e
Compare
c66784a to
f3ba47b
Compare
…erified deltas, body rewrite, vnprc reply (NOT pushed) PR stratum-mining#2188 (the clean upstream production extraction, distinct from stratum-mining#2154) was opened before the decline-safety arc and ships the SUPERSEDED contender, not the champion. Verified against source: stratum-mining#2188 = EWMA120/AsymCUSUM-at-10/0.2→0.4 (full_remedy- adjacent); champion (champion_composed, composed.rs:261) = EWMA360/AdaptiveSignPersist (seam 6)/AccelRetarget(0.2,0.6,0.05). All three stages differ. Also verified the vnprc review thread: his June-11 objection cites the EXACT job_id_to_target lines that became the corpus's §6(ii) no-lost-work retraction — he was right, independently, three weeks early. Two stale justifications now sit in the thread: vnprc killed stratum-mining#1 (in-flight shares); the arc killed stratum-mining#2 (the fitness sweep, my June-12 defense) by replacing fitness-selection with the decline-safety constraint. Prepared (held for owner timing): the champion extraction spec + cross-check, the PR-body rewrite (360 not 120, AdaptiveSignPersist not CUSUM-at-10, selection = decline-gate not fitness, remove the in-flight-shares framing), and a draft vnprc reply that closes the loop with the current (stronger) justification. The force-push to the public PR + the posted reply are the loud event — explicitly NOT done here; they go together on the owner's go. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Clean extraction of the best-performing vardiff algorithm from the simulation framework in stratum-mining#2154, with all test scaffolding, traits, and alternative algorithm implementations removed. The previous VardiffState used a fixed time-dependent threshold ladder and full retarget. This produced: - 6.6% median settled error (p99: 30% at low SPM) - 5–9 minute cold-start convergence (p90) - 33% detection rate for 10% hashrate declines (thermal throttle, failing ASICs) - 28% target overshoot during cold-start ramp (p99 at SPM 6) The new algorithm (EWMA + adaptive boundary + accelerating partial retarget): - Settled accuracy: <3% median error across all SPM - Cold-start overshoot bounded to <10% (was 28%) - Jitter: 0.03 fires/min at low SPM (was 0.06) — half the unnecessary retargets - Small-change detection: 85% reaction to -10% steps at SPM 6 (was 33%) - Transient disconnects recover in 1–2 fires rather than requiring a full cold-start ramp (20%/fire partial retarget vs old algo's 50–67% slash) - Asymmetric cost: loosening fires 3x faster than tightening, because loosening is free but tightening rejects in-flight shares Breaking: adds private fields to VardiffState (previously all-pub). Requires channels_sv2 major version bump. Public constructor API (new, new_with_min) and Vardiff trait interface are unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…itness-pick contender) This PR's flat-struct extraction shipped the mid-arc fitness-selected contender (EWMA 120 / AsymmetricCUSUM-at-10 / retarget 0.2→0.4). The arc's final selection criterion is decline-safety (a hard no-spiral constraint), not scalar fitness — and the gate-passing champion differs on every stage. Replace the params with the champion, verified parameter-for-parameter against champion_composed (the exact constructor the decline-safety gate guard builds and validates in stratum-mining#2154): - Estimator: EWMA 120s → 360s (the τ-safety-valley floor; shorter fails the gate). - Boundary: seam 10 → 6; high boundary AsymmetricCUSUM → sign-persistence CUSUM (adds the consecutive-tick discount); tighten_multiplier 3.0 → 8.0. The stronger asymmetry is the selection-criterion change made concrete: a hard decline-safety constraint demands more tightening-reluctance than a stability-weighted fitness score did. - Update: accelerating retarget eta_max 0.4 → 0.6, acceleration 0.2 → 0.05. Doc comment rewritten: the asymmetry is dangerous-direction (decline-safety) protection, NOT "tightening rejects in-flight shares" (which it doesn't — the pool validates each share against the per-job target snapshotted at job creation; this was a reviewer's correct catch, now retracted in the corpus). Tests updated to the champion's contract (they encoded the contender's looser single-tick behavior): the champion requires SUSTAINED evidence to tighten (8× multiplier + slow EWMA), so asymmetry is now asserted as a sustained-evidence property — deep loosening fires within minutes, symmetric tightening is the deliberately-reluctant direction. Stale comments (3× threshold, eta 0.4, seam 10) corrected; the low-spm PoissonCI test moved to spm=4 (genuinely below seam 6) so it exercises the branch it names. 14 vardiff tests pass; fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the all-pub threshold-ladder VardiffState with a flat-struct adaptive EWMA controller — the decline-safety-selected champion, the clean production extraction of the algorithm derived in stratum-mining#2154. Single-struct replacement: no trait scaffolding, no alternative implementations. Three inline stages (behaviorally identical to the champion_composed reference that passes the decline-safety gate in stratum-mining#2154): - Estimator: EWMA, tau=360s (the τ-safety-valley floor; shorter windows fail the slow-decline gate at sparse rates). - Boundary: adaptive at a seam of 6 spm — PoissonCI below (sparse-data conservatism), sign-persistence CUSUM at/above. The boundary protects the dangerous (tightening) direction two ways: an 8× asymmetric multiplier and a sign-persistence discount that relaxes only after consecutive same-direction ticks. This is decline-safety / dangerous-direction protection — NOT a "tightening rejects in-flight shares" cost (it doesn't: the pool validates each share against the per-job target snapshotted at job creation, which the vardiff path never mutates). - Update: accelerating partial retarget, eta 0.2 → 0.6. Selected by a decline-safety minimax (a hard no-spiral constraint), not a scalar fitness score — which is why the asymmetry is strong (8×): a safety constraint demands more tightening-reluctance than a stability-weighted fitness metric. Breaking: adds private fields to VardiffState (requires channels_sv2 major bump); shares_since_last_update now means "shares since last evaluation tick." Public constructors (new, new_with_min) and the Vardiff trait are unchanged. Tests: implementation-specific suite in test/classic.rs asserting the champion's contract (sustained-evidence tightening reluctance, the asymmetry, the boundary seam, partial-retarget damping), plus a teeth-bearing reset-cleanliness invariant for the sign-persistence state. 15 vardiff tests; fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
25b2fed to
e7817b0
Compare
… pipeline) The shipped variable-difficulty controller for channels-sv2, built as a three-stage decomposition (Estimator / Boundary / UpdateRule) composed via Composed<E,B,U>. The champion is EwmaEstimator(360) / AdaptiveSignPersist(spm6) / AcceleratingPartialRetarget — selected by a decline-safety minimax (the death-spiral gate), not by scalar tracking fitness. Includes the estimator family (EWMA, Bayesian, ckpool decay, Holt), the boundary family (PoissonCI, SignPersistenceCusum, adaptive), the update rules (partial/accelerating retarget), and the contrast controllers (classic SMA, pow2-PID). Controller-layer tests live in src/vardiff/test/ and exercise the pipeline independently of the simulation framework. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
558c037 to
cc9834f
Compare
…ards Deterministic in-process framework for characterizing any Vardiff implementation. Decomposes a controller into the three orthogonal stages and characterizes each in isolation, so a metric change is attributable to the stage that changed. Provides the scenario DSL, the Grid sweep harness, the Metric trait, ~75 analysis/figure bins, and pinned baselines (champion/classic) with CI regression guards — including the load-bearing decline-safety champion gate and the classic↔champion behavioral equivalence test. Cargo.lock + rust-toolchain pinned for reproducibility. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
92491a4 to
bfca487
Compare
…records, essays, figures The full proof and reader-facing layer behind the controller and framework: - information-floor.md — the paper: the closed information-floor theory, the death-spiral mechanism, the decline-safety selection criterion, and the metric it implies, each result labelled theory / simulation-only / hardware. - THEORY.md — the derivation-and-falsification notebook (reasoning-in-progress). - docs/records/ — the supporting record (architecture, findings, the per-controller investigations, test specs, and status registers), with README.md as the index. - docs/essays/ — the two-part plain-language article (why / how), self-contained (vendored fonts + fallback stacks), byline included. - docs/figures/ — the generated SVG figures the paper and essays embed. - docs/claims/ — the claim-warrant validator and its fixture. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bfca487 to
d9d8619
Compare
…ning#2221 class) Records the bounded result of a parallel source-audit of channels_sv2's public input surface for the stratum-mining#2221 bug class (functions accepting non-finite/out-of-range numeric input that produce garbage/dangerous control values). Confirmed independent findings: A (vardiff new_with_min NaN-disables-clamp) + B (try_vardiff divisor finiteness), both fixed on branch fix/vardiff-reject-non-finite-hashrate; C (server set_nominal_hashrate unvalidated store) as a defense-in-depth note. Six garbage-target caller sites are the blast radius of stratum-mining#2221 (closed on its merge), not new findings. Cleared the false suspects (hash_rate_from_target, client setters, share-accounting sums) — the cleared set outnumbers the confirmed. Sequencing decided: let stratum-mining#2221 merge first, prepare A+B ready-but-unopened, then let maintainers' convention choose the tracking construct. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The plain-language 'why'/'how' essays (tier 1) assume no statistics; the technical report information-floor.md (tier 3) is full rigor. This primer is the bridge between them — a ground-up, interactive walkthrough of the estimation theory (the precision floor, the collapse) with no prior stats assumed, ending by pointing the reader on to the report. Placed at the docs root beside information-floor.md (NOT in essays/, which would mis-signal its depth). Standalone artifact: it does not yet wire into the reading path (no essay->primer pointer added, to avoid editing the review-branch essays), and references the report in prose rather than as a hyperlink — both deferred until a whole-document read confirms it's ready to advertise. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the primer as the middle tier of the reading path: both essays now
point to information-floor-primer.html as the on-ramp ('the reasoning behind
the numbers, no prior statistics assumed') alongside the existing report
link, and the primer's closing now links onward to information-floor.md.
Full chain essays -> primer -> report, all relative paths verified to resolve.
Also folds in the earlier essay edit: the square-root formula (and its two
callbacks) removed from better-vardiff.html in favor of the plain
'diminishing returns' intuition — the formal law stays in the report tier.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>



Adds a deterministic in-process simulation framework that characterizes any
Vardiffimplementation, the decline-safe champion controller it selected,and the proof that argues both. Read the three commits in order — controller,
framework, docs — each is a coherent review unit on its own.
What this is
A vardiff controller's only observable is its own tracking error, and that one
fact bounds what any controller can do. The framework measures it, the proof
derives the bound, and the champion is the existence proof that the safe corner
of the achievable frontier is occupiable. The headline is structural, not a
horse race: across the operating band every reasonable controller is pinned to
the same information floor (~12% best-to-worst spread), so the residual axis is
safety on a decline, not agility — and that is the one axis controllers
genuinely differ on.
This is the research/proof branch. The clean production extraction that ships
the champion is #2188 (a single flat-struct replacement of
VardiffState,no scaffolding); this branch is the framework that selected it and the proof that
argues it.
The three commits, in reading order
feat(vardiff): decline-safe champion controller— the thing thatships. The three-stage
Composed<Estimator, Boundary, UpdateRule>pipeline;the champion is
EwmaEstimator(360) / AdaptiveSignPersist(spm6) / AcceleratingPartialRetarget, selected by minimax over share rate under adecline-safety constraint (the kills are the selection criterion — the
alternatives that died on a sustained decline are why this one ships).
feat(vardiff/sim): in-process simulation framework + CI regression guards— the apparatus that selected it. Deterministic per-tick Poisson sim, the
metric, the grid, the decline-safety gate, and three CI-guarded regression
tests that re-derive the selection criteria on every run. Carries the pinned
reproduction environment (
Cargo.lock+ toolchain) and the frozenchampion/classic baselines the guards assert against.
docs(vardiff): the proof corpus— the proof and how it's argued.information-floor.md(the closed theory, each result labelled theory /simulation-only / hardware),
THEORY.md(the derivation-and-falsificationnotebook), the two plain-language essays (why → how), the investigations
(incl. the source-verified finding that a deployed pow2-in-loop deadband
construction is structurally unable to ease on a decline), and the
claim-warrant validator. The proof corpus also includes a source-verified
survey of deployed open-source vardiff controllers and their
decline-safety failure modes (
docs/records/VARDIFF_SURVEY.md) — NOMP,ckpool, MiningCore — finding the failure axis is a structural property
(idle-path + trigger-reachability), not algorithm sophistication.
Reproducibility (stated at the scope it is true)
simulation-result claim in
information-floor.mdmaps, via the READMEreproduction index, to a deterministic binary that regenerates the number
(not a frozen value that could drift from the code). The selection criteria
are CI-guarded.
per-job target snapshot in
src/server/extended.rs(verified; the protocolvalidates each share against the target snapshotted at job creation, which the
vardiff path never mutates) — a protocol fact, reproduced by reading the
pinned source, not by a sim.
Cargo.lockhash + pinnedtoolchain), so a recorded seed reproduces deterministically.
registered and re-runnable; tagging the remaining exploration binaries
(which back no proof claim beyond what the index already covers) is a typed
standing debt the validator surfaces, not part of this PR's reproducibility
guarantee.
Known-open items (disclosed, not surprises)
The claim validator surfaces two standing
REVISITflags by design:pow2-in-loop deadband construction is currently a date + line numbers; should
strengthen to a commit SHA (line numbers drift).
public source and names no operator; naming is a deliberate, separate
decision, not made here.
Scope honesty (from the proof): the decline response is hardware-confirmed in
direction on a sustained 50% drop (eased the safe way, no rejection spike); the
settled over-difficulty figure and the slow-moderate-decline gate remain
simulation results; the cost-model weights are a calibrated judgment.