Verified against origin/test @ 1c813f5 (2026-06-15). The two reproductions below run against the real code; the live evidence is publicly verifiable in the validator telemetry at wandb.ai/entrius-gittensor/gittensor-validators.
Summary
The documented miner workflow is "run gitt miner post once; validators store your PAT and score your merged PRs." In practice a miner's PAT coverage monotonically decays: each time a validator loses its on-disk PAT store, that validator scores the miner 0 every round thereafter, and nothing restores it — gitt miner post is a one-shot best-effort broadcast, and miners run no axon, so there is no re-broadcast and no way for a validator to re-request. The miner gets no error, no log, no signal; coverage is only re-established by manually re-running gitt miner post.
Net effect: a valid, registered, actively-contributing miner is silently and indefinitely de-scored by a growing subset of validators. This is live and systemic (receipts below), not theoretical — it has dropped my own miner (UID 29) out of eligibility twice on a multi-week cadence.
Mechanism (verbatim @ origin/test)
A validator loses its store (data/miner_pats.json, gittensor/validator/pat_storage.py:18) on any of: a restart without a persistent ./data (ensure_pats_file() then starts empty, neurons/validator.py:62); a crash/restart loop; or a single failed/corrupt read — _read_file() (pat_storage.py:68) catches (json.JSONDecodeError, OSError) → returns [], and the next save_pat() upserts one entry into that [] and atomically overwrites, erasing all others. (On the standard docker-compose.vali.yml — ./data:/app/data, atomic writes — the store does survive a clean restart, so this is not about per-operator volume config.)
Downstream, reward.py:100 snapshots load_all_pats() per round; any miner without a stored PAT is logged No stored PAT for miner {uid} — miner must run gitt miner post (oss_contributions/inspections.py:134) and scored 0.
Nothing recovers it: gitt miner post issues a single dendrite(..., timeout=30.0) broadcast with no retry/resend (cli/miner_commands/post.py:133-142) then exits; miners serve no axon (README: "No miner neuron required"), so a validator cannot re-request; and no miner-facing signal exists (the guide implies one post suffices).
Proof 1 — gitt miner post silently reports success and never retries a missed validator
Runs the real gitt miner post (Click CliRunner); only the network/identity boundaries are stubbed — the broadcast-result handling, success flag, exit code, and retry behavior are the real code. Validator UID 30 does not respond:
REPRODUCTION: `gitt miner post` — silent partial success + no retry
3 validators; validator UID 30 did NOT respond (status 408).
command exit code : 0
reported success : True
accepted / total : 2 / 3
no_response (silent miss): 1
broadcast attempts made : 1
PROVEN: exited 0 and reported success=True while UID 30 got nothing, with exactly
ONE broadcast (no retry). A transient failure to any validator is a silent, un-
retried, un-surfaced coverage gap — the miner is never told they are uncovered.
So a brief blip during the one post (or a validator that is mid-restart) permanently drops the miner from that validator until a future manual re-post — and post reports success anyway.
Proof 2 — a single failed read of miner_pats.json erases every stored PAT
Runs the real pat_storage.save_pat/_read_file; only PATS_FILE is redirected to a scratch path:
[setup] 3 miners stored normally -> file UIDs = [10, 20, 30]
[control] 4th miner, file readable -> file UIDs = [10, 20, 30, 40] (correct)
[trigger] file unreadable (corrupt / I/O error) before the next broadcast
[result] one miner (UID 99) broadcasts -> file UIDs = [99]
PROVEN: PATs for UIDs [10,20,30,40] silently and permanently erased by one unrelated
broadcast after a single failed read. Those miners are scored 0 until each re-posts.
(Both scripts are self-contained and deterministic; available on request.)
Proof 3 — it is happening live (publicly verifiable)
From wandb.ai/entrius-gittensor/gittensor-validators (anonymously readable; console telemetry; display names are vali-{uid}-{version}):
gittensor-181 (run bbd93w0w) restarted at the release — new run created 2026-06-15T23:51Z, ~4 min after the release-20260615-234738 tag — and kept its PATs (PAT check result — UID: 29: valid). Confirms the store persists on a well-configured validator across an upgrade-restart.
vali-116 (run u1ropz51, running since 2026-06-13) was missing that same miner's PAT (4/7 coverage on my gitt miner check) despite no restart at this release — i.e. lost on an earlier restart, never recovered. In one ~110-min window it logged a second miner cycling through loss → re-post (UID: 32: no PAT stored → PAT broadcast accepted — UID: 32), so this is systemic across miners.
rt21-64 (UID 64) is in a restart/crash loop (10+ runs over Jun 8–12) and is unreachable for re-broadcast.
Impact
A transient/operational event (a validator restart) produces a lasting 0-score for a valid miner across every validator that lost its PAT; those validators set weights from the zeros, so it reduces real emission. Coverage only erodes (no mechanism ever restores it), the miner is never told, and it recurs as validators restart (upgrades/crashes).
Relationship to prior reports
Proposed fix
Recovery must be miner-initiated (validators cannot pull — no miner axon). Make the reference tooling keep coverage asserted instead of one-shot:
- Add a re-assert mode to
gitt miner (e.g. gitt miner post --watch / gitt miner ensure) that, on an interval, runs the existing check and re-broadcasts only to validators reporting has_pat=False / pat_valid != True. Cheap (check transmits no PAT), non-spammy (posts only where missing), and makes the existing "miner must re-post" remedy reliable instead of manual-and-silent.
- Surgical first step if a daemon is out of scope: have
gitt miner post retry validators that return no response, and surface partial coverage as a non-zero exit / explicit warning so it is not silent (Proof 1).
- Document that PAT coverage can be lost on validator restarts and should be re-asserted.
Separate, optional hardening for loss-vector 3 (Proof 2): make save_pat() fail closed when the prior read could not be parsed instead of overwriting (as #829 / #1081 proposed).
Suggested tests
gitt miner post retries a non-responding validator and reports full coverage only when actually achieved; persistent partial coverage → non-zero exit / warning (Proof 1 becomes a regression test).
- The re-assert path posts to exactly the validators whose check returns missing/invalid and skips the valid ones.
save_pat() preserves the existing entries (raises) when _read_file() cannot parse the file (Proof 2 becomes a regression test).
Footer — verification provenance
- Live on
origin/test @ 1c813f5 (2026-06-15). Code: pat_storage.py:18/23/36/68/78; neurons/validator.py:62; oss_contributions/reward.py:100, inspections.py:134; cli/miner_commands/post.py:133-142; README miner section. Deploy: docker-compose.vali.yml (./data:/app/data).
- Telemetry (public, verifiable):
wandb.ai/entrius-gittensor/gittensor-validators — gittensor-181/bbd93w0w (restarted at release, kept PATs), vali-116/u1ropz51 (lost a PAT, no recovery; UID 32 systemic), rt21-64 (crash loop).
- Both reproductions run against current
origin/test. Fix is Python-side (CLI / validator); no contract changes.
Summary
The documented miner workflow is "run
gitt miner postonce; validators store your PAT and score your merged PRs." In practice a miner's PAT coverage monotonically decays: each time a validator loses its on-disk PAT store, that validator scores the miner 0 every round thereafter, and nothing restores it —gitt miner postis a one-shot best-effort broadcast, and miners run no axon, so there is no re-broadcast and no way for a validator to re-request. The miner gets no error, no log, no signal; coverage is only re-established by manually re-runninggitt miner post.Net effect: a valid, registered, actively-contributing miner is silently and indefinitely de-scored by a growing subset of validators. This is live and systemic (receipts below), not theoretical — it has dropped my own miner (UID 29) out of eligibility twice on a multi-week cadence.
Mechanism (verbatim @
origin/test)A validator loses its store (
data/miner_pats.json,gittensor/validator/pat_storage.py:18) on any of: a restart without a persistent./data(ensure_pats_file()then starts empty,neurons/validator.py:62); a crash/restart loop; or a single failed/corrupt read —_read_file()(pat_storage.py:68) catches(json.JSONDecodeError, OSError)→ returns[], and the nextsave_pat()upserts one entry into that[]and atomically overwrites, erasing all others. (On the standarddocker-compose.vali.yml—./data:/app/data, atomic writes — the store does survive a clean restart, so this is not about per-operator volume config.)Downstream,
reward.py:100snapshotsload_all_pats()per round; any miner without a stored PAT is loggedNo stored PAT for miner {uid} — miner must run gitt miner post(oss_contributions/inspections.py:134) and scored 0.Nothing recovers it:
gitt miner postissues a singledendrite(..., timeout=30.0)broadcast with no retry/resend (cli/miner_commands/post.py:133-142) then exits; miners serve no axon (README: "No miner neuron required"), so a validator cannot re-request; and no miner-facing signal exists (the guide implies one post suffices).Proof 1 —
gitt miner postsilently reports success and never retries a missed validatorRuns the real
gitt miner post(ClickCliRunner); only the network/identity boundaries are stubbed — the broadcast-result handling, success flag, exit code, and retry behavior are the real code. Validator UID 30 does not respond:So a brief blip during the one post (or a validator that is mid-restart) permanently drops the miner from that validator until a future manual re-post — and
postreports success anyway.Proof 2 — a single failed read of
miner_pats.jsonerases every stored PATRuns the real
pat_storage.save_pat/_read_file; onlyPATS_FILEis redirected to a scratch path:(Both scripts are self-contained and deterministic; available on request.)
Proof 3 — it is happening live (publicly verifiable)
From
wandb.ai/entrius-gittensor/gittensor-validators(anonymously readable; console telemetry; display names arevali-{uid}-{version}):gittensor-181(runbbd93w0w) restarted at the release — new run created2026-06-15T23:51Z, ~4 min after therelease-20260615-234738tag — and kept its PATs (PAT check result — UID: 29: valid). Confirms the store persists on a well-configured validator across an upgrade-restart.vali-116(runu1ropz51, running since2026-06-13) was missing that same miner's PAT (4/7 coverage on mygitt miner check) despite no restart at this release — i.e. lost on an earlier restart, never recovered. In one ~110-min window it logged a second miner cycling through loss → re-post (UID: 32: no PAT stored→PAT broadcast accepted — UID: 32), so this is systemic across miners.rt21-64(UID 64) is in a restart/crash loop (10+ runs over Jun 8–12) and is unreachable for re-broadcast.Impact
A transient/operational event (a validator restart) produces a lasting 0-score for a valid miner across every validator that lost its PAT; those validators set weights from the zeros, so it reduces real emission. Coverage only erodes (no mechanism ever restores it), the miner is never told, and it recurs as validators restart (upgrades/crashes).
Relationship to prior reports
miner_pats.json→ empty → overwrite) and their fixes fix(validator): refuse to overwrite corrupt PATS_FILE in save_pat / remove_pat #829 / fix: fail closed on corrupt PAT storage writes #1081 addressed only loss-vector 3 (the read-wipe). This issue is scoped to the missing recovery path, which is independent of how the store was lost and has not been filed./userfailure is treated as inconclusive (cache fallback) rather than a hard failure. A validator restart is the same shape of event; today it yields a permanent silent de-score. The fix applies the same principle to PAT coverage.Proposed fix
Recovery must be miner-initiated (validators cannot pull — no miner axon). Make the reference tooling keep coverage asserted instead of one-shot:
gitt miner(e.g.gitt miner post --watch/gitt miner ensure) that, on an interval, runs the existing check and re-broadcasts only to validators reportinghas_pat=False/pat_valid != True. Cheap (check transmits no PAT), non-spammy (posts only where missing), and makes the existing "miner must re-post" remedy reliable instead of manual-and-silent.gitt miner postretry validators that return no response, and surface partial coverage as a non-zero exit / explicit warning so it is not silent (Proof 1).Separate, optional hardening for loss-vector 3 (Proof 2): make
save_pat()fail closed when the prior read could not be parsed instead of overwriting (as #829 / #1081 proposed).Suggested tests
gitt miner postretries a non-responding validator and reports full coverage only when actually achieved; persistent partial coverage → non-zero exit / warning (Proof 1 becomes a regression test).save_pat()preserves the existing entries (raises) when_read_file()cannot parse the file (Proof 2 becomes a regression test).Footer — verification provenance
origin/test@1c813f5(2026-06-15). Code:pat_storage.py:18/23/36/68/78;neurons/validator.py:62;oss_contributions/reward.py:100,inspections.py:134;cli/miner_commands/post.py:133-142; README miner section. Deploy:docker-compose.vali.yml(./data:/app/data).wandb.ai/entrius-gittensor/gittensor-validators—gittensor-181/bbd93w0w(restarted at release, kept PATs),vali-116/u1ropz51(lost a PAT, no recovery; UID 32 systemic),rt21-64(crash loop).origin/test. Fix is Python-side (CLI / validator); no contract changes.