[Bug] `gitt miner post` coverage silently and permanently erodes after validator restarts — valid miners are de-scored with no signal and no recovery

> Verified against `origin/test` @ `1c813f5` (2026-06-15). The two reproductions below run against the real code; the live evidence is publicly verifiable in the validator telemetry at `wandb.ai/entrius-gittensor/gittensor-validators`.

## Summary

The documented miner workflow is "run `gitt miner post` once; validators store your PAT and score your merged PRs." In practice a miner's PAT coverage **monotonically decays**: each time a validator loses its on-disk PAT store, that validator scores the miner **0** every round thereafter, and **nothing restores it** — `gitt miner post` is a one-shot best-effort broadcast, and miners run no axon, so there is no re-broadcast and no way for a validator to re-request. The miner gets **no error, no log, no signal**; coverage is only re-established by manually re-running `gitt miner post`.

Net effect: a valid, registered, actively-contributing miner is silently and indefinitely de-scored by a growing subset of validators. This is live and systemic (receipts below), not theoretical — it has dropped my own miner (UID 29) out of eligibility twice on a multi-week cadence.

## Mechanism (verbatim @ `origin/test`)

**A validator loses its store** (`data/miner_pats.json`, `gittensor/validator/pat_storage.py:18`) on any of: a restart without a persistent `./data` (`ensure_pats_file()` then starts empty, `neurons/validator.py:62`); a crash/restart loop; or a single failed/corrupt read — `_read_file()` (`pat_storage.py:68`) catches `(json.JSONDecodeError, OSError)` → returns `[]`, and the next `save_pat()` upserts one entry into that `[]` and atomically overwrites, erasing all others. (On the standard `docker-compose.vali.yml` — `./data:/app/data`, atomic writes — the store *does* survive a clean restart, so this is **not** about per-operator volume config.)

**Downstream**, `reward.py:100` snapshots `load_all_pats()` per round; any miner without a stored PAT is logged `No stored PAT for miner {uid} — miner must run gitt miner post` (`oss_contributions/inspections.py:134`) and scored 0.

**Nothing recovers it:** `gitt miner post` issues a single `dendrite(..., timeout=30.0)` broadcast with no retry/resend (`cli/miner_commands/post.py:133-142`) then exits; miners serve no axon (README: "No miner neuron required"), so a validator cannot re-request; and no miner-facing signal exists (the guide implies one post suffices).

## Proof 1 — `gitt miner post` silently reports success and never retries a missed validator

Runs the real `gitt miner post` (Click `CliRunner`); only the network/identity boundaries are stubbed — the broadcast-result handling, success flag, exit code, and retry behavior are the real code. Validator UID 30 does not respond:

```text
REPRODUCTION: `gitt miner post` — silent partial success + no retry
  3 validators; validator UID 30 did NOT respond (status 408).
  command exit code        : 0
  reported success         : True
  accepted / total         : 2 / 3
  no_response (silent miss): 1
  broadcast attempts made  : 1
PROVEN: exited 0 and reported success=True while UID 30 got nothing, with exactly
ONE broadcast (no retry). A transient failure to any validator is a silent, un-
retried, un-surfaced coverage gap — the miner is never told they are uncovered.
```

So a brief blip during the one post (or a validator that is mid-restart) permanently drops the miner from that validator until a future manual re-post — and `post` reports success anyway.

## Proof 2 — a single failed read of `miner_pats.json` erases every stored PAT

Runs the real `pat_storage.save_pat`/`_read_file`; only `PATS_FILE` is redirected to a scratch path:

```text
[setup]   3 miners stored normally   -> file UIDs = [10, 20, 30]
[control] 4th miner, file readable   -> file UIDs = [10, 20, 30, 40]   (correct)
[trigger] file unreadable (corrupt / I/O error) before the next broadcast
[result]  one miner (UID 99) broadcasts -> file UIDs = [99]
PROVEN: PATs for UIDs [10,20,30,40] silently and permanently erased by one unrelated
broadcast after a single failed read. Those miners are scored 0 until each re-posts.
```

(Both scripts are self-contained and deterministic; available on request.)

## Proof 3 — it is happening live (publicly verifiable)

From `wandb.ai/entrius-gittensor/gittensor-validators` (anonymously readable; console telemetry; display names are `vali-{uid}-{version}`):

- **`gittensor-181`** (run `bbd93w0w`) restarted **at the release** — new run created `2026-06-15T23:51Z`, ~4 min after the `release-20260615-234738` tag — and **kept** its PATs (`PAT check result — UID: 29: valid`). Confirms the store persists on a well-configured validator across an upgrade-restart.
- **`vali-116`** (run `u1ropz51`, running since `2026-06-13`) was **missing** that same miner's PAT (4/7 coverage on my `gitt miner check`) despite no restart at this release — i.e. lost on an earlier restart, never recovered. In one ~110-min window it logged a *second* miner cycling through loss → re-post (`UID: 32: no PAT stored` → `PAT broadcast accepted — UID: 32`), so this is systemic across miners.
- **`rt21-64`** (UID 64) is in a restart/crash loop (10+ runs over Jun 8–12) and is unreachable for re-broadcast.

## Impact

A transient/operational event (a validator restart) produces a **lasting** 0-score for a valid miner across every validator that lost its PAT; those validators set weights from the zeros, so it reduces real emission. Coverage only erodes (no mechanism ever restores it), the miner is never told, and it recurs as validators restart (upgrades/crashes).

## Relationship to prior reports

- **#781 / #1079** (corrupt `miner_pats.json` → empty → overwrite) and their fixes **#829 / #1081** addressed only loss-vector 3 (the read-wipe). This issue is scoped to the **missing recovery path**, which is independent of *how* the store was lost and has not been filed.
- **#931 / #932 / #1107** (merged) established that a *transient* condition must not cause a *permanent wrong outcome* — there, a transient GitHub `/user` failure is treated as **inconclusive** (cache fallback) rather than a hard failure. A validator restart is the same shape of event; today it yields a permanent silent de-score. The fix applies the same principle to PAT coverage.

## Proposed fix

Recovery must be miner-initiated (validators cannot pull — no miner axon). Make the reference tooling keep coverage *asserted* instead of one-shot:

- Add a re-assert mode to `gitt miner` (e.g. `gitt miner post --watch` / `gitt miner ensure`) that, on an interval, runs the existing **check** and re-broadcasts **only** to validators reporting `has_pat=False` / `pat_valid != True`. Cheap (check transmits no PAT), non-spammy (posts only where missing), and makes the existing "miner must re-post" remedy reliable instead of manual-and-silent.
- Surgical first step if a daemon is out of scope: have `gitt miner post` **retry** validators that return no response, and surface partial coverage as a non-zero exit / explicit warning so it is not silent (Proof 1).
- Document that PAT coverage can be lost on validator restarts and should be re-asserted.

Separate, optional hardening for loss-vector 3 (Proof 2): make `save_pat()` fail closed when the prior read could not be parsed instead of overwriting (as #829 / #1081 proposed).

## Suggested tests

- `gitt miner post` retries a non-responding validator and reports full coverage only when actually achieved; persistent partial coverage → non-zero exit / warning (Proof 1 becomes a regression test).
- The re-assert path posts to exactly the validators whose check returns missing/invalid and skips the valid ones.
- `save_pat()` preserves the existing entries (raises) when `_read_file()` cannot parse the file (Proof 2 becomes a regression test).

## Footer — verification provenance

- Live on `origin/test` @ `1c813f5` (2026-06-15). Code: `pat_storage.py:18/23/36/68/78`; `neurons/validator.py:62`; `oss_contributions/reward.py:100`, `inspections.py:134`; `cli/miner_commands/post.py:133-142`; README miner section. Deploy: `docker-compose.vali.yml` (`./data:/app/data`).
- Telemetry (public, verifiable): `wandb.ai/entrius-gittensor/gittensor-validators` — `gittensor-181`/`bbd93w0w` (restarted at release, kept PATs), `vali-116`/`u1ropz51` (lost a PAT, no recovery; UID 32 systemic), `rt21-64` (crash loop).
- Both reproductions run against current `origin/test`. Fix is Python-side (CLI / validator); no contract changes.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] `gitt miner post` coverage silently and permanently erodes after validator restarts — valid miners are de-scored with no signal and no recovery #1481

Summary

Mechanism (verbatim @ `origin/test`)

Proof 1 — `gitt miner post` silently reports success and never retries a missed validator

Proof 2 — a single failed read of `miner_pats.json` erases every stored PAT

Proof 3 — it is happening live (publicly verifiable)

Impact

Relationship to prior reports

Proposed fix

Suggested tests

Footer — verification provenance

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] gitt miner post coverage silently and permanently erodes after validator restarts — valid miners are de-scored with no signal and no recovery #1481

Description

Summary

Mechanism (verbatim @ origin/test)

Proof 1 — gitt miner post silently reports success and never retries a missed validator

Proof 2 — a single failed read of miner_pats.json erases every stored PAT

Proof 3 — it is happening live (publicly verifiable)

Impact

Relationship to prior reports

Proposed fix

Suggested tests

Footer — verification provenance

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

[Bug] `gitt miner post` coverage silently and permanently erodes after validator restarts — valid miners are de-scored with no signal and no recovery #1481

Mechanism (verbatim @ `origin/test`)

Proof 1 — `gitt miner post` silently reports success and never retries a missed validator

Proof 2 — a single failed read of `miner_pats.json` erases every stored PAT