Trace L4 M7: schedule the outer harness-improvement loop (event-driven driver, no worker integration)

## Summary

Implement Milestone 7 of the trace-driven harness improvement loop: add a
scheduled / event-driven **outer driver** that runs the existing report →
proposal → follow-through chain (M1 #938 / M2 #939 / M3 #940) on a cadence
**without a human prompting each run**, and **without integrating into the
worker binary**.

**Goal alignment (read first):** per #937's North Star the goal is *durable
harness learning that makes future runs fail less* — **not more automation or
more reports**. This milestone is a *means*, not the goal: it earns its place
only if it raises the rate at which confirmed failures become durable harness
fixes. It is sequenced **after** M5 (#952, durable evidence) and M6 (#953, the
goal-bearing milestone that closes the ratchet), and **may be deferred or
dropped** — a driver that only schedules more reports/issues without raising the
failure→durable-fix conversion rate is the North Star's named anti-pattern.

The driver is **operator-side product tooling**: it runs in the environment
that runs the aiops worker, consumes **that deployment's** evidence, and
proposes changes to **the operator's target repo**. It is not part of, and does
not act on, the aiops-platform source repo's own development harness.

Part of #937. This is the concrete realization of the "Outer-agent automation
later" step already foreseen in #931's recommended sequence (step 3). Design
source of truth: `docs/design/trace-driven-harness-improvement.md`.
Depends on #940 (M3) and #941 (M4) — both merged — and on **M5** (#952, durable
evidence) and **M6** (#953, ratchet closure); sequenced last. Follow-up to #931
/ PR #936.

## Problem — operator toil, not a missing goal

M1–M4 delivered the loop's *artifact-production* half: given evidence,
`scripts/trace-harness-report.py` produces grouped clusters, issue/draft-PR
proposals (schema `trace-harness-report/v3`), and advisory evaluator candidates.
Every connective step is manual today: capture worker stderr, run the report,
run `jq`, open the issue, dispatch the follow-through agent. That manual toil is
real friction, but reducing it is a convenience, not the loop's purpose — the
purpose (durable learning that closes the ratchet) is M5 (#952) + M6 (#953). The
loop-engineering sources name automation as a separate loop level, not the goal:

- LangChain L3 event-driven loop: "Schedules, webhooks, cron jobs… trigger
  agent runs **without a human manually prompting each one**."
  (`docs/research/2026-06-16-langchain-art-of-loop-engineering.md`)
- Zach Lloyd's outer loop: "**A scheduled agent** reviews those records… and
  opens a diff." (`docs/research/2026-06-16-zach-lloyd-self-improving-skills.md`)

Pursue this milestone only after M5/M6 prove the ratchet turns, and only if
scheduling measurably increases the failure→durable-fix rate.

## Required behavior

Add a scheduled / triggered driver whose venue is the operator's choice — a
scheduled CI workflow (the same scheduled-workflow mechanism this repo already
ships for `ruleset-drift.yml` / `capture-unresolved-reviews.yml`), an external
cron on the worker host, or a coding-agent workflow — running wherever it can
reach the target deployment's evidence. The driver:

1. Collects already-available evidence for a bounded recent window (retained
   worker logs and/or CI log artifacts; the **Trace L4 M5** (#952) durable
   manifest when available).
2. Runs `scripts/trace-harness-report.py` to produce the grouped report.
3. Selects only **actionable, recurring** clusters using the design's
   recurrence rule (normally ≥2 independent occurrences; a single severe
   safety / data-loss finding may qualify).
4. Promotes each selected cluster into a **tracking issue** carrying the M2
   proposal body, **through ordinary forge / agent tooling** (`gh` or a
   coding-agent workflow), never the worker. The driver does **not** open a
   draft PR itself: a draft PR contains a diff, which is the M3 follow-through
   output produced only after operator approval (see Approval model).
5. Is **idempotent**: before creating, it searches for an existing open
   tracking issue for the same cluster id (a provenance marker or a
   `trace-harness` label namespaced with the cluster id, e.g.
   `trace-harness:runner-timeout`) and updates or skips instead of duplicating.
6. Records each run's outcome (clusters seen / promoted / skipped-as-duplicate /
   below-threshold) so the loop is auditable, and tracks how many promoted
   clusters became durable fixes — the metric that justifies this milestone.

## Approval model (decision to lock in this issue)

Reconcile the articles' "scheduled agent opens a diff" with the design's
"operator approves intent, not the body" (M2/M3). Recommended default:

- The driver may **auto-create the proposal artifact** — a **tracking issue**
  carrying the M2 proposal body — for a recurring cluster. This is transport,
  not a harness change.
- The **M3 follow-through implementation only starts after explicit operator
  approval** (label flip or comment naming the cluster). The draft PR and its
  diff are that approved M3 output, so a human approves *intent* before any
  harness code is written.
- **Merging / shipping the harness change always stays behind normal review**
  (human or reviewer-agent). No unattended merge.

Alternatives, both explicit opt-in and never the default: (a) a notify-only mode
that posts the generated proposal to a single rolling issue and leaves creation
manual; (b) an advanced mode where the driver also dispatches the M3
follow-through agent to open a draft PR directly — still gated by the same
reviewed-merge rule. Default is the dedup-guarded auto-create-issue +
approve-before-implement behavior above; expose the mode as config.

## Boundary requirements (SPEC alignment)

- The driver is **not** the worker/orchestrator and adds **no** worker-side
  code path, phase, gate, tracker write, or PR write. SPEC §1 keeps the worker
  a scheduler/runner/tracker reader; #76 closed orchestrator-owned PR/tracker
  writes.
- All issue/PR/tracker writes happen through the coding agent's / operator's
  ordinary forge tools — the same surface M3 already uses.
- No unattended **merge**; no automatic prompt/rubric/skill/`LEARNINGS.md`
  rewrite without a reviewed PR; no evaluator promoted to a required gate.
- Redaction / byte bounds from the design and M1 are preserved end-to-end: the
  driver must not widen what the report embeds and must mask clone-URL userinfo
  (`workflow.MaskCloneURL`) and tokens in anything it posts.

## Non-goals

- Not a worker subcommand and not wired into `cmd/worker`.
- Not the goal: automation/scheduling is a means; producing more reports/issues
  that do not raise the failure→durable-fix rate is the anti-pattern, not a win.
- No durable scheduler trace DB (rejected in the design); evidence input is
  bounded and comes from M5 or ordinary log retention.
- No auto-merge, no required CI/runtime gate, no worker post-turn verifier.
- Not the evidence-capture mechanism itself (that is **Trace L4 M5**, #952) nor
  the evaluator-result feedback closure (that is **Trace L4 M6**, #953).

## Open considerations to resolve during design

- **Is it earned yet?** Confirm M5/M6 have shown the ratchet turns and that
  manual promotion is the bottleneck before building any scheduler.
- **Evidence availability**: the aiops worker runs in the operator's
  environment, which a CI runner cannot see by default. Specify how
  logs/artifacts reach the driver (operator-provided artifact path, a
  self-hosted runner on the worker host, or the M5 manifest). A v1 may accept an
  explicit input path; the robust unattended path depends on M5.
- **Cadence & window**: pick a default cron and a bounded look-back; do not
  re-scan the whole history each run.
- **Dedup key**: define the cluster-id provenance marker (hidden body marker or
  a cluster-id-namespaced label) used to find existing issues.
- **Noise control**: recurrence threshold plus a per-run cap on how many
  proposals it may open.

## Acceptance criteria

- [ ] A scheduled / triggered workflow runs the report and promotes recurring
  clusters without a human prompting the run.
- [ ] Promotion goes through forge/agent tooling only; no worker/orchestrator
  code path is added (diff shows no `internal/worker` or `internal/orchestrator`
  change).
- [ ] Re-running on unchanged evidence does not create duplicate tracking issues
  (idempotency covered by a test/fixture).
- [ ] Only clusters meeting the recurrence rule are promoted; below-threshold
  clusters are skipped.
- [ ] Operator approval gates the M3 implementation step; the approved M3 PR
  stops at draft / ready-for-review and merge stays manual/reviewed.
- [ ] Redaction and byte bounds are preserved in everything the driver posts.
- [ ] Evidence is recorded that scheduling raised the failure→durable-fix
  conversion rate (or the milestone is deferred); a runbook explains the
  schedule, evidence input, approval model, dedup, and how to disable it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trace L4 M7: schedule the outer harness-improvement loop (event-driven driver, no worker integration) #951

Summary

Problem — operator toil, not a missing goal

Required behavior

Approval model (decision to lock in this issue)

Boundary requirements (SPEC alignment)

Non-goals

Open considerations to resolve during design

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Trace L4 M7: schedule the outer harness-improvement loop (event-driven driver, no worker integration) #951

Description

Summary

Problem — operator toil, not a missing goal

Required behavior

Approval model (decision to lock in this issue)

Boundary requirements (SPEC alignment)

Non-goals

Open considerations to resolve during design

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions