Discussion: Output Format section after the classifier hook — still all load-bearing?

# Discussion: Output Format section after the classifier hook — still all load-bearing?

**Repo:** `danielmiessler/PAI` (or current public fork)
**File:** `PAI/PAI_SYSTEM_PROMPT.md` → section `## Output Format (CRITICAL — MANDATORY — ZERO EXCEPTIONS)`
**Type:** Question / discussion (not a bug report, not a hard PR proposal)

## Context

Running a private fork of PAI 5.0 with the Sonnet classifier hook (`hooks/PromptProcessing.hook.ts`) actively writing `MODE: MINIMAL|NATIVE|ALGORITHM` and `TIER: E1..E5` to additionalContext on every prompt. The classifier-driven mode-selection seems to be doing the heavy lifting structurally — the model reads `MODE`/`TIER` and picks the template.

That made me look hard at the existing `## Output Format` section in `PAI_SYSTEM_PROMPT.md`. It's about 15 lines and reads (paraphrased):

1. Authority statement: "highest enforcement priority", "CRITICAL FAILURE regardless of underlying correctness"
2. Every-response statement: "every response uses exactly one output format from CLAUDE.md"
3. Eight "Hard requirements" bullets (header first, fields populated, closing-line last, no freeform prose around/between fields, no conversational sentences as content, exploratory/recommendations/opinions/plan-presentations/acknowledgments all use format, answering a question is not a license to drop format, apologies/errors also use format)
4. "Self-check before emitting any response" line
5. "Recurring failure pattern" reflection about freeform-prose-drift on interesting topics

## Observation

Many of the 8 Hard-requirement bullets restate the same underlying principle: *use the format, all the time, no exceptions*. The "Recurring failure pattern" paragraph reads as post-hoc reflection rather than enforceable rule — it tells the model what mistake it tends to make, without specifying an action different from "use the format".

In my fork, with the classifier writing the explicit `MODE` line, I notice:
- The mode-selection itself is no longer ambiguous (the classifier decided)
- Format-compliance failures I've seen this session were drift inside a chosen mode (e.g., freeform prose before the header), which is mostly addressed by ONE bullet: "Mode header is the first visible token, closing line is the last"
- The "every-response-including-follow-ups" qualifiers feel like they were added against specific past failures

## Two readable failure-history hypotheses

A. **Pre-classifier history.** Before the classifier hook existed, the section had to do double duty: (1) tell the model to choose a mode, (2) tell it to obey the chosen mode's template. The 8-bullet enumeration was the model's only enforcement. Now with the classifier, half of that work is structurally upstream — but the section was never retrenched.

B. **Failure-pattern accretion.** Each bullet corresponds to a specific mistake observed across versions: "answering a question is not a license to drop format" → seen models drop format when answering direct questions. "Apologies use the format too" → seen apologies revert to prose. Removing any one bullet risks re-introducing that mistake on the next model version.

If (B) is the actual reason, the section is doing more than it looks — it's a checklist of failure-modes-seen-in-the-wild and each bullet is load-bearing against a specific drift mode. Trimming it would be a regression.

## What I would love to know

1. Is hypothesis (A), (B), neither, or both?
2. For each Hard-requirements bullet, is there a remembered failure case that drove it? If yes, the bullet earns its weight independent of the others. If no, it's restating defaults.
3. With the classifier hook now writing `MODE`/`TIER`, does the Output Format section want to evolve — or does experience say "leave it, the classifier is one more belt, not a replacement for any suspender"?

## Sketch of a leaner alternative (for discussion, not a PR)

Not proposing this — putting it here only so the question has something concrete to argue against:

```
**This rule has the highest enforcement priority. Every response uses exactly one output format: ALGORITHM, NATIVE, or MINIMAL (templates in CLAUDE.md). Violating is CRITICAL FAILURE regardless of underlying correctness.**

- Mode header is the first visible token, closing line is the last.
- All content lives inside template fields — no freeform prose around or between them.
- Follow-ups, questions, apologies, errors — same rule, no exceptions.
- Self-check pre-emit: header first, closing line last, fields populated.
```

That removes the explicit enumerations and the recurring-failure-pattern paragraph. My instinct says "smaller, sharper". Your data may say "the long version is what stops the recurring drift".

## What I'm doing in the meantime

Keeping the section untouched in my fork. If the data on this side starts to suggest the long version is doing work, I'll know.

If you have a moment to share the failure history, the call gets a lot easier for everyone running a PAI fork.

---

*Draft authored 2026-05-18 during a BitterPillEngineering audit on a private fork. Not yet posted upstream — staged for review before submitting.*
*Draft reviewed 2026-05-18 by me (Arne Frick). Please bear with me, as this is my first Github Issue / Question report at all :-) *


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discussion: Output Format section after the classifier hook — still all load-bearing? #1287

Discussion: Output Format section after the classifier hook — still all load-bearing?

Context

Observation

Two readable failure-history hypotheses

What I would love to know

Sketch of a leaner alternative (for discussion, not a PR)

What I'm doing in the meantime

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Discussion: Output Format section after the classifier hook — still all load-bearing? #1287

Description

Discussion: Output Format section after the classifier hook — still all load-bearing?

Context

Observation

Two readable failure-history hypotheses

What I would love to know

Sketch of a leaner alternative (for discussion, not a PR)

What I'm doing in the meantime

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions