Purpose
Create a reproducible paper-faithful baseline package that all downstream tickets must compare against.
Mandatory Reading (blocking)
Before writing code, the agent must read and summarize in a first issue comment:
reports/NL_IMPLEMENTATION_ORACLE.md (sections: 3, 4.1-4.7, 5, 6.1-6.4)
reports/paper/NL-print.extracted.clean.txt (Eq. 21-24, 28-31 and Table 1 text)
docs/PAPER_COMPLIANCE.md
The first comment must include:
- 5-10 bullet summary of the current implementation state.
- Exact list of code paths used for baseline runs.
- Confirmed gaps that this ticket does not change.
Required Code Anchors
src/nested_learning/training.py
src/nested_learning/model.py
configs/pilot_paper_faithful.yaml
configs/pilot_selfmod_paper_faithful.yaml
scripts/eval/zeroshot.py
scripts/eval/niah.py
scripts/eval/continual.py
scripts/eval/passkey.py
scripts/eval/pg19_perplexity.py
Scope
- Define canonical baseline run commands for:
hope_selfmod paper-faithful
hope_attention comparison
- Produce frozen baseline artifact bundle with checksums + evals.
- Add
reports/ baseline table (short + explicit command provenance).
Runbook
uv run python train.py --config-name pilot_selfmod_paper_faithful train.steps=2000
uv run python train.py --config-name pilot_paper_faithful model.block_variant=hope_attention train.steps=2000
- Run full eval suite for both checkpoints.
Deliverables
- Baseline artifact bundle under
artifacts/.
- Eval JSON set under
eval/ for both baselines.
- Report file documenting exact commands + hashes + metrics.
Acceptance Criteria
- No NaN/Inf in training logs.
- All eval outputs exist for both baselines.
scripts/checkpoint/verify.py passes on all published checkpoints.
- First issue comment includes mandatory reading summary.
Purpose
Create a reproducible paper-faithful baseline package that all downstream tickets must compare against.
Mandatory Reading (blocking)
Before writing code, the agent must read and summarize in a first issue comment:
reports/NL_IMPLEMENTATION_ORACLE.md(sections: 3, 4.1-4.7, 5, 6.1-6.4)reports/paper/NL-print.extracted.clean.txt(Eq. 21-24, 28-31 and Table 1 text)docs/PAPER_COMPLIANCE.mdThe first comment must include:
Required Code Anchors
src/nested_learning/training.pysrc/nested_learning/model.pyconfigs/pilot_paper_faithful.yamlconfigs/pilot_selfmod_paper_faithful.yamlscripts/eval/zeroshot.pyscripts/eval/niah.pyscripts/eval/continual.pyscripts/eval/passkey.pyscripts/eval/pg19_perplexity.pyScope
hope_selfmodpaper-faithfulhope_attentioncomparisonreports/baseline table (short + explicit command provenance).Runbook
uv run python train.py --config-name pilot_selfmod_paper_faithful train.steps=2000uv run python train.py --config-name pilot_paper_faithful model.block_variant=hope_attention train.steps=2000Deliverables
artifacts/.eval/for both baselines.Acceptance Criteria
scripts/checkpoint/verify.pypasses on all published checkpoints.