Purpose
Bring distributed training (starting with FSDP) to paper-faithful parity with single-GPU online/per-layer behavior.
Mandatory Reading (blocking)
First comment must summarize:
reports/NL_IMPLEMENTATION_ORACLE.md sections 5.3 and 6.1.4
docs/PAPER_COMPLIANCE.md distributed caveats
train_fsdp.py
src/nested_learning/training.py distributed guards and online loop
Required Code Anchors
train_fsdp.py
train_dist.py
src/nested_learning/training.py
- parity tests in
tests/
Scope
- Add FSDP support for:
- online chunk updates
- per-layer teach signals
- Keep fail-fast behavior explicit for unsupported combinations.
- Add single-GPU vs FSDP parity harness.
Deliverables
- Updated FSDP path.
- Parity test script + report template.
Acceptance Criteria
- 1k-step FSDP faithful run completes.
- Parity drift against single-GPU baseline is within defined tolerance.
- First issue comment contains mandatory reading summary.
Purpose
Bring distributed training (starting with FSDP) to paper-faithful parity with single-GPU online/per-layer behavior.
Mandatory Reading (blocking)
First comment must summarize:
reports/NL_IMPLEMENTATION_ORACLE.mdsections 5.3 and 6.1.4docs/PAPER_COMPLIANCE.mddistributed caveatstrain_fsdp.pysrc/nested_learning/training.pydistributed guards and online loopRequired Code Anchors
train_fsdp.pytrain_dist.pysrc/nested_learning/training.pytests/Scope
Deliverables
Acceptance Criteria