autoresearch-continual-learning

autoresearch-continual-learning is a public fork and design pivot inspired by karpathy/autoresearch.

This repository is building a constrained autonomous research harness for:

continual learning
knowledge editing
interference-sensitive updates
multi-metric evaluation with regression risk
single-GPU workflows where heavyweight runs must be serialized

The repository is focused on the research loop itself:

bounded editable surfaces
bounded runs
structured artifacts
anti-shim constraints
promotion and discard rules

What This Repo Is Today

This is no longer just a design-doc repo, but it is also not yet a fully self-running autonomous experimentation system.

Current state:

a real protocol/ constitution layer exists
preflight, schema, parsing, submission, and decision scripts exist
the 3090 pilot benchmark is complete
the active implementation family is now Qwen 3.5:
- surrogate lane: Qwen/Qwen3.5-0.8B-Base
- champion lane: Qwen/Qwen3.5-4B-Base
- editable surface: qwen35_top8_hybrid_attention_mlp
baseline and method scaffolding exist
Qwen-family surrogate, champion, and protected-confirmation run classes are frozen for the current launch envelope
a 12-case CounterFact smoke lane is available only for plumbing checks
the active v4 lane now uses a 96-case CounterFact standard substrate for in-loop development and protected confirmation
baseline-20260510T133850Z is the current accepted Qwen-family baseline champion on the corrected active v4 standard substrate

Not done yet:

copying a launch-clean sealed workspace onto the 3090
hardening monitor/restart behavior for longer unattended operation
accumulating valid mainline method comparisons against the accepted Qwen baseline champion

So the honest description is:

this repo is a constitution-first continual-learning harness with an accepted Qwen-family baseline champion and a Qwen-wide HyperLoRA first method branch ready for sealed in-loop iteration.

Why This Exists

autoresearch is powerful because it makes autonomous research tractable by constraining the loop:

narrow editable surface
fixed evaluation surface
fixed compute budget
machine-readable result
keep-or-discard discipline

For continual learning and knowledge editing, those same ideas still matter, but the problem is harder:

success is not one scalar metric
regressions matter as much as improvements
evaluation can be gamed more easily
hidden capacity and shims are real failure modes
heavyweight runs are expensive and must be serialized

This repo adapts the autoresearch mentality to that harder setting.

Current 3090 Stack Decision

The first 3090 pilot benchmark compared:

Qwen/Qwen3.5-4B-Base
meta-llama/Llama-3.1-8B
google/gemma-3-4b-pt

Original pilot outcome:

all three fit on the observed 3090
all three reached 1.0 on the repaired bounded visible-dev smoke pack
Gemma initially won the implementation pilot because it matched the legitimacy checks and had the best throughput on the fixed probe

Current launch outcome:

the harness pivoted to Qwen-family surrogate/champion lanes
Qwen/Qwen3.5-0.8B-Base is the fast surrogate lane for method development
Qwen/Qwen3.5-4B-Base is the champion lane for accepted comparisons
baseline_seq_lora_ft_v11_qwen35_wide_fact_replay cleared protected confirmation as the accepted baseline champion

Active launch pair:

Qwen/Qwen3.5-4B-Base
qwen35_top8_hybrid_attention_mlp

First Case Study

The first target repo is:

conflict_aware_editing

That case study is useful because it already has:

explicit quality gates
artifact contracts
competing method families
evaluation slices with regression risk
a strong anti-cheating stance

The design principle here is:

fit conflict_aware_editing to the best constrained autonomous research loop, not the other way around.

Project Stance

This repo is intentionally closer to autoresearch than to a general-purpose software-engineering orchestrator.

Its center of gravity is:

hypothesis
bounded change
bounded run
structured result
promote, discard, or escalate

not:

issue
agent swarm
PR theater

Current Repository Shape

The main layers now present are:

protocol/
- loop contract
- immutable/editable surfaces
- anti-shim policy
- promotion rules
- run-class policy
- pilot model/surface selection
scripts/
- workspace preflight
- spec freezing
- 3090 submission
- artifact parsing
- decision logic
- pilot calibration and visible-dev profiling
eval/
- schema validation
- metrics and aggregation
- protected confirmation normalization
- sentinel checks
method/
- editable-surface definitions
- trainer shell
- method scaffolding
- baseline wrappers
experiments/
- champion state
- append-only ledgers

Attribution And Fork Boundary

This repository is a public fork of karpathy/autoresearch, and that inspiration should remain explicit.

What is preserved:

tight loop constraints
fixed-budget thinking
structured result parsing
human ownership of research organization

What is adapted:

the target domain
the evaluation complexity
the decision logic
the anti-shim and anti-cheat requirements

Because this is a public fork, some upstream prototype files still exist in the tree. Treat those as inherited reference material, not as the implementation of the current continual-learning harness.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
data		data
docs		docs
eval		eval
experiments		experiments
locks		locks
method		method
ops		ops
protocol		protocol
scripts		scripts
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
program.md		program.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autoresearch-continual-learning

What This Repo Is Today

Why This Exists

Current 3090 Stack Decision

First Case Study

Project Stance

Current Repository Shape

Attribution And Fork Boundary

Documents

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

autoresearch-continual-learning

What This Repo Is Today

Why This Exists

Current 3090 Stack Decision

First Case Study

Project Stance

Current Repository Shape

Attribution And Fork Boundary

Documents

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages