ATLAS - Self-Improving AI Trading Agents

Built by General Intelligence Capital

Karpathy's autoresearch + Soros's reflexivity + MiroFish swarm simulation applied to financial markets.

The agent prompts are the weights. Sharpe ratio is the loss function. No GPU needed.

⭐ 944 stars · 202 forks

What Is This?

ATLAS is a framework for autonomous AI trading agents that improve their own prompts through market feedback, train on different market regimes, create new agents when they detect knowledge gaps, and simulate reflexive futures to prepare for what's coming.

25+ agents debate markets daily across 4 layers. Every recommendation is scored against real outcomes. The worst-performing agent gets its prompt rewritten. If performance improves, the git commit survives. If not, git revert.

Now running live with real capital.

Architecture

Layer 1 - Macro (10 agents)

Central bank, geopolitical, China, dollar, yield curve, commodities, volatility, emerging markets, news sentiment, institutional flow.

These agents set the regime. Risk on or risk off? What's the macro backdrop?

Layer 2 - Sector Desks (7 agents)

Semiconductor, energy, biotech, consumer, industrials, financials, plus a Bloomberg-style relationship mapper that tracks supply chains, ownership, analyst coverage, and competitive dynamics.

Layer 3 - Superinvestors (4 agents)

Druckenmiller - macro/momentum: what's the big asymmetric trade?
Aschenbrenner - AI/compute: who benefits from the capex cycle?
Baker - deep tech/biotech: who has real IP moats?
Ackman - quality compounder: pricing power + FCF + catalyst?

Layer 4 - Decision (4 agents)

CRO - adversarial risk officer: attacks every idea, finds correlated risks
Alpha Discovery - finds names nobody else mentioned
Autonomous Execution - converts signals to sized trades
CIO - synthesises all prior layers, weighted by Darwinian agent scores, makes the final call

The Autoresearch Loop

Inspired by Karpathy's autoresearch. Same pattern, different domain.

System identifies worst agent by rolling Sharpe
Generates one targeted prompt modification
Runs for 5 trading days
Checks if agent's Sharpe improved
Keep (git commit) or revert (git reset)

The agent prompts are the weights being optimised. Each trading day is one training iteration. A $20/month VM replaces the H100.

Darwinian Weights: Each agent has a weight between 0.3 (minimum) and 2.5 (maximum). Top quartile agents get weight × 1.05 daily. Bottom quartile get × 0.95. The CIO proportionally weights input by these scores. Good agents get louder. Bad agents get quieter.

18-Month Backtest Results

Period: September 2024 - March 2026 (378 trading days)

Prompt modifications attempted: 54
Survived (kept): 16 (30%)
Reverted: 37 (70%)
Deployment phase return: +22% in 173 days
Best individual pick: AVGO at $152, held for +128%

The system independently discovered its own portfolio manager (CIO) was its weakest component — downweighting it to minimum before we diagnosed the same issue manually.

Equity Curve

Agent Spawning

The system detects recurring knowledge gaps in its own debates. When the same blind spot appears 3+ times in 5 days, it autonomously creates a new specialist agent at neutral weight.

During a 6-month spawning test (Jul-Dec 2024):

9 agents spawned autonomously — credit markets, earnings calendar, options flow, liquidity conditions, positioning data, earnings guidance, retail sentiment, technical levels
3 went extinct (stuck at minimum weight for 20+ days)
6 survived Darwinian selection and reached maximum weight
Zero human involvement in deciding what to create or when

The system grew its own team from 25 to 31 agents based on what it learned it didn't know.

All Seasons (PRISM) — Regime-Specific Training

We don't train one set of agents and hope they work everywhere. We train separate cohorts on distinct market regimes.

Same starting agents. Same vanilla prompts. Different evolutionary environments. Completely different survival instincts.

Cohort	Period	Return	Modifications Kept	Key Learning
Bull/Low Vol	2016-2018	+7.7%	180/509 (35%)	Exit vol longs when events resolve peacefully
Crisis (COVID)	2020 Q1-Q2	-13.1%	0/3 (0%)	Crashes too fast for autoresearch — agents need to arrive pre-trained
Rate Tightening	2022-2023	-30.2%	38/89 (43%)	Don't flip-flop during Fed weeks — 15-day minimum between reversals
Recovery	2020 Q2-Q4	-29.0%	0/1 (0%)	Same problem as crisis — too fast for the feedback loop
Euphoria	2021	+14.3%	119/243 (49%)	Momentum confirmation before shorts, cap conviction during political crises

Convergent Evolution: All five cohorts independently discovered the same meta-rules — cap conviction, use VIX as regime filter, enforce hard position limits, never override risk management. Nobody programmed caution. Every cohort learned it from losing money when overconfident.

Divergent Evolution: The same volatility agent started at 844 bytes in every cohort:

Bull markets: grew to 121,260 bytes (143x). Learned "exit vol longs immediately when events resolve peacefully."
Rate tightening: grew to 10,354 bytes. Learned "NEVER buy VXX when VIX is 15-25."
Euphoria: grew to 1,998 bytes. Learned "require VIX above 30 before going long vol."

Same agent. Same starting prompt. Three completely different survival strategies shaped by three different markets.

JANUS Meta-Layer

Multiple trained cohorts produce different recommendations. JANUS sits above all cohorts and algorithmically weights them by recent accuracy.

The weight differential between cohorts is an emergent regime detector:

When short-window agents outperform → NOVEL REGIME
When long-window agents outperform → HISTORICAL REGIME
When they're roughly equal → MIXED

We didn't build a regime detector. It emerged from tracking which cohort gets things right.

Soros Reflexivity Engine

Markets don't just reflect reality — they change it. We built reflexive feedback loops into the simulation framework.

Five feedback loops modelled:

Price → Fundamentals: Stock drops >15% trigger credit downgrades, talent flight, capex cuts. Rises >20% trigger cheap capital, talent attraction, customer confidence.
P&L → Behaviour: Fund drawdown >10% → forced selling cascade. Gains >15% → increased position sizes and concentrated bets.
Narrative → Flows: 3+ analysts converge on thesis → retail flow follows. Contrarian narratives emerge after extended consensus.
Market → Policy: Equity drawdown >15% → central bank signals easing. Oil >$130 → strategic reserve releases.
Reflexive Reversal Detection: Feedback loop running 5+ rounds in one direction → flagged as reflexive extreme. Maximum consensus = maximum fragility.

First detection: Gold bullish consensus appeared in 4 of 5 simulation rounds — flagged as a crowded trade with 32% reversal probability. That's Soros in code.

MiroFish Swarm Simulation Integration

Integrated with MiroFish, a swarm intelligence engine that generates parallel digital worlds populated by thousands of AI agents.

Our trading agents don't just learn from the past — they train on simulated futures:

Overnight, the system generates branching scenarios (geopolitical escalation, Fed policy, earnings shocks, black swans)
Thousands of simulated agents (fund managers, central bankers, retail traders, corporate executives) interact with reflexive feedback
ATLAS trading agents are trained inside these simulated futures using the same Darwinian loop
Agents that navigate simulated futures well get upweighted
Predictions scored against actual outcomes to improve simulation accuracy

First result: Druckenmiller-style agent scores 1.0 in simulated crashes but 0.22 in melt-ups. Quality compounder agent is the opposite. The system knows which instincts to trust before the regime arrives — not after.

Key Insight

The orchestration layer matters as much as the intelligence layer.

Individual agents improved measurably through autoresearch. But portfolio returns depend on how agent signals are converted to sized positions. The synthesis/decision layer is the bottleneck. Improving individual agent intelligence without improving orchestration yields diminishing returns.

What's Included

Framework architecture and pipeline structure
Autoresearch loop design
Backtest results and equity curve
All Seasons (PRISM) methodology and results
Agent spawning mechanism
JANUS meta-layer design
Soros reflexivity engine
MiroFish integration bridge
Example placeholder prompts (generic, not trained)

What's NOT Included

Trained agent prompts (proprietary — evolutionary products of market feedback)
PRISM evolved prompts per regime
CIO active management rules
Agent scorecard data
Live portfolio positions
Darwinian weight values
MiroFish simulation outputs

The trained prompts are the core IP. A competitor starting today is hundreds of iterations behind. That gap widens every day.

Tech Stack

Agents: Claude Sonnet (Anthropic API)
Simulation: MiroFish swarm engine
Data: FMP, Finnhub, Polygon, FRED
Infrastructure: Azure VM ($20/month)
Version Control: Git feature branches for autoresearch tracking
Cost: ~$50-80 for full 18-month backtest, ~$30 for all five PRISM cohorts

Contact

Chris Worsey — CEO & Technical Founder, General Intelligence Capital

chris@generalintelligencecapital.com

generalintelligencecapital.com

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
architecture		architecture
prompts/examples		prompts/examples
results		results
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ATLAS - Self-Improving AI Trading Agents

What Is This?

Architecture

Layer 1 - Macro (10 agents)

Layer 2 - Sector Desks (7 agents)

Layer 3 - Superinvestors (4 agents)

Layer 4 - Decision (4 agents)

The Autoresearch Loop

18-Month Backtest Results

Equity Curve

Agent Spawning

All Seasons (PRISM) — Regime-Specific Training

JANUS Meta-Layer

Soros Reflexivity Engine

MiroFish Swarm Simulation Integration

Key Insight

What's Included

What's NOT Included

Tech Stack

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ATLAS - Self-Improving AI Trading Agents

What Is This?

Architecture

Layer 1 - Macro (10 agents)

Layer 2 - Sector Desks (7 agents)

Layer 3 - Superinvestors (4 agents)

Layer 4 - Decision (4 agents)

The Autoresearch Loop

18-Month Backtest Results

Equity Curve

Agent Spawning

All Seasons (PRISM) — Regime-Specific Training

JANUS Meta-Layer

Soros Reflexivity Engine

MiroFish Swarm Simulation Integration

Key Insight

What's Included

What's NOT Included

Tech Stack

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages