Skip to content

chrisworsey55/atlas-gic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ATLAS - Self-Improving AI Trading Agents

Built by General Intelligence Capital

Karpathy's autoresearch + Soros's reflexivity + MiroFish swarm simulation applied to financial markets.

The agent prompts are the weights. Sharpe ratio is the loss function. No GPU needed.

⭐ 944 stars · 202 forks


What Is This?

ATLAS is a framework for autonomous AI trading agents that improve their own prompts through market feedback, train on different market regimes, create new agents when they detect knowledge gaps, and simulate reflexive futures to prepare for what's coming.

25+ agents debate markets daily across 4 layers. Every recommendation is scored against real outcomes. The worst-performing agent gets its prompt rewritten. If performance improves, the git commit survives. If not, git revert.

Now running live with real capital.


Architecture

Layer 1 - Macro (10 agents)

Central bank, geopolitical, China, dollar, yield curve, commodities, volatility, emerging markets, news sentiment, institutional flow.

These agents set the regime. Risk on or risk off? What's the macro backdrop?

Layer 2 - Sector Desks (7 agents)

Semiconductor, energy, biotech, consumer, industrials, financials, plus a Bloomberg-style relationship mapper that tracks supply chains, ownership, analyst coverage, and competitive dynamics.

Layer 3 - Superinvestors (4 agents)

  • Druckenmiller - macro/momentum: what's the big asymmetric trade?
  • Aschenbrenner - AI/compute: who benefits from the capex cycle?
  • Baker - deep tech/biotech: who has real IP moats?
  • Ackman - quality compounder: pricing power + FCF + catalyst?

Layer 4 - Decision (4 agents)

  • CRO - adversarial risk officer: attacks every idea, finds correlated risks
  • Alpha Discovery - finds names nobody else mentioned
  • Autonomous Execution - converts signals to sized trades
  • CIO - synthesises all prior layers, weighted by Darwinian agent scores, makes the final call

The Autoresearch Loop

Inspired by Karpathy's autoresearch. Same pattern, different domain.

  • System identifies worst agent by rolling Sharpe
  • Generates one targeted prompt modification
  • Runs for 5 trading days
  • Checks if agent's Sharpe improved
  • Keep (git commit) or revert (git reset)

The agent prompts are the weights being optimised. Each trading day is one training iteration. A $20/month VM replaces the H100.

Darwinian Weights: Each agent has a weight between 0.3 (minimum) and 2.5 (maximum). Top quartile agents get weight × 1.05 daily. Bottom quartile get × 0.95. The CIO proportionally weights input by these scores. Good agents get louder. Bad agents get quieter.


18-Month Backtest Results

Period: September 2024 - March 2026 (378 trading days)

  • Prompt modifications attempted: 54
  • Survived (kept): 16 (30%)
  • Reverted: 37 (70%)
  • Deployment phase return: +22% in 173 days
  • Best individual pick: AVGO at $152, held for +128%

The system independently discovered its own portfolio manager (CIO) was its weakest component — downweighting it to minimum before we diagnosed the same issue manually.

Equity Curve

Equity Curve


Agent Spawning

The system detects recurring knowledge gaps in its own debates. When the same blind spot appears 3+ times in 5 days, it autonomously creates a new specialist agent at neutral weight.

During a 6-month spawning test (Jul-Dec 2024):

  • 9 agents spawned autonomously — credit markets, earnings calendar, options flow, liquidity conditions, positioning data, earnings guidance, retail sentiment, technical levels
  • 3 went extinct (stuck at minimum weight for 20+ days)
  • 6 survived Darwinian selection and reached maximum weight
  • Zero human involvement in deciding what to create or when

The system grew its own team from 25 to 31 agents based on what it learned it didn't know.


All Seasons (PRISM) — Regime-Specific Training

We don't train one set of agents and hope they work everywhere. We train separate cohorts on distinct market regimes.

Same starting agents. Same vanilla prompts. Different evolutionary environments. Completely different survival instincts.

Cohort Period Return Modifications Kept Key Learning
Bull/Low Vol 2016-2018 +7.7% 180/509 (35%) Exit vol longs when events resolve peacefully
Crisis (COVID) 2020 Q1-Q2 -13.1% 0/3 (0%) Crashes too fast for autoresearch — agents need to arrive pre-trained
Rate Tightening 2022-2023 -30.2% 38/89 (43%) Don't flip-flop during Fed weeks — 15-day minimum between reversals
Recovery 2020 Q2-Q4 -29.0% 0/1 (0%) Same problem as crisis — too fast for the feedback loop
Euphoria 2021 +14.3% 119/243 (49%) Momentum confirmation before shorts, cap conviction during political crises

Convergent Evolution: All five cohorts independently discovered the same meta-rules — cap conviction, use VIX as regime filter, enforce hard position limits, never override risk management. Nobody programmed caution. Every cohort learned it from losing money when overconfident.

Divergent Evolution: The same volatility agent started at 844 bytes in every cohort:

  • Bull markets: grew to 121,260 bytes (143x). Learned "exit vol longs immediately when events resolve peacefully."
  • Rate tightening: grew to 10,354 bytes. Learned "NEVER buy VXX when VIX is 15-25."
  • Euphoria: grew to 1,998 bytes. Learned "require VIX above 30 before going long vol."

Same agent. Same starting prompt. Three completely different survival strategies shaped by three different markets.


JANUS Meta-Layer

Multiple trained cohorts produce different recommendations. JANUS sits above all cohorts and algorithmically weights them by recent accuracy.

The weight differential between cohorts is an emergent regime detector:

  • When short-window agents outperform → NOVEL REGIME
  • When long-window agents outperform → HISTORICAL REGIME
  • When they're roughly equal → MIXED

We didn't build a regime detector. It emerged from tracking which cohort gets things right.


Soros Reflexivity Engine

Markets don't just reflect reality — they change it. We built reflexive feedback loops into the simulation framework.

Five feedback loops modelled:

  1. Price → Fundamentals: Stock drops >15% trigger credit downgrades, talent flight, capex cuts. Rises >20% trigger cheap capital, talent attraction, customer confidence.
  2. P&L → Behaviour: Fund drawdown >10% → forced selling cascade. Gains >15% → increased position sizes and concentrated bets.
  3. Narrative → Flows: 3+ analysts converge on thesis → retail flow follows. Contrarian narratives emerge after extended consensus.
  4. Market → Policy: Equity drawdown >15% → central bank signals easing. Oil >$130 → strategic reserve releases.
  5. Reflexive Reversal Detection: Feedback loop running 5+ rounds in one direction → flagged as reflexive extreme. Maximum consensus = maximum fragility.

First detection: Gold bullish consensus appeared in 4 of 5 simulation rounds — flagged as a crowded trade with 32% reversal probability. That's Soros in code.


MiroFish Swarm Simulation Integration

Integrated with MiroFish, a swarm intelligence engine that generates parallel digital worlds populated by thousands of AI agents.

Our trading agents don't just learn from the past — they train on simulated futures:

  • Overnight, the system generates branching scenarios (geopolitical escalation, Fed policy, earnings shocks, black swans)
  • Thousands of simulated agents (fund managers, central bankers, retail traders, corporate executives) interact with reflexive feedback
  • ATLAS trading agents are trained inside these simulated futures using the same Darwinian loop
  • Agents that navigate simulated futures well get upweighted
  • Predictions scored against actual outcomes to improve simulation accuracy

First result: Druckenmiller-style agent scores 1.0 in simulated crashes but 0.22 in melt-ups. Quality compounder agent is the opposite. The system knows which instincts to trust before the regime arrives — not after.


Key Insight

The orchestration layer matters as much as the intelligence layer.

Individual agents improved measurably through autoresearch. But portfolio returns depend on how agent signals are converted to sized positions. The synthesis/decision layer is the bottleneck. Improving individual agent intelligence without improving orchestration yields diminishing returns.


What's Included

  • Framework architecture and pipeline structure
  • Autoresearch loop design
  • Backtest results and equity curve
  • All Seasons (PRISM) methodology and results
  • Agent spawning mechanism
  • JANUS meta-layer design
  • Soros reflexivity engine
  • MiroFish integration bridge
  • Example placeholder prompts (generic, not trained)

What's NOT Included

  • Trained agent prompts (proprietary — evolutionary products of market feedback)
  • PRISM evolved prompts per regime
  • CIO active management rules
  • Agent scorecard data
  • Live portfolio positions
  • Darwinian weight values
  • MiroFish simulation outputs

The trained prompts are the core IP. A competitor starting today is hundreds of iterations behind. That gap widens every day.


Tech Stack

  • Agents: Claude Sonnet (Anthropic API)
  • Simulation: MiroFish swarm engine
  • Data: FMP, Finnhub, Polygon, FRED
  • Infrastructure: Azure VM ($20/month)
  • Version Control: Git feature branches for autoresearch tracking
  • Cost: ~$50-80 for full 18-month backtest, ~$30 for all five PRISM cohorts

Contact

Chris Worsey — CEO & Technical Founder, General Intelligence Capital

chris@generalintelligencecapital.com

generalintelligencecapital.com

About

ATLAS by General Intelligence Capital — Self-improving AI trading agents using Karpathy-style autoresearch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages