Skip to content

fix(aa-index): scrape App Router RSC payload instead of __NEXT_DATA__#97

Open
xack20 wants to merge 1 commit into
Andyyyy64:mainfrom
xack20:fix/aa-index-rsc-scraper
Open

fix(aa-index): scrape App Router RSC payload instead of __NEXT_DATA__#97
xack20 wants to merge 1 commit into
Andyyyy64:mainfrom
xack20:fix/aa-index-rsc-scraper

Conversation

@xack20

@xack20 xack20 commented Jun 9, 2026

Copy link
Copy Markdown

PR: fix(aa-index): scrape App Router RSC payload instead of __NEXT_DATA__

Base: Andyyyy64/whichllm:main · Head: fix/aa-index-rsc-scraper


Summary

The Artificial Analysis Intelligence Index source has been silently failing on
every run. artificialanalysis.ai migrated its leaderboard to the Next.js
App Router
, which no longer embeds a <script id="__NEXT_DATA__"> blob — model
data now streams via self.__next_f.push([n, "…"]) (RSC) chunks. The scraper's
_NEXT_DATA_RE regex never matches, so fetch_aa_index_scores raises
ExtractionFailed("__NEXT_DATA__ payload not found"), and build_scores()
logs:

AA Index fetch failed, will use fallback: __NEXT_DATA__ payload not found

then falls back to the frozen AA_INDEX_FALLBACK_2026_05_14 snapshot. The CLI
still prints (the failure is caught), but the "current" AA tier is stale on
every invocation.

Changes

All in src/whichllm/models/benchmark_sources/aa_index.py:

  1. Parse the App Router RSC stream. _decode_rsc_blob() concatenates and
    unescapes the self.__next_f.push([n, "…"]) chunks; _extract_aa_pairs_from_html()
    pulls every {"name", …, "intelligenceIndex"} record out with a bounded
    regex (the payload is a flat RSC stream, not one parseable JSON document, so
    the middle of the record regex forbids a second "name":" to avoid leaking
    across records).
  2. Canonicalize variant-suffixed names. AA now labels models like
    "Qwen3 14B (Reasoning)", "GLM-5 (Non-reasoning)", "gpt-oss-20B (high)".
    _canonical_name() strips parentheticals and normalizes separators/case so
    they map back onto the existing AA_NAME_TO_HF_IDS table. This lifts live
    name→HF coverage from 8 → ~46 models without enlarging the table.
  3. Overlay live over the curated fallback. A successful live fetch now
    merges on top of get_aa_curated_fallback() (live wins where both exist),
    so a fetch can only add coverage — it can never shrink the AA tier below
    the snapshot. Previously, replacing the ~72-entry snapshot with ~8 exact
    matches would have regressed rankings.
  4. Keep __NEXT_DATA__ as a secondary fallback in case the site format
    changes again.

Verification

Against the live page (2026-06-09): live fetch returns 78 merged scores
(≥72 fallback baseline), with 12 models refreshed by live data and 6 new
ones not in the snapshot (GLM-4.7, MiniMax-M2, MiMo-V2-Flash, …). The
AA Index fetch failed warning no longer appears.

Tests

Adds tests/test_aa_index.py — fully offline (httpx.MockTransport), covering:

  • name canonicalization (variant + separator stripping),
  • RSC chunk decoding and record extraction (incl. the no-leak boundary),
  • canonical-name → HF mapping through fetch_aa_index_scores,
  • the merge-over-fallback coverage guarantee,
  • the ExtractionFailed path when no records are found.
uv run pytest          # 298 passed

🤖 Generated with Claude Code

artificialanalysis.ai migrated to the Next.js App Router, which no longer
embeds a `<script id="__NEXT_DATA__">` blob. The AA Intelligence Index
fetcher's regex never matched, so every run logged
`AA Index fetch failed ... __NEXT_DATA__ payload not found` and silently
fell back to the frozen 2026-05-14 snapshot — live scores stopped flowing
into the rankings.

Changes (src/whichllm/models/benchmark_sources/aa_index.py):
- Parse the App Router RSC stream: concatenate + unescape the
  `self.__next_f.push([n, "..."])` chunks and pull every
  `{"name", ..., "intelligenceIndex"}` record with a bounded regex.
- Canonicalize AA's variant-suffixed display names (`(Reasoning)`,
  `(Non-reasoning)`, `(high)`, effort/date tags) before mapping to
  HuggingFace ids — lifts live name->HF coverage from 8 to ~46 models.
- Overlay live scores on top of the curated fallback so a successful live
  fetch can only add coverage, never shrink it below the snapshot.
- Keep the legacy `__NEXT_DATA__` extraction as a secondary fallback.

Adds tests/test_aa_index.py (offline, httpx.MockTransport) covering name
canonicalization, RSC decoding/extraction, canonical-name mapping, the
merge-over-fallback guarantee, and the no-records error path.

Full suite: 298 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 9, 2026 09:46

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates the Artificial Analysis Intelligence Index integration to work with the site’s newer Next.js App Router (RSC) payload format, restoring live fetching while retaining a curated fallback.

Changes:

  • Add RSC scraping via self.__next_f.push(...) chunk decoding + bounded record extraction.
  • Add canonicalization for AA display names to improve mapping to HF IDs.
  • Merge live scores over a curated snapshot fallback, and add offline tests for the new behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/whichllm/models/benchmark_sources/aa_index.py Implements RSC scraper, canonical-name matching, and live+fallback merge logic.
tests/test_aa_index.py Adds offline unit tests for RSC decoding/extraction, name canonicalization, and merging behavior.
CHANGELOG.md Documents the fix restoring live AA Index fetching and the new RSC parsing approach.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +350 to +357
# Overlay live scores on top of the curated snapshot so a successful live
# fetch can only ADD coverage, never shrink it below the fallback. Live
# numbers win wherever both exist; the snapshot fills the long tail of
# models AA labels in a way we can't map (or no longer tracks).
scores = get_aa_curated_fallback()
for hf_id, normalized in live.items():
if normalized > scores.get(hf_id, 0.0):
scores[hf_id] = normalized
Comment on lines +212 to +217
_AA_RECORD_RE = re.compile(
r'"name":"(?P<name>(?:[^"\\]|\\.)*)"'
r'(?:(?!"name":").)*?'
r'"intelligenceIndex":(?P<idx>-?\d+(?:\.\d+)?)',
re.DOTALL,
)
Comment on lines +235 to +239
# Canonical-name -> HF ids, derived once from AA_NAME_TO_HF_IDS. Several display
# names can collapse to one canonical key; we union their HF ids.
_AA_CANON_TO_HF_IDS: dict[str, list[str]] = {}
for _disp, _ids in AA_NAME_TO_HF_IDS.items():
_AA_CANON_TO_HF_IDS.setdefault(_canonical_name(_disp), []).extend(_ids)
Comment on lines +246 to +249
try:
parts.append(json.loads(m.group("s")))
except (ValueError, json.JSONDecodeError):
continue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants