fix(aa-index): scrape App Router RSC payload instead of __NEXT_DATA__#97
Open
xack20 wants to merge 1 commit into
Open
fix(aa-index): scrape App Router RSC payload instead of __NEXT_DATA__#97xack20 wants to merge 1 commit into
xack20 wants to merge 1 commit into
Conversation
artificialanalysis.ai migrated to the Next.js App Router, which no longer
embeds a `<script id="__NEXT_DATA__">` blob. The AA Intelligence Index
fetcher's regex never matched, so every run logged
`AA Index fetch failed ... __NEXT_DATA__ payload not found` and silently
fell back to the frozen 2026-05-14 snapshot — live scores stopped flowing
into the rankings.
Changes (src/whichllm/models/benchmark_sources/aa_index.py):
- Parse the App Router RSC stream: concatenate + unescape the
`self.__next_f.push([n, "..."])` chunks and pull every
`{"name", ..., "intelligenceIndex"}` record with a bounded regex.
- Canonicalize AA's variant-suffixed display names (`(Reasoning)`,
`(Non-reasoning)`, `(high)`, effort/date tags) before mapping to
HuggingFace ids — lifts live name->HF coverage from 8 to ~46 models.
- Overlay live scores on top of the curated fallback so a successful live
fetch can only add coverage, never shrink it below the snapshot.
- Keep the legacy `__NEXT_DATA__` extraction as a secondary fallback.
Adds tests/test_aa_index.py (offline, httpx.MockTransport) covering name
canonicalization, RSC decoding/extraction, canonical-name mapping, the
merge-over-fallback guarantee, and the no-records error path.
Full suite: 298 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Updates the Artificial Analysis Intelligence Index integration to work with the site’s newer Next.js App Router (RSC) payload format, restoring live fetching while retaining a curated fallback.
Changes:
- Add RSC scraping via
self.__next_f.push(...)chunk decoding + bounded record extraction. - Add canonicalization for AA display names to improve mapping to HF IDs.
- Merge live scores over a curated snapshot fallback, and add offline tests for the new behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/whichllm/models/benchmark_sources/aa_index.py |
Implements RSC scraper, canonical-name matching, and live+fallback merge logic. |
tests/test_aa_index.py |
Adds offline unit tests for RSC decoding/extraction, name canonicalization, and merging behavior. |
CHANGELOG.md |
Documents the fix restoring live AA Index fetching and the new RSC parsing approach. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+350
to
+357
| # Overlay live scores on top of the curated snapshot so a successful live | ||
| # fetch can only ADD coverage, never shrink it below the fallback. Live | ||
| # numbers win wherever both exist; the snapshot fills the long tail of | ||
| # models AA labels in a way we can't map (or no longer tracks). | ||
| scores = get_aa_curated_fallback() | ||
| for hf_id, normalized in live.items(): | ||
| if normalized > scores.get(hf_id, 0.0): | ||
| scores[hf_id] = normalized |
Comment on lines
+212
to
+217
| _AA_RECORD_RE = re.compile( | ||
| r'"name":"(?P<name>(?:[^"\\]|\\.)*)"' | ||
| r'(?:(?!"name":").)*?' | ||
| r'"intelligenceIndex":(?P<idx>-?\d+(?:\.\d+)?)', | ||
| re.DOTALL, | ||
| ) |
Comment on lines
+235
to
+239
| # Canonical-name -> HF ids, derived once from AA_NAME_TO_HF_IDS. Several display | ||
| # names can collapse to one canonical key; we union their HF ids. | ||
| _AA_CANON_TO_HF_IDS: dict[str, list[str]] = {} | ||
| for _disp, _ids in AA_NAME_TO_HF_IDS.items(): | ||
| _AA_CANON_TO_HF_IDS.setdefault(_canonical_name(_disp), []).extend(_ids) |
Comment on lines
+246
to
+249
| try: | ||
| parts.append(json.loads(m.group("s"))) | ||
| except (ValueError, json.JSONDecodeError): | ||
| continue |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR: fix(aa-index): scrape App Router RSC payload instead of
__NEXT_DATA__Base:
Andyyyy64/whichllm:main· Head:fix/aa-index-rsc-scraperSummary
The Artificial Analysis Intelligence Index source has been silently failing on
every run.
artificialanalysis.aimigrated its leaderboard to the Next.jsApp Router, which no longer embeds a
<script id="__NEXT_DATA__">blob — modeldata now streams via
self.__next_f.push([n, "…"])(RSC) chunks. The scraper's_NEXT_DATA_REregex never matches, sofetch_aa_index_scoresraisesExtractionFailed("__NEXT_DATA__ payload not found"), andbuild_scores()logs:
then falls back to the frozen
AA_INDEX_FALLBACK_2026_05_14snapshot. The CLIstill prints (the failure is caught), but the "current" AA tier is stale on
every invocation.
Changes
All in
src/whichllm/models/benchmark_sources/aa_index.py:_decode_rsc_blob()concatenates andunescapes the
self.__next_f.push([n, "…"])chunks;_extract_aa_pairs_from_html()pulls every
{"name", …, "intelligenceIndex"}record out with a boundedregex (the payload is a flat RSC stream, not one parseable JSON document, so
the middle of the record regex forbids a second
"name":"to avoid leakingacross records).
"Qwen3 14B (Reasoning)","GLM-5 (Non-reasoning)","gpt-oss-20B (high)"._canonical_name()strips parentheticals and normalizes separators/case sothey map back onto the existing
AA_NAME_TO_HF_IDStable. This lifts livename→HF coverage from 8 → ~46 models without enlarging the table.
merges on top of
get_aa_curated_fallback()(live wins where both exist),so a fetch can only add coverage — it can never shrink the AA tier below
the snapshot. Previously, replacing the ~72-entry snapshot with ~8 exact
matches would have regressed rankings.
__NEXT_DATA__as a secondary fallback in case the site formatchanges again.
Verification
Against the live page (2026-06-09): live fetch returns 78 merged scores
(≥72 fallback baseline), with 12 models refreshed by live data and 6 new
ones not in the snapshot (GLM-4.7, MiniMax-M2, MiMo-V2-Flash, …). The
AA Index fetch failedwarning no longer appears.Tests
Adds
tests/test_aa_index.py— fully offline (httpx.MockTransport), covering:fetch_aa_index_scores,ExtractionFailedpath when no records are found.🤖 Generated with Claude Code