Skip to content

fix(registry): derive freshness from source history#2121

Open
JSONbored wants to merge 1 commit into
mainfrom
codex/propose-fix-for-content-freshness-spoofing
Open

fix(registry): derive freshness from source history#2121
JSONbored wants to merge 1 commit into
mainfrom
codex/propose-fix-for-content-freshness-spoofing

Conversation

@JSONbored

Copy link
Copy Markdown
Owner

Motivation

  • Prevent generated JSON artifacts from being treated as authoritative sources of contentUpdatedAt, which allowed committed generated files to spoof public freshness/trust metadata.

Description

  • Replace the loader that read contentUpdatedAt from checked-in apps/web/public/data/entries/*.json with a Git-history-derived map of source content/*.mdx timestamps computed via git log in scripts/build-content-index.mjs.
  • Preserve explicit frontmatter contentUpdatedAt when present and otherwise fall back to the source file’s Git commit timestamp and then dateAdded, so generated artifacts are no longer an input to freshness metadata.
  • Add a churn-policy regression test in tests/generated-churn-policy.test.ts to assert the new code path (loadGitContentUpdatedAt) is present and the old artifact-based loader is not reintroduced.
  • Minor runtime import change: add execFileSync usage to safely run Git commands from within the build script.

Testing

  • Ran pnpm exec vitest run tests/generated-churn-policy.test.ts and it passed (all tests green).
  • Ran node scripts/build-content-index.mjs to exercise full generation and it completed successfully and wrote registry artifacts.
  • Ran pnpm validate:content:strict and content validation passed.
  • Ran pnpm test:registry-artifacts and observed two tests in tests/registry-artifacts.test.ts time out under the environment default test timeout (these timed out during long-running aggregate-assertion checks and are reported here as failures in this run).

Codex Task

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@JSONbored, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 12 minutes and 4 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8c00a9cd-b888-4746-af53-5e60ba845775

📥 Commits

Reviewing files that changed from the base of the PR and between 60c0a31 and 3bcd750.

📒 Files selected for processing (2)
  • scripts/build-content-index.mjs
  • tests/generated-churn-policy.test.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/propose-fix-for-content-freshness-spoofing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@superagent-security

Copy link
Copy Markdown

Superagent didn't find any vulnerabilities or security issues in this PR.

@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 11, 2026
@gittensory

gittensory Bot commented Jun 11, 2026

Copy link
Copy Markdown

Note

Gittensory Gate skipped

PR closed before full evaluation. No late first comment was created.

Signal Result Evidence Action
Gate result ⚠️ Skipped #2121 is no longer open. No action.

Checked by Gittensory, a quiet PR intelligence layer for OSS maintainers.

@gittensory gittensory Bot added the gittensory:reviewed Gittensor contributor context label Jun 11, 2026
@JSONbored JSONbored self-assigned this Jun 11, 2026
@JSONbored JSONbored closed this Jun 11, 2026
@JSONbored JSONbored reopened this Jun 11, 2026
@reviewwed

reviewwed Bot commented Jun 11, 2026

Copy link
Copy Markdown

🔍 Maintainer advisory review

Reviewed 2 changed file(s) — two independent AI reviewers. This is an advisory review — it does not merge or close the PR.

Suggested action: 🛠️ Request changes. Address the items below before merging.

Reviewer A · gpt-oss-120b — recommends 🛠️ request changes
The change attempts to improve freshness detection by reading Git history, but it introduces functional regressions and insufficient testing. The new loadGitContentUpdatedAt builds a map keyed by relative file paths (e.g., "content/foo/bar.mdx"), yet later code looks up freshness using path.relative(repoRoot, filePath) where filePath points to the generated JSON artifact (e.g., "entry-data/foo/bar.json"). This key mismatch means the map will never be hit, so contentUpdatedAt will always fall back to entry.dateAdded, breaking the intended freshness logic and potentially causing unnecessary rebuilds. It also removes the previous fallback that read existing JSON artifacts, which may affect incremental builds. Security-wise, invoking git via execFileSync with static arguments is safe, but the lack of handling for a missing .git directory could silently hide errors. Tests only verify the presence of the new function name, not its correctness. Overall, the change is not ready to merge.

Suggestions

  • Align the key format of loadGitContentUpdatedAt with the lookup performed later (e.g., store ${category}:${slug} derived from the MDX path).
  • Retain the previous fallback that reads existing JSON artifacts for entries that lack a Git timestamp.
  • Add unit tests that simulate a small Git repository (or mock execFileSync) and verify that contentUpdatedAt is correctly populated from the Git commit timestamp.
  • Handle the case where the repository is not a Git checkout more explicitly, perhaps logging a warning.
  • Consider caching the Git log output or limiting its scope to improve performance.

Worth double-checking

  • Regression: contentUpdatedAt will always be missing, causing stale content to be treated as new and triggering unnecessary rebuilds or PRs.
  • Potential loss of incremental build optimization, leading to longer CI times.
  • Silent failures in environments without a Git history (e.g., CI containers without .git).

Reviewer B · mistral-small-3.1-24b-instruct — recommends 🛠️ request changes
The change aims to derive content freshness from the git history instead of generated artifacts, which is a reasonable goal. However, there are several issues that need to be addressed.

Suggestions

  • Add error handling for the execFileSync call to ensure that failures are logged and handled gracefully.
  • Consider using a more secure method to execute shell commands, such as util.promisify with child_process.execFile to handle errors asynchronously.
  • Add tests to cover the new loadGitContentUpdatedAt function, especially edge cases and error conditions.
  • Ensure that the git command is available in the environment where the script runs, or handle the case where it is not available.
  • Review the regular expression and string manipulations to ensure they are robust and do not introduce security risks like ReDoS.

Worth double-checking

  • Potential command injection if the git command or its arguments are not properly sanitized.
  • The script may fail if the git command is not available in the environment.
  • The new logic may introduce regressions if the git history is not as expected.
  • The change introduces a dependency on the git command, which may not be present in all environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

aardvark codex gittensory:reviewed Gittensor contributor context size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant