Public JSON feeds for community FAQ and citations#331
Merged
Conversation
* feat(api): public FAQ JSON feed gated by public_feeds config
Add a top-level public_feeds config block (faq/citations flags, off by
default) and a read-only GET /{community_id}/faq endpoint that serves
generated FAQ entries from the knowledge database.
- New PublicFeedsConfig model on CommunityConfig
- list_faq_entries browse helper (no FTS query required) with pagination
- Endpoint supports q/category/min_quality/limit/offset filters
- Email addresses redacted from public output (privacy mitigation)
- Returns 404 unless public_feeds.faq is enabled
Tests: list helper (ordering, filters, pagination) and endpoint
(gate, fields, redaction, filters, validation) against real SQLite data.
* fix(faq): address PR review findings
- Unify browse + search in list_faq_entries via optional query param so
total is the real pre-LIMIT count and offset is honored in both modes
(fixes broken pagination on the ?q= path).
- Redact emails in tags, not just question/answer.
- Guard json.loads(tags) against malformed JSON (shared _parse_faq_tags
helper, applied to search_faq_entries too) so a corrupt row degrades to
empty tags instead of an unlogged 500.
- Add a broad logged 500 fallback in the endpoint alongside the 503 path.
- Set Cache-Control: public, max-age=3600, matching /metrics/public.
- Include limit/offset in the list_faq_entries sqlite error log.
Tests: project-consistent fixture, faq=False gate, 503 browse+search,
redaction across question/answer/tags, list_name filter, real search
total vs page size, Cache-Control header.
* feat(api): public citations dashboard with cites_doi linkage
Record which canonical DOI each citing paper references and expose a
per-year + stacked-by-paper citation feed, opt-in per community.
- papers.cites_doi column (CREATE TABLE + _migrate_db ALTER for existing
DBs); index created in _migrate_db so init_db stays safe on databases
predating the column.
- upsert_paper records cites_doi; on conflict COALESCE keeps the first
link, so a keyword sync (None) never erases it and a re-sync backfills
legacy NULL rows.
- sync_citing_papers threads the canonical DOI through _store_papers.
- get_citation_stats aggregates total/per_year/by_paper (4-digit-year
GLOB guard drops undated rows).
- GET /{community_id}/citations gated by public_feeds.citations, returns
per_year, stacked by_paper, and canonical_dois from config, with
Cache-Control and 503/500 handling matching the FAQ feed.
Backfill on deploy: run a full citation re-sync to populate cites_doi on
existing rows.
Tests: stats aggregation, COALESCE link semantics (backfill/first-wins/
no-clobber), legacy-table migration, endpoint gate/content/cache/503.
* fix(citations): address PR review findings
- Narrow the _migrate_db try to the PRAGMA only so a DDL failure (locked
DB, I/O error) on an existing papers table propagates instead of being
swallowed at DEBUG with a misleading 'table not found' message.
- Document the single-column cross-DOI attribution limitation on
upsert_paper.
- Cover _store_papers threading cites_doi onto each stored row.
- Cover the canonical_dois=[] branch (feed enabled, no citations config)
and the unexpected-error 500 path; correct the test module docstring.
This was referenced Jun 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Exposes two already-populated community datasets as public, read-only JSON feeds that communities can build their own frontends on. Both are opt-in per community via a new top-level
public_feedsconfig block (off by default):Phases completed
GET /{community_id}/faqover generated FAQ entries, withq/category/min_qualityfilters, pagination, email redaction (question/answer/tags), andCache-Control. Gated bypublic_feeds.faq.papers.cites_doilinkage column (+ in-place migration), recorded by the citation sync;GET /{community_id}/citationsreturns per-year counts and a stacked-by-canonical-paper breakdown plus the configured canonical DOIs. Gated bypublic_feeds.citations.Both endpoints are unauthenticated and read-only, mirror the existing
/metrics/publicpattern, and share the same 404-gate / 503 / 500 / cache handling.Deploy follow-up
The
cites_doicolumn is populated going forward by the scheduled citation sync. To backfill existing rows once, run a full citation re-sync after the schema migration runs (cheap).Test plan
Closes #321