Skip to content

Public JSON feeds for community FAQ and citations#331

Merged
neuromechanist merged 2 commits into
developfrom
feature/issue-321-epic-public-feeds
Jun 9, 2026
Merged

Public JSON feeds for community FAQ and citations#331
neuromechanist merged 2 commits into
developfrom
feature/issue-321-epic-public-feeds

Conversation

@neuromechanist

Copy link
Copy Markdown
Member

Summary

Exposes two already-populated community datasets as public, read-only JSON feeds that communities can build their own frontends on. Both are opt-in per community via a new top-level public_feeds config block (off by default):

public_feeds:
  faq: true
  citations: true

Phases completed

  • Phase 1 — FAQ JSON endpoint (Phase 1: FAQ JSON endpoint #324): GET /{community_id}/faq over generated FAQ entries, with q/category/min_quality filters, pagination, email redaction (question/answer/tags), and Cache-Control. Gated by public_feeds.faq.
  • Phase 2 — Citation dashboard endpoint (Phase 2: Citation dashboard endpoint #330): new papers.cites_doi linkage column (+ in-place migration), recorded by the citation sync; GET /{community_id}/citations returns per-year counts and a stacked-by-canonical-paper breakdown plus the configured canonical DOIs. Gated by public_feeds.citations.

Both endpoints are unauthenticated and read-only, mirror the existing /metrics/public pattern, and share the same 404-gate / 503 / 500 / cache handling.

Deploy follow-up

The cites_doi column is populated going forward by the scheduled citation sync. To backfill existing rows once, run a full citation re-sync after the schema migration runs (cheap).

Test plan

  • Phase 1: 25 tests (FAQ list helper + endpoint), Phase 2: 18 new + papers_sync threading test.
  • Integration run on the epic branch: 275 passed, 1 skipped (FAQ + citations + papers_sync + db migration + community router + core config).
  • Real temporary SQLite databases throughout; no business logic mocked.
  • All phase PRs individually reviewed (code, tests, error-handling) with all critical/important/suggestion findings addressed.

Closes #321

* feat(api): public FAQ JSON feed gated by public_feeds config

Add a top-level public_feeds config block (faq/citations flags, off by
default) and a read-only GET /{community_id}/faq endpoint that serves
generated FAQ entries from the knowledge database.

- New PublicFeedsConfig model on CommunityConfig
- list_faq_entries browse helper (no FTS query required) with pagination
- Endpoint supports q/category/min_quality/limit/offset filters
- Email addresses redacted from public output (privacy mitigation)
- Returns 404 unless public_feeds.faq is enabled

Tests: list helper (ordering, filters, pagination) and endpoint
(gate, fields, redaction, filters, validation) against real SQLite data.

* fix(faq): address PR review findings

- Unify browse + search in list_faq_entries via optional query param so
  total is the real pre-LIMIT count and offset is honored in both modes
  (fixes broken pagination on the ?q= path).
- Redact emails in tags, not just question/answer.
- Guard json.loads(tags) against malformed JSON (shared _parse_faq_tags
  helper, applied to search_faq_entries too) so a corrupt row degrades to
  empty tags instead of an unlogged 500.
- Add a broad logged 500 fallback in the endpoint alongside the 503 path.
- Set Cache-Control: public, max-age=3600, matching /metrics/public.
- Include limit/offset in the list_faq_entries sqlite error log.

Tests: project-consistent fixture, faq=False gate, 503 browse+search,
redaction across question/answer/tags, list_name filter, real search
total vs page size, Cache-Control header.
* feat(api): public citations dashboard with cites_doi linkage

Record which canonical DOI each citing paper references and expose a
per-year + stacked-by-paper citation feed, opt-in per community.

- papers.cites_doi column (CREATE TABLE + _migrate_db ALTER for existing
  DBs); index created in _migrate_db so init_db stays safe on databases
  predating the column.
- upsert_paper records cites_doi; on conflict COALESCE keeps the first
  link, so a keyword sync (None) never erases it and a re-sync backfills
  legacy NULL rows.
- sync_citing_papers threads the canonical DOI through _store_papers.
- get_citation_stats aggregates total/per_year/by_paper (4-digit-year
  GLOB guard drops undated rows).
- GET /{community_id}/citations gated by public_feeds.citations, returns
  per_year, stacked by_paper, and canonical_dois from config, with
  Cache-Control and 503/500 handling matching the FAQ feed.

Backfill on deploy: run a full citation re-sync to populate cites_doi on
existing rows.

Tests: stats aggregation, COALESCE link semantics (backfill/first-wins/
no-clobber), legacy-table migration, endpoint gate/content/cache/503.

* fix(citations): address PR review findings

- Narrow the _migrate_db try to the PRAGMA only so a DDL failure (locked
  DB, I/O error) on an existing papers table propagates instead of being
  swallowed at DEBUG with a misleading 'table not found' message.
- Document the single-column cross-DOI attribution limitation on
  upsert_paper.
- Cover _store_papers threading cites_doi onto each stored row.
- Cover the canonical_dois=[] branch (feed enabled, no citations config)
  and the unexpected-error 500 path; correct the test module docstring.
@neuromechanist neuromechanist merged commit ebd9cb2 into develop Jun 9, 2026
6 checks passed
@neuromechanist neuromechanist deleted the feature/issue-321-epic-public-feeds branch June 9, 2026 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant