Phase 2: Citation dashboard endpoint#330
Merged
neuromechanist merged 2 commits intoJun 9, 2026
Merged
Conversation
Record which canonical DOI each citing paper references and expose a
per-year + stacked-by-paper citation feed, opt-in per community.
- papers.cites_doi column (CREATE TABLE + _migrate_db ALTER for existing
DBs); index created in _migrate_db so init_db stays safe on databases
predating the column.
- upsert_paper records cites_doi; on conflict COALESCE keeps the first
link, so a keyword sync (None) never erases it and a re-sync backfills
legacy NULL rows.
- sync_citing_papers threads the canonical DOI through _store_papers.
- get_citation_stats aggregates total/per_year/by_paper (4-digit-year
GLOB guard drops undated rows).
- GET /{community_id}/citations gated by public_feeds.citations, returns
per_year, stacked by_paper, and canonical_dois from config, with
Cache-Control and 503/500 handling matching the FAQ feed.
Backfill on deploy: run a full citation re-sync to populate cites_doi on
existing rows.
Tests: stats aggregation, COALESCE link semantics (backfill/first-wins/
no-clobber), legacy-table migration, endpoint gate/content/cache/503.
- Narrow the _migrate_db try to the PRAGMA only so a DDL failure (locked DB, I/O error) on an existing papers table propagates instead of being swallowed at DEBUG with a misleading 'table not found' message. - Document the single-column cross-DOI attribution limitation on upsert_paper. - Cover _store_papers threading cites_doi onto each stored row. - Cover the canonical_dois=[] branch (feed enabled, no citations config) and the unexpected-error 500 path; correct the test module docstring.
30f3e82
into
feature/issue-321-epic-public-feeds
4 checks passed
neuromechanist
added a commit
that referenced
this pull request
Jun 9, 2026
* Phase 1: FAQ JSON endpoint (#324) * feat(api): public FAQ JSON feed gated by public_feeds config Add a top-level public_feeds config block (faq/citations flags, off by default) and a read-only GET /{community_id}/faq endpoint that serves generated FAQ entries from the knowledge database. - New PublicFeedsConfig model on CommunityConfig - list_faq_entries browse helper (no FTS query required) with pagination - Endpoint supports q/category/min_quality/limit/offset filters - Email addresses redacted from public output (privacy mitigation) - Returns 404 unless public_feeds.faq is enabled Tests: list helper (ordering, filters, pagination) and endpoint (gate, fields, redaction, filters, validation) against real SQLite data. * fix(faq): address PR review findings - Unify browse + search in list_faq_entries via optional query param so total is the real pre-LIMIT count and offset is honored in both modes (fixes broken pagination on the ?q= path). - Redact emails in tags, not just question/answer. - Guard json.loads(tags) against malformed JSON (shared _parse_faq_tags helper, applied to search_faq_entries too) so a corrupt row degrades to empty tags instead of an unlogged 500. - Add a broad logged 500 fallback in the endpoint alongside the 503 path. - Set Cache-Control: public, max-age=3600, matching /metrics/public. - Include limit/offset in the list_faq_entries sqlite error log. Tests: project-consistent fixture, faq=False gate, 503 browse+search, redaction across question/answer/tags, list_name filter, real search total vs page size, Cache-Control header. * Phase 2: Citation dashboard endpoint (#330) * feat(api): public citations dashboard with cites_doi linkage Record which canonical DOI each citing paper references and expose a per-year + stacked-by-paper citation feed, opt-in per community. - papers.cites_doi column (CREATE TABLE + _migrate_db ALTER for existing DBs); index created in _migrate_db so init_db stays safe on databases predating the column. - upsert_paper records cites_doi; on conflict COALESCE keeps the first link, so a keyword sync (None) never erases it and a re-sync backfills legacy NULL rows. - sync_citing_papers threads the canonical DOI through _store_papers. - get_citation_stats aggregates total/per_year/by_paper (4-digit-year GLOB guard drops undated rows). - GET /{community_id}/citations gated by public_feeds.citations, returns per_year, stacked by_paper, and canonical_dois from config, with Cache-Control and 503/500 handling matching the FAQ feed. Backfill on deploy: run a full citation re-sync to populate cites_doi on existing rows. Tests: stats aggregation, COALESCE link semantics (backfill/first-wins/ no-clobber), legacy-table migration, endpoint gate/content/cache/503. * fix(citations): address PR review findings - Narrow the _migrate_db try to the PRAGMA only so a DDL failure (locked DB, I/O error) on an existing papers table propagates instead of being swallowed at DEBUG with a misleading 'table not found' message. - Document the single-column cross-DOI attribution limitation on upsert_paper. - Cover _store_papers threading cites_doi onto each stored row. - Cover the canonical_dois=[] branch (feed enabled, no citations config) and the unexpected-error 500 path; correct the test module docstring.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Exposes canonical-paper citation tracking as a public, read-only JSON feed with a per-year and stacked-by-canonical-paper breakdown, opt-in per community.
papers.cites_doicolumn. Added toCREATE TABLE papersfor new DBs and via_migrate_dbALTER TABLEfor existing ones. Theidx_papers_cites_doiindex is created in_migrate_db(notSCHEMA_SQL) soinit_dbstays safe on databases predating the column — caught by a migration test.upsert_paperrecordscites_doi; on conflictcites_doi = COALESCE(papers.cites_doi, excluded.cites_doi)so the first citation link wins, a later keyword sync (None) never erases it, and a re-sync backfills legacyNULLrows.sync_citing_papersthreads the canonical DOI through_store_papers.get_citation_stats(project)returnstotal,per_year, andby_paper{doi:{year:count}}. A 4-digit-yearGLOBguard drops rows whosecreated_atis missing or malformed so no citation lands in a bogus year bucket.GET /{community_id}/citationsgated bypublic_feeds.citations; returnsper_year, stackedby_paper, andcanonical_doisfrom config. SameCache-Control: public, max-age=3600and 503/500 handling as the FAQ feed.Deploy note
cites_doiis populated going forward by the scheduled citation sync. To backfill existing rows, run a full citation re-sync once (cheap) after this merges and the schema migration runs.Limitation
A paper citing more than one canonical DOI records only the first (single column, by design). Re-sync is idempotent under COALESCE.
Test plan
tests/test_knowledge/test_citation_stats.py(10): stats aggregation (total/per_year/by_paper, year sorting, undated exclusion, empty DB), COALESCE link semantics (backfill, first-link-wins, keyword-sync-no-clobber), legacy-table migration.tests/test_api/test_citations_feed.py(8): gate 404 (None + flag false) / 200, total+per_year, stacked by_paper, canonical_dois from config, Cache-Control, 503.Closes #323
Part of epic #321