Skip to content

Community Dashboard: metrics, auth, budget alerting#148

Merged
neuromechanist merged 8 commits into
developfrom
feature/issue-132-epic-community-dashboard
Feb 4, 2026
Merged

Community Dashboard: metrics, auth, budget alerting#148
neuromechanist merged 8 commits into
developfrom
feature/issue-132-epic-community-dashboard

Conversation

@neuromechanist

Copy link
Copy Markdown
Member

Summary

Epic PR for the Community Dashboard feature (#132), merging all 4 phases into develop:

Key additions

  • src/metrics/ module: SQLite-based request logging, quality queries, cost estimation, budget checking, GitHub issue alerts
  • src/api/security.py: AuthScope with admin/community roles, BYOK support, scoped auth dependencies
  • src/api/routers/metrics.py + metrics_public.py: Global and per-community metrics endpoints (authenticated + public)
  • src/api/scheduler.py: APScheduler jobs for budget checking every 15 minutes
  • src/core/config/community.py: BudgetConfig, maintainers field with GitHub username validation
  • dashboard/osa/index.html: Single-file dashboard frontend
  • 15 new test files, 1175 tests passing

Related issues: #132, #137, #138, #140, #143, #144, #145, #146, #147

Test plan

  • 1175 tests passing (6 pre-existing async failures unrelated)
  • All lint checks pass (ruff check + ruff format)
  • CI green on Python 3.11 and 3.12
  • Two full rounds of PR review (6 specialized agents each round)
  • All critical and important review findings addressed

Remove branch filter from pull_request trigger in tests.yml
so lint and unit tests run on PRs targeting any branch, not
just main/develop. This ensures feature branch PRs to epic
branches still get CI coverage.
Skip test_documentation_urls_accessible; HED docs URL
returns 404 due to upstream repo change. See #139.
* feat: add backend metrics collection and request logging

- Add src/metrics/ package with SQLite storage (WAL mode),
  aggregation queries, and request timing middleware
- Add global /metrics/overview and /metrics/tokens endpoints
- Add per-community /{id}/metrics and /{id}/metrics/usage
- Log token usage, model, key_source, tools for ask/chat
- Streaming handlers log metrics at end of generator
- Middleware captures timing for all requests
- All metrics endpoints require admin auth

Closes #134

* Address PR review: error handling and type safety fixes

- Wrap middleware dispatch in try/except so metrics never crash requests
- Wrap init_metrics_db() in try/except for graceful degradation
- Always return AssistantWithMetrics (remove return_metrics flag)
- Narrow log_request except to sqlite3.Error
- Add try/except to _log_streaming_metrics
- Log metrics on streaming error paths (400/500)
- Add sqlite3 error handling to all metrics endpoints (503)
- Add logger + warning for malformed JSON in queries.py
- Fix middleware ordering comment
- Remove redundant inline import uuid
- Move get_metrics_connection to top-level import

* CI: run lint and tests on all PRs

Remove branch filter from pull_request trigger in tests.yml
so lint and unit tests run on PRs targeting any branch, not
just main/develop. This ensures feature branch PRs to epic
branches still get CI coverage.

* Disable broken URL test until upstream fix

Skip test_documentation_urls_accessible; HED docs URL
returns 404 due to upstream repo change. See #139.
)

* Add dashboard frontend with public metrics endpoints

- Add public query functions (no tokens/costs/models exposed)
- Create /metrics/public/* endpoints (no auth required)
- Build /dashboard page with Chart.js, community tabs, admin unlock
- Register new routers in main.py
- Add tests for public endpoints and dashboard page (28 new tests)

* Restructure dashboard as standalone static site

- Move per-community public metrics to community router
  (/{community_id}/metrics/public, /{community_id}/metrics/public/usage)
- Keep only global /metrics/public/overview in metrics_public router
- Remove FastAPI dashboard router
- Add dashboard/ as standalone static site for Cloudflare Pages:
  / = aggregate overview, /{community} = community detail
- Client-side routing with configurable API base URL
- Add _redirects for Cloudflare Pages SPA routing
- Update tests for new route structure

* Add CI workflow for dashboard Cloudflare Pages deploy

Deploys dashboard/ to osa-dash.pages.dev via wrangler.
Same pattern as existing deploy-pages.yml for the demo widget:
- main -> osa-dash.pages.dev (production)
- develop -> develop.osa-dash.pages.dev
- PRs -> {branch}.osa-dash.pages.dev with preview URL comment

* Add dynamic community tab bar to dashboard

Tabs are populated from /metrics/public/overview API so new
communities appear automatically. Navigation uses simple links
(All -> /, community -> /{id}) with active tab highlighting.

* Address PR review findings: XSS, error handling, tests

- Fix XSS: add escapeHtml() helper, sanitize all innerHTML interpolations,
  use encodeURIComponent() for URL path segments
- Move get_metrics_connection() inside try blocks in all metrics endpoints
- Add console.error/warn to all JavaScript catch blocks (no silent failures)
- Improve admin section UX: defer visibility until data loads successfully
- Extract shared helpers in queries.py (_count_tools, _validate_period)
- Add test classes: TestPublicAdminBoundary, TestEmptyDatabase,
  TestCommunityMetricsValues with dynamic community cross-checks
- Fix admin boundary tests to use auth_env fixture with test API key

* Address round-2 review: XSS, error logging, auth tests, simplify

- Fix single-quote XSS in onclick handlers: use encodeURIComponent
  for communityId in changePeriod calls, decode in changePeriod
- Validate health status against known values instead of escapeHtml
- Add console.warn to sync/health .catch() blocks
- Add console.error to loadCommunityView catch block
- Add auth-enabled tests proving public endpoints stay accessible
  when REQUIRE_API_AUTH=true (core security contract)
- Add metrics_connection() context manager in db.py; simplify all
  endpoint handlers from nested try/try/finally to with-statement
- Use tuple unpacking in _count_tools for clarity
* Serve dashboard from /osa/ base path for status.osc.earth

Move dashboard/index.html to dashboard/osa/index.html and add
BASE_PATH constant to strip /osa prefix in client-side router.
Update all internal links (tabs, community cards) to use the
/osa/ prefix. Update _redirects for SPA routing under /osa/.
Update dashboard tests for new file location.

* Handle /osa without trailing slash in _redirects

Add explicit /osa rule alongside /osa/* to ensure the path
without trailing slash also serves the SPA index.
* Add per-community auth, quality metrics, and budget alerting

Phase 4 of the community dashboard: per-community scoped
authentication (AuthScope + community admin keys), LangFuse
observability wiring, quality metrics (error rates, latency
percentiles, tool call tracking), cost estimation with model
pricing table, budget checking with configurable limits, and
automated GitHub issue alerting when spend thresholds are
exceeded. Includes scheduled budget check job (every 15 min)
and sample budget configs for HED and EEGLAB communities.

Tested: 1152 passed, 68% coverage.

* Address PR review: simplify code and fix error handling

- Fix _issue_exists to return True on error (prevent duplicate spam)
- Simplify redundant exception tuples in alerts.py
- Extract _require_community_access helper (4 duplicated blocks)
- Extract parse_admin_keys method on Settings (3 duplicated parsers)
- Strengthen AuthScope with Literal type, frozen, validation
- Make BudgetStatus frozen (immutable snapshot)
- Add BudgetConfig cross-field validation (daily <= monthly)
- Share single DB connection in budget check loop
- Add budget check failure escalation (matching sync pattern)
- Split LangFuse except into ImportError vs Exception
- Improve _migrate_columns to re-raise unexpected errors
- Log warnings for malformed community_admin_keys entries
- Bump unknown model fallback logging from debug to warning

* Fix streaming metrics fields and add quality endpoint tests

Add tool_call_count and langfuse_trace_id to streaming metrics
logging so streaming requests capture the same quality data as
non-streaming. Add 14 endpoint tests covering community quality,
quality summary, and global quality API routes.

* Address round 2 review: fix alerts, docstrings, tests, simplify

- Fix _issue_exists to return None on failure instead of True,
  with warning-level logging when dedup check fails
- Fix stale pricing date and deduplicate fallback branches
- Extract shared _fetch_latency_percentiles helper in queries
- Make BudgetConfig frozen (immutable after parsing)
- Fix inaccurate docstrings: maintainers usage, _percentile
  method name, _migrate_columns idempotency, regex claims
- Fix get_quality_summary docstring key name mismatch
- Track per-community scheduler failures for critical alerting
- Upgrade malformed config entry log from WARNING to ERROR
- Add tests: AuthScope validation, BudgetConfig daily>monthly,
  community-scoped keys on global endpoints, dedup failure
@github-actions

github-actions Bot commented Feb 4, 2026

Copy link
Copy Markdown
Contributor

Dashboard Preview

Name Link
Preview URL https://feature-issue-132-epic-commu.osa-dash.pages.dev
Branch feature/issue-132-epic-community-dashboard
Commit f8e1cee

This preview will be updated automatically when you push new commits.

Code fixes:
- Handle HTTPException in streaming generators as SSE error events
  (cannot re-raise after response headers sent)
- Extract _match_wildcard_origin helper, AgentResult dataclass,
  _extract_agent_result/_set_metrics_on_request to deduplicate
  ask/chat endpoints
- Use metrics_connection() context manager in metrics router
- Add failure counting with escalation logging to log_request()
- Refactor check_budget() to accept BudgetConfig instead of 3 floats
- Add __post_init__ validation to BudgetStatus for non-negative spend
- Simplify list_sessions to reuse _evict_expired_sessions
- Move inline imports (re, os) to top-level
- Simplify _get_communities_with_sync to list comprehension

Docstring fixes:
- Clarify verify_api_key handles only global admin keys
- Update scheduler module docstring to mention budget checks

New tests:
- check_budget with today's timestamps (exercises date('now') SQL)
- Budget alert trigger with current-day spend
- BudgetStatus rejects negative spend values
- _percentile edge cases (single element, two elements, empty list)
- _count_tools with malformed JSON
- _extract_community_id documents intentional None for metrics paths
@neuromechanist neuromechanist merged commit af48f4b into develop Feb 4, 2026
7 checks passed
@neuromechanist neuromechanist deleted the feature/issue-132-epic-community-dashboard branch February 4, 2026 11:48
@neuromechanist neuromechanist linked an issue Feb 5, 2026 that may be closed by this pull request
4 tasks
@neuromechanist neuromechanist mentioned this pull request Feb 7, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Community dashboard with metrics, usage stats, and sync status

1 participant