release/v1.12.1 by MBombeck · Pull Request #258 · MBombeck/HealthLog

MBombeck · 2026-06-05T01:58:53Z

v1.12.1 — security, data-integrity, and insight-quality hardening (a fast follow-up to v1.12.0).

Security

Server-side consent gate before any health data egresses to a server-managed AI provider (insights, coach, narrative, comprehensive, and medication free-text extraction); BYOK/local unaffected; fails closed.
Auth: logout revokes the bearer token + refresh sibling; refresh-token issuance never alongside a Set-Cookie session; device-mismatch reuse escalates to a user-wide revoke; tighter auth:refresh limit.
Fitbit sync rate-limited per user; OAuth redirect_uri pinned same-origin.

Data integrity

Partial unique index for live medication-intake rows (re-take after delete no longer 500s; tombstones don't block).
MoodEntry.externalId + NULL-distinct unique, honoured on the native single + bulk routes for idempotent re-import.
Cross-source double-count fixed in the tag×metric crosstab (canonical-source pick).
Composite perf indexes (AuditLog, MoodEntry).
Fitbit sync no longer resurrects deleted readings; batched backfill; bounded hourly-poll concurrency.

Insight quality

Per-metric assessments surface FDR-significant correlations, vary across repeated steady periods, hedge on thin data, and skip a forced "step" when nothing is actionable — grounding/safety contracts unchanged.

Performance

Integrations status in one consolidated query; mood-insights stale-while-revalidate (no evict-on-write); below-fold Recharts deferred.

Polish

Settings standardisation: shared card headers, unified integration verbs + test buttons, 44px tap targets, contrast fix.

Docs

README presentation (TOC, comparison matrix, security posture, CI badge, FUNDING.yml); docs-site pages for Google Health/Fitbit, FHIR, and derived-metrics methodology (on a branch).

Migrations

0121 (intake live-row partial unique), 0122 (MoodEntry.externalId), 0123 (perf indexes).

Gate

typecheck · lint (one documented allowed warning) · knip · openapi in sync · 7338 unit · build · 352 integration. Pre-ship security re-verify + e2e-risk analysis reconciled (one consent-gate gap closed; no e2e assertion breaks predicted).

The .planning directory holds internal per-release working notes and the iOS coordination channel, none of it user-facing. Ignore the whole tree so the public repo carries only the curated surfaces (README, CHANGELOG, docs).

…n device-mismatch revoke Harden the native auth-token surface against four confirmed findings. - /api/auth/logout was a no-op for a bearer credential: destroySession only clears the cookie, leaving an Authorization: Bearer hlk_ token live until expiry. When the request carries a bearer, also flip the matching ApiToken.revoked and revoke its paired refresh-token sibling. Cookie path unchanged. - A browser spoofing X-Client-Type: native was handed a 60-day refresh token in the JSON body. Gate refresh-token issuance on a new isCookielessNativeCaller check (no Mozilla UA, no inbound session cookie), so the secret reaches genuine cookie-less native callers only and never alongside a browser session. The iOS native path is unaffected; a spoofing browser falls back to the short-lived access token with no refresh. - Refresh-reuse revoke trusted the stored row's deviceId, so a stolen token replayed under a fabricated X-Device-Id confined the revoke to the attacker's own id and left the victim's family live. On a present-but- mismatched deviceId, escalate to the user-wide family revoke (a real device never changes its id mid-family). The matching-id and null-id cases keep their existing scope. - Tighten the auth:refresh rate limit from 60/15min to 10/15min, aligning the high-value rotation endpoint with the passkey-verify tier. Add coverage for each: logout revokes bearer + sibling, a cookie/Mozilla caller never receives a refresh token, a device-mismatch replay triggers the wide revoke, and the cookie-less native path still gets its refresh token. (cherry picked from commit 14dcdefc5f40855305d8aaf573d073faebc61476)

Add a table of contents to the README, extend the comparison matrix with WHOOP and Google Health/Fitbit columns plus rows for multi-source dedup, provider aggregation, and encryption at rest, and add a CI status badge. Reconcile two stale claims against the current code: the Prisma model count (now 60) and the AI architecture note (add Codex). Strengthen the Security and Privacy section with the bring-your-own-key / local-endpoint AI posture. Add .github/FUNDING.yml so the GitHub Sponsor button renders, and fix the contributor prerequisite to Node.js 22 to match the Dockerfile base image. (cherry picked from commit 7cfe326709814dd0ce16d1554d5afd5933de448c)

…ll concurrency Three hardening + performance fixes to the Fitbit/Google Health sync. Stop soft-delete resurrection. The measurement write probed the existing row by the `(userId, type, source, externalId)` unique key without excluding soft-deleted rows, so a reading the user deleted came back on the next hourly sync (the upsert matched the tombstone and took the update branch — silently undoing the delete). The write now probes only live rows (`deletedAt: null`) and treats a tombstone as absent, minting a fresh insert instead. The fresh insert relies on the partial unique index over `deleted_at IS NULL` to keep live-row uniqueness; the tombstone sits outside it, and `skipDuplicates` guards a live-row race. Batch the backfill writes + collapse the rollup recompute. The write loop was one upsert per reading (N+1) followed by a per-(type,day) DAY-rollup recompute, so a multi-year backfill paid thousands of serial round-trips on a concurrency-1 worker. The existence probe is now a single `findMany` over the batch; fresh rows go through chunked `createMany`; only already-live rows take a per-row update for their differing values. On a `fullSync` the per-write rollup hook is deferred and the touched type-days collapse into one `recomputeUserRollups(from, to)` pass at the end of the cycle. The incremental path keeps the inline per-day hook. The `stats:` overwrite contract, the single-watermark `markSynced`, and the all-403 soft-skip guard are unchanged. Bound the hourly poll concurrency. The poll synced every connection in one serial loop, so a single slow Google response stalled the whole cohort. The cohort now fans out through a bounded `p-limit` pool with per-user error isolation, extracted into `runFitbitPollCohort` so the contract is unit testable without exporting worker internals. The accumulators are read after the await to avoid a compound-assignment lost-update across overlapping pool tasks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit 8369af2471fb8da04a938b5a784433318d81a248)

…-limit and harden Fitbit sync Close the launch-blocking gap where the ConsentReceipt infrastructure was built but never enforced as a precondition: a direct API caller with a valid token could forward a health snapshot to a server-managed external LLM with no receipt on file. Consent was enforced client-side only. Add a consent guard (src/lib/ai/consent-guard.ts) and call it before the first external-provider call on the server-managed path. The gate fires only when the resolved chain could egress via the operator's global key (admin-openai); a user's own BYOK key (openai/anthropic), their own ChatGPT OAuth account, and the self-hosted local provider stay ungated — that egress is the user's own act and the settings flow is its consent. The check fails closed: a BYOK-primary chain with an admin-openai fallback still requires a receipt, since the runner may cascade to the operator key. The surface maps to the consent kind (coach -> ai_coach, insights -> ai_insights_only; master ai_full satisfies either). - Interactive routes (insights/chat, insights/generate POST) throw ConsentRequiredError; api-handler renders a 403 with meta.errorCode = "consent.ai.required", mirroring assistant.disabled.* so the client can render an inline grant-consent notice. - Off-request pipelines fail closed quietly: comprehensive generation returns a typed skipped:no-consent outcome, and the per-metric status family (runStatusCompletion, gating every status card + the period-narrative warm + the off-budget Coach memory workers) surfaces the no-key fallback instead of egressing. Also rate-limit POST /api/fitbit/sync (the one Fitbit route without a limiter): a 5/60s baseline plus a tighter 1/hour bucket on the expensive fullSync walk, keyed by user id, matching the sibling Fitbit routes. Harden the Fitbit OAuth redirect_uri: assert it is an absolute https origin (or http on localhost), lands on /api/fitbit/callback, and stays same-origin with NEXT_PUBLIC_APP_URL — defence-in-depth behind Google's registered-URI check so a Host-coerced or misconfigured app URL cannot redirect the authorization code off-origin. (cherry picked from commit 46b5fc2cea1b6e80bee0445bfdacf5790442c67d)

… dedup key, perf indexes Migration 0121 replaces the MedicationIntakeEvent unique constraint with a partial unique index `WHERE deleted_at IS NULL`. A slot the user previously deleted (tombstoned) no longer occupies the unique slot, so a re-take re-creates the row cleanly instead of P2002-ing against the tombstone and 500-ing. The migration dedups any pre-existing duplicate live rows first (keep newest, tombstone the rest) so the unique build can't fail on real data. Measurement and Workout deliberately stay full uniques: Measurement's compound-key writes use prisma.upsert, which Prisma 7 compiles to native INSERT ... ON CONFLICT and Postgres cannot arbiter against a partial unique (and consolidate-daily-mean intentionally resurrects a tombstoned canonical row, which needs the full unique); Workout has no deleted_at column. The rationale is documented in the migration header and the schema comments. Migration 0122 adds MoodEntry.externalId plus a NULL-distinct `@@unique([userId, source, externalId])`. The moodLog webhook, the pull sync, and the JSON import now carry the source's stable id into externalId and upsert on the new key when present, so a re-import is idempotent even when moodLoggedAt re-rounds or re-zones. Native/MANUAL entries keep their NULL externalId and stay on the legacy (userId, date, moodLoggedAt) path. Migration 0123 adds AuditLog(userId, action, createdAt desc) and MoodEntry(userId, moodLoggedAt) composite indexes for the status-cache and mood-insights read paths. Also corrects the stale "v1.6.0 drops this column" note on MedicationSchedule.daysOfWeek (the column is still dual-written and read by live paths, so it is retained) and documents the UTC-only timezone storage invariant in a schema header comment. Integration tests cover the intake re-take after delete (no resurrection / no 500) and mood re-import idempotency (with and without an external id) on real Postgres. (cherry picked from commit 9a6723f469bdec063a99cd08df132e07ec3c95a5)

…in mood crosstab The tag × metric crosstab summed cumulative metrics (steps, active energy, sleep duration) per day across every source with no canonical pick, so once two sources reported the same day — Fitbit + Apple steps, Fitbit + WHOOP sleep — the per-day total double-counted and inflated the with/without averages and the Welch delta the surface ranks on. Run the metric rows through the same per-day source-priority + device-type picker the analytics steps/sleep path uses before bucketing, keyed by the metric's priority ladder and the user's Berlin calendar day. Same-source per-stage rows still sum into the night total; only cross-source twins collapse. Thread the user's source-priority blob through the read so the pick honours the configured ladder. The cross-metric read was a flat `take: 5000` over the 365-day window ordered `measuredAt desc`, unioned across per-stage sleep + per-sample pulse/HRV; a multi-source, high-frequency user blew past it and silently dropped the oldest months, biasing the crosstab and the long-window correlations toward recent data. Lift the cap to cover the realistic worst case for the window and annotate a wide event when the cap is hit so a truncated read is observable rather than silently wrong. The read stays raw rather than moving to the DAY rollup tier because that tier buckets on UTC midnight while these aggregates key on the user's Berlin day, which would skew the day pairing at the boundary. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit aa72ffba6e3bcac9f97be4adc26d689d4f352e98)

…trolled variety to assessments The per-metric assessment cards are tightly grounded but their form had converged: every card opened the same way, forced a "one doable step" even when nothing was actionable, and a metric that stayed steady for weeks produced near-identical paragraphs. Squeeze more value from data already computed server-side, without touching the grounding floor or any safety contract. - Surface FDR-surviving cross-metric correlations on the per-metric card. The discovery engine already screens behaviour×outcome pairs with a real Pearson + exact p-value + Benjamini-Hochberg FDR control; that intelligence only reached the period narrative. A new read-only consumer runs the same full-matrix discovery and filters to the pairs that involve the current metric, then feeds the engine's own conservative, descriptive, never-causal interpretation strings into the prompt as grounded relations. Best-effort: a correlation hiccup can never block the generation it only decorates. - Streak-aware repetition. Derive a steady-run length from the graded series (consecutive recent weeks within a band of the user's own baseline) and tell the model when it has already reported "no material change" N times running, so it acknowledges the continuity in one clause and pivots to a different facet instead of restating the same level. - Stop forcing a step; add controlled variety. The closing step is now conditional — when nothing is genuinely actionable the model affirms and names one thing to watch rather than manufacturing the filler that lets platitudes back in. A deterministic per-render variety token (seeded from user + metric + day, never Math.random / Date.now) rotates the opening angle so consecutive cards and days don't read identically, and the phrasing temperature lifts 0.3 → 0.45 while the facts stay pinned by the snapshot and the forbidden-phrase guards. - Surface the computed data strength (n + recency) into the prose prompt so it hedges on thin data instead of guessing what "few" means, matching the UI confidence badge. - Add locale-matched few-shot examples (a grounded assessment + a labelled banned-filler counter-example) to the assessment system prompt, which most helps weaker local-provider models follow the contract. The own-baseline grounding, computed-not-hallucinated stats, schema- enforced output, filler-phrase ban, and correlation-as-association discipline are all preserved and re-pinned by new guard tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit c9d04cab1523e0e57a3f50c87c9f88684b01b26f)

…fer below-fold charts Settings → Integrations fired five status round-trips on every mount — the consolidated /api/integrations/status alongside the four per-provider /api/<provider>/status queries. Extend the consolidated endpoint to carry every field the four cards render (Withings activity scope, WHOOP/Fitbit backfill state, moodLog webhook secret + entry count) and drive all four cards off the single envelope. The legacy per-provider routes stay for the iOS/test callers; the web cards no longer hit them. Mood-insights cache was a plain 60 s TTL that was hard-evicted on every mood write, so an active logger re-paid the multi-second cold compute on every entry. Add stale-while-revalidate to ServerCache (wrapSwr / cachedSwr + a markStaleByPrefix that collapses the fresh TTL without dropping the value), give the moodInsights bucket a 10-minute stale window, route the read through cachedSwr, and switch the mood-write invalidation from a hard evict to a mark-stale so the prior aggregate serves immediately while a single background recompute warms a fresh one. Defer the eager Recharts on the mood-insights surface. The trajectory forecast card pulled Recharts into the initial chunk of every trajectory-eligible metric page though the main chart is already deferred; split its fan band into trajectory-fan.tsx behind next/dynamic. The three below-fold mood mini-charts (distribution / weekday / time-of-day) were static imports under an already-dynamic hero line chart; defer them with next/dynamic. Each loader paints a skeleton sized to the chart's own band so the chunk arrives without a layout shift; the charts stay Recharts and visually identical. (cherry picked from commit 2b627893ab6b4d7bc457fc18e6e5585b048190e1)

…ntry routes Migration 0122 added MoodEntry.externalId plus the NULL-distinct @@unique([userId, source, externalId]) and wired the moodLog webhook and sync importer onto it, but the iOS-facing native routes never used it: POST /api/mood-entries dropped externalId entirely (the schema did not declare it) and POST /api/mood-entries/bulk accepted it in its Zod schema yet parsed-and-ignored it, still upserting on the legacy (userId, date, moodLoggedAt) key. The dedup index existed but nothing on the native write path reached it, so the idempotent re-import the client needs was not functional. Declare externalId on createMoodEntrySchema (bounded to match the bulk schema), persist it, and — when present — upsert the single-entry write on (userId, source, externalId) so a re-post with the same id updates the existing row in place instead of 409-ing or minting a duplicate. When absent, keep the legacy first-write create and its 409-on-conflict behaviour untouched. On the bulk path, branch the per-entry probe and upsert onto the same compound key when an entry carries an externalId, refreshing date and moodLoggedAt in the update so a re-zoned re-import lands the corrected wall-clock on the same row; absent ids keep the legacy wall-clock key. Both paths resolve source once so the dedup key and the row write agree. Echo externalId back on the single response (via the row) and on each bulk per-entry result so the client can map server ids onto its local rows. Cover the present-vs-absent upsert key, cross-user isolation, and the echo with unit tests on both routes plus a real-Postgres integration test proving a re-posted externalId resolves to one row updated in place, a null externalId keeps wall-clock behaviour, and the same id under two users stays isolated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit 6a65de22bc436daeeb73dd2a1cb612336c74a928)

…ns, and tap targets Holistic consistency pass over the settings and integration surfaces so the same concept reads and behaves the same way everywhere. - Card headers: migrate the four integration cards (Withings, WHOOP, Fitbit/Google Health, Mood Log) plus the profile, passkeys, avatar, about, api-endpoints, api-tokens, and sharing cards onto the shared SettingsCardHeader primitive. Extend the primitive with an inline title accessory slot (Fitbit tag + experimental badge) and a multi-line description slot (overlap / experimental / deprecation sub-notes), keeping the rendered result identical. - Integration verbs: normalise Mood Log onto the canonical "Sync now" / "Sync all data" / "Synchronize" set used by the other integrations, across all six locales. Swap the full-sync icon from Download to RefreshCw and align its action-row icon metrics with the sibling cards. - Test buttons: migrate Telegram and ntfy off their hand-rolled "Test message" buttons onto the shared TestConnectionButton, so every channel and integration test control shares one icon, label, latency readout, and error taxonomy. Retire the now-unused testMessage / testSent strings. - ntfy auth-token placeholder no longer reuses the "Saved" status string; it now reads "Saved — enter new to replace" like the other secret fields. - Save-success copy: collapse the moodLog and Telegram bespoke strings onto the shared settings.saved string. - Tap targets: floor every size="sm" action-row button (sync / full-sync / test / disconnect / save / connect-secret copy) at min-h-11, fixed at the TestConnectionButton source and across the cards, so they clear the 44px touch minimum. - Contrast: the onboarding source-card badge moves from muted-on-muted to text-foreground; web-push card body rhythm aligns on space-y-4; Mood Log's deprecation note drops its bespoke 11px-italic treatment for the shared sub-note style. (cherry picked from commit 11931d14b7875b7e2184a3b373d7a34b842f8eae)

The medication NL-extraction route egresses user-typed free text (PHI) to the operator's server-managed provider key; require an active consent receipt for the coach surface before the chain runs, matching the other server-managed egress sites.

…ty hardening

…elope The settings cards now read connection state from the consolidated /api/integrations/status envelope (the per-provider status routes were dropped from the section), so the mobile-layout fixture must supply the per-integration connected/configured fields the pills key on.

MBombeck and others added 14 commits June 5, 2026 02:21

chore: stop tracking internal planning notes

a8cc8b8

The .planning directory holds internal per-release working notes and the iOS coordination channel, none of it user-facing. Ignore the whole tree so the public repo carries only the curated surfaces (README, CHANGELOG, docs).

chore(release): v1.12.1 — security, data-integrity, and insight-quali…

f531494

…ty hardening

MBombeck merged commit 5793313 into main Jun 5, 2026
13 checks passed

MBombeck deleted the release/v1.12.1 branch June 5, 2026 02:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release/v1.12.1#258

release/v1.12.1#258
MBombeck merged 14 commits into
mainfrom
release/v1.12.1

MBombeck commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MBombeck commented Jun 5, 2026

Security

Data integrity

Insight quality

Performance

Polish

Docs

Migrations

Gate

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant