Skip to content

release/v1.12.1#258

Merged
MBombeck merged 14 commits into
mainfrom
release/v1.12.1
Jun 5, 2026
Merged

release/v1.12.1#258
MBombeck merged 14 commits into
mainfrom
release/v1.12.1

Conversation

@MBombeck
Copy link
Copy Markdown
Owner

@MBombeck MBombeck commented Jun 5, 2026

v1.12.1 — security, data-integrity, and insight-quality hardening (a fast follow-up to v1.12.0).

Security

  • Server-side consent gate before any health data egresses to a server-managed AI provider (insights, coach, narrative, comprehensive, and medication free-text extraction); BYOK/local unaffected; fails closed.
  • Auth: logout revokes the bearer token + refresh sibling; refresh-token issuance never alongside a Set-Cookie session; device-mismatch reuse escalates to a user-wide revoke; tighter auth:refresh limit.
  • Fitbit sync rate-limited per user; OAuth redirect_uri pinned same-origin.

Data integrity

  • Partial unique index for live medication-intake rows (re-take after delete no longer 500s; tombstones don't block).
  • MoodEntry.externalId + NULL-distinct unique, honoured on the native single + bulk routes for idempotent re-import.
  • Cross-source double-count fixed in the tag×metric crosstab (canonical-source pick).
  • Composite perf indexes (AuditLog, MoodEntry).
  • Fitbit sync no longer resurrects deleted readings; batched backfill; bounded hourly-poll concurrency.

Insight quality

  • Per-metric assessments surface FDR-significant correlations, vary across repeated steady periods, hedge on thin data, and skip a forced "step" when nothing is actionable — grounding/safety contracts unchanged.

Performance

  • Integrations status in one consolidated query; mood-insights stale-while-revalidate (no evict-on-write); below-fold Recharts deferred.

Polish

  • Settings standardisation: shared card headers, unified integration verbs + test buttons, 44px tap targets, contrast fix.

Docs

  • README presentation (TOC, comparison matrix, security posture, CI badge, FUNDING.yml); docs-site pages for Google Health/Fitbit, FHIR, and derived-metrics methodology (on a branch).

Migrations

0121 (intake live-row partial unique), 0122 (MoodEntry.externalId), 0123 (perf indexes).

Gate

typecheck · lint (one documented allowed warning) · knip · openapi in sync · 7338 unit · build · 352 integration. Pre-ship security re-verify + e2e-risk analysis reconciled (one consent-gate gap closed; no e2e assertion breaks predicted).

MBombeck and others added 14 commits June 5, 2026 02:21
The .planning directory holds internal per-release working notes and the
iOS coordination channel, none of it user-facing. Ignore the whole tree so
the public repo carries only the curated surfaces (README, CHANGELOG, docs).
…n device-mismatch revoke

Harden the native auth-token surface against four confirmed findings.

- /api/auth/logout was a no-op for a bearer credential: destroySession only
  clears the cookie, leaving an Authorization: Bearer hlk_ token live until
  expiry. When the request carries a bearer, also flip the matching
  ApiToken.revoked and revoke its paired refresh-token sibling. Cookie path
  unchanged.
- A browser spoofing X-Client-Type: native was handed a 60-day refresh token
  in the JSON body. Gate refresh-token issuance on a new
  isCookielessNativeCaller check (no Mozilla UA, no inbound session cookie),
  so the secret reaches genuine cookie-less native callers only and never
  alongside a browser session. The iOS native path is unaffected; a spoofing
  browser falls back to the short-lived access token with no refresh.
- Refresh-reuse revoke trusted the stored row's deviceId, so a stolen token
  replayed under a fabricated X-Device-Id confined the revoke to the
  attacker's own id and left the victim's family live. On a present-but-
  mismatched deviceId, escalate to the user-wide family revoke (a real device
  never changes its id mid-family). The matching-id and null-id cases keep
  their existing scope.
- Tighten the auth:refresh rate limit from 60/15min to 10/15min, aligning the
  high-value rotation endpoint with the passkey-verify tier.

Add coverage for each: logout revokes bearer + sibling, a cookie/Mozilla
caller never receives a refresh token, a device-mismatch replay triggers the
wide revoke, and the cookie-less native path still gets its refresh token.

(cherry picked from commit 14dcdefc5f40855305d8aaf573d073faebc61476)
Add a table of contents to the README, extend the comparison matrix with
WHOOP and Google Health/Fitbit columns plus rows for multi-source dedup,
provider aggregation, and encryption at rest, and add a CI status badge.
Reconcile two stale claims against the current code: the Prisma model
count (now 60) and the AI architecture note (add Codex). Strengthen the
Security and Privacy section with the bring-your-own-key / local-endpoint
AI posture.

Add .github/FUNDING.yml so the GitHub Sponsor button renders, and fix the
contributor prerequisite to Node.js 22 to match the Dockerfile base image.

(cherry picked from commit 7cfe326709814dd0ce16d1554d5afd5933de448c)
…ll concurrency

Three hardening + performance fixes to the Fitbit/Google Health sync.

Stop soft-delete resurrection. The measurement write probed the existing
row by the `(userId, type, source, externalId)` unique key without excluding
soft-deleted rows, so a reading the user deleted came back on the next hourly
sync (the upsert matched the tombstone and took the update branch — silently
undoing the delete). The write now probes only live rows (`deletedAt: null`)
and treats a tombstone as absent, minting a fresh insert instead. The fresh
insert relies on the partial unique index over `deleted_at IS NULL` to keep
live-row uniqueness; the tombstone sits outside it, and `skipDuplicates` guards
a live-row race.

Batch the backfill writes + collapse the rollup recompute. The write loop
was one upsert per reading (N+1) followed by a per-(type,day) DAY-rollup
recompute, so a multi-year backfill paid thousands of serial round-trips on a
concurrency-1 worker. The existence probe is now a single `findMany` over the
batch; fresh rows go through chunked `createMany`; only already-live rows take a
per-row update for their differing values. On a `fullSync` the per-write rollup
hook is deferred and the touched type-days collapse into one
`recomputeUserRollups(from, to)` pass at the end of the cycle. The incremental
path keeps the inline per-day hook. The `stats:` overwrite contract, the
single-watermark `markSynced`, and the all-403 soft-skip guard are unchanged.

Bound the hourly poll concurrency. The poll synced every connection in one
serial loop, so a single slow Google response stalled the whole cohort. The
cohort now fans out through a bounded `p-limit` pool with per-user error
isolation, extracted into `runFitbitPollCohort` so the contract is unit testable
without exporting worker internals. The accumulators are read after the await to
avoid a compound-assignment lost-update across overlapping pool tasks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit 8369af2471fb8da04a938b5a784433318d81a248)
…-limit and harden Fitbit sync

Close the launch-blocking gap where the ConsentReceipt infrastructure was
built but never enforced as a precondition: a direct API caller with a valid
token could forward a health snapshot to a server-managed external LLM with no
receipt on file. Consent was enforced client-side only.

Add a consent guard (src/lib/ai/consent-guard.ts) and call it before the first
external-provider call on the server-managed path. The gate fires only when the
resolved chain could egress via the operator's global key (admin-openai); a
user's own BYOK key (openai/anthropic), their own ChatGPT OAuth account, and
the self-hosted local provider stay ungated — that egress is the user's own act
and the settings flow is its consent. The check fails closed: a BYOK-primary
chain with an admin-openai fallback still requires a receipt, since the runner
may cascade to the operator key. The surface maps to the consent kind (coach ->
ai_coach, insights -> ai_insights_only; master ai_full satisfies either).

- Interactive routes (insights/chat, insights/generate POST) throw
  ConsentRequiredError; api-handler renders a 403 with
  meta.errorCode = "consent.ai.required", mirroring assistant.disabled.* so
  the client can render an inline grant-consent notice.
- Off-request pipelines fail closed quietly: comprehensive generation returns a
  typed skipped:no-consent outcome, and the per-metric status family
  (runStatusCompletion, gating every status card + the period-narrative warm +
  the off-budget Coach memory workers) surfaces the no-key fallback instead of
  egressing.

Also rate-limit POST /api/fitbit/sync (the one Fitbit route without a limiter):
a 5/60s baseline plus a tighter 1/hour bucket on the expensive fullSync walk,
keyed by user id, matching the sibling Fitbit routes.

Harden the Fitbit OAuth redirect_uri: assert it is an absolute https origin (or
http on localhost), lands on /api/fitbit/callback, and stays same-origin with
NEXT_PUBLIC_APP_URL — defence-in-depth behind Google's registered-URI check so
a Host-coerced or misconfigured app URL cannot redirect the authorization code
off-origin.

(cherry picked from commit 46b5fc2cea1b6e80bee0445bfdacf5790442c67d)
… dedup key, perf indexes

Migration 0121 replaces the MedicationIntakeEvent unique constraint with a
partial unique index `WHERE deleted_at IS NULL`. A slot the user previously
deleted (tombstoned) no longer occupies the unique slot, so a re-take
re-creates the row cleanly instead of P2002-ing against the tombstone and
500-ing. The migration dedups any pre-existing duplicate live rows first
(keep newest, tombstone the rest) so the unique build can't fail on real
data.

Measurement and Workout deliberately stay full uniques: Measurement's
compound-key writes use prisma.upsert, which Prisma 7 compiles to native
INSERT ... ON CONFLICT and Postgres cannot arbiter against a partial unique
(and consolidate-daily-mean intentionally resurrects a tombstoned canonical
row, which needs the full unique); Workout has no deleted_at column. The
rationale is documented in the migration header and the schema comments.

Migration 0122 adds MoodEntry.externalId plus a NULL-distinct
`@@unique([userId, source, externalId])`. The moodLog webhook, the pull sync,
and the JSON import now carry the source's stable id into externalId and
upsert on the new key when present, so a re-import is idempotent even when
moodLoggedAt re-rounds or re-zones. Native/MANUAL entries keep their NULL
externalId and stay on the legacy (userId, date, moodLoggedAt) path.

Migration 0123 adds AuditLog(userId, action, createdAt desc) and
MoodEntry(userId, moodLoggedAt) composite indexes for the status-cache and
mood-insights read paths.

Also corrects the stale "v1.6.0 drops this column" note on
MedicationSchedule.daysOfWeek (the column is still dual-written and read by
live paths, so it is retained) and documents the UTC-only timezone storage
invariant in a schema header comment.

Integration tests cover the intake re-take after delete (no resurrection /
no 500) and mood re-import idempotency (with and without an external id) on
real Postgres.

(cherry picked from commit 9a6723f469bdec063a99cd08df132e07ec3c95a5)
…in mood crosstab

The tag × metric crosstab summed cumulative metrics (steps, active
energy, sleep duration) per day across every source with no canonical
pick, so once two sources reported the same day — Fitbit + Apple steps,
Fitbit + WHOOP sleep — the per-day total double-counted and inflated the
with/without averages and the Welch delta the surface ranks on. Run the
metric rows through the same per-day source-priority + device-type picker
the analytics steps/sleep path uses before bucketing, keyed by the
metric's priority ladder and the user's Berlin calendar day. Same-source
per-stage rows still sum into the night total; only cross-source twins
collapse. Thread the user's source-priority blob through the read so the
pick honours the configured ladder.

The cross-metric read was a flat `take: 5000` over the 365-day window
ordered `measuredAt desc`, unioned across per-stage sleep + per-sample
pulse/HRV; a multi-source, high-frequency user blew past it and silently
dropped the oldest months, biasing the crosstab and the long-window
correlations toward recent data. Lift the cap to cover the realistic
worst case for the window and annotate a wide event when the cap is hit
so a truncated read is observable rather than silently wrong. The read
stays raw rather than moving to the DAY rollup tier because that tier
buckets on UTC midnight while these aggregates key on the user's Berlin
day, which would skew the day pairing at the boundary.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit aa72ffba6e3bcac9f97be4adc26d689d4f352e98)
…trolled variety to assessments

The per-metric assessment cards are tightly grounded but their form had
converged: every card opened the same way, forced a "one doable step"
even when nothing was actionable, and a metric that stayed steady for
weeks produced near-identical paragraphs. Squeeze more value from data
already computed server-side, without touching the grounding floor or any
safety contract.

- Surface FDR-surviving cross-metric correlations on the per-metric card.
  The discovery engine already screens behaviour×outcome pairs with a real
  Pearson + exact p-value + Benjamini-Hochberg FDR control; that
  intelligence only reached the period narrative. A new read-only consumer
  runs the same full-matrix discovery and filters to the pairs that
  involve the current metric, then feeds the engine's own conservative,
  descriptive, never-causal interpretation strings into the prompt as
  grounded relations. Best-effort: a correlation hiccup can never block the
  generation it only decorates.

- Streak-aware repetition. Derive a steady-run length from the graded
  series (consecutive recent weeks within a band of the user's own
  baseline) and tell the model when it has already reported "no material
  change" N times running, so it acknowledges the continuity in one clause
  and pivots to a different facet instead of restating the same level.

- Stop forcing a step; add controlled variety. The closing step is now
  conditional — when nothing is genuinely actionable the model affirms and
  names one thing to watch rather than manufacturing the filler that lets
  platitudes back in. A deterministic per-render variety token (seeded from
  user + metric + day, never Math.random / Date.now) rotates the opening
  angle so consecutive cards and days don't read identically, and the
  phrasing temperature lifts 0.3 → 0.45 while the facts stay pinned by the
  snapshot and the forbidden-phrase guards.

- Surface the computed data strength (n + recency) into the prose prompt so
  it hedges on thin data instead of guessing what "few" means, matching the
  UI confidence badge.

- Add locale-matched few-shot examples (a grounded assessment + a labelled
  banned-filler counter-example) to the assessment system prompt, which
  most helps weaker local-provider models follow the contract.

The own-baseline grounding, computed-not-hallucinated stats, schema-
enforced output, filler-phrase ban, and correlation-as-association
discipline are all preserved and re-pinned by new guard tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit c9d04cab1523e0e57a3f50c87c9f88684b01b26f)
…fer below-fold charts

Settings → Integrations fired five status round-trips on every mount —
the consolidated /api/integrations/status alongside the four per-provider
/api/<provider>/status queries. Extend the consolidated endpoint to carry
every field the four cards render (Withings activity scope, WHOOP/Fitbit
backfill state, moodLog webhook secret + entry count) and drive all four
cards off the single envelope. The legacy per-provider routes stay for the
iOS/test callers; the web cards no longer hit them.

Mood-insights cache was a plain 60 s TTL that was hard-evicted on every
mood write, so an active logger re-paid the multi-second cold compute on
every entry. Add stale-while-revalidate to ServerCache (wrapSwr / cachedSwr
+ a markStaleByPrefix that collapses the fresh TTL without dropping the
value), give the moodInsights bucket a 10-minute stale window, route the
read through cachedSwr, and switch the mood-write invalidation from a hard
evict to a mark-stale so the prior aggregate serves immediately while a
single background recompute warms a fresh one.

Defer the eager Recharts on the mood-insights surface. The trajectory
forecast card pulled Recharts into the initial chunk of every
trajectory-eligible metric page though the main chart is already deferred;
split its fan band into trajectory-fan.tsx behind next/dynamic. The three
below-fold mood mini-charts (distribution / weekday / time-of-day) were
static imports under an already-dynamic hero line chart; defer them with
next/dynamic. Each loader paints a skeleton sized to the chart's own band
so the chunk arrives without a layout shift; the charts stay Recharts and
visually identical.

(cherry picked from commit 2b627893ab6b4d7bc457fc18e6e5585b048190e1)
…ntry routes

Migration 0122 added MoodEntry.externalId plus the NULL-distinct
@@unique([userId, source, externalId]) and wired the moodLog webhook
and sync importer onto it, but the iOS-facing native routes never used
it: POST /api/mood-entries dropped externalId entirely (the schema did
not declare it) and POST /api/mood-entries/bulk accepted it in its Zod
schema yet parsed-and-ignored it, still upserting on the legacy
(userId, date, moodLoggedAt) key. The dedup index existed but nothing
on the native write path reached it, so the idempotent re-import the
client needs was not functional.

Declare externalId on createMoodEntrySchema (bounded to match the bulk
schema), persist it, and — when present — upsert the single-entry write
on (userId, source, externalId) so a re-post with the same id updates
the existing row in place instead of 409-ing or minting a duplicate.
When absent, keep the legacy first-write create and its 409-on-conflict
behaviour untouched. On the bulk path, branch the per-entry probe and
upsert onto the same compound key when an entry carries an externalId,
refreshing date and moodLoggedAt in the update so a re-zoned re-import
lands the corrected wall-clock on the same row; absent ids keep the
legacy wall-clock key. Both paths resolve source once so the dedup key
and the row write agree. Echo externalId back on the single response
(via the row) and on each bulk per-entry result so the client can map
server ids onto its local rows.

Cover the present-vs-absent upsert key, cross-user isolation, and the
echo with unit tests on both routes plus a real-Postgres integration
test proving a re-posted externalId resolves to one row updated in
place, a null externalId keeps wall-clock behaviour, and the same id
under two users stays isolated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit 6a65de22bc436daeeb73dd2a1cb612336c74a928)
…ns, and tap targets

Holistic consistency pass over the settings and integration surfaces so the
same concept reads and behaves the same way everywhere.

- Card headers: migrate the four integration cards (Withings, WHOOP,
  Fitbit/Google Health, Mood Log) plus the profile, passkeys, avatar, about,
  api-endpoints, api-tokens, and sharing cards onto the shared
  SettingsCardHeader primitive. Extend the primitive with an inline title
  accessory slot (Fitbit tag + experimental badge) and a multi-line
  description slot (overlap / experimental / deprecation sub-notes), keeping
  the rendered result identical.
- Integration verbs: normalise Mood Log onto the canonical "Sync now" /
  "Sync all data" / "Synchronize" set used by the other integrations, across
  all six locales. Swap the full-sync icon from Download to RefreshCw and
  align its action-row icon metrics with the sibling cards.
- Test buttons: migrate Telegram and ntfy off their hand-rolled "Test
  message" buttons onto the shared TestConnectionButton, so every channel and
  integration test control shares one icon, label, latency readout, and error
  taxonomy. Retire the now-unused testMessage / testSent strings.
- ntfy auth-token placeholder no longer reuses the "Saved" status string;
  it now reads "Saved — enter new to replace" like the other secret fields.
- Save-success copy: collapse the moodLog and Telegram bespoke strings onto
  the shared settings.saved string.
- Tap targets: floor every size="sm" action-row button (sync / full-sync /
  test / disconnect / save / connect-secret copy) at min-h-11, fixed at the
  TestConnectionButton source and across the cards, so they clear the 44px
  touch minimum.
- Contrast: the onboarding source-card badge moves from muted-on-muted to
  text-foreground; web-push card body rhythm aligns on space-y-4; Mood Log's
  deprecation note drops its bespoke 11px-italic treatment for the shared
  sub-note style.

(cherry picked from commit 11931d14b7875b7e2184a3b373d7a34b842f8eae)
The medication NL-extraction route egresses user-typed free text (PHI) to the
operator's server-managed provider key; require an active consent receipt for
the coach surface before the chain runs, matching the other server-managed
egress sites.
…elope

The settings cards now read connection state from the consolidated
/api/integrations/status envelope (the per-provider status routes were dropped
from the section), so the mobile-layout fixture must supply the per-integration
connected/configured fields the pills key on.
@MBombeck MBombeck merged commit 5793313 into main Jun 5, 2026
13 checks passed
@MBombeck MBombeck deleted the release/v1.12.1 branch June 5, 2026 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant