Skip to content

feat(pwa): migrate auth.js to better-auth#614

Merged
vincentchalamon merged 34 commits into4.3from
feat/better-auth
Mar 27, 2026
Merged

feat(pwa): migrate auth.js to better-auth#614
vincentchalamon merged 34 commits into4.3from
feat/better-auth

Conversation

@vincentchalamon
Copy link
Copy Markdown
Contributor

@vincentchalamon vincentchalamon commented Mar 27, 2026

Summary

  • Replace next-auth v5 beta with better-auth for frontend authentication
  • Change Keycloak api-platform-pwa client from public to confidential (BFF pattern)
  • Add keycloak-config-cli Helm Job (post-upgrade) and nightly CronJob for automated realm reset
  • Update all components, server pages, admin panel, and E2E tests

Changes

New files

  • pwa/lib/auth.ts — better-auth server config (genericOAuth + Keycloak, PostgreSQL, nextCookies)
  • pwa/lib/auth-client.ts — React client with genericOAuthClient plugin
  • pwa/lib/auth-helpers.tsgetServerSession() / getServerAccessToken() for server components
  • pwa/hooks/useAuth.tsuseAccessToken() hook, signInWithKeycloak(), signOutWithKeycloak()
  • pwa/app/api/auth/[...all]/route.ts — better-auth route handler
  • helm/api-platform/templates/keycloak-realm-job.yaml — post-upgrade realm sync (weight 5, before fixtures at 6)
  • helm/api-platform/templates/keycloak-realm-cronjob.yaml — nightly realm reset at 00:00

Key design decisions

  • Database: reuses existing PostgreSQL with ba_-prefixed tables
  • Dual URLs: explicit authorizationUrl (external) / tokenUrl (internal) instead of OIDC discovery
  • Access token: fetched via getAccessToken({ providerId: "keycloak" }) with auto-refresh (replaces manual JWT callback)
  • Logout: RP-initiated via Keycloak logout endpoint (without id_token_hint for simplicity)
  • Realm reload: keycloak-config-cli Job + CronJob (no KC_SPI_IMPORT_IMPORTER_STRATEGY)

Test plan

  • pnpm lint — passes
  • tsc --noEmit — passes
  • helm lint — passes
  • hadolint — passes
  • PHPUnit (CI)
  • PHPStan (CI)
  • E2E Playwright (CI)
  • Full Docker Compose stack manual test

Closes #572
Closes #505

🤖 Generated with Claude Code

vincentchalamon and others added 13 commits March 27, 2026 14:12
Replace next-auth v5 beta with better-auth for frontend authentication.
Keycloak remains the OIDC identity provider; the Symfony backend is unchanged.

- Add better-auth server config with genericOAuth plugin (dual internal/external
  Keycloak URLs) and PostgreSQL storage (ba_-prefixed tables)
- Add better-auth React client with genericOAuthClient plugin
- Add server-side helpers (getServerSession, getServerAccessToken) for
  Next.js server components
- Add useAccessToken hook and signInWithKeycloak/signOutWithKeycloak helpers
- Change Keycloak api-platform-pwa client from public to confidential (BFF)
- Update all components to use better-auth session/token APIs
- Update Helm chart: new secrets, configmap entries, pwa-deployment env vars
- Add keycloak-config-cli Job (post-upgrade, weight 5) and nightly CronJob
  to ensure realm changes are applied on every deployment
- Shift fixtures CronJob to 00:10 (after realm reset at 00:00)
- Update CI/CD workflows (BETTER_AUTH_SECRET replaces AUTH_SECRET)
- Update E2E tests: remove next-auth intermediary page step from login flow

Closes #572
Closes #505

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The openid scope was missing from the realm's clientScopes list and from
all clients' defaultClientScopes. Keycloak was implicitly handling OIDC
requests without it, but this violates the OIDC spec which requires the
openid scope in authorization requests.

Add the openid scope definition to the realm and include it in
defaultClientScopes for all OIDC clients and in defaultDefaultClientScopes.

Closes #583

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Delete [...nextauth]/route.ts which conflicted with [...all]/route.ts
  causing Next.js build to fail with "Ambiguous app routes detected"
- Rename AUTH_SECRET to BETTER_AUTH_SECRET in Dockerfile and compose.prod.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- docker/setup-buildx-action: v3 -> v4
- docker/bake-action: v6 -> v7
- docker/login-action: v3 -> v4
- pnpm/action-setup: v4 -> v5
- aquasecurity/trivy-action: master -> 0.35.0 (pin to release tag)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix FromAsCasing: 'as' -> 'AS' for dev stage (line 25)
- Fix SecretsUsedInArgOrEnv: remove BETTER_AUTH_SECRET from build ARG
  (it's a runtime secret, not needed at build time — only NEXT_PUBLIC_*
  vars need to be ARGs for Next.js bundling)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keycloak needs more time to start on first deployment: DB schema init
(Liquibase ~40s) + Infinispan cluster setup (~50s) + realm import
exceeds the previous 150s budget (30 retries × 5s).

Increase failureThreshold from 30 to 60 (300s / 5 min) to accommodate
the larger realm-demo.json with the added openid scope.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The tag 6.2.1-26.0.0 does not exist on Docker Hub. Use 6.5.0-26 which
matches the keycloak-config-cli 6.5.0 release for Keycloak 26.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Keycloak K8s Service exposes port 80 (targeting container port 8080).
The keycloak-config-cli Job and CronJob were using port 8080 directly,
which is only accessible inside the pod, not via the Service DNS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When KC_BOOTSTRAP_ADMIN_PASSWORD secret is not set in GitHub (e.g. on
feature-deploy PRs), the Helm value is empty, which causes the
keycloak-config-cli Job to fail with HTTP 401.

Add a default fallback to "!ChangeMe!" matching the values.yaml default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The secret is named KEYCLOAK_ADMIN_PASSWORD in GitHub, not
KC_BOOTSTRAP_ADMIN_PASSWORD. This caused an empty password to be passed
to the Helm chart, making keycloak-config-cli fail with HTTP 401.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
external-dns rejects zones that don't match its domainFilter. When the
filter was set to the full FQDN (e.g. pr-614-demo.api-platform.com),
external-dns found the Cloudflare zone "api-platform.com" but rejected
it with "zone not in domain filter", preventing DNS record creation.

Extract the zone (last two domain segments) from the URL so external-dns
can match the Cloudflare zone correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vincentchalamon vincentchalamon removed the deploy Deploys Pull Request label Mar 27, 2026
vincentchalamon and others added 4 commits March 27, 2026 14:43
Using the zone name (api-platform.com) as domainFilter was too broad and
could interfere with other projects on the same domain. Restore the
pre-migration approach: use the deployment URL as domainFilter (scoped)
and explicitly pass the Cloudflare zone ID via zoneIdFilters so
external-dns can resolve the zone without relying on name matching.

Also align secret names to CF_API_TOKEN and CF_ZONE_ID.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use pg Pool instance instead of { type, url } config to fix
  "Failed to initialize database adapter" error
- Add @types/pg dev dependency for TypeScript
- Add better-auth migration step to CI before E2E tests
- Add waitForURL in E2E login() to wait for Keycloak page before
  filling credentials (better-auth redirects client-side via JS,
  unlike next-auth which redirected server-side)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Use pg Pool instance instead of { type, url } config to fix
  "Failed to initialize database adapter" error
- Add @types/pg dev dependency for TypeScript
- Add better-auth migration step to CI before E2E tests
- Add waitForURL in E2E login() to wait for Keycloak page before
  filling credentials (better-auth redirects client-side via JS,
  unlike next-auth which redirected server-side)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vincentchalamon vincentchalamon added the deploy Deploys Pull Request label Mar 27, 2026
The pwa prod container runs as the nextjs user whose home is
/nonexistent. npx fails with EACCES when trying to cache packages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vincentchalamon and others added 2 commits March 27, 2026 15:04
Add a standalone migrate.mjs script that creates better-auth tables
using only pg (no better-auth CLI needed at runtime). Uses CREATE TABLE
IF NOT EXISTS for idempotence.

Integrated into the Dockerfile CMD for both dev and prod stages:
- dev: pnpm install; node migrate.mjs; pnpm dev
- prod: node migrate.mjs && node server.js

This removes the need for a separate CI migration step, and ensures
tables exist in all environments (local dev, CI/E2E, K8s prod).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keycloak requires id_token_hint for RP-initiated logout. The idToken is
stored in the ba_account table and not accessible client-side.

Add /api/auth/keycloak-logout server route that:
1. Retrieves the user's Keycloak account to get idToken
2. Revokes the better-auth session
3. Redirects to Keycloak logout endpoint with id_token_hint

Update signOutWithKeycloak to navigate to this route instead of
calling authClient.signOut() + direct Keycloak redirect.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vincentchalamon vincentchalamon removed the deploy Deploys Pull Request label Mar 27, 2026
The listUserAccounts API filters out sensitive fields like idToken.
Query the ba_account table directly via pg Pool to retrieve it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the Keycloak login page form with two submit buttons styled
as the default Keycloak primary buttons. Each button contains a hidden
form that POSTs credentials directly — one click to log in.

Remove the socialProviders section ("Or sign in with") and all custom
CSS/assets (use default Keycloak theme).

Update E2E tests to assert on the new buttons.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vincentchalamon vincentchalamon added the deploy Deploys Pull Request label Mar 27, 2026
vincentchalamon and others added 2 commits March 27, 2026 15:47
…close

external-dns pod was not restarting between deploys (Helm showed "no
changes"), so DNS records were never created/updated. Add a deploy
timestamp annotation to force a rolling restart on each helm upgrade.

Enable triggerLoopOnEvent so external-dns reacts immediately to ingress
changes. In the cleanup workflow, delete the ingress before the namespace
so external-dns can remove DNS records from Cloudflare before being killed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Keycloak login page no longer has email/password fields — only
"Log in as user" and "Log in as admin" submit buttons. Update E2E
login methods to click the appropriate button.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vincentchalamon and others added 3 commits March 27, 2026 16:02
Kubernetes annotations must be strings. $(date +%s) produces a number
that --set passes as an integer, causing "cannot unmarshal number into
Go struct field ObjectMeta.spec.template.metadata.annotations of type
string". Use --set-string to force string type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Using a well-known default password ("!ChangeMe!") is a security risk.
If the secret is not provided, the deployment should fail rather than
silently use an insecure default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove prompt=consent from better-auth config (caused login loops)
- Fix AdminWithOIDC: use useEffect+useRef to prevent signIn loop when
  session hasn't loaded yet after OAuth callback
- E2E login: wait for Keycloak button visibility, then click and await
  navigation away from /oidc/ (handles multi-redirect chain)
- E2E admin pages: wait for React Admin sidebar instead of waitForURL
- E2E BookmarkPage: login first via header, then navigate to bookmarks
- E2E User.spec: wait for Keycloak buttons instead of text state changes

All 40 E2E tests pass locally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…chart

The official kubernetes-sigs/external-dns chart (unlike Bitnami's) does
not support zoneIdFilters as a values key — it was silently ignored.
Without --zone-id-filter, external-dns cannot match the Cloudflare zone
"api-platform.com" when domainFilter is set to a subdomain FQDN.

Use extraArgs to pass --zone-id-filter directly to the external-dns
binary, restoring the ability to scope domainFilter to the deployment
URL while explicitly resolving the Cloudflare zone by ID.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add /login client page that auto-triggers Keycloak signIn with
  callbackURL support — used by private server-side routes
- Redirect /bookmarks to /login?callbackURL=/bookmarks when not
  authenticated (instead of /books)
- E2E: use input[value="..."] locators instead of getByRole("button")
  for Keycloak submit inputs (more reliable across environments)
- E2E: simplify BookmarkPage flow (navigate to /bookmarks directly,
  login is triggered automatically via redirect)
- E2E: wait for sidebar visibility instead of URL patterns in admin

37+ tests pass locally. Remaining failures are pre-existing flaky tests
(MUI Rating timing, review data count).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vincentchalamon vincentchalamon removed the deploy Deploys Pull Request label Mar 27, 2026
vincentchalamon and others added 3 commits March 27, 2026 18:37
Next.js cannot prerender pages that use useSearchParams without a
Suspense boundary. Extract the hook into a child component wrapped
in Suspense to fix the build.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
chmod 644 on directories removes the execute bit, preventing Keycloak
from traversing the theme directory. This caused the custom login
template to not load in CI, falling back to the default Keycloak form.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
COPY --link --chown=keycloak:keycloak fails with "invalid user index: -1"
in Buildx when the named user doesn't exist in the builder context.
Use numeric 1000:0 (keycloak user UID/GID) instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vincentchalamon vincentchalamon linked an issue Mar 27, 2026 that may be closed by this pull request
…ed page

Non-admin users (e.g. john.doe) now see a full-page "Access Denied" screen
instead of a broken admin UI with 403 errors. The access token JWT is decoded
client-side to check for the "admin" realm role before rendering react-admin.

Closes #427

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vincentchalamon vincentchalamon merged commit d70fbac into 4.3 Mar 27, 2026
7 checks passed
@vincentchalamon vincentchalamon deleted the feat/better-auth branch March 27, 2026 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant