Skip to content

[hermes] Add log-router for data plane audit events#11330

Open
notque wants to merge 24 commits into
masterfrom
hermes_log-router-integration
Open

[hermes] Add log-router for data plane audit events#11330
notque wants to merge 24 commits into
masterfrom
hermes_log-router-integration

Conversation

@notque

@notque notque commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds log-router as a new opt-in component (logRouter.enabled: false by default)
    to the hermes chart for routing data plane audit events to customer Ceph/S3 buckets
  • Consumes from a dedicated dataplane.audit RabbitMQ queue (separate from
    the existing notifications.info control plane queue)
  • PostgreSQL (postgresql-ng) dependency for dynamic per-tenant routing configuration
  • New audit_admin Keystone role
  • Phase 1: QA/testing only — validates end-to-end event flow

New Files

  • templates/log-router-statefulset.yaml — StatefulSet with WAL PVC
  • templates/log-router-configmap.yaml — Non-secret config + policy.json
  • templates/log-router-secret.yaml — RabbitMQ, S3, Keystone credentials
  • templates/log-router-service.yaml — ClusterIP + headless service
  • templates/log-router-pdb.yaml — PodDisruptionBudget
  • templates/_log-router-utils.tpl — Image helper + common env vars

Modified Files

  • Chart.yaml — postgresql-ng dependency, version bump to 0.2.0
  • values.yaml — logRouter + log-router-postgresql sections
  • keystone-seed.yaml — audit_admin role (conditional on logRouter.enabled)
  • ci/test-values.yaml — log-router test overrides

Test plan

  • helm template renders cleanly with logRouter.enabled: false (no log-router resources)
  • helm template renders cleanly with logRouter.enabled: true and test values
  • Verify postgresql-ng secret/service naming matches _utils.tpl references
  • Deploy to QA and validate end-to-end event flow
  • Confirm existing hermes components are unaffected

Adds log-router as an opt-in component (disabled by default) for routing
data plane audit events from RabbitMQ to customer Ceph/S3 buckets.

- StatefulSet with WAL persistence for crash recovery
- PostgreSQL via postgresql-ng dependency for tenant configuration
- Keystone audit_admin role for config API RBAC
- Prometheus metrics scraping on port 8080
- PodDisruptionBudget and headless service for StatefulSet
- CI test values with minimal resources
joluc
joluc previously approved these changes Apr 9, 2026

@joluc joluc left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bbobrov bbobrov left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New audit_admin Keystone role scoped to ccadmin/cloud_admin project

Sorry, but this makes no sense. A role is not scoped to anything - it exists by itself. It can be assigned to a user and project, and then the user can get a bearer token on the project with this role, and go to hermes with it.

What you do is probably a rule that utilizes this combination... but to that i have comments inline.

Comment thread openstack/hermes/templates/keystone-seed.yaml Outdated
Comment thread openstack/hermes/templates/keystone-seed.yaml
Comment thread openstack/hermes/templates/keystone-seed.yaml Outdated
Comment thread openstack/hermes/templates/keystone-seed.yaml Outdated
Comment thread openstack/hermes/templates/log-router-configmap.yaml Outdated
@sapcc-bot

Copy link
Copy Markdown
Contributor

Failed to validate the helm chart. Details. Readme.

The original policy.json restricted all config operations to
ccadmin/cloud_admin scope, preventing customers from managing
their own data plane event routing. Add project-scoped and
domain-scoped rules matching the existing hermes audit_viewer
pattern so customers with audit_admin can manage their own config.

Remove the cloud_admin project block and all role assignments
from the seed — cloud admins get access through cluster policy
rules, and the service user doesn't need audit_admin.
@notque notque force-pushed the hermes_log-router-integration branch from acf130d to 5bc3899 Compare April 14, 2026 18:34
@notque

notque commented Apr 14, 2026

Copy link
Copy Markdown
Contributor Author

@bbobrov I've attempted to change this to what I've actually intended as we discussed on the call. Tomorrow if you have time, I'd appreciate your consideration on it, and again thank you for giving me a proper check.

bbobrov
bbobrov previously approved these changes Apr 17, 2026

@bbobrov bbobrov left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from authorization pov this looks now better, thanks

@LeoZhangZXZ LeoZhangZXZ force-pushed the hermes_log-router-integration branch from 81c4a0f to fc5d471 Compare May 12, 2026 11:21
@LeoZhangZXZ LeoZhangZXZ force-pushed the hermes_log-router-integration branch from 2d60324 to 575a584 Compare May 15, 2026 07:35
@sapcc-bot

Copy link
Copy Markdown
Contributor

Failed to validate the helm chart. Details. Readme.

@LeoZhangZXZ LeoZhangZXZ force-pushed the hermes_log-router-integration branch 2 times, most recently from 89fedec to 3ffb488 Compare May 18, 2026 07:20
@LeoZhangZXZ LeoZhangZXZ force-pushed the hermes_log-router-integration branch 2 times, most recently from ea450ab to e59eb6e Compare May 19, 2026 12:07
LeoZhangZXZ
LeoZhangZXZ previously approved these changes Jun 16, 2026
@sapcc-bot

Copy link
Copy Markdown
Contributor

Failed to validate the helm chart. Details. Readme.

LeoZhangZXZ
LeoZhangZXZ previously approved these changes Jun 17, 2026
LeoZhangZXZ
LeoZhangZXZ previously approved these changes Jun 17, 2026

@viennaa viennaa left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing the log-router PR. Couple of remarks before it can go live.

Comment thread openstack/hermes/ci/test-values.yaml Outdated
Comment on lines +129 to +131
log_router: {}
users:
log_router: {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to postgresql-ng they shall not include _. will fail

Suggested change
log_router: {}
users:
log_router: {}
log-router: {}
users:
log-router: {}

Comment on lines +110 to +113
name: '{{ $.Release.Name }}-pguser-log-router'
key: postgres-password
- name: LOG_ROUTER_DB_URL
value: "postgres://log-router:$(LOG_ROUTER_DB_PASSWORD)@{{ $.Release.Name }}-postgresql.{{ $.Release.Namespace }}.svc:5432/log-router?sslmode=disable"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally the log-router occurrences of these strings go into values.yaml. if log-router-postgresql values change, this fails silently.

values.yaml:
logRouter:
  db:
    user: log-router
    name: log-router

Comment thread openstack/hermes/values.yaml
emptyDir: {}
initContainers:
- name: fix-permissions
image: {{ required ".Values.init_image is missing" .Values.logRouter.init_image }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different values are checked here, is this on purpose?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Comment thread openstack/hermes/values.yaml Outdated
Comment thread openstack/hermes/values.yaml Outdated
@LeoZhangZXZ

Copy link
Copy Markdown
Contributor

@viennaa thanks for the review, good catch

@sapcc-bot

Copy link
Copy Markdown
Contributor

Failed to validate the helm chart. Details. Readme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants