diff --git a/charts/tenant/templates/agentfleet.yaml b/charts/tenant/templates/agentfleet.yaml index 075f8f3..674f52e 100644 --- a/charts/tenant/templates/agentfleet.yaml +++ b/charts/tenant/templates/agentfleet.yaml @@ -2,7 +2,7 @@ apiVersion: agents.nanohype.dev/v1alpha1 kind: AgentFleet metadata: name: {{ .Values.platform.name }}-fleet - namespace: {{ .Release.Namespace }} + namespace: {{ .Values.controlPlaneNamespace | default .Release.Namespace }} labels: app.kubernetes.io/part-of: eks-agent-platform eks-agent-platform/tenant: {{ .Values.platform.tenant }} diff --git a/charts/tenant/templates/budgetpolicy.yaml b/charts/tenant/templates/budgetpolicy.yaml index 0b1c90c..dc3b09e 100644 --- a/charts/tenant/templates/budgetpolicy.yaml +++ b/charts/tenant/templates/budgetpolicy.yaml @@ -5,7 +5,7 @@ apiVersion: governance.nanohype.dev/v1alpha1 kind: BudgetPolicy metadata: name: {{ .Values.platform.name }}-budget - namespace: {{ .Release.Namespace }} + namespace: {{ .Values.controlPlaneNamespace | default .Release.Namespace }} labels: app.kubernetes.io/part-of: eks-agent-platform eks-agent-platform/tenant: {{ .Values.platform.tenant }} diff --git a/charts/tenant/templates/evalsuite.yaml b/charts/tenant/templates/evalsuite.yaml index 3f3f01a..6f54190 100644 --- a/charts/tenant/templates/evalsuite.yaml +++ b/charts/tenant/templates/evalsuite.yaml @@ -3,7 +3,7 @@ apiVersion: governance.nanohype.dev/v1alpha1 kind: EvalSuite metadata: name: {{ .Values.platform.name }}-eval - namespace: {{ .Release.Namespace }} + namespace: {{ .Values.controlPlaneNamespace | default .Release.Namespace }} labels: app.kubernetes.io/part-of: eks-agent-platform eks-agent-platform/tenant: {{ .Values.platform.tenant }} diff --git a/charts/tenant/templates/modelgateway.yaml b/charts/tenant/templates/modelgateway.yaml index f551fb6..0984682 100644 --- a/charts/tenant/templates/modelgateway.yaml +++ b/charts/tenant/templates/modelgateway.yaml @@ -2,7 +2,7 @@ apiVersion: agents.nanohype.dev/v1alpha1 kind: ModelGateway metadata: name: {{ .Values.platform.name }}-gateway - namespace: {{ .Release.Namespace }} + namespace: {{ .Values.controlPlaneNamespace | default .Release.Namespace }} labels: app.kubernetes.io/part-of: eks-agent-platform eks-agent-platform/tenant: {{ .Values.platform.tenant }} diff --git a/charts/tenant/templates/platform.yaml b/charts/tenant/templates/platform.yaml index f1b35b6..b4b73f3 100644 --- a/charts/tenant/templates/platform.yaml +++ b/charts/tenant/templates/platform.yaml @@ -5,7 +5,7 @@ apiVersion: platform.nanohype.dev/v1alpha1 kind: Platform metadata: name: {{ .Values.platform.name }} - namespace: {{ .Release.Namespace }} + namespace: {{ .Values.controlPlaneNamespace | default .Release.Namespace }} labels: app.kubernetes.io/part-of: eks-agent-platform eks-agent-platform/persona: {{ .Values.platform.persona }} diff --git a/charts/tenant/values.yaml b/charts/tenant/values.yaml index 97211ca..80293dd 100644 --- a/charts/tenant/values.yaml +++ b/charts/tenant/values.yaml @@ -7,6 +7,22 @@ # --set platform.tenant=acme \ # --set budget.monthlyUsd=2500 +# Where this tenant's control-plane CRs (Platform, BudgetPolicy, ModelGateway, +# AgentFleet, EvalSuite) live. This is the first dial of the tenant isolation +# tier — the template grows with you instead of being outgrown: +# shared (default) controlPlaneNamespace: eks-agent-platform — every tenant's +# control-plane CRs in one platform-team-owned namespace. +# Simplest; right for a startup / few tenants. +# dedicated controlPlaneNamespace: eap-tenant- — per-tenant +# control-plane namespace for isolation / per-tenant GitOps +# granularity at scale or under compliance. +# harder platform.isolation: vcluster, then a dedicated cluster — +# the workload-side dials, orthogonal to this one. +# The Tenant CR is cluster-scoped and the operator watches Platforms +# cluster-wide, so moving a tenant up a tier is a re-render, never a migration. +# See docs/architecture/tenant-isolation-tiers.md. +controlPlaneNamespace: eks-agent-platform + platform: name: "" # required tenant: "" # required diff --git a/docs/architecture/tenant-isolation-tiers.md b/docs/architecture/tenant-isolation-tiers.md new file mode 100644 index 0000000..0b66334 --- /dev/null +++ b/docs/architecture/tenant-isolation-tiers.md @@ -0,0 +1,59 @@ +# Tenant isolation tiers + +The platform is an opinionated starting point that has to serve a solo startup +and a regulated enterprise from the same template. The way it does that without +becoming a template you outgrow: **the simple default is a degenerate case of +the scalable model.** Growing up a tier is turning a dial, never a migration or +a rewrite. + +This works because three things are already true: + +- **`Tenant` is cluster-scoped.** It's the stable identity anchor — the owning + team. Where a tenant's `Platform` CRs physically live can change; the `Tenant` + doesn't move, and per-tenant budget/spend roll-up follows it. +- **The operator watches `Platform` cluster-wide.** Where a control-plane CR + sits is a _placement policy_, not a functional constraint — so namespacing can + be reshaped with zero operator changes. +- **Isolation is already a spectrum, not a boolean.** `Platform.isolation` + (`namespace` → `vcluster`) dials workload isolation; `controlPlaneNamespace` + dials control-plane isolation. They're orthogonal — turn them independently. + +## The tiers + +A tenant climbs these as count, compliance, or blast-radius needs grow. Earlier +tiers are not "wrong" — they're the right default until a need forces the next. + +| Tier | Control plane (`controlPlaneNamespace`) | Workload (`Platform.isolation`) | When | +| --------------------- | ---------------------------------------------------------------- | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- | +| **Shared** (default) | `eks-agent-platform` — all tenants' CRs in one platform-owned ns | `namespace` → `tenants-` | Startup / few tenants. Lowest ceremony. | +| **Dedicated CP ns** | `eap-tenant-` — per-tenant control-plane ns | `namespace` | Many tenants; per-tenant GitOps Application granularity; per-tenant control-plane RBAC/quota. | +| **vcluster** | `eap-tenant-` | `vcluster` | Hard workload isolation (noisy-neighbor, untrusted code) without a new cluster. | +| **Dedicated cluster** | (that cluster's mgmt ns) | n/a | Regulated / air-gapped / sovereignty. The cluster-scoped `Tenant` + the portal's multi-cluster watcher already anticipate this. | + +## Why control-plane CRs default to the _shared management_ namespace + +The `Platform` / `BudgetPolicy` / `ModelGateway` / `AgentFleet` / `EvalSuite` +CRs _define_ the tenant boundary — budget, allowed models, kill-switch. They are +platform-team-owned control-plane objects, so the default keeps them in +`eks-agent-platform`, **out of the tenant's workload namespace and out of the +tenant's reach.** The operator derives the `tenants-` workload namespace +separately; that's where the tenant's pods (and their RBAC) live. + +Deliberately _not_ the default: rendering control-plane CRs into the tenant's +own workload namespace. It co-locates the boundary definition with the workloads +it governs — a privilege-escalation footgun unless the CRD RBAC is airtight. +When a tenant needs control-plane isolation, the answer is a dedicated +_control-plane_ namespace (`eap-tenant-`), still platform-owned — not the +workload namespace. + +## Promoting a tenant (no migration) + +1. Set `controlPlaneNamespace: eap-tenant-` (and/or `platform.isolation: +vcluster`) for that tenant — a value change in the portal form / template. +2. Re-render + re-apply. ArgoCD moves the CRs to the new namespace; the + cluster-scoped `Tenant` is untouched, so identity, budget roll-up, and access + grants carry over. +3. The operator reconciles the Platform from its new home exactly as before + (cluster-wide watch). + +No CRD change, no operator change, no data migration. The dial is the product.