Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion charts/tenant/templates/agentfleet.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ apiVersion: agents.nanohype.dev/v1alpha1
kind: AgentFleet
metadata:
name: {{ .Values.platform.name }}-fleet
namespace: {{ .Release.Namespace }}
namespace: {{ .Values.controlPlaneNamespace | default .Release.Namespace }}
labels:
app.kubernetes.io/part-of: eks-agent-platform
eks-agent-platform/tenant: {{ .Values.platform.tenant }}
Expand Down
2 changes: 1 addition & 1 deletion charts/tenant/templates/budgetpolicy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ apiVersion: governance.nanohype.dev/v1alpha1
kind: BudgetPolicy
metadata:
name: {{ .Values.platform.name }}-budget
namespace: {{ .Release.Namespace }}
namespace: {{ .Values.controlPlaneNamespace | default .Release.Namespace }}
labels:
app.kubernetes.io/part-of: eks-agent-platform
eks-agent-platform/tenant: {{ .Values.platform.tenant }}
Expand Down
2 changes: 1 addition & 1 deletion charts/tenant/templates/evalsuite.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ apiVersion: governance.nanohype.dev/v1alpha1
kind: EvalSuite
metadata:
name: {{ .Values.platform.name }}-eval
namespace: {{ .Release.Namespace }}
namespace: {{ .Values.controlPlaneNamespace | default .Release.Namespace }}
labels:
app.kubernetes.io/part-of: eks-agent-platform
eks-agent-platform/tenant: {{ .Values.platform.tenant }}
Expand Down
2 changes: 1 addition & 1 deletion charts/tenant/templates/modelgateway.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ apiVersion: agents.nanohype.dev/v1alpha1
kind: ModelGateway
metadata:
name: {{ .Values.platform.name }}-gateway
namespace: {{ .Release.Namespace }}
namespace: {{ .Values.controlPlaneNamespace | default .Release.Namespace }}
labels:
app.kubernetes.io/part-of: eks-agent-platform
eks-agent-platform/tenant: {{ .Values.platform.tenant }}
Expand Down
2 changes: 1 addition & 1 deletion charts/tenant/templates/platform.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ apiVersion: platform.nanohype.dev/v1alpha1
kind: Platform
metadata:
name: {{ .Values.platform.name }}
namespace: {{ .Release.Namespace }}
namespace: {{ .Values.controlPlaneNamespace | default .Release.Namespace }}
labels:
app.kubernetes.io/part-of: eks-agent-platform
eks-agent-platform/persona: {{ .Values.platform.persona }}
Expand Down
16 changes: 16 additions & 0 deletions charts/tenant/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,22 @@
# --set platform.tenant=acme \
# --set budget.monthlyUsd=2500

# Where this tenant's control-plane CRs (Platform, BudgetPolicy, ModelGateway,
# AgentFleet, EvalSuite) live. This is the first dial of the tenant isolation
# tier — the template grows with you instead of being outgrown:
# shared (default) controlPlaneNamespace: eks-agent-platform — every tenant's
# control-plane CRs in one platform-team-owned namespace.
# Simplest; right for a startup / few tenants.
# dedicated controlPlaneNamespace: eap-tenant-<name> — per-tenant
# control-plane namespace for isolation / per-tenant GitOps
# granularity at scale or under compliance.
# harder platform.isolation: vcluster, then a dedicated cluster —
# the workload-side dials, orthogonal to this one.
# The Tenant CR is cluster-scoped and the operator watches Platforms
# cluster-wide, so moving a tenant up a tier is a re-render, never a migration.
# See docs/architecture/tenant-isolation-tiers.md.
controlPlaneNamespace: eks-agent-platform

platform:
name: "" # required
tenant: "" # required
Expand Down
59 changes: 59 additions & 0 deletions docs/architecture/tenant-isolation-tiers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Tenant isolation tiers

The platform is an opinionated starting point that has to serve a solo startup
and a regulated enterprise from the same template. The way it does that without
becoming a template you outgrow: **the simple default is a degenerate case of
the scalable model.** Growing up a tier is turning a dial, never a migration or
a rewrite.

This works because three things are already true:

- **`Tenant` is cluster-scoped.** It's the stable identity anchor — the owning
team. Where a tenant's `Platform` CRs physically live can change; the `Tenant`
doesn't move, and per-tenant budget/spend roll-up follows it.
- **The operator watches `Platform` cluster-wide.** Where a control-plane CR
sits is a _placement policy_, not a functional constraint — so namespacing can
be reshaped with zero operator changes.
- **Isolation is already a spectrum, not a boolean.** `Platform.isolation`
(`namespace` → `vcluster`) dials workload isolation; `controlPlaneNamespace`
dials control-plane isolation. They're orthogonal — turn them independently.

## The tiers

A tenant climbs these as count, compliance, or blast-radius needs grow. Earlier
tiers are not "wrong" — they're the right default until a need forces the next.

| Tier | Control plane (`controlPlaneNamespace`) | Workload (`Platform.isolation`) | When |
| --------------------- | ---------------------------------------------------------------- | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| **Shared** (default) | `eks-agent-platform` — all tenants' CRs in one platform-owned ns | `namespace` → `tenants-<name>` | Startup / few tenants. Lowest ceremony. |
| **Dedicated CP ns** | `eap-tenant-<name>` — per-tenant control-plane ns | `namespace` | Many tenants; per-tenant GitOps Application granularity; per-tenant control-plane RBAC/quota. |
| **vcluster** | `eap-tenant-<name>` | `vcluster` | Hard workload isolation (noisy-neighbor, untrusted code) without a new cluster. |
| **Dedicated cluster** | (that cluster's mgmt ns) | n/a | Regulated / air-gapped / sovereignty. The cluster-scoped `Tenant` + the portal's multi-cluster watcher already anticipate this. |

## Why control-plane CRs default to the _shared management_ namespace

The `Platform` / `BudgetPolicy` / `ModelGateway` / `AgentFleet` / `EvalSuite`
CRs _define_ the tenant boundary — budget, allowed models, kill-switch. They are
platform-team-owned control-plane objects, so the default keeps them in
`eks-agent-platform`, **out of the tenant's workload namespace and out of the
tenant's reach.** The operator derives the `tenants-<name>` workload namespace
separately; that's where the tenant's pods (and their RBAC) live.

Deliberately _not_ the default: rendering control-plane CRs into the tenant's
own workload namespace. It co-locates the boundary definition with the workloads
it governs — a privilege-escalation footgun unless the CRD RBAC is airtight.
When a tenant needs control-plane isolation, the answer is a dedicated
_control-plane_ namespace (`eap-tenant-<name>`), still platform-owned — not the
workload namespace.

## Promoting a tenant (no migration)

1. Set `controlPlaneNamespace: eap-tenant-<name>` (and/or `platform.isolation:
vcluster`) for that tenant — a value change in the portal form / template.
2. Re-render + re-apply. ArgoCD moves the CRs to the new namespace; the
cluster-scoped `Tenant` is untouched, so identity, budget roll-up, and access
grants carry over.
3. The operator reconciles the Platform from its new home exactly as before
(cluster-wide watch).

No CRD change, no operator change, no data migration. The dial is the product.
Loading