The platform supports two deployment types with different infrastructure requirements:
Kind (Development):
- Local Kubernetes cluster via Kind
- Requires: Docker, kubectl, kind CLI
- Simulated GPU support (RuntimeClass only)
- nginx-ingress-controller
- Suitable for development and testing
- Bootstrap:
./bootstrap.sh --deployment kind
K3s (Production):
- Lightweight Kubernetes for production
- Requires: Ubuntu 22.04+, NVIDIA drivers, nvidia-container-toolkit
- Real GPU support via NVIDIA GPU Operator
- Gateway API with external DNS
- Optimized for bare-metal and edge deployments
- Bootstrap:
./bootstrap.sh --deployment k3s
Common Prerequisites (both deployments):
- kubectl: Configured with cluster access and admin permissions
- DNS Configuration: For
*.home.local(or your chosen domain) - Network Access: Ability to reach ingress controller from your devices
- Cloudflare API Token: For organization webhook tunnels (set as
CLOUDFLARE_API_TOKENenvironment variable) - Basic tools: curl, openssl, base64, jq
Kind-Specific Prerequisites:
- Docker: Docker Desktop or Docker Engine
- Kind CLI:
brew install kindor download from releases - Kubernetes: Version 1.24+ (created by Kind)
K3s-Specific Prerequisites:
- Ubuntu 22.04+: With root/sudo access
- NVIDIA Drivers: Installed and working (
nvidia-smisucceeds) - NVIDIA Container Toolkit:
nvidia-container-runtimeinstalled - K3s: Will be installed by bootstrap if not present
The platform requires DNS configuration for browser-based authentication. Choose one option:
Option A: Router/Pi-hole DNS (Recommended)
# Find your ingress controller IP
kubectl get svc -n ingress-nginx ingress-nginx-controller
# Add A records in your router or Pi-hole:
auth.home.local → <INGRESS_IP>
dex.home.local → <INGRESS_IP>
argocd.home.local → <INGRESS_IP>
tekton.home.local → <INGRESS_IP>Option B: Hosts File (Per-device)
# Add to /etc/hosts on each device:
<INGRESS_IP> auth.home.local
<INGRESS_IP> dex.home.local
<INGRESS_IP> argocd.home.local
<INGRESS_IP> tekton.home.localThe bootstrap script achieves complete platform convergence automatically with no manual steps required.
Run Bootstrap
git clone https://github.com/bdchatham/AphexPlatformInfrastructure.git
cd AphexPlatformInfrastructure
# Set Cloudflare API token for organization webhooks
export CLOUDFLARE_API_TOKEN="your-cloudflare-api-token"
# Choose deployment type
./bootstrap.sh --deployment kind # For development
# OR
./bootstrap.sh --deployment k3s # For production with GPU supportKind Bootstrap Options:
./bootstrap.sh --deployment kind [OPTIONS]
Options:
--cluster-name NAME Name for Kind cluster (default: platform-cluster)
--use-existing Use existing kubecontext instead of creating cluster
--show-secrets Display generated secrets (WARNING: not for production)K3s Bootstrap Options:
sudo ./bootstrap.sh --deployment k3s [OPTIONS]
Options:
--show-secrets Display generated secrets (WARNING: not for production)
Note: Must run as root/sudo for K3s installationBootstrap Actions (Automatic):
- Routes to deployment-specific bootstrap script
- Creates or configures Kubernetes cluster (Kind or K3s)
- Generates ALL secrets automatically (PostgreSQL, Authentik, Dex, API tokens)
- Creates platform namespaces (argocd, auth-system, tekton-pipelines, platform-system, external-secrets)
- Stores all secrets in Kubernetes (never prints to stdout)
- Installs ArgoCD with proper OIDC configuration
- Creates deployment-specific platform-root ArgoCD Application
- Waits for ArgoCD to deploy all platform components (GitOps-managed)
- Waits for Authentik and creates API token via Authentik API
- Achieves complete platform convergence automatically
- Displays access instructions and secret retrieval commands
Deployment-Specific Paths:
- Kind: Uses
platform/base/argocd/apps→ referencesplatform/deployments/kind/* - K3s: Uses
platform/deployments/k3s/argocd/apps→ referencesplatform/deployments/k3s/*
For detailed architecture, see architecture.md.
The platform implements a revolutionary layered cert-manager architecture that eliminates timing issues and manual intervention.
Deployment Waves (Automatic via ArgoCD):
Wave 10: cert-manager Installation
# ArgoCD deploys cert-manager with PostSync validation
kubectl get pods -n cert-manager
kubectl get job cert-manager-webhook-readiness -n cert-managerWave 20: Certificate Foundation
# Only proceeds after webhook validation passes
kubectl get clusterissuer selfsigned-issuer
kubectl get certificates -AWave 30: Ingress Resources
# Only proceeds after certificates are ready
kubectl get ingress -ASource: platform/cert-manager/, platform/cert-manager/webhook-readiness-hook.yaml, platform/argocd/apps/platform-cert-manager.yaml
For detailed cert-manager architecture, see architecture.md.
Step 1: Check Bootstrap Completion
# All ArgoCD applications should be Synced/Healthy
kubectl get applications -n argocd
# cert-manager components should be running
kubectl get pods -n cert-manager
# External Secrets Operator should be running
kubectl get pods -n external-secrets
# Certificates should be Ready
kubectl get certificates -AStep 2: Verify Authentication System
# Check auth-system components
kubectl get pods -n auth-system
# Verify Authentik is accessible
curl -k https://auth.home.local/if/flow/initial-setup/
# Verify Dex OIDC discovery
curl -k https://dex.home.local/.well-known/openid-configurationSource: platform/auth/authentik/, platform/auth/dex/
Step 3: Verify External Secrets Operator
# Check External Secrets Operator pods
kubectl get pods -n external-secrets
# Verify ClusterSecretStore CRD is installed
kubectl get crd clustersecretstores.external-secrets.io
# List any ClusterSecretStores (created when organizations are provisioned)
kubectl get clustersecretstoresSource: platform/base/external-secrets/
Step 4: Access Platform Services
Authentik UI (User Management):
# Get admin password
kubectl get secret authentik-secrets -n auth-system \
-o jsonpath='{.data.admin-password}' | base64 -d
# Access: https://auth.home.local
# Username: admin
# Password: (from above command)ArgoCD UI (GitOps Management):
# Access: https://argocd.home.local
# Click "Login via Dex"
# Authenticate with Authentik credentialsTekton Dashboard (Pipeline Monitoring):
# Access: https://tekton.home.local
# Authenticate via Dex/AuthentikSource: platform/auth/secrets/README.md, platform/integrations/argocd-oidc-config.yaml
After successful bootstrap and convergence:
ArgoCD Applications (Kind deployment):
NAME SYNC STATUS HEALTH STATUS
platform-auth Synced Healthy
platform-catalog Synced Healthy
platform-cert-foundation Synced Healthy
platform-cert-manager Synced Healthy
platform-controllers Synced Healthy
platform-crds Synced Healthy
platform-external-secrets Synced Healthy
platform-gpu Synced Healthy
platform-ingress Synced Healthy
platform-ingress-controller Synced Healthy
platform-rbac Synced Healthy
platform-root Synced Healthy
platform-tekton Synced Healthy
ArgoCD Applications (K3s deployment - additional apps):
platform-external-dns Synced Healthy
platform-gateway Synced Healthy
platform-gpu-operator Synced Healthy
Certificates:
NAMESPACE NAME READY SECRET AGE
auth-system argocd-tls True argocd-tls 5m
auth-system authentik-tls True authentik-tls 5m
auth-system dex-tls True dex-tls 5m
auth-system tekton-tls True tekton-tls 5m
Platform Services:
- All pods Running in cert-manager, external-secrets, auth-system, argocd, tekton-pipelines namespaces
- All Ingress resources configured with TLS certificates
- Authentication flow working end-to-end
- External Secrets Operator ready for ClusterSecretStore provisioning
Source
platform/bootstrap/bootstrap.sh- Bootstrap implementationplatform/cert-manager/- Layered cert-manager architectureplatform/cert-manager/webhook-readiness-hook.yaml- PostSync validationplatform/auth/- Authentication system components
# Run RBAC validation script
platform/scripts/validate-rbac.sh
# Manual RBAC checks
kubectl auth can-i create pipelines.platform.dev --as=admin@platform.local --as-group=platform-admins
kubectl auth can-i create pipelines.platform.dev --as=alice@platform.local --as-group=platform-engineering -n user-alice
kubectl auth can-i create pipelines.platform.dev --as=alice@platform.local --as-group=platform-engineering -n auth-system# Verify certificate-based admin access works
kubectl --kubeconfig /etc/kubernetes/admin.conf get nodes
# Test when OIDC is unavailable
kubectl scale deployment/dex --replicas=0 -n auth-system
kubectl --kubeconfig /etc/kubernetes/admin.conf get pods -n auth-system
kubectl scale deployment/dex --replicas=1 -n auth-systemAuthentik UI (User Management):
# Get admin password
kubectl get secret authentik-secrets -n auth-system -o jsonpath='{.data.admin-password}' | base64 -d
# Access at https://auth.home.local
# Username: admin
# Password: (from above command)ArgoCD with OIDC:
# Access at https://argocd.home.local
# Click "Login via Dex"
# Authenticate with Authentik credentialsTekton Dashboard with OIDC:
# Access at https://tekton.home.local
# Authenticate with Authentik credentials via DexSource
platform/scripts/validate-oidc-discovery.shplatform/scripts/validate-rbac.shplatform/auth/ingress/
Important: Dual-Domain Strategy
The platform uses two domain strategies:
- arbiter-dev.com (public): Organization webhook endpoints accessible from the internet via Cloudflare tunnels
- home.local (local): Authentication and platform services accessible only within the home network
This separation ensures webhook endpoints are publicly accessible while keeping platform administration services private. See architecture.md for detailed networking architecture.
Before creating organizations, ensure:
-
Cloudflare Account Setup:
- Domain
arbiter-dev.comadded to Cloudflare account - Nameservers updated at domain registrar to point to Cloudflare
- API token created with permissions: Zone.DNS (Edit), Account.Cloudflare Tunnel (Edit)
- API token stored in cluster secret:
cloudflare-api-tokeninplatform-systemnamespace
- Domain
-
Platform Bootstrap Complete:
- ArgoCD and all platform components deployed
- Onboarding controller running in
platform-systemnamespace
Organizations provide multi-tenant isolation with dedicated namespaces (org-{name}), EventListeners, and public webhook endpoints.
Using AphexCLI (Recommended):
aphex organization bootstrap --admin-email admin@acme-corp.com acme-corpManual YAML Application:
kubectl apply -f - <<EOF
apiVersion: aphex.io/v1alpha1
kind: Organization
metadata:
name: acme-corp
namespace: platform-system
spec:
displayName: "ACME Corporation"
adminUsers:
- admin@acme-corp.com
webhookSecret: "" # Auto-generated if empty
EOFWhat Gets Created:
- Organization namespace:
org-acme-corp - Cloudflare tunnel via API
- DNS CNAME record:
acme-corp.arbiter-dev.com → {tunnel-id}.cfargotunnel.com - Cloudflared tunnel deployment
- EventListener with dedicated ServiceAccount and ClusterRoleBinding
- Organization admin RBAC
- Webhook secret for GitHub integration
Source: platform/crds/organization-crd.yaml, platform/platform-controller/controller/controllers/organization_controller.go
# Check Organization status
kubectl get organization acme-corp -n platform-system
kubectl describe organization acme-corp -n platform-system
# Verify organization namespace
kubectl get namespace org-acme-corp
# Verify webhook secret
kubectl get secret github-webhook-secret -n org-acme-corp
# Verify Cloudflared tunnel
kubectl get deployment -n org-acme-corp | grep cloudflared
kubectl get configmap cloudflared-config -n org-acme-corp
# Verify EventListener
kubectl get eventlisteners -n org-acme-corp
kubectl get clusterrolebinding eventlistener-acme-corp
# Verify admin RBAC
kubectl get role,rolebinding -n org-acme-corp
# Verify External Secrets infrastructure
kubectl get serviceaccount eso-secrets-reader -n org-acme-corp
kubectl get role eso-secrets-reader -n org-acme-corp
kubectl get rolebinding eso-secrets-reader -n org-acme-corp
kubectl get clustersecretstore org-acme-corp-storeVerify the public webhook endpoint is accessible:
# Test DNS resolution
nslookup acme-corp.arbiter-dev.com
# Test HTTPS connectivity (expect 404 for GET request)
curl -k https://acme-corp.arbiter-dev.com
# Get webhook secret for GitHub configuration
kubectl get secret github-webhook-secret -n org-acme-corp -o jsonpath='{.data.secret}' | base64 -dConfigure GitHub Webhook:
- Go to repository Settings → Webhooks → Add webhook
- Payload URL:
https://acme-corp.arbiter-dev.com - Content type:
application/json - Secret: (use secret from command above)
- Events: Select "Push events"
Verify Webhook Delivery:
# Watch EventListener logs
kubectl logs -n org-acme-corp -l eventlistener=github-listener -f
# Check for PipelineRuns after push
kubectl get pipelineruns -n org-acme-corpSource: platform/platform-controller/controller/controllers/organization_controller.go
After organization bootstrap, repositories can be onboarded to create webhook integration with existing pipelines.
# Create RepoBinding
kubectl apply -f - <<EOF
apiVersion: aphex.io/v1alpha1
kind: RepoBinding
metadata:
name: my-repo-binding
namespace: platform-system
spec:
aphexOrg: "acme-corp"
repoOrg: "acme"
repoName: "my-application"
pipelineName: "my-application-pipeline"
templateRef: "run-pipeline-v1"
ingressHost: "" # Optional: defaults to cluster ingress
EOFField Descriptions:
aphexOrg: Organization name (maps toorg-{aphexOrg}namespace)repoOrg: GitHub organization namerepoName: Repository namepipelineName: Name of Tekton Pipeline to triggertemplateRef: Dispatcher template name (e.g.,run-pipeline-v1)ingressHost: Optional webhook hostname (defaults to cluster ingress)
Source: platform/crds/repobinding-crd.yaml, platform/platform-controller/controller/controllers/repobinding_controller.go
# Check RepoBinding status
kubectl get repobinding my-repo-binding -n platform-system
kubectl describe repobinding my-repo-binding -n platform-system
# Verify organization namespace
kubectl get namespace org-my-org
# Verify service account in pipeline namespace
kubectl get serviceaccount pipeline-runner -n my-pipeline
# Verify RBAC in pipeline namespace
kubectl get role,rolebinding,clusterrole,clusterrolebinding -n my-pipeline | grep my-pipeline
# Verify ArgoCD AppProject
kubectl get appproject my-pipeline -n argocd
kubectl describe appproject my-pipeline -n argocd
# Verify resource limits in pipeline namespace
kubectl get resourcequota,limitrange -n my-pipeline
# Verify network policy in pipeline namespace
kubectl get networkpolicy -n my-pipeline
# Verify Tekton resources in organization namespace
kubectl get trigger,triggertemplate -n org-my-org | grep my-pipelineAfter onboarding, configure the webhook in GitHub:
# Get webhook URL and secret from RepoBinding status
kubectl get repobinding my-repo-binding -n pipeline-system -o yaml
# Look for status.webhookURL and status.webhookSecretConfigure in GitHub:
- Go to repository Settings → Webhooks → Add webhook
- Payload URL: (from RepoBinding status.webhookURL)
- Content type: application/json
- Secret: (from RepoBinding status.webhookSecret)
- Events: Push events
- Active: ✓
- Click "Add webhook"
Source
platform/crds/repobinding-crd.yamlplatform/platform-controller/controller/
The authentication system provides centralized identity management through Authentik with Dex as an OIDC connector layer. All authentication components are managed by ArgoCD via GitOps.
After bootstrap completes and ArgoCD syncs the auth system:
Step 1: Ensure DNS is configured
Configure DNS so that *.home.local resolves to your Ingress controller's IP address.
# Find Ingress controller IP
kubectl get svc -n ingress-nginx ingress-nginx-controller
# Add DNS records (router/Pi-hole) or /etc/hosts entries:
# 192.168.1.100 auth.home.local
# 192.168.1.100 dex.home.local
# 192.168.1.100 argocd.home.local
# 192.168.1.100 tekton.home.localStep 2: Retrieve admin password
kubectl get secret authentik-secrets -n auth-system \
-o jsonpath='{.data.admin-password}' | base64 -dStep 3: Access Authentik UI
- Open
https://auth.home.localin browser - Accept certificate warning (if using self-signed certificates)
- Login with username
adminand password from Step 2
- Navigate to Directory → Users
- Click Create
- Fill in user details:
- Username (required, unique)
- Email (required, unique)
- Name (display name)
- Password (or send password reset email)
- Assign user to groups:
admins: Full access to all platform servicesengineering: Read-only access to platform services
- Click Create
Users can authenticate immediately - no pod restarts or configuration changes required.
- Navigate to Directory → Groups
- View existing groups (
admins,engineering) - Create new groups as needed
- Assign users to groups
- Groups are automatically included in OIDC tokens
- Navigate to Directory → Users
- Click on
adminuser - Click Set password
- Enter new password
- Click Update
DNS configuration is required for user browsers to reach services via hostnames. OIDC authentication requires redirect URIs that browsers can reach.
Option 1: Router/Pi-hole DNS (Recommended)
Add A records in your home router or Pi-hole:
auth.home.local → 192.168.1.100
dex.home.local → 192.168.1.100
argocd.home.local → 192.168.1.100
tekton.home.local → 192.168.1.100
Replace 192.168.1.100 with your Ingress controller's LoadBalancer IP or NodePort IP.
Option 2: Hosts File
Add entries to /etc/hosts on each device:
# Linux/macOS
sudo nano /etc/hosts
# Add these lines:
192.168.1.100 auth.home.local
192.168.1.100 dex.home.local
192.168.1.100 argocd.home.local
192.168.1.100 tekton.home.localVerify DNS configuration:
nslookup auth.home.local
nslookup dex.home.local
curl -k https://auth.home.localAll services use HTTPS with TLS certificates. Choose between self-signed (simplest) or Let's Encrypt (trusted certificates).
-
Install cert-manager:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.16.4/cert-manager.yaml kubectl wait --for=condition=ready pod \ -l app.kubernetes.io/instance=cert-manager \ -n cert-manager \ --timeout=90s -
Create self-signed ClusterIssuer:
cat <<EOF | kubectl apply -f - apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: selfsigned-issuer spec: selfSigned: {} EOF
-
Accept certificate warnings in browser:
- Chrome: Click "Advanced" → "Proceed to auth.home.local (unsafe)"
- Firefox: Click "Advanced" → "Accept the Risk and Continue"
Provides trusted certificates without browser warnings. Requires DNS provider API access.
-
Install cert-manager (same as above)
-
Create DNS provider API token secret (Cloudflare example):
kubectl create secret generic cloudflare-api-token \ -n cert-manager \ --from-literal=api-token=YOUR_TOKEN_HERE
-
Create Let's Encrypt ClusterIssuer:
cat <<EOF | kubectl apply -f - apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-dns spec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: your-email@example.com privateKeySecretRef: name: letsencrypt-dns-key solvers: - dns01: cloudflare: apiTokenSecretRef: name: cloudflare-api-token key: api-token EOF
-
Update Ingress resources to use Let's Encrypt issuer and your domain
-
Commit and push to Git - ArgoCD syncs changes automatically
Verify certificates:
kubectl get certificate -n auth-system
kubectl describe certificate authentik-tls -n auth-systemSecrets should be rotated periodically for security. The authentication system supports secret rotation without downtime.
Via Authentik UI (Recommended):
- Login to Authentik UI
- Navigate to Directory → Users →
admin - Click Set password
- Enter new password
- Click Update
- Update Kubernetes Secret:
kubectl create secret generic authentik-secrets \ -n auth-system \ --from-literal=secret-key="$(kubectl get secret authentik-secrets -n auth-system -o jsonpath='{.data.secret-key}' | base64 -d)" \ --from-literal=admin-password="NEW_PASSWORD" \ --dry-run=client -o yaml | kubectl apply -f -
-
Generate new secret:
NEW_SECRET=$(openssl rand -base64 32) -
Update Kubernetes Secret:
kubectl create secret generic dex-secrets \ -n auth-system \ --from-literal=client-secret="$NEW_SECRET" \ --dry-run=client -o yaml | kubectl apply -f -
-
Update Authentik OIDC provider:
- Login to Authentik UI
- Navigate to Applications → Providers → Dex OIDC Provider
- Update Client Secret field
- Click Update
-
Restart Dex:
kubectl rollout restart deployment/dex -n auth-system
-
Create new token via Authentik UI:
- Navigate to Directory → Tokens → Create
- Set identifier: "config-sync-job-new"
- Set intent: "API"
- Copy token value
-
Update Kubernetes Secret:
kubectl create secret generic authentik-api-token \ -n auth-system \ --from-literal=token="NEW_TOKEN_HERE" \ --dry-run=client -o yaml | kubectl apply -f -
-
Revoke old token:
- Navigate to Directory → Tokens
- Find old token → Delete
The Config Sync Job orchestrates Authentik-Dex integration. It runs automatically during bootstrap but can be manually triggered.
# Check Job status
kubectl get job auth-config-sync -n auth-system
# Check Job pod status
kubectl get pods -n auth-system -l app=auth-config-sync
# View Job logs
kubectl logs -n auth-system -l app=auth-config-syncTo re-run the Job (e.g., after fixing a configuration issue):
# Delete existing Job
kubectl delete job auth-config-sync -n auth-system
# ArgoCD will recreate the Job automatically
# Or manually apply:
kubectl apply -f platform/auth/config-sync/job.yaml
# Watch Job progress
kubectl logs -n auth-system -l app=auth-config-sync -f# Check if Job completed successfully
kubectl get job auth-config-sync -n auth-system -o jsonpath='{.status.conditions[?(@.type=="Complete")].status}'
# Should output: True
# Get detailed Job status
kubectl describe job auth-config-sync -n auth-systemAll authentication system components are managed by ArgoCD. To make changes:
Step 1: Update manifests in Git
# Clone repository
git clone https://github.com/bdchatham/AphexPlatformInfrastructure.git
cd AphexPlatformInfrastructure
# Update authentication manifests
# Example: Update Authentik image version
vi platform/auth/authentik/server-deployment.yaml
# Commit changes
git add .
git commit -m "Update Authentik to v2024.2.2"
git pushStep 2: ArgoCD detects and syncs changes
ArgoCD polls Git every 3 minutes and detects changes automatically.
# Watch ArgoCD sync status
kubectl get application platform-auth -n argocd -w
# Or view in ArgoCD UI
# https://argocd.home.localStep 3: Verify changes
# Check pod status
kubectl get pods -n auth-system
# Check ArgoCD sync status
kubectl get application platform-auth -n argocd
# View sync details
kubectl describe application platform-auth -n argocdNo manual kubectl apply required - ArgoCD handles all deployments and updates.
Symptom: User cannot log in to ArgoCD or Tekton Dashboard
Diagnosis:
# Check DNS resolution
nslookup auth.home.local
nslookup dex.home.local
# Check Authentik OIDC discovery
curl https://auth.home.local/application/o/dex/.well-known/openid-configuration
# Check Dex OIDC discovery
curl https://dex.home.local/.well-known/openid-configuration
# Check user groups in Authentik UI
# Login → Directory → Users → Select user → Groups tabResolution:
- Configure DNS if resolution fails
- Verify redirect URI configuration in Dex and Authentik
- Verify user is in correct group (
adminsorengineering)
Symptom: Dex pod fails to start repeatedly
Diagnosis:
# Check Dex logs
kubectl logs -n auth-system deployment/dex
# Verify Dex starts with replicas=0
kubectl get deployment dex -n auth-system -o jsonpath='{.spec.replicas}'
# Check Config Sync Job status
kubectl get job auth-config-sync -n auth-system
kubectl logs -n auth-system job/auth-config-syncResolution:
- Verify Authentik is running and healthy
- Check Dex ConfigMap for syntax errors
- Verify Config Sync Job completed successfully
- Re-run Config Sync Job if needed
Symptom: Changes to Git are not applied to cluster
Diagnosis:
# Check Application status
kubectl get application platform-auth -n argocd
# Check Application details
kubectl describe application platform-auth -n argocd
# Check for sync errors
kubectl get application platform-auth -n argocd -o json | \
jq '.status.conditions[] | select(.type=="SyncError")'Resolution:
- Verify ArgoCD auto-sync is enabled
- Check for invalid YAML syntax in manifests
- Manually trigger sync:
kubectl patch application platform-auth -n argocd \ --type merge -p '{"operation":{"initiatedBy":{"username":"admin"},"sync":{}}}'
Symptom: Pods fail to start with "secret not found" errors
Diagnosis:
# Check if secrets exist
kubectl get secrets -n auth-system
# Expected secrets:
# - authentik-postgresql
# - authentik-secrets
# - dex-secrets
# - authentik-api-tokenResolution:
- Re-run bootstrap to regenerate secrets:
./platform/bootstrap/bootstrap.sh
- Or manually create missing secrets (see
platform/auth/secrets/README.md)
Source
platform/auth/README.mdplatform/auth/secrets/README.mdplatform/auth/ingress/README.mdplatform/auth/config-sync/README.mdplatform/bootstrap/bootstrap.sh
The platform upgrades itself via ArgoCD when manifests change in Git. No manual kubectl apply or custom upgrade scripts are needed.
Step 1: Update Manifests in Git
# Clone platform repository
git clone https://github.com/bdchatham/AphexPlatformInfrastructure.git
cd AphexPlatformInfrastructure
# Update component manifests
# Example: Update controller image
vi platform/platform-controller/controller-deployment.yaml # Update image tag
# Commit changes
git add .
git commit -m "Update onboarding controller to v1.1.0"
git pushStep 2: ArgoCD Detects Changes
ArgoCD polls Git every 3 minutes (default) and detects changes automatically.
# Watch ArgoCD sync status
kubectl get application -n argocd -w
# Or view in ArgoCD UI
# http://localhost:8080Step 3: ArgoCD Syncs Changes
ArgoCD automatically syncs changes based on sync policy:
- Automated sync: Changes applied automatically
- Self-heal: Drift corrected automatically
- Prune: Removed resources deleted automatically
# Check sync status
kubectl get application platform-root -n argocd
# View sync details
kubectl describe application platform-root -n argocd
# Check child Applications
kubectl get application -n argocdStep 4: Verify Upgrade
# Check component versions
kubectl get deployment -n platform-system -o wide
# Check pod status
kubectl get pods -n platform-system
# Check ArgoCD sync status
kubectl get application -n argocdIf automatic sync is disabled or you want to sync immediately:
# Sync via kubectl
kubectl patch application platform-root -n argocd --type merge -p '{"operation":{"initiatedBy":{"username":"admin"},"sync":{"revision":"HEAD"}}}'
# Or sync via ArgoCD CLI
argocd app sync platform-root
# Or sync via ArgoCD UI
# Click "Sync" button in UISource
platform/argocd/apps/platform-root.yaml
# Check ArgoCD
kubectl get pods -n argocd
# Check Tekton controllers
kubectl get pods -n tekton-pipelines
# Check platform controller
kubectl get pods -n platform-system -l app=platform-controller
# Check ArgoCD Applications
kubectl get application -n argocd# List all Applications
kubectl get application -n argocd
# Get Application sync status
kubectl get application platform-root -n argocd -o jsonpath='{.status.sync.status}'
# View sync details
kubectl describe application platform-root -n argocd
# Check for sync errors
kubectl get application -n argocd -o json | jq '.items[] | select(.status.sync.status != "Synced") | {name: .metadata.name, status: .status.sync.status, message: .status.conditions[0].message}'# List all PipelineRuns
kubectl get pipelineruns --all-namespaces
# Get PipelineRun details
kubectl describe pipelinerun <name> -n <pipeline-namespace>
# View PipelineRun logs
kubectl logs -n <pipeline-namespace> -l tekton.dev/pipelineRun=<name>
# Watch PipelineRun status
kubectl get pipelinerun <name> -n <pipeline-namespace> -w# View EventListener logs for an organization
kubectl logs -n org-<organization-name> -l eventlistener=github-listener --tail=100
# Stream EventListener logs
kubectl logs -n org-<organization-name> -l eventlistener=github-listener -f
# Search for specific webhook events
kubectl logs -n org-<organization-name> -l eventlistener=github-listener | grep "webhook"# View controller logs
kubectl logs -n platform-system -l app=platform-controller --tail=100
# Stream controller logs
kubectl logs -n platform-system -l app=platform-controller -f
# Search for specific RepoBinding
kubectl logs -n platform-system -l app=platform-controller | grep "repobinding-name"Source
platform/argocd/apps/platform/platform-controller/controller-deployment.yaml
Symptoms: EventListener pod crashes with "empty caBundle in clusterInterceptor spec" error.
Diagnosis:
# Check EventListener pod status in organization namespace
kubectl get pods -n org-<organization-name> -l eventlistener=github-listener
# Check EventListener logs
kubectl logs -n org-<organization-name> -l eventlistener=github-listener --tail=50
# Check if ClusterInterceptors exist
kubectl get clusterinterceptors
# Check if Core Interceptors deployment exists
kubectl get deployment tekton-triggers-core-interceptors -n tekton-pipelinesResolution:
This error occurs when Tekton Triggers Core Interceptors are not installed. The Core Interceptors provide ClusterInterceptor resources (github, gitlab, cel, etc.) that EventListeners need.
# Install Core Interceptors
kubectl apply -f https://infra.tekton.dev/tekton-releases/triggers/previous/v0.34.0/interceptors.yaml
# Verify ClusterInterceptors are created
kubectl get clusterinterceptors
# Delete EventListener pod to restart
kubectl delete pod -n org-<organization-name> -l eventlistener=github-listener
# Verify EventListener is running
kubectl get pods -n org-<organization-name> -l eventlistener=github-listenerPrevention: Ensure bootstrap script installs Core Interceptors, or ensure platform-tekton ArgoCD Application includes interceptors.yaml.
Symptoms: EventListener pod logs show "cannot list resource clusterinterceptors" or "cannot list resource clustertriggerbindings" errors.
Diagnosis:
# Check EventListener logs
kubectl logs -n org-<organization-name> -l eventlistener=github-listener --tail=50
# Check if ClusterRole exists for pipeline
kubectl get clusterrole pipeline-runner-<pipeline-name>
# Check if ClusterRoleBinding exists for pipeline
kubectl get clusterrolebinding pipeline-runner-<pipeline-name>Resolution:
EventListener pods need cluster-scoped read permissions for ClusterInterceptor and ClusterTriggerBinding resources. The onboarding controller should provision these automatically.
# Check if controller provisioned cluster-scoped RBAC
kubectl describe clusterrole pipeline-runner-<pipeline-name>
kubectl describe clusterrolebinding pipeline-runner-<pipeline-name>
# If missing, delete and recreate RepoBinding to trigger reprovisioning
kubectl delete repobinding <name> -n platform-system
kubectl apply -f repobinding.yaml
# Verify cluster-scoped RBAC was created
kubectl get clusterrole pipeline-runner-<pipeline-name>
kubectl get clusterrolebinding pipeline-runner-<pipeline-name>
# Delete EventListener pod to restart with new permissions
kubectl delete pod -n org-<organization-name> -l eventlistener=github-listenerPrevention: Ensure platform controller has permissions to create ClusterRoles and ClusterRoleBindings (check platform/platform-controller/controller-rbac.yaml).
Diagnosis:
# Check if EventListener exists in organization namespace
kubectl get eventlistener -n org-<organization-name>
# Check EventListener logs for webhook events
kubectl logs -n org-<organization-name> -l eventlistener=github-listener | grep "webhook"
# Check if Ingress exists
kubectl get ingress -n org-<organization-name>
# Check if webhook secret exists in organization namespace
kubectl get secret github-webhook-secret -n org-<organization-name>Resolution:
- Verify EventListener is running
- Verify Ingress is configured correctly
- Verify GitHub webhook is configured with correct URL and secret
- Check EventListener logs for error messages
Diagnosis:
# Get PipelineRun status
kubectl get pipelinerun <name> -n <pipeline-namespace>
# Get detailed status
kubectl describe pipelinerun <name> -n <pipeline-namespace>
# Get pod logs
kubectl logs -n <pipeline-namespace> -l tekton.dev/pipelineRun=<name>
# Check pod events
kubectl get events -n <pipeline-namespace> --sort-by='.lastTimestamp'Common Issues:
- Git clone failure: Check repository access
- CDKTF synth failure: Check Node.js dependencies and syntax errors
- CDKTF deploy failure: Check Terraform state and permissions
- RBAC denial: Check service account permissions
Diagnosis:
# Check RepoBinding status
kubectl get repobinding <name> -n platform-system
kubectl describe repobinding <name> -n platform-system
# Check controller logs
kubectl logs -n platform-system -l app=platform-controller | grep "<name>"Common Issues:
- Invalid namespace pattern: Namespace name doesn't match pattern
- RBAC failure: Controller lacks permissions to create resources
- EventListener creation failed: Check Tekton Triggers installation
- Ingress creation failed: Check Ingress controller installation
Diagnosis:
# Check Application sync status
kubectl get application -n argocd
# Get sync error details
kubectl describe application <name> -n argocd
# View Application events
kubectl get events -n argocd --field-selector involvedObject.name=<name>
# Check ArgoCD controller logs
kubectl logs -n argocd -l app.kubernetes.io/name=argocd-application-controllerCommon Issues:
- Invalid manifest: YAML syntax errors in Git
- Resource conflicts: Resource already exists with different configuration
- RBAC denial: ArgoCD lacks permissions to create resources
- Git connection failure: ArgoCD cannot access Git repository
Source
platform/argocd/apps/platform/platform-controller/controller/platform/catalog/
# Backup RepoBindings
kubectl get repobindings -n platform-system -o yaml > repobindings-backup.yaml
# Backup ArgoCD Applications
kubectl get applications -n argocd -o yaml > applications-backup.yaml
# Backup platform manifests (already in Git)
# No backup needed - Git is the source of truthStep 1: Re-run Bootstrap
# Run bootstrap on new cluster
cd platform/bootstrap
./bootstrap.sh --cluster-name aphex-platform --repo-url https://github.com/bdchatham/AphexPlatformInfrastructureStep 2: Wait for ArgoCD to Sync
ArgoCD will automatically sync all platform components from Git.
# Watch ArgoCD sync
kubectl get application -n argocd -w
# Verify all Applications are synced
kubectl get application -n argocdStep 3: Restore RepoBindings
# Restore RepoBindings
kubectl apply -f repobindings-backup.yaml
# Verify onboarding
kubectl get repobindings -n platform-system
kubectl get namespaces -l aphex/managed-by=platform-controllerStep 4: Verify Platform
# Check all components
kubectl get pods -n argocd
kubectl get pods -n tekton-pipelines
kubectl get pods -n platform-system
# Check organization namespaces
kubectl get namespaces -l aphex/managed-by=platform-controller
# Check ArgoCD sync status
kubectl get application -n argocdSource
platform/bootstrap/bootstrap.sh.kiro/specs/argocd-tekton-platform/requirements.md(Requirement 12.1, 12.4)
Symptoms: No PipelineRun created after merge to main.
Diagnosis:
# Check EventListener logs in organization namespace
kubectl logs -n org-<organization-name> -l eventlistener=github-listener --tail=100
# Check Ingress configuration
kubectl get ingress -n org-<organization-name> -o yaml
# Check webhook secret in organization namespace
kubectl get secret github-webhook-secret -n org-<organization-name>
# Check GitHub webhook delivery logs
# Go to GitHub repository Settings → Webhooks → Recent DeliveriesResolution:
- Verify Ingress is accessible from GitHub
- Verify webhook secret matches GitHub configuration
- Verify EventListener is running
- Check GitHub webhook delivery logs for errors
Symptoms: Changes committed to Git but ArgoCD not syncing.
Diagnosis:
# Check Application sync status
kubectl get application platform-root -n argocd
# Check ArgoCD controller logs
kubectl logs -n argocd -l app.kubernetes.io/name=argocd-application-controller --tail=100
# Check Git repository connectivity
kubectl exec -n argocd -it <argocd-repo-server-pod> -- git ls-remote https://github.com/bdchatham/AphexPlatformInfrastructureResolution:
- Verify ArgoCD can access Git repository
- Verify sync policy is configured (automated sync enabled)
- Manually trigger sync if needed
- Check for manifest errors in Git
Symptoms: RepoBinding created but status remains "Pending".
Diagnosis:
# Check controller logs
kubectl logs -n platform-system -l app=platform-controller --tail=100
# Check controller pod status
kubectl get pods -n platform-system -l app=platform-controller
# Check RepoBinding status
kubectl describe repobinding <name> -n platform-systemResolution:
- Verify controller is running
- Check controller logs for errors
- Verify controller has RBAC permissions
- Restart controller if needed:
kubectl rollout restart deployment platform-controller -n platform-system
Source
platform/platform-controller/controller/platform/argocd/apps/platform/tenancy/templates/
Tekton is managed by ArgoCD after bootstrap. To update Tekton versions:
# Update Tekton versions in kustomization
vi platform/tekton/kustomization.yaml
# Update resource URLs to new versions
# Example: Update to Tekton Pipelines v0.66.0
resources:
- https://github.com/tektoncd/pipeline/releases/download/v0.66.0/release.yaml
- https://github.com/tektoncd/triggers/releases/download/v0.30.0/release.yaml
- https://github.com/tektoncd/triggers/releases/download/v0.30.0/interceptors.yaml
# Commit changes
git add platform/tekton/kustomization.yaml
git commit -m "Update Tekton to v0.66.0"
git push
# ArgoCD will automatically sync and update Tekton
# Watch sync status
kubectl get application platform-tekton -n argocd -w
# Verify updates
kubectl get pods -n tekton-pipelinesNote: The bootstrap script installs Tekton initially, but ArgoCD manages updates from Git. This enables GitOps-based Tekton upgrades.
# Update ArgoCD version in bootstrap script
vi platform/bootstrap/bootstrap.sh
# Update version URL
# ArgoCD: https://raw.githubusercontent.com/argoproj/argo-cd/v2.9.3/manifests/install.yaml
# Commit changes
git add platform/bootstrap/bootstrap.sh
git commit -m "Update ArgoCD to v2.9.3"
git push
# ArgoCD will NOT automatically update itself (installed by bootstrap)
# To update ArgoCD, manually apply new manifests:
kubectl apply -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.9.3/manifests/install.yaml# Build new version
cd platform/platform-controller/controller
docker build -t your-registry/platform-controller:v1.1.0 .
docker push your-registry/platform-controller:v1.1.0
# Update deployment manifest
vi platform/platform-controller/controller-deployment.yaml
# Update image tag to v1.1.0
# Commit changes
git add platform/platform-controller/controller-deployment.yaml
git commit -m "Update platform controller to v1.1.0"
git push
# ArgoCD will automatically sync and update the controller
# Watch sync status
kubectl get application platform-controllers -n argocd -w# Update catalog resources
vi platform/catalog/tasks/cdktf-deploy.yaml
# Make changes
# Commit changes
git add platform/catalog/
git commit -m "Update CDKTF deploy task"
git push
# ArgoCD will automatically sync and update the catalog
# Watch sync status
kubectl get application platform-catalog -n argocd -w
# Verify updates
kubectl get tasks -n platform-system# List old PipelineRuns
kubectl get pipelineruns --all-namespaces --sort-by=.metadata.creationTimestamp
# Delete PipelineRuns older than 30 days
kubectl get pipelineruns --all-namespaces -o json | \
jq -r '.items[] | select(.metadata.creationTimestamp < "'$(date -d '30 days ago' -Iseconds)'") | "\(.metadata.namespace) \(.metadata.name)"' | \
xargs -n2 kubectl delete pipelinerun -nSource
platform/bootstrap/bootstrap.shplatform/platform-controller/controller-deployment.yamlplatform/catalog/
Webhook secrets are generated by the Onboarding Controller and stored in organization namespaces. To rotate:
# Delete existing secret in organization namespace
kubectl delete secret github-webhook-secret -n org-<organization-name>
# Delete and recreate Organization to regenerate secret
kubectl delete organization <organization-name> -n platform-system
kubectl apply -f organization.yaml
# Get new webhook secret from Organization status
kubectl get organization <organization-name> -n platform-system -o yaml
# Update GitHub webhook with new secret# List all Roles in pipeline namespaces
kubectl get roles --all-namespaces | grep -v "kube-"
# Review specific Role in pipeline namespace
kubectl get role pipeline-runner -n <pipeline-namespace> -o yaml
# Check what a service account can do
kubectl auth can-i --list --as=system:serviceaccount:<pipeline-namespace>:pipeline-runner -n <pipeline-namespace># List all NetworkPolicies
kubectl get networkpolicies --all-namespaces
# Review specific NetworkPolicy in pipeline namespace
kubectl get networkpolicy pipeline-isolation -n <pipeline-namespace> -o yaml
# Test network connectivity
kubectl run -it --rm debug --image=busybox --restart=Never -n <pipeline-namespace> -- wget -O- http://<service>.<other-namespace>.svc.cluster.localSource
platform/platform-controller/controller/platform/tenancy/templates/
Prerequisites:
- Kubernetes cluster (1.24+) with RBAC enabled
- kubectl configured with cluster admin access
- GitHub organization with admin access
Bootstrap Steps:
-
Clone Repository:
git clone https://github.com/bdchatham/AphexPlatformInfrastructure.git cd AphexPlatformInfrastructure -
Run Bootstrap:
cd platform/bootstrap ./bootstrap.sh --cluster-name aphex-platform --repo-url https://github.com/bdchatham/AphexPlatformInfrastructure -
Verify Bootstrap:
# Check all components kubectl get pods -n argocd kubectl get pods -n tekton-pipelines kubectl get pods -n platform-system # Check ArgoCD Applications kubectl get application -n argocd
-
Access ArgoCD UI:
# Get admin password kubectl get secret argocd-initial-admin-secret -n argocd -o jsonpath='{.data.password}' | base64 -d # Port-forward kubectl port-forward svc/argocd-server -n argocd 8080:443 # Access at http://localhost:8080
Source
platform/bootstrap/bootstrap.sh
Prerequisites:
- Platform bootstrapped and running
- kubectl configured with cluster access
- GitHub repository ready for onboarding
Onboarding Steps:
-
Create RepoBinding:
kubectl apply -f - <<EOF apiVersion: aphex.io/v1alpha1 kind: RepoBinding metadata: name: my-repo-binding namespace: platform-system spec: aphexOrg: "my-org" repoOrg: "your-github-org" repoName: "your-repo" pipelineName: "my-pipeline" templateRef: "run-pipeline-v1" EOF
-
Verify Onboarding:
# Check RepoBinding status kubectl get repobinding my-repo-binding -n platform-system # Verify organization namespace kubectl get namespace org-my-org # Verify pipeline namespace kubectl get namespace my-pipeline # Verify all resources in organization namespace kubectl get all -n org-my-org
-
Configure GitHub Webhook:
# Get webhook URL from Organization status kubectl get organization my-org -n platform-system -o jsonpath='{.status.webhookURL}' # Get webhook secret from organization namespace kubectl get secret github-webhook-secret -n org-my-org -o jsonpath='{.data.secret}' | base64 -d # Configure in GitHub repository Settings → Webhooks
-
Test Pipeline:
# Merge a commit to main branch # Watch for PipelineRun creation in pipeline namespace kubectl get pipelineruns -n my-pipeline -w
Source
platform/crds/repobinding-crd.yamlplatform/platform-controller/controller/
Source
.kiro/specs/argocd-tekton-platform/design.md.kiro/specs/argocd-tekton-platform/requirements.mdplatform/bootstrap/bootstrap.shplatform/argocd/apps/platform/platform-controller/controller/platform/catalog/platform/tenancy/templates/