A complete setup for testing Apache YuniKorn scheduler with Karpenter autoscaling using KWOK (Kubernetes WithOut Kubelet) for simulated nodes.
This environment allows you to:
- Test YuniKorn scheduling behavior
- Validate Karpenter autoscaling with YuniKorn
- Simulate large-scale clusters locally without resource overhead
- Experiment with queue configurations and resource limits
- Kind cluster: Control-plane node + 1 worker node for running real workloads
- YuniKorn: Apache YuniKorn scheduler (replaces default kube-scheduler)
- KWOK: Simulates Kubernetes nodes without running actual containers
- Karpenter: Cluster autoscaler that provisions nodes based on pending pods with 147 realistic AWS instance types (m, c, r families across 6th/7th gen, Intel/AMD/Graviton)
- Prometheus + Grafana: Monitoring stack with auto-imported cluster overview dashboard
- Gateway API (optional): Kubernetes Gateway API v1.4.0 for unified UI access
- kgateway (optional): Solo.io's lightweight gateway implementation (v2.1.1)
- Example workloads: Various test deployments to trigger autoscaling
Before starting, ensure you have the following tools installed:
Optional but useful:
Run the automated setup script to create the entire environment:
./setup.shThis script will:
- Create a Kind cluster named
kind-yunikarpwith a control-plane and worker node - Label the worker node with
node-role=real - Install YuniKorn scheduler
- Install KWOK for node simulation
- Clone, build, and install Karpenter with KWOK provider (includes Prometheus & Grafana)
- Configure Karpenter with 147 realistic AWS instance types
- Configure Karpenter NodePool and NodeClass
- Taint the control-plane node
- Import Grafana "Cluster Resource Overview" dashboard
Note: The setup takes 5-10 minutes, mainly for cloning Karpenter and building the controller.
By default, the setup uses direct kubectl port-forward to access UIs. To install Gateway API and kgateway for unified UI access:
INSTALL_GATEWAY=true ./setup.shWhen Gateway is enabled:
- Access UIs via single port-forward:
kubectl port-forward -n kgateway-system svc/ui-gateway 8080:80 - YuniKorn UI: http://yunikorn.localhost:8080
- Grafana UI: http://grafana.localhost:8080
When Gateway is disabled (default):
- YuniKorn UI:
kubectl port-forward -n yunikorn svc/yunikorn-service 9889:9889→ http://localhost:9889 - Grafana UI:
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80→ http://localhost:3000
CLUSTER_NAME: Set custom cluster name (default:kind-yunikarp)KARPENTER_VERSION: Set Karpenter version (default:v1.8.0)
Example:
CLUSTER_NAME=my-test-cluster KARPENTER_VERSION=v1.9.0 ./setup.shIf you prefer manual control or want to understand each step, see the Manual Setup section below.
Once setup is complete, try the example workloads:
# Deploy a small test workload
kubectl apply -f examples/test-deployment.yaml
# Watch Karpenter create nodes
kubectl get nodes -wDefault method (port-forward directly to YuniKorn):
kubectl port-forward -n yunikorn svc/yunikorn-service 9889:9889
# Then open http://localhost:9889With Gateway API (if you installed with INSTALL_GATEWAY=true):
kubectl port-forward -n kgateway-system svc/ui-gateway 8080:80
# Then open http://yunikorn.localhost:8080# Deploy workloads that need more resources
kubectl apply -f examples/autoscaling-test.yaml
# Watch multiple nodes being created
kubectl get nodes -w
kubectl get pods -n autoscale-demo -wThe setup automatically imports a "Cluster Resource Overview" dashboard that shows:
- Node count per instance type
- Unschedulable pods count
- CPU requests vs capacity
- Memory requests vs capacity
- Resource requests breakdown by namespace
Access Grafana:
Default method (port-forward directly to Grafana):
# Get password
kubectl get secret prometheus-grafana -n monitoring -o jsonpath='{.data.admin-password}' | base64 -d
# Port-forward
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
# Open http://localhost:3000
# Username: admin, Password: (from above)
# Look for "Cluster Resource Overview" dashboardWith Gateway API (if you installed with INSTALL_GATEWAY=true):
# Get password
kubectl get secret prometheus-grafana -n monitoring -o jsonpath='{.data.admin-password}' | base64 -d
# Port-forward to Gateway
kubectl port-forward -n kgateway-system svc/ui-gateway 8080:80
# Open http://grafana.localhost:8080
# Username: admin, Password: (from above)
# Look for "Cluster Resource Overview" dashboard# View Karpenter logs
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -f
# View Karpenter provisioner decisions
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep provisioner┌───────────────────────────────────────────────────────────────┐
│ Kind Cluster (kind-yunikarp) │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Control Plane Node (tainted) │ │
│ │ - Kubernetes control plane components │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Worker Node (labeled: node-role=real) │ │
│ │ - YuniKorn Scheduler │ │
│ │ - YuniKorn Admission Controller │ │
│ │ - Karpenter Controller (147 AWS instance types) │ │
│ │ - KWOK Controller │ │
│ │ - kgateway (optional, if INSTALL_GATEWAY=true) │ │
│ │ - Prometheus + Grafana (for metrics) │ │
│ │ │ │
│ │ UIs (accessible via port-forward): │ │
│ │ - YuniKorn: localhost:9889 or yunikorn.localhost:8080 │ │
│ │ - Grafana: localhost:3000 or grafana.localhost:8080 │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ KWOK Nodes (simulated, created by Karpenter) │ │
│ │ - No kubelet, container runtime, or kernel │ │
│ │ - Test workloads scheduled here via YuniKorn │ │
│ │ - Instant scaling with minimal resources │ │
│ │ - Test pods explicitly exclude the real worker node │ │
│ └──────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────┘
- YuniKorn: Handles pod scheduling with queue-based resource management
- Karpenter: Monitors pending pods and provisions nodes (KWOK nodes in this case) from 147 realistic AWS instance types
- KWOK: Simulates node behavior without running actual kubelet
- Prometheus + Grafana: Monitoring stack with auto-imported cluster overview dashboard
- Gateway API (optional): Kubernetes-native API for ingress and routing, enabling unified UI access
- kgateway (optional): Lightweight implementation of Gateway API for exposing services
- Kind: Provides the Kubernetes cluster infrastructure
- Worker Node: Real node (labeled
node-role=real) where system components run, excluded from test workloads
- You deploy a workload with
schedulerName: yunikorn - YuniKorn tries to schedule pods but finds insufficient capacity
- Pods remain pending with reason "Insufficient cpu/memory"
- Karpenter detects pending pods and provisions KWOK nodes
- KWOK nodes register with the cluster
- YuniKorn schedules pods on the new nodes
- When pods are deleted, Karpenter consolidates and removes idle nodes
If you prefer to run each step manually:
Note: Steps 8, 9, and 10 (Gateway API setup) are optional. If skipped, access UIs directly via kubectl port-forward.
export CLUSTER_NAME=kind-yunikarp
cat <<EOF | kind create cluster --name "$CLUSTER_NAME" --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
EOF# Wait for worker node to be ready
kubectl wait --for=condition=Ready node/"${CLUSTER_NAME}-worker" --timeout=120s
# Label the worker node
kubectl label node "${CLUSTER_NAME}-worker" node-role=real --overwrite# Add Helm repo
helm repo add yunikorn https://apache.github.io/yunikorn-release
helm repo update
# Install YuniKorn
helm install yunikorn yunikorn/yunikorn \
--namespace yunikorn \
--create-namespaceYuniKorn components should avoid KWOK nodes (they'll naturally run on the real worker node):
# Wait for deployments to be ready
kubectl wait --for=condition=available --timeout=120s deployment/yunikorn-scheduler -n yunikorn
kubectl wait --for=condition=available --timeout=120s deployment/yunikorn-admission-controller -n yunikorn
# Patch scheduler
kubectl patch deployment yunikorn-scheduler -n yunikorn --type='json' -p='[
{"op": "add", "path": "/spec/template/spec/affinity", "value": {
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [{
"matchExpressions": [{
"key": "kwok.x-k8s.io/node",
"operator": "DoesNotExist"
}]
}]
}
}
}}
]'
# Patch admission controller
kubectl patch deployment yunikorn-admission-controller -n yunikorn --type='json' -p='[
{"op": "add", "path": "/spec/template/spec/affinity", "value": {
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [{
"matchExpressions": [{
"key": "kwok.x-k8s.io/node",
"operator": "DoesNotExist"
}]
}]
}
}
}}
]'helm repo add kwok https://kwok.sigs.k8s.io/charts/
helm repo update
helm upgrade --install kwok kwok/kwok \
--namespace kube-system \
-f kwok-helm-karpenter-config.yaml
helm upgrade --install kwok-stage kwok/stage-fast \
--namespace kube-system# Clone Karpenter repo
git clone https://github.com/kubernetes-sigs/karpenter.git
cd karpenter
git switch --detach v1.8.0
# Install Prometheus (required)
./hack/install-prometheus.sh
# Build and install Karpenter
export KWOK_REPO=kind.local
export KIND_CLUSTER_NAME=kind-yunikarp
make apply-with-kind
cd ..cat <<EOF | kubectl apply -f -
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
nodeClassRef:
name: default
kind: KWOKNodeClass
group: karpenter.kwok.sh
expireAfter: 720h
limits:
cpu: 10000
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 10s
---
apiVersion: karpenter.kwok.sh/v1alpha1
kind: KWOKNodeClass
metadata:
name: default
EOFSkip this step if you want to use direct port-forward instead of Gateway.
# Install Gateway API v1.4.0 CRDs
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/standard-install.yamlSkip this step if you skipped step 8.
# Install kgateway CRDs
helm upgrade -i --create-namespace --namespace kgateway-system \
--version v2.1.1 \
kgateway-crds \
oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
# Install kgateway
helm upgrade -i --create-namespace --namespace kgateway-system \
--version v2.1.1 \
kgateway \
oci://cr.kgateway.dev/kgateway-dev/charts/kgatewaySkip this step if you skipped step 8.
# First, create GatewayParameters with tmp volume for Envoy
cat <<EOF | kubectl apply -f -
apiVersion: gateway.kgateway.dev/v1alpha1
kind: GatewayParameters
metadata:
name: kgateway-params
namespace: kgateway-system
spec:
kube:
podTemplate:
extraVolumes:
- name: tmp
emptyDir: {}
envoyContainer:
extraVolumeMounts:
- name: tmp
mountPath: /tmp
EOF
# Then create Gateway and HTTPRoutes
cat <<EOF | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: ui-gateway
namespace: kgateway-system
spec:
gatewayClassName: kgateway
infrastructure:
parametersRef:
group: gateway.kgateway.dev
kind: GatewayParameters
name: kgateway-params
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: yunikorn-route
namespace: yunikorn
spec:
parentRefs:
- name: ui-gateway
namespace: kgateway-system
hostnames:
- "yunikorn.localhost"
rules:
- matches:
- path:
type: PathPrefix
value: /
backendRefs:
- name: yunikorn-service
port: 9889
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: grafana-route
namespace: monitoring
spec:
parentRefs:
- name: ui-gateway
namespace: kgateway-system
hostnames:
- "grafana.localhost"
rules:
- matches:
- path:
type: PathPrefix
value: /
backendRefs:
- name: prometheus-grafana
port: 80
EOFkubectl taint nodes kind-yunikarp-control-plane \
CriticalAddonsOnly:NoSchedule --overwriteThe dashboard is automatically imported by the setup script. To manually import:
# Get Grafana admin password
GRAFANA_PASSWORD=$(kubectl get secret prometheus-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 -d)
# Port-forward to Grafana
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 &
PF_PID=$!
# Import dashboard
curl -X POST \
-H "Content-Type: application/json" \
-d @dashboards/cluster-overview.json \
http://admin:${GRAFANA_PASSWORD}@localhost:3000/api/dashboards/db
# Kill port-forward
kill $PF_PIDYuniKorn uses queues to organize and limit resource allocation. By default, all pods go to the root.default queue.
kubectl get configmap yunikorn-configs -n yunikorn -o yaml# Apply the example queue configuration
kubectl apply -f examples/yunikorn-queue-config.yamlAssign pods to queues using labels:
metadata:
labels:
queue: "root.production" # Custom queue
applicationId: "my-app-001" # YuniKorn application ID# Check YuniKorn scheduler logs
kubectl logs -n yunikorn -l app=yunikorn
# Check pod events
kubectl describe pod <pod-name>
# Verify YuniKorn is running
kubectl get pods -n yunikorn# Check Karpenter logs
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter
# Verify NodePool exists
kubectl get nodepool
# Check for pending pods
kubectl get pods --all-namespaces --field-selector=status.phase=Pending# Check KWOK controller
kubectl get pods -n kube-system | grep kwok
# View KWOK controller logs
kubectl logs -n kube-system -l app=kwok-controller
# Verify stage configuration
kubectl get stages -n kube-system# Check Prometheus pods
kubectl get pods -n monitoring
# Port-forward Prometheus UI
kubectl port-forward -n monitoring svc/prometheus-server 9090:80
# Access Grafana (get password first)
kubectl get secrets prometheus-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 -d
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80kubectl delete namespace test-autoscaling
kubectl delete namespace autoscale-demokind delete cluster --name kind-yunikarprm -rf karpenter/The setup includes 147 realistic AWS instance types covering:
- M family (general purpose): m6i, m6a, m7i, m7a, m6g, m7g
- C family (compute optimized): c6i, c6a, c7i, c7a, c6g, c7g
- R family (memory optimized): r6i, r6a, r7i, r7a, r6g, r7g
- Generations: 6th and 7th gen Intel/AMD, Graviton2/3 (ARM)
- Sizes: xlarge, 2xlarge, 4xlarge, 8xlarge, 12xlarge, 16xlarge, 24xlarge (where available on AWS)
- Pricing: Realistic spot pricing for us-east-1a and us-east-1b
Instance types are defined in instance-types.json and automatically loaded into Karpenter.
To see what instance types Karpenter selected:
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep "instance-type"You can filter instance types in the NodePool requirements:
spec:
template:
spec:
requirements:
- key: node.kubernetes.io/instance-type
operator: In
values: ["m6i.2xlarge", "c6i.4xlarge", "r6i.8xlarge"]
# Or use NotIn to exclude certain types
- key: node.kubernetes.io/instance-type
operator: NotIn
values: ["*.24xlarge"] # Exclude largest instancesConfigure queue limits in the YuniKorn ConfigMap:
queues:
- name: production
resources:
max:
memory: 50G
vcore: 50Adjust Karpenter's consolidation behavior:
disruption:
consolidationPolicy: WhenEmpty # or WhenEmptyOrUnderutilized
consolidateAfter: 30s # Wait time before consolidatingIssues and pull requests welcome! This project is meant to help test YuniKorn and Karpenter integration.
This project is provided as-is for testing and educational purposes.