YuniKorn + Karpenter + KWOK Test Environment

A complete setup for testing Apache YuniKorn scheduler with Karpenter autoscaling using KWOK (Kubernetes WithOut Kubelet) for simulated nodes.

This environment allows you to:

Test YuniKorn scheduling behavior
Validate Karpenter autoscaling with YuniKorn
Simulate large-scale clusters locally without resource overhead
Experiment with queue configurations and resource limits

What's Inside

Kind cluster: Control-plane node + 1 worker node for running real workloads
YuniKorn: Apache YuniKorn scheduler (replaces default kube-scheduler)
KWOK: Simulates Kubernetes nodes without running actual containers
Karpenter: Cluster autoscaler that provisions nodes based on pending pods with 147 realistic AWS instance types (m, c, r families across 6th/7th gen, Intel/AMD/Graviton)
Prometheus + Grafana: Monitoring stack with auto-imported cluster overview dashboard
Gateway API (optional): Kubernetes Gateway API v1.4.0 for unified UI access
kgateway (optional): Solo.io's lightweight gateway implementation (v2.1.1)
Example workloads: Various test deployments to trigger autoscaling

Prerequisites

Before starting, ensure you have the following tools installed:

kind (v0.20.0+)
kubectl (v1.28+)
helm (v3.12+)
go (v1.21+) - Required for building Karpenter

Optional but useful:

k9s - Terminal-based Kubernetes UI
jq - JSON processor for debugging

Quick Start

Automated Setup (Recommended)

Run the automated setup script to create the entire environment:

./setup.sh

This script will:

Create a Kind cluster named kind-yunikarp with a control-plane and worker node
Label the worker node with node-role=real
Install YuniKorn scheduler
Install KWOK for node simulation
Clone, build, and install Karpenter with KWOK provider (includes Prometheus & Grafana)
Configure Karpenter with 147 realistic AWS instance types
Configure Karpenter NodePool and NodeClass
Taint the control-plane node
Import Grafana "Cluster Resource Overview" dashboard

Note: The setup takes 5-10 minutes, mainly for cloning Karpenter and building the controller.

Configuration Options

Gateway API (Optional)

By default, the setup uses direct kubectl port-forward to access UIs. To install Gateway API and kgateway for unified UI access:

INSTALL_GATEWAY=true ./setup.sh

When Gateway is enabled:

Access UIs via single port-forward: kubectl port-forward -n kgateway-system svc/ui-gateway 8080:80
YuniKorn UI: http://yunikorn.localhost:8080
Grafana UI: http://grafana.localhost:8080

When Gateway is disabled (default):

YuniKorn UI: kubectl port-forward -n yunikorn svc/yunikorn-service 9889:9889 → http://localhost:9889
Grafana UI: kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 → http://localhost:3000

Other Options

CLUSTER_NAME: Set custom cluster name (default: kind-yunikarp)
KARPENTER_VERSION: Set Karpenter version (default: v1.8.0)

Example:

CLUSTER_NAME=my-test-cluster KARPENTER_VERSION=v1.9.0 ./setup.sh

Manual Setup

If you prefer manual control or want to understand each step, see the Manual Setup section below.

Testing Autoscaling

Once setup is complete, try the example workloads:

1. Simple Test

# Deploy a small test workload
kubectl apply -f examples/test-deployment.yaml

# Watch Karpenter create nodes
kubectl get nodes -w

2. View in YuniKorn UI

Default method (port-forward directly to YuniKorn):

kubectl port-forward -n yunikorn svc/yunikorn-service 9889:9889
# Then open http://localhost:9889

With Gateway API (if you installed with INSTALL_GATEWAY=true):

kubectl port-forward -n kgateway-system svc/ui-gateway 8080:80
# Then open http://yunikorn.localhost:8080

3. Larger Scale Test

# Deploy workloads that need more resources
kubectl apply -f examples/autoscaling-test.yaml

# Watch multiple nodes being created
kubectl get nodes -w
kubectl get pods -n autoscale-demo -w

4. Monitor with Grafana Dashboard

The setup automatically imports a "Cluster Resource Overview" dashboard that shows:

Node count per instance type
Unschedulable pods count
CPU requests vs capacity
Memory requests vs capacity
Resource requests breakdown by namespace

Access Grafana:

Default method (port-forward directly to Grafana):

# Get password
kubectl get secret prometheus-grafana -n monitoring -o jsonpath='{.data.admin-password}' | base64 -d

# Port-forward
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

# Open http://localhost:3000
# Username: admin, Password: (from above)
# Look for "Cluster Resource Overview" dashboard

With Gateway API (if you installed with INSTALL_GATEWAY=true):

# Get password
kubectl get secret prometheus-grafana -n monitoring -o jsonpath='{.data.admin-password}' | base64 -d

# Port-forward to Gateway
kubectl port-forward -n kgateway-system svc/ui-gateway 8080:80

# Open http://grafana.localhost:8080
# Username: admin, Password: (from above)
# Look for "Cluster Resource Overview" dashboard

5. Monitor Karpenter

# View Karpenter logs
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -f

# View Karpenter provisioner decisions
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep provisioner

Understanding the Setup

Architecture

┌───────────────────────────────────────────────────────────────┐
│  Kind Cluster (kind-yunikarp)                                 │
│                                                               │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │  Control Plane Node (tainted)                            │ │
│  │  - Kubernetes control plane components                   │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                               │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │  Worker Node (labeled: node-role=real)                   │ │
│  │  - YuniKorn Scheduler                                    │ │
│  │  - YuniKorn Admission Controller                         │ │
│  │  - Karpenter Controller (147 AWS instance types)         │ │
│  │  - KWOK Controller                                       │ │
│  │  - kgateway (optional, if INSTALL_GATEWAY=true)          │ │
│  │  - Prometheus + Grafana (for metrics)                    │ │
│  │                                                          │ │
│  │  UIs (accessible via port-forward):                      │ │
│  │  - YuniKorn: localhost:9889 or yunikorn.localhost:8080   │ │
│  │  - Grafana: localhost:3000 or grafana.localhost:8080     │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                               │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │  KWOK Nodes (simulated, created by Karpenter)            │ │
│  │  - No kubelet, container runtime, or kernel              │ │
│  │  - Test workloads scheduled here via YuniKorn            │ │
│  │  - Instant scaling with minimal resources                │ │
│  │  - Test pods explicitly exclude the real worker node     │ │
│  └──────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────┘

Component Roles

YuniKorn: Handles pod scheduling with queue-based resource management
Karpenter: Monitors pending pods and provisions nodes (KWOK nodes in this case) from 147 realistic AWS instance types
KWOK: Simulates node behavior without running actual kubelet
Prometheus + Grafana: Monitoring stack with auto-imported cluster overview dashboard
Gateway API (optional): Kubernetes-native API for ingress and routing, enabling unified UI access
kgateway (optional): Lightweight implementation of Gateway API for exposing services
Kind: Provides the Kubernetes cluster infrastructure
Worker Node: Real node (labeled node-role=real) where system components run, excluded from test workloads

How It Works

You deploy a workload with schedulerName: yunikorn
YuniKorn tries to schedule pods but finds insufficient capacity
Pods remain pending with reason "Insufficient cpu/memory"
Karpenter detects pending pods and provisions KWOK nodes
KWOK nodes register with the cluster
YuniKorn schedules pods on the new nodes
When pods are deleted, Karpenter consolidates and removes idle nodes

Manual Setup

If you prefer to run each step manually:

Note: Steps 8, 9, and 10 (Gateway API setup) are optional. If skipped, access UIs directly via kubectl port-forward.

1. Create Kind Cluster

export CLUSTER_NAME=kind-yunikarp

cat <<EOF | kind create cluster --name "$CLUSTER_NAME" --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
EOF

2. Label Worker Node

# Wait for worker node to be ready
kubectl wait --for=condition=Ready node/"${CLUSTER_NAME}-worker" --timeout=120s

# Label the worker node
kubectl label node "${CLUSTER_NAME}-worker" node-role=real --overwrite

3. Install YuniKorn

# Add Helm repo
helm repo add yunikorn https://apache.github.io/yunikorn-release
helm repo update

# Install YuniKorn
helm install yunikorn yunikorn/yunikorn \
  --namespace yunikorn \
  --create-namespace

4. Configure YuniKorn to Avoid KWOK Nodes

YuniKorn components should avoid KWOK nodes (they'll naturally run on the real worker node):

# Wait for deployments to be ready
kubectl wait --for=condition=available --timeout=120s deployment/yunikorn-scheduler -n yunikorn
kubectl wait --for=condition=available --timeout=120s deployment/yunikorn-admission-controller -n yunikorn

# Patch scheduler
kubectl patch deployment yunikorn-scheduler -n yunikorn --type='json' -p='[
  {"op": "add", "path": "/spec/template/spec/affinity", "value": {
    "nodeAffinity": {
      "requiredDuringSchedulingIgnoredDuringExecution": {
        "nodeSelectorTerms": [{
          "matchExpressions": [{
            "key": "kwok.x-k8s.io/node",
            "operator": "DoesNotExist"
          }]
        }]
      }
    }
  }}
]'

# Patch admission controller
kubectl patch deployment yunikorn-admission-controller -n yunikorn --type='json' -p='[
  {"op": "add", "path": "/spec/template/spec/affinity", "value": {
    "nodeAffinity": {
      "requiredDuringSchedulingIgnoredDuringExecution": {
        "nodeSelectorTerms": [{
          "matchExpressions": [{
            "key": "kwok.x-k8s.io/node",
            "operator": "DoesNotExist"
          }]
        }]
      }
    }
  }}
]'

5. Install KWOK

helm repo add kwok https://kwok.sigs.k8s.io/charts/
helm repo update

helm upgrade --install kwok kwok/kwok \
  --namespace kube-system \
  -f kwok-helm-karpenter-config.yaml

helm upgrade --install kwok-stage kwok/stage-fast \
  --namespace kube-system

6. Install Karpenter

# Clone Karpenter repo
git clone https://github.com/kubernetes-sigs/karpenter.git
cd karpenter
git switch --detach v1.8.0

# Install Prometheus (required)
./hack/install-prometheus.sh

# Build and install Karpenter
export KWOK_REPO=kind.local
export KIND_CLUSTER_NAME=kind-yunikarp
make apply-with-kind

cd ..

7. Configure Karpenter

cat <<EOF | kubectl apply -f -
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
      nodeClassRef:
        name: default
        kind: KWOKNodeClass
        group: karpenter.kwok.sh
      expireAfter: 720h
  limits:
    cpu: 10000
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 10s
---
apiVersion: karpenter.kwok.sh/v1alpha1
kind: KWOKNodeClass
metadata:
  name: default
EOF

8. Install Gateway API (Optional)

Skip this step if you want to use direct port-forward instead of Gateway.

# Install Gateway API v1.4.0 CRDs
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/standard-install.yaml

9. Install kgateway (Optional)

Skip this step if you skipped step 8.

# Install kgateway CRDs
helm upgrade -i --create-namespace --namespace kgateway-system \
  --version v2.1.1 \
  kgateway-crds \
  oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds

# Install kgateway
helm upgrade -i --create-namespace --namespace kgateway-system \
  --version v2.1.1 \
  kgateway \
  oci://cr.kgateway.dev/kgateway-dev/charts/kgateway

10. Configure Gateway and HTTPRoutes (Optional)

Skip this step if you skipped step 8.

# First, create GatewayParameters with tmp volume for Envoy
cat <<EOF | kubectl apply -f -
apiVersion: gateway.kgateway.dev/v1alpha1
kind: GatewayParameters
metadata:
  name: kgateway-params
  namespace: kgateway-system
spec:
  kube:
    podTemplate:
      extraVolumes:
      - name: tmp
        emptyDir: {}
    envoyContainer:
      extraVolumeMounts:
      - name: tmp
        mountPath: /tmp
EOF

# Then create Gateway and HTTPRoutes
cat <<EOF | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: ui-gateway
  namespace: kgateway-system
spec:
  gatewayClassName: kgateway
  infrastructure:
    parametersRef:
      group: gateway.kgateway.dev
      kind: GatewayParameters
      name: kgateway-params
  listeners:
  - name: http
    protocol: HTTP
    port: 80
    allowedRoutes:
      namespaces:
        from: All
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: yunikorn-route
  namespace: yunikorn
spec:
  parentRefs:
  - name: ui-gateway
    namespace: kgateway-system
  hostnames:
  - "yunikorn.localhost"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
    - name: yunikorn-service
      port: 9889
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: grafana-route
  namespace: monitoring
spec:
  parentRefs:
  - name: ui-gateway
    namespace: kgateway-system
  hostnames:
  - "grafana.localhost"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
    - name: prometheus-grafana
      port: 80
EOF

11. Taint Control Plane

kubectl taint nodes kind-yunikarp-control-plane \
  CriticalAddonsOnly:NoSchedule --overwrite

12. Import Grafana Dashboard

The dashboard is automatically imported by the setup script. To manually import:

# Get Grafana admin password
GRAFANA_PASSWORD=$(kubectl get secret prometheus-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 -d)

# Port-forward to Grafana
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 &
PF_PID=$!

# Import dashboard
curl -X POST \
  -H "Content-Type: application/json" \
  -d @dashboards/cluster-overview.json \
  http://admin:${GRAFANA_PASSWORD}@localhost:3000/api/dashboards/db

# Kill port-forward
kill $PF_PID

YuniKorn Queue Configuration

YuniKorn uses queues to organize and limit resource allocation. By default, all pods go to the root.default queue.

Viewing Current Config

kubectl get configmap yunikorn-configs -n yunikorn -o yaml

Applying Custom Queues

# Apply the example queue configuration
kubectl apply -f examples/yunikorn-queue-config.yaml

Using Queues in Pods

Assign pods to queues using labels:

metadata:
  labels:
    queue: "root.production"  # Custom queue
    applicationId: "my-app-001"  # YuniKorn application ID

Troubleshooting

Pods Not Scheduling

# Check YuniKorn scheduler logs
kubectl logs -n yunikorn -l app=yunikorn

# Check pod events
kubectl describe pod <pod-name>

# Verify YuniKorn is running
kubectl get pods -n yunikorn

Karpenter Not Creating Nodes

# Check Karpenter logs
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter

# Verify NodePool exists
kubectl get nodepool

# Check for pending pods
kubectl get pods --all-namespaces --field-selector=status.phase=Pending

KWOK Nodes Not Working

# Check KWOK controller
kubectl get pods -n kube-system | grep kwok

# View KWOK controller logs
kubectl logs -n kube-system -l app=kwok-controller

# Verify stage configuration
kubectl get stages -n kube-system

Prometheus Not Accessible

# Check Prometheus pods
kubectl get pods -n monitoring

# Port-forward Prometheus UI
kubectl port-forward -n monitoring svc/prometheus-server 9090:80

# Access Grafana (get password first)
kubectl get secrets prometheus-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 -d
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

Cleanup

Delete Specific Namespaces

kubectl delete namespace test-autoscaling
kubectl delete namespace autoscale-demo

Delete Entire Cluster

kind delete cluster --name kind-yunikarp

Clean Up Karpenter Clone

rm -rf karpenter/

Advanced Usage

Instance Types

The setup includes 147 realistic AWS instance types covering:

M family (general purpose): m6i, m6a, m7i, m7a, m6g, m7g
C family (compute optimized): c6i, c6a, c7i, c7a, c6g, c7g
R family (memory optimized): r6i, r6a, r7i, r7a, r6g, r7g
Generations: 6th and 7th gen Intel/AMD, Graviton2/3 (ARM)
Sizes: xlarge, 2xlarge, 4xlarge, 8xlarge, 12xlarge, 16xlarge, 24xlarge (where available on AWS)
Pricing: Realistic spot pricing for us-east-1a and us-east-1b

Instance types are defined in instance-types.json and automatically loaded into Karpenter.

To see what instance types Karpenter selected:

kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep "instance-type"

Custom Node Sizes

You can filter instance types in the NodePool requirements:

spec:
  template:
    spec:
      requirements:
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["m6i.2xlarge", "c6i.4xlarge", "r6i.8xlarge"]
        # Or use NotIn to exclude certain types
        - key: node.kubernetes.io/instance-type
          operator: NotIn
          values: ["*.24xlarge"]  # Exclude largest instances

Queue Resource Limits

Configure queue limits in the YuniKorn ConfigMap:

queues:
  - name: production
    resources:
      max:
        memory: 50G
        vcore: 50

Disruption Controls

Adjust Karpenter's consolidation behavior:

disruption:
  consolidationPolicy: WhenEmpty  # or WhenEmptyOrUnderutilized
  consolidateAfter: 30s  # Wait time before consolidating

Resources

Contributing

Issues and pull requests welcome! This project is meant to help test YuniKorn and Karpenter integration.

License

This project is provided as-is for testing and educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dashboards		dashboards
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gateway-params.yaml		gateway-params.yaml
instance-types.json		instance-types.json
kwok-helm-karpenter-config.yaml		kwok-helm-karpenter-config.yaml
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

YuniKorn + Karpenter + KWOK Test Environment

What's Inside

Prerequisites

Quick Start

Automated Setup (Recommended)

Configuration Options

Gateway API (Optional)

Other Options

Manual Setup

Testing Autoscaling

1. Simple Test

2. View in YuniKorn UI

3. Larger Scale Test

4. Monitor with Grafana Dashboard

5. Monitor Karpenter

Understanding the Setup

Architecture

Component Roles

How It Works

Manual Setup

1. Create Kind Cluster

2. Label Worker Node

3. Install YuniKorn

4. Configure YuniKorn to Avoid KWOK Nodes

5. Install KWOK

6. Install Karpenter

7. Configure Karpenter

8. Install Gateway API (Optional)

9. Install kgateway (Optional)

10. Configure Gateway and HTTPRoutes (Optional)

11. Taint Control Plane

12. Import Grafana Dashboard

YuniKorn Queue Configuration

Viewing Current Config

Applying Custom Queues

Using Queues in Pods

Troubleshooting

Pods Not Scheduling

Karpenter Not Creating Nodes

KWOK Nodes Not Working

Prometheus Not Accessible

Cleanup

Delete Specific Namespaces

Delete Entire Cluster

Clean Up Karpenter Clone

Advanced Usage

Instance Types

Custom Node Sizes

Queue Resource Limits

Disruption Controls

Resources

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages