# Scheduling Configuration This document describes the supported configuration options for pod scheduling, including labels, affinity, anti-affinity, and Pod Disruption Budgets in the Karpenter IBM Cloud Provider. ## Labels ### Well-Known Labels (Supported) The following labels are recognized and validated by Karpenter during scheduling: #### Kubernetes Standard Labels - `kubernetes.io/arch` - Node architecture (e.g., `amd64`, `arm64`) - `kubernetes.io/os` - Operating system (e.g., `linux`, `windows`) - `node.kubernetes.io/instance-type` - IBM Cloud instance type (e.g., `bx2-2x8`, `cx2-4x8`) #### Topology Labels - `topology.kubernetes.io/zone` - Availability zone - `topology.kubernetes.io/region` - Region #### Karpenter Labels - `karpenter.sh/nodepool` - NodePool name - `karpenter.sh/nodeclaim` - NodeClaim name - `karpenter.sh/capacity-type` - Capacity type (e.g., `spot`, `on-demand`) #### IBM Cloud Provider Labels - `karpenter.ibm.sh/instance-family` - Instance family (e.g., `bx2`, `cx2`, `mx2`) - `karpenter.ibm.sh/instance-cpu` - Number of vCPUs - `karpenter.ibm.sh/instance-memory` - Memory in MB - `karpenter.ibm.sh/instance-network-bandwidth` - Network bandwidth - `karpenter.ibm.sh/instance-storage-policy` - Storage policy ### Custom Labels (Applied but Not Validated) Custom labels defined in NodePool templates are applied to nodes after provisioning but cannot be used for scheduling validation: ```yaml # ❌ This will cause scheduling failures apiVersion: v1 kind: Pod spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "workload" # Custom label - not validated operator: In values: ["specialized"] ``` ```yaml # ✅ This works correctly apiVersion: v1 kind: Pod spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "node.kubernetes.io/instance-type" operator: In values: ["bx2-2x8"] ``` ## Node Affinity ### Required Node Affinity Use `requiredDuringSchedulingIgnoredDuringExecution` for hard requirements: ```yaml apiVersion: v1 kind: Pod spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "node.kubernetes.io/instance-type" operator: In values: ["bx2-2x8", "bx2-4x16"] - key: "kubernetes.io/arch" operator: In values: ["amd64"] - key: "topology.kubernetes.io/zone" operator: In values: ["us-south-1", "us-south-2"] ``` ### Preferred Node Affinity Use `preferredDuringSchedulingIgnoredDuringExecution` for soft preferences: ```yaml apiVersion: v1 kind: Pod spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: "node.kubernetes.io/instance-type" operator: In values: ["bx2-2x8"] - weight: 50 preference: matchExpressions: - key: "karpenter.ibm.sh/instance-family" operator: In values: ["bx2"] ``` ### Supported Operators - `In` - Label value must be in the list - `NotIn` - Label value must not be in the list - `Exists` - Label key must exist - `DoesNotExist` - Label key must not exist - `Gt` - Label value must be greater than (numeric comparison) - `Lt` - Label value must be less than (numeric comparison) ## Pod Anti-Affinity ### Required Anti-Affinity Ensures pods are not scheduled on the same node or zone: ```yaml apiVersion: apps/v1 kind: Deployment spec: template: spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: "app" operator: In values: ["my-app"] topologyKey: "kubernetes.io/hostname" # Different nodes ``` ### Preferred Anti-Affinity Attempts to spread pods but allows co-location if necessary: ```yaml apiVersion: apps/v1 kind: Deployment spec: template: spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: "app" operator: In values: ["my-app"] topologyKey: "topology.kubernetes.io/zone" # Different zones ``` ### Supported Topology Keys - `kubernetes.io/hostname` - Spread across different nodes - `topology.kubernetes.io/zone` - Spread across different zones - `topology.kubernetes.io/region` - Spread across different regions - `karpenter.sh/nodepool` - Spread across different NodePools ## Pod Disruption Budgets (PDB) Pod Disruption Budgets are fully supported and help maintain application availability during voluntary disruptions: ### MinAvailable Configuration ```yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: my-app-pdb spec: minAvailable: 2 # Keep at least 2 pods running selector: matchLabels: app: my-app ``` ### MaxUnavailable Configuration ```yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: my-app-pdb spec: maxUnavailable: 25% # Allow up to 25% of pods to be unavailable selector: matchLabels: app: my-app ``` ### PDB Best Practices 1. **Set appropriate limits**: Balance availability needs with operational flexibility 2. **Use percentage values**: More flexible for scaling applications 3. **Monitor PDB violations**: Check for pods that cannot be evicted 4. **Consider multiple PDBs**: Different rules for different components ```yaml # Example: Web tier with high availability apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: web-tier-pdb spec: maxUnavailable: 1 selector: matchLabels: tier: web --- # Example: Background workers with more flexibility apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: worker-pdb spec: maxUnavailable: 50% selector: matchLabels: tier: worker ``` ## Node Selectors Simple key-value matching for node selection: ```yaml apiVersion: v1 kind: Pod spec: nodeSelector: node.kubernetes.io/instance-type: "bx2-4x16" kubernetes.io/arch: "amd64" ``` ## Taints and Tolerations ### Node Taints Taints are applied through NodePool configuration: ```yaml apiVersion: karpenter.sh/v1 kind: NodePool spec: template: spec: taints: - key: "dedicated" value: "gpu-workload" effect: "NoSchedule" - key: "special-hardware" effect: "NoExecute" ``` ### Pod Tolerations ```yaml apiVersion: v1 kind: Pod spec: tolerations: - key: "dedicated" operator: "Equal" value: "gpu-workload" effect: "NoSchedule" - key: "special-hardware" operator: "Exists" effect: "NoExecute" tolerationSeconds: 300 # Tolerate for 5 minutes ``` ## Configuration Examples ### High-Performance Computing Workload ```yaml apiVersion: v1 kind: Pod spec: nodeSelector: karpenter.ibm.sh/instance-family: "cx2" affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "node.kubernetes.io/instance-type" operator: In values: ["cx2-8x16", "cx2-16x32"] podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: app: hpc-worker topologyKey: kubernetes.io/hostname tolerations: - key: "dedicated" value: "hpc" effect: "NoSchedule" ``` ### Multi-Zone Deployment with PDB ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: web-app spec: replicas: 6 template: spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: app: web-app topologyKey: topology.kubernetes.io/zone --- apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: web-app-pdb spec: minAvailable: 4 selector: matchLabels: app: web-app ``` ## Troubleshooting ### Common Issues 1. **"incompatible requirements, label does not have known values"** - Cause: Using custom labels in node affinity - Solution: Use well-known labels listed in this document 2. **Pods stuck in Pending state** - Check node affinity requirements match available instance types - Verify NodePool has sufficient capacity limits - Ensure tolerations match node taints 3. **PDB blocking disruptions** - Review PDB constraints - Check if enough replicas are running - Consider adjusting `maxUnavailable` or `minAvailable` ### Debugging Commands ```bash # Check node labels kubectl get nodes --show-labels # Describe node for detailed information kubectl describe node # Check pod scheduling events kubectl describe pod # List PDBs and their status kubectl get pdb kubectl describe pdb # Check NodePool configuration kubectl describe nodepool ``` ## Migration from Custom Labels If you're currently using custom labels for scheduling, migrate to well-known labels: ```yaml # Before (will fail) affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "workload" # Custom label operator: In values: ["database"] # After (works correctly) affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "karpenter.ibm.sh/instance-family" operator: In values: ["mx2"] # Memory-optimized for database - key: "node.kubernetes.io/instance-type" operator: In values: ["mx2-4x32", "mx2-8x64"] ``` Custom labels can still be applied to nodes and used for non-scheduling purposes like monitoring, billing, or operational grouping.