# Scheduling Configuration

This document describes the supported configuration options for pod scheduling, including labels, affinity, anti-affinity, and Pod Disruption Budgets in the Karpenter IBM Cloud Provider.

## Labels

### Well-Known Labels (Supported)

The following labels are recognized and validated by Karpenter during scheduling:

#### Kubernetes Standard Labels
- `kubernetes.io/arch` - Node architecture (e.g., `amd64`, `arm64`)
- `kubernetes.io/os` - Operating system (e.g., `linux`, `windows`)
- `node.kubernetes.io/instance-type` - IBM Cloud instance type (e.g., `bx2-2x8`, `cx2-4x8`)

#### Topology Labels
- `topology.kubernetes.io/zone` - Availability zone
- `topology.kubernetes.io/region` - Region

#### Karpenter Labels
- `karpenter.sh/nodepool` - NodePool name
- `karpenter.sh/nodeclaim` - NodeClaim name
- `karpenter.sh/capacity-type` - Capacity type (e.g., `spot`, `on-demand`)

#### IBM Cloud Provider Labels
- `karpenter.ibm.sh/instance-family` - Instance family (e.g., `bx2`, `cx2`, `mx2`)
- `karpenter.ibm.sh/instance-cpu` - Number of vCPUs
- `karpenter.ibm.sh/instance-memory` - Memory in MB
- `karpenter.ibm.sh/instance-network-bandwidth` - Network bandwidth
- `karpenter.ibm.sh/instance-storage-policy` - Storage policy

### Custom Labels (Applied but Not Validated)

Custom labels defined in NodePool templates are applied to nodes after provisioning but cannot be used for scheduling validation:

```yaml
# ❌ This will cause scheduling failures
apiVersion: v1
kind: Pod
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: "workload"  # Custom label - not validated
            operator: In
            values: ["specialized"]
```

```yaml
# ✅ This works correctly
apiVersion: v1
kind: Pod
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: "node.kubernetes.io/instance-type"
            operator: In
            values: ["bx2-2x8"]
```

## Node Affinity

### Required Node Affinity

Use `requiredDuringSchedulingIgnoredDuringExecution` for hard requirements:

```yaml
apiVersion: v1
kind: Pod
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: "node.kubernetes.io/instance-type"
            operator: In
            values: ["bx2-2x8", "bx2-4x16"]
          - key: "kubernetes.io/arch"
            operator: In
            values: ["amd64"]
          - key: "topology.kubernetes.io/zone"
            operator: In
            values: ["us-south-1", "us-south-2"]
```

### Preferred Node Affinity

Use `preferredDuringSchedulingIgnoredDuringExecution` for soft preferences:

```yaml
apiVersion: v1
kind: Pod
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
          - key: "node.kubernetes.io/instance-type"
            operator: In
            values: ["bx2-2x8"]
      - weight: 50
        preference:
          matchExpressions:
          - key: "karpenter.ibm.sh/instance-family"
            operator: In
            values: ["bx2"]
```

### Supported Operators

- `In` - Label value must be in the list
- `NotIn` - Label value must not be in the list
- `Exists` - Label key must exist
- `DoesNotExist` - Label key must not exist
- `Gt` - Label value must be greater than (numeric comparison)
- `Lt` - Label value must be less than (numeric comparison)

## Pod Anti-Affinity

### Required Anti-Affinity

Ensures pods are not scheduled on the same node or zone:

```yaml
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: "app"
                operator: In
                values: ["my-app"]
            topologyKey: "kubernetes.io/hostname"  # Different nodes
```

### Preferred Anti-Affinity

Attempts to spread pods but allows co-location if necessary:

```yaml
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: "app"
                  operator: In
                  values: ["my-app"]
              topologyKey: "topology.kubernetes.io/zone"  # Different zones
```

### Supported Topology Keys

- `kubernetes.io/hostname` - Spread across different nodes
- `topology.kubernetes.io/zone` - Spread across different zones
- `topology.kubernetes.io/region` - Spread across different regions
- `karpenter.sh/nodepool` - Spread across different NodePools

## Pod Disruption Budgets (PDB)

Pod Disruption Budgets are fully supported and help maintain application availability during voluntary disruptions:

### MinAvailable Configuration

```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2  # Keep at least 2 pods running
  selector:
    matchLabels:
      app: my-app
```

### MaxUnavailable Configuration

```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  maxUnavailable: 25%  # Allow up to 25% of pods to be unavailable
  selector:
    matchLabels:
      app: my-app
```

### PDB Best Practices

1. **Set appropriate limits**: Balance availability needs with operational flexibility
2. **Use percentage values**: More flexible for scaling applications
3. **Monitor PDB violations**: Check for pods that cannot be evicted
4. **Consider multiple PDBs**: Different rules for different components

```yaml
# Example: Web tier with high availability
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-tier-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      tier: web

---
# Example: Background workers with more flexibility
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: worker-pdb
spec:
  maxUnavailable: 50%
  selector:
    matchLabels:
      tier: worker
```

## Node Selectors

Simple key-value matching for node selection:

```yaml
apiVersion: v1
kind: Pod
spec:
  nodeSelector:
    node.kubernetes.io/instance-type: "bx2-4x16"
    kubernetes.io/arch: "amd64"
```

## Taints and Tolerations

### Node Taints

Taints are applied through NodePool configuration:

```yaml
apiVersion: karpenter.sh/v1
kind: NodePool
spec:
  template:
    spec:
      taints:
      - key: "dedicated"
        value: "gpu-workload"
        effect: "NoSchedule"
      - key: "special-hardware"
        effect: "NoExecute"
```

### Pod Tolerations

```yaml
apiVersion: v1
kind: Pod
spec:
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "gpu-workload"
    effect: "NoSchedule"
  - key: "special-hardware"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 300  # Tolerate for 5 minutes
```

## Configuration Examples

### High-Performance Computing Workload

```yaml
apiVersion: v1
kind: Pod
spec:
  nodeSelector:
    karpenter.ibm.sh/instance-family: "cx2"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: "node.kubernetes.io/instance-type"
            operator: In
            values: ["cx2-8x16", "cx2-16x32"]
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: hpc-worker
          topologyKey: kubernetes.io/hostname
  tolerations:
  - key: "dedicated"
    value: "hpc"
    effect: "NoSchedule"
```

### Multi-Zone Deployment with PDB

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 6
  template:
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: web-app
              topologyKey: topology.kubernetes.io/zone

---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
spec:
  minAvailable: 4
  selector:
    matchLabels:
      app: web-app
```

## Troubleshooting

### Common Issues

1. **"incompatible requirements, label does not have known values"**
   - Cause: Using custom labels in node affinity
   - Solution: Use well-known labels listed in this document

2. **Pods stuck in Pending state**
   - Check node affinity requirements match available instance types
   - Verify NodePool has sufficient capacity limits
   - Ensure tolerations match node taints

3. **PDB blocking disruptions**
   - Review PDB constraints
   - Check if enough replicas are running
   - Consider adjusting `maxUnavailable` or `minAvailable`

### Debugging Commands

```bash
# Check node labels
kubectl get nodes --show-labels

# Describe node for detailed information
kubectl describe node <node-name>

# Check pod scheduling events
kubectl describe pod <pod-name>

# List PDBs and their status
kubectl get pdb
kubectl describe pdb <pdb-name>

# Check NodePool configuration
kubectl describe nodepool <nodepool-name>
```

## Migration from Custom Labels

If you're currently using custom labels for scheduling, migrate to well-known labels:

```yaml
# Before (will fail)
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: "workload"  # Custom label
          operator: In
          values: ["database"]

# After (works correctly)
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: "karpenter.ibm.sh/instance-family"
          operator: In
          values: ["mx2"]  # Memory-optimized for database
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["mx2-4x32", "mx2-8x64"]
```

Custom labels can still be applied to nodes and used for non-scheduling purposes like monitoring, billing, or operational grouping.