IKS + Karpenter Node Removal Issue - FIXED ========================================== Date: 2025-12-08 Cluster: karpenter-iks-test-v2 (d4qoskhd0cckervfb8ng) Kubernetes: 1.32.9 ISSUE SUMMARY ------------- When Karpenter was deployed on an IKS cluster, existing IKS-managed worker nodes were removed within ~2 minutes. ROOT CAUSE (IDENTIFIED) ----------------------- Our IBM-specific garbage collection controller (pkg/controllers/nodeclaim/garbagecollection/controller.go) was: 1. Getting ALL nodes with ibm:// providerID from cloudProvider.List() 2. Comparing against NodeClaim objects in etcd 3. Since IKS nodes don't have corresponding NodeClaim objects, the controller treated them as "orphaned cloud instances" 4. The garbageCollect() function deleted the node Additionally, the isKarpenterManagedNode() function incorrectly returned true for ANY node with ibm:// prefix, instead of checking for Karpenter labels. FIX APPLIED ----------- Modified pkg/controllers/nodeclaim/garbagecollection/controller.go: 1. Added check in the orphaned cloud instance handling (lines 130-146): - Before calling garbageCollect(), check if the corresponding node has Karpenter labels (karpenter.sh/nodepool or karpenter-ibm.sh/ibmnodeclass) - Skip garbage collection for nodes without these labels 2. Fixed isKarpenterManagedNode() function (lines 306-315): - Removed the incorrect ibm:// prefix check that matched ALL IBM nodes - Now only returns true for nodes with Karpenter labels VERIFICATION ------------ After the fix: - IKS worker node stayed Ready for 20+ minutes with Karpenter running - Before the fix, nodes were removed within ~2 minutes of Karpenter starting - Karpenter pod running without restarts KEY INSIGHT ----------- The user was correct: "the way karpenter is set up, it should ignore non managed nodes anyways" - Karpenter CORE controllers use IsManaged() checks properly. The bug was in our IBM-SPECIFIC garbage collection controller which didn't have these checks. FILES MODIFIED -------------- - pkg/controllers/nodeclaim/garbagecollection/controller.go - Lines 130-146: Added Karpenter label check before garbage collection - Lines 306-315: Fixed isKarpenterManagedNode() to only check labels