# s390x Worker Node Troubleshooting Progress ## Timeline of Issues and Solutions ### Initial State: Kernel Panic **Error**: `Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0)` **Root Cause**: Missing `coreos.inst=yes` in boot parameters - System tried to boot directly instead of running CoreOS installer - Without installer, no root filesystem was available ### Attempt 1: Fix Boot Parameters in Netboot Role **Changes Made**: - Added `coreos.inst=yes` and `coreos.inst.wipe_table=true` to `generic.prm.j2` - Enhanced boot parameters with missing CoreOS flags **Problem**: Files were going to wrong directory (`/srv/ocp/` vs `/srv/test01/`) **Cause**: `ocp_name` variable not loaded, fallback to `default('ocp')` ### Attempt 2: Fix Variable Loading **Changes Made**: - Added explicit loading of `group_vars/all` in playbook - Removed all `default('ocp')` fallbacks throughout role - Added validation to fail if `ocp_name` undefined **Result**: Fixed directory issue, files now go to `/srv/test01/` **Problem**: Still getting kernel panic - HMC was using old files ### Attempt 3: Regenerate Files with Correct Parameters **Action**: Re-ran playbook to regenerate files in `/srv/test01/` **Result**: Generated correct PRM file with `coreos.inst=yes` in right location **Status**: ✅ **SOLVED KERNEL PANIC** - System boots and reaches CoreOS installer ### Attempt 4: Optimize with Direct Downloads **Goal**: Avoid downloading 900MB ISO, use direct file downloads from release URLs **Changes Made**: - Created `prepare_boot_files_direct.yaml` - Downloaded kernel/initrd directly from `pxe.kernel.location` and `pxe.initramfs.location` - Used standard s390x memory addresses **Result**: ❌ **NEW PROBLEM** - System starts to boot but fails immediately **Finding**: Direct downloads don't work, but ISO extraction does ### Current Status: ISO Extraction Works, Direct Downloads Don't #### ✅ **WORKING**: ISO Extraction Method - Downloads full RHCOS ISO - Extracts files from `images/pxeboot/` directory - Parses memory addresses from ISO's generic.ins - **Result**: Boots successfully, reaches CoreOS installer #### ❌ **FAILING**: Direct Download Method - Downloads files directly from release configmap URLs - Uses standard s390x memory addresses - Same boot parameters, same directory structure - **Result**: Starts to boot then fails immediately ### Next Steps: Try ISO Building Approach **Plan**: Use `playbooks/build-s390x-iso.yaml` with `ocp_s390x_iso` role **Rationale**: - This approach worked best in previous testing - Creates self-contained bootable ISO with embedded config - No network dependencies during early boot - Most reliable for s390x LPAR environments **Command**: ```bash ansible-navigator run playbooks/build-s390x-iso.yaml -m stdout -i inventory/test01/ -u jpfeiffe --extra-vars "add_ssh_keys=true ssh_public_key_files=['/home/jpfeiffe/.ssh/id_ed25519.pub']" ``` **Expected Outcome**: - Generate `/srv/test01/iso/worker-lpar01-rhcos.iso` - ISO contains all improvements (correct boot parameters, CA trust, SSH keys) - Bootable directly from HMC without netboot complexities ### Key Learnings 1. **Boot Parameters Matter**: `coreos.inst=yes` is absolutely critical 2. **Variable Loading**: Playbooks against localhost need explicit group_vars loading 3. **No Fallbacks**: Silent fallbacks cause confusion, explicit validation is better 4. **File Sources Matter**: PXE files ≠ ISO files for s390x boot compatibility 5. **ISO Approach Most Reliable**: Self-contained, no network timing issues ### Files Modified for All Approaches - `playbooks/07_add_s390x_node.yaml` - Variable loading and validation - `playbooks/build-s390x-iso.yaml` - Variable loading for ISO approach - `roles/ocp_mainframe/templates/generic.prm.j2` - Enhanced boot parameters - `roles/ocp_mainframe/tasks/prepare_boot_files.yaml` - Fixed fallbacks - All ocp_mainframe role files - Removed `default('ocp')` fallbacks ### Success Metrics - [ ] LPAR boots from ISO - [ ] CoreOS installer starts - [ ] Network configuration applied - [ ] Node joins OpenShift cluster - [ ] Worker node becomes Ready