signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -1.0 MAILING_LIST_MULTI Multiple indicators imply a widely-seen list manager SpamTally: Final spam score: 5 Am 18.10.25 um 00:41 schrieb David Hildenbrand: > On 18.10.25 00:15, David Hildenbrand wrote: >> On 17.10.25 23:56, Balbir Singh wrote: >>> On 10/18/25 04:07, David Hildenbrand wrote: >>>> On 17.10.25 17:20, Christian Borntraeger wrote: >>>>> >>>>> >>>>> Am 17.10.25 um 17:07 schrieb David Hildenbrand: >>>>>> On 17.10.25 17:01, Christian Borntraeger wrote: >>>>>>> Am 17.10.25 um 16:54 schrieb David Hildenbrand: >>>>>>>> On 17.10.25 16:49, Christian Borntraeger wrote: >>>>>>>>> This patch triggers a regression for s390x kvm as qemu guests can no longer start >>>>>>>>> >>>>>>>>> error: kvm run failed Cannot allocate memory >>>>>>>>> PSW=mask 0000000180000000 addr 000000007fd00600 >>>>>>>>> R00=0000000000000000 R01=0000000000000000 R02=0000000000000000 R03=0000000000000000 >>>>>>>>> R04=0000000000000000 R05=0000000000000000 R06=0000000000000000 R07=0000000000000000 >>>>>>>>> R08=0000000000000000 R09=0000000000000000 R10=0000000000000000 R11=0000000000000000 >>>>>>>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000 >>>>>>>>> C00=00000000000000e0 C01=0000000000000000 C02=0000000000000000 C03=0000000000000000 >>>>>>>>> C04=0000000000000000 C05=0000000000000000 C06=0000000000000000 C07=0000000000000000 >>>>>>>>> C08=0000000000000000 C09=0000000000000000 C10=0000000000000000 C11=0000000000000000 >>>>>>>>> C12=0000000000000000 C13=0000000000000000 C14=00000000c2000000 C15=0000000000000000 >>>>>>>>> >>>>>>>>> KVM on s390x does not use THP so far, will investigate. Does anyone have a quick idea? >>>>>>>> >>>>>>>> Only when running KVM guests and apart from that everything else seems to be fine? >>>>>>> >>>>>>> We have other weirdness in linux-next but in different areas. Could that somehow be >>>>>>> related to use disabling THP for the kvm address space? >>>>>> >>>>>> Not sure ... it's a bit weird. I mean, when KVM disables THPs we essentially just remap everything to be mapped by PTEs. So there shouldn't be any PMDs in that whole process. >>>>>> >>>>>> Remapping a file THP (shmem) implies zapping the THP completely. >>>>>> >>>>>> >>>>>> I assume in your kernel config has CONFIG_ZONE_DEVICE and CONFIG_ARCH_ENABLE_THP_MIGRATION set, right? >>>>> >>>>> yes. >>>>> >>>>>> >>>>>> I'd rule out copy_huge_pmd(), zap_huge_pmd() a well. >>>>>> >>>>>> >>>>>> What happens if you revert the change in mm/pgtable-generic.c? >>>>> >>>>> That partial revert seems to fix the issue >>>>> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c >>>>> index 0c847cdf4fd3..567e2d084071 100644 >>>>> --- a/mm/pgtable-generic.c >>>>> +++ b/mm/pgtable-generic.c >>>>> @@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) >>>>>                if (pmdvalp) >>>>>                     *pmdvalp = pmdval; >>>>> -       if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval))) >>>>> +       if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval))) >>>> >>>> Okay, but that means that effectively we stumble over a PMD entry that is not a migration entry but still non-present. >>>> >>>> And I would expect that it's a page table, because otherwise the change >>>> wouldn't make a difference. >>>> >>>> And the weird thing is that this only triggers sometimes, because if >>>> it would always trigger nothing would ever work. >>>> >>>> Is there some weird scenario where s390x might set a left page table mapped in a PMD to non-present? >>>> >>> >>> Good point >>> >>>> Staring at the definition of pmd_present() on s390x it's really just >>>> >>>>       return (pmd_val(pmd) & _SEGMENT_ENTRY_PRESENT) != 0; >>>> >>>> >>>> Maybe this is happening in the gmap code only and not actually in the core-mm code? >>>> >>> >>> >>> I am not an s390 expert, but just looking at the code >>> >>> So the check on s390 effectively >>> >>> segment_entry/present = false or segment_entry_empty/invalid = true >> >> pmd_present() == true iff _SEGMENT_ENTRY_PRESENT is set >> >> because >> >>     return (pmd_val(pmd) & _SEGMENT_ENTRY_PRESENT) != 0; >> >> is the same as >> >>     return pmd_val(pmd) & _SEGMENT_ENTRY_PRESENT; >> >> But that means we have something where _SEGMENT_ENTRY_PRESENT is not set. >> >> I suspect that can only be the gmap tables. >> >> Likely __gmap_link() does not set _SEGMENT_ENTRY_PRESENT, which is fine >> because it's a software managed bit for "ordinary" page tables, not gmap >> tables. >> >> Which raises the question why someone would wrongly use >> pte_offset_map()/__pte_offset_map() on the gmap tables. >> >> I cannot immediately spot any such usage in kvm/gmap code, though. >> > > Ah, it's all that pte_alloc_map_lock() stuff in gmap.c. > > Oh my. > > So we're mapping a user PTE table that is linked into the gmap tables through a PMD table that does not have the right sw bits set we would expect in a user PMD table. > > What's also scary is that pte_alloc_map_lock() would try to pte_alloc() a user page table in the gmap, which sounds completely wrong? > > Yeah, when walking the gmap and wanting to lock the linked user PTE table, we should probably never use the pte_*map variants but obtain > the lock through pte_lockptr(). > > All magic we end up doing with RCU etc in __pte_offset_map_lock() > does not apply to the gmap PMD table. > CC Claudio. From - Mon Oct 20 07:09:29 2025 X-Mozilla-Status: 0001 X-Mozilla-Status2: 00000000 Return-Path: Delivered-To: hi@josie.lol Received: from witcher.mxrouting.net by witcher.mxrouting.net with LMTP id wGAANCbg9WiqMgEAYBR5ng (envelope-from ) for ; Mon, 20 Oct 2025 07:09:26 +0000 Return-path: Envelope-to: hi@josie.lol Delivery-date: Mon, 20 Oct 2025 07:09:26 +0000 Received: from dfw.mirrors.kernel.org ([142.0.200.124]) by witcher.mxrouting.net with esmtps (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.98) (envelope-from ) id 1vAk1G-00000001asT-2Yhf for hi@josie.lol; Mon, 20 Oct 2025 07:09:26 +0000 Received: from smtp.subspace.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.mirrors.kernel.org (Postfix) with ESMTPS id E31B34EF0BF for ; Mon, 20 Oct 2025 07:08:25 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5A90B2DC353; Mon, 20 Oct 2025 07:08:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Hw8gCN7o" X-Original-To: linux-s390@vger.kernel.org Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B68AF2BDC26 for ; Mon, 20 Oct 2025 07:08:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760944084; cv=none; b=ihq/IA7VF0nbFf415aGdlnS+6AtWlyk16fKCTapS5ZAWHHZGlzvqDKwrGhPk76Uq/kewxIctdG2J/yFyfVclx4k2qndraYDw0LkPdl3yOveQuVlLp9hCXvFK1QJdpfD5obG6Rai+EkounB5M/AwAT5QwuA/PYB2wl3xSE/Qh/14= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760944084; c=relaxed/simple; bh=XZ0XrFe8i9BOKiZV0xnN18sfKKByLNmWtR6xj1Y69WY=; h=Date:From:To:Cc:Subject:Message-ID:References