hard-LOCKUP panic is triggered the moment the Mellanox Ethernet device vanishes. But we can analyze what happens when we access the Mellanox Ethernet device whose link is disabled. (If we check whether the PCIe endpoint device (Mellanox Ethernet) is present before issuing device-IOTLB invalidation to the Intel IOMMU, no other issues appear.) According to the PCIe spec, Rev. 5.0 v1.0, Sec. 2.4.1, there are two kinds of TLPs: posted and non-posted. Non-posted TLPs require a completion TLP; posted TLPs do not. - A Posted Request is a Memory Write Request or a Message Request. - A Read Request is a Configuration Read Request, an I/O Read Request, or a Memory Read Request. - An NPR (Non-Posted Request) with Data is a Configuration Write Request, an I/O Write Request, or an AtomicOp Request. - A Non-Posted Request is a Read Request or an NPR with Data. When the CPU issues a PCIe memory-write TLP (posted) via a MOV instruction, the instruction retires immediately after the packet reaches the Root Complex; no Data-Link ACK/NAK is required. A memory-read TLP (non-posted), however, stalls the core until the corresponding Completion TLP is received - if that Completion never arrives, the CPU hangs. (The CPU hangs if the LTSSM does not enter the Disabled state.) However, if the LTSSM enters the Disabled state, the Root Port returns Completer-Abort (CA) for any non-posted TLP, so the request completes with status 0xFFFFFFFF without stalling. I ran some tests on the machine after setting the Link Disable bit in the switch’s Link Control register (offset 10h). - setpci -s 0000:3c:08.0 CAP_EXP+10.w=0x0010 +-[0000:3a]-+-00.0-[3b-3f]----00.0-[3c-3f]--+-00.0-[3d]---- | | +-04.0-[3e]---- | | \-08.0-[3f]----00.0 Mellanox Technologies MT27800 Family [ConnectX-5] # lspci -vvv -s 0000:3f:00.0 3f:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] ... Region 0: Memory at 3af804000000 (64-bit, prefetchable) [size=32M] ... 1) Issue a PCI config-space read request and it returns 0xFFFFFFFF. # lspci -vvv -s 0000:3f:00.0 3f:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] (rev ff) (prog-if ff) !!! Unknown header type 7f Kernel driver in use: mlx5_core Kernel modules: mlx5_core 2) Issuing a PCI memory read request through /dev/mem also returns 0xFFFFFFFF. # ./devmem Usage: ./devmem [value] phys_addr : physical base address of the BAR (hex or decimal) size : mapping length in bytes (hex or decimal) offset : register offset from BAR base (hex or decimal) value : optional 32-bit value to write (hex or decimal) Example: ./devmem 0x600000000 0x1000 0x0 0xDEADBEEF # ./devmem 0x3af804000000 0x2000000 0x0 0x3af804000000 = 0xffffffff Before the link was disabled, we could read 0x3af804000000 with devmem and obtain a valid result. # ./devmem 0x3af804000000 0x2000000 0x0 0x3af804000000 = 0x10002300 Besides, after searching the kernel code, I found many EP drivers already check whether their endpoint is still present. There may be exception cases in some PCIe endpoint drivers, such as commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device"). Best Regards, Jinhui[PATCH v2 2/2] iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in scalable mode"Jinhui Guo" undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined4