y testing a pair of tasks was reported to be hung: [10144.859042] mlx5_core 0000:00:00.1: mlx5_health_try_recover:338:(pid 5553): health recovery flow aborted, PCI reads still not working [10320.359160] INFO: task kmcheck:72 blocked for more than 122 seconds. [10320.359169] Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1 [10320.359171] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [10320.359172] task:kmcheck state:D stack:0 pid:72 tgid:72 ppid:2 flags:0x00000000 [10320.359176] Call Trace: [10320.359178] [<000000065256f030>] __schedule+0x2a0/0x590 [10320.359187] [<000000065256f356>] schedule+0x36/0xe0 [10320.359189] [<000000065256f572>] schedule_preempt_disabled+0x22/0x30 [10320.359192] [<0000000652570a94>] __mutex_lock.constprop.0+0x484/0x8a8 [10320.359194] [<000003ff800673a4>] mlx5_unload_one+0x34/0x58 [mlx5_core] [10320.359360] [<000003ff8006745c>] mlx5_pci_err_detected+0x94/0x140 [mlx5_core] [10320.359400] [<0000000652556c5a>] zpci_event_attempt_error_recovery+0xf2/0x398 [10320.359406] [<0000000651b9184a>] __zpci_event_error+0x23a/0x2c0 [10320.359411] [<00000006522b3958>] chsc_process_event_information.constprop.0+0x1c8/0x1e8 [10320.359416] [<00000006522baf1a>] crw_collect_info+0x272/0x338 [10320.359418] [<0000000651bc9de0>] kthread+0x108/0x110 [10320.359422] [<0000000651b42ea4>] __ret_from_fork+0x3c/0x58 [10320.359425] [<0000000652576642>] ret_from_fork+0xa/0x30 [10320.359440] INFO: task kworker/u1664:6:1514 blocked for more than 122 seconds. [10320.359441] Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1 [10320.359442] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [10320.359443] task:kworker/u1664:6 state:D stack:0 pid:1514 tgid:1514 ppid:2 flags:0x00000000 [10320.359447] Workqueue: mlx5_health0000:00:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core] [10320.359492] Call Trace: [10320.359521] [<000000065256f030>] __schedule+0x2a0/0x590 [10320.359524] [<000000065256f356>] schedule+0x36/0xe0 [10320.359526] [<0000000652172e28>] pci_wait_cfg+0x80/0xe8 [10320.359532] [<0000000652172f94>] pci_cfg_access_lock+0x74/0x88 [10320.359534] [<000003ff800916b6>] mlx5_vsc_gw_lock+0x36/0x178 [mlx5_core] [10320.359585] [<000003ff80098824>] mlx5_crdump_collect+0x34/0x1c8 [mlx5_core] [10320.359637] [<000003ff80074b62>] mlx5_fw_fatal_reporter_dump+0x6a/0xe8 [mlx5_core] [10320.359680] [<0000000652512242>] devlink_health_do_dump.part.0+0x82/0x168 [10320.359683] [<0000000652513212>] devlink_health_report+0x19a/0x230 [10320.359685] [<000003ff80075a12>] mlx5_fw_fatal_reporter_err_work+0xba/0x1b0 [mlx5_core] [10320.359728] [<0000000651bbf852>] process_one_work+0x1c2/0x458 [10320.359733] [<0000000651bc073e>] worker_thread+0x3ce/0x528 [10320.359735] [<0000000651bc9de0>] kthread+0x108/0x110 [10320.359737] [<0000000651b42ea4>] __ret_from_fork+0x3c/0x58 [10320.359739] [<0000000652576642>] ret_from_fork+0xa/0x30 No kernel log of the exact same error with an upstream kernel is available - but the very same deadlock situation can be constructed there, too: - task: kmcheck mlx5_unload_one() tries to acquire devlink lock while the PCI error recovery code has set pdev->block_cfg_access by way of pci_cfg_access_lock() - task: kworker mlx5_crdump_collect() tries to set block_cfg_access through pci_cfg_access_lock() while devlink_health_report() had acquired the devlink lock. A similar deadlock situation can be reproduced by requesting a crdump with > devlink health dump show pci/ reporter fw_fatal while PCI error recovery is executed on the same physical function by mlx5_core's pci_error_handlers. On s390 this can be injected with > zpcictl --reset-fw Extensive tests with the same setup that showed the original dead-lock didn't reproduce, nor did the second deadlock situation hit with this patch applied. Fixes: b25bbc2f24dc ("net/mlx5: Add Vendor Specific Capability access gateway") Signed-off-by: Gerd Bayer --- v1: Attempted to fix this differently, but still had potential for deadlocks and the second inject reproduced it quite regularly, still https://lore.kernel.org/netdev/20250807131130.4056349-1-gbayer@linux.ibm.com/ --- drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c index 432c98f2626d..f668237b6bb0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c @@ -73,7 +73,9 @@ int mlx5_vsc_gw_lock(struct mlx5_core_dev *dev) u32 lock_val; int ret; - pci_cfg_access_lock(dev->pdev); + if (!pci_cfg_access_trylock(dev->pdev)) + return -EBUSY; + do { if (retries > VSC_MAX_RETRIES) { ret = -EBUSY; -- 2.48.1[PATCH net v2] net/mlx5: Avoid deadlock between PCI error recovery and health reporterGerd Bayer undefinedTariq Toukan , Saeed Mahameed , Leon Romanovsky , Shay Drori , Mark Bloch undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefinedƒLƒ…c