ta to a DMA-coherent buffer via `memcpy(tmp_xqc, xqc, size)` (line 715) 2. It prepares a mailbox structure containing the DMA address of that buffer 3. `qm_mb_nolock()` -> `qm_mb_write()` writes the 128-bit mailbox to the MMIO register 4. The hardware reads the mailbox, extracts the DMA address, and DMA- reads from that buffer With the barrier **after** the `stp`: - The ARM64 weak memory model allows the CPU to reorder the `stp` (MMIO write, which triggers the hardware) **before** the `memcpy` stores to the DMA buffer are globally visible - The hardware gets triggered and attempts to DMA-read the buffer, but the data isn't there yet - Result: **hardware reads stale/incorrect data** from the DMA buffer With the barrier **before** the `stp`: - All preceding stores (including the DMA buffer writes) are guaranteed to be visible to the device before the MMIO write - The hardware is triggered only after the DMA data is committed - Result: hardware correctly reads the intended data ### 4. Impact and Severity This is a **data correctness bug** affecting the HiSilicon crypto accelerator (used in HiSilicon Kunpeng ARM64 servers). The `hisi_qm_mb()` function is called from 9 different callers across the QM driver and VFIO driver, including: - `__hisi_qm_start` (queue startup) - `qm_stop_qp` (queue stop) - `qm_drain_qm` (queue draining) - `qm_set_and_get_xqc` (configuring SQC, CQC, EQC, AEQC) - VFIO live migration paths If the hardware reads stale DMA data, the consequences could include: - **Incorrect crypto operations** (data corruption in encryption/decryption) - **Hardware timeouts** (mailbox operation failures) - **Undefined hardware behavior** ### 5. Scope and Risk Assessment - **Lines changed**: 4 lines modified (2 line reorder + 4 lines of comment added) - **Files changed**: 1 file (`drivers/crypto/hisilicon/qm.c`) - **Risk**: Extremely low. The fix simply moves an existing barrier instruction to the correct position in the assembly. No new logic is added; the semantic intent is preserved but the ordering is corrected. - **Regression risk**: Essentially zero. The barrier provides the same protection, just at the right time. ### 6. History of this Code The original code (commit `263c9959c9376e`, v5.4) used `dsb sy` (full synchronization barrier, all domains) after `stp` — also in the wrong position. Commit `4cda2f4a0ee68a` (merged in v5.19) "optimized" the barrier from `dsb sy` to `dmb oshst` (a weaker but sufficient barrier) but kept it in the wrong position. The current fix addresses the ordering issue that has been present since the driver was originally written. ### 7. Applicability to Stable The `dmb oshst` instruction was introduced by commit `4cda2f4a0ee68a` (v5.19). Stable trees from v5.19 onward have this exact code and can apply this fix directly. For older stable trees (v5.4-v5.18), the barrier was `dsb sy` but had the same ordering problem — a similar fix would need to be adapted. ### 8. Non-ARM64 Path Comparison The non-ARM64 path does: ```c memcpy_toio(fun_base, src, 16); dma_wmb(); ``` This also has the barrier after the write, but on x86 (the primary non- ARM64 platform for this driver), stores to uncacheable/write-combining MMIO memory are strongly ordered by the architecture itself, so the barrier placement doesn't matter. The issue is ARM64-specific due to its weak memory model. ### 9. Conclusion This commit fixes a real memory ordering bug on ARM64 that has existed since the driver was introduced in v5.4. The bug can cause the HiSilicon hardware accelerator to read stale data from DMA buffers, potentially leading to data corruption in crypto operations or hardware errors. The fix is tiny (reordering 2 lines of inline assembly), obviously correct per ARM64 memory ordering semantics, carries essentially zero regression risk, and affects actively-used hardware (HiSilicon Kunpeng servers). It meets all stable kernel criteria. **YES** drivers/crypto/hisilicon/qm.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c index d47bf06a90f7d..af9dd4d275f9f 100644 --- a/drivers/crypto/hisilicon/qm.c +++ b/drivers/crypto/hisilicon/qm.c @@ -609,9 +609,13 @@ static void qm_mb_write(struct hisi_qm *qm, const void *src) } #if IS_ENABLED(CONFIG_ARM64) + /* + * The dmb oshst instruction ensures that the data in the + * mailbox is written before it is sent to the hardware. + */ asm volatile("ldp %0, %1, %3\n" - "stp %0, %1, %2\n" "dmb oshst\n" + "stp %0, %1, %2\n" : "=&r" (tmp0), "=&r" (tmp1), "+Q" (*((char __iomem *)fun_base)) -- 2.51.0[PATCH AUTOSEL 6.19-6.6] crypto: hisilicon/qm - move the barrier before writing to the mailbox registerSasha Levin undefinedpatches@lists.linux.dev, stable@vger.kernel.org undefined undefined undefined undefined undefined undefined undefinedÙ