-
由 Jean-Philippe Brucker 提交于
maillist inclusion category: feature bugzilla: 51855 CVE: NA Reference: https://jpbrucker.net/git/linux/commit/?h=sva/2021-03-01&id=d32d8baaf293aaefef8a1c9b8a4508ab2ec46c61 --------------------------------------------- The ptep_set_access_flags path in handle_pte_fault, can cause a change of the pte's permissions on some architectures. A Read-Only and writeable-clean entry becomes Read-Write and dirty. This requires us to call the MMU notifier to invalidate the entry in remote TLBs, for instance in a PCIe Address Translation Cache (ATC). Here is a scenario where the lack of notifier call ends up locking a device: 1) A shared anonymous buffer is mapped with READ|WRITE prot, at VA. 2) A PCIe device with ATS/PRI/PASID capabilities wants to read the buffer, using its virtual address. a) Device asks for translation of VA for reading (NW=1) b) The IOMMU cannot fulfill the request, so the device does a Page Request for VA. The fault is handled with do_read_fault, after which the PTE has flags young, write and rdonly. c) Device retries the translation; IOMMU sends a Translation Completion with the PA and Read-Only permission. d) The VA->PA translation is stored in the ATC, with Read-Only permission. From the device's point of view, the page may or may not be writeable. It didn't ask for writeability, so it doesn't get a definite answer on that point. 3) The same device now wants to write the buffer. It needs to restart the AT-PR-AT dance for writing this time. a) Device could asks for translation of VA for reading and writing (NW=0). The IOMMU would reply with the same Read-Only mapping, so this time the device is certain that the page isn't writeable. Some implementations might update their ATC entry to store that information. The ATS specification is pretty fuzzy on the behaviour to adopt. b) The entry is Read-Only, so we fault again. The PTE exists and is valid, all we need to do is mark it dirty. TLBs are invalidated, but not the ATC since there is no notifier. c) Now the behaviour depends on the device implementation. If 3a) didn't update the ATC entry, the device is still uncertain on the writeability of the page, goto 3a) - repeat the Translation Request and get Read-Write permissions. But if 3a) updated the ATC entry, the device is certain of the PTE's permissions, and will goto 3b) instead - repeat the page fault, again and again. This time we take the "spurious fault" path in the same function, which invalidates the TLB but doesn't call an MMU notifier either. To avoid this page request loop, call mmu_notifier_change_pte after dirtying the PTE. Note: if the IOMMU supports hardware update of the access/dirty bits, 3a) dirties the PTE, and the IOMMU returns RW permission to the device, so there is no need to do a Page Request. Signed-off-by: NJean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: NLijun Fang <fanglijun3@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
d30a2a28