• J
    mm: notify remote TLBs when dirtying a PTE · d30a2a28
    Jean-Philippe Brucker 提交于
    maillist inclusion
    category: feature
    bugzilla: 51855
    CVE: NA
    
    Reference: https://jpbrucker.net/git/linux/commit/?h=sva/2021-03-01&id=d32d8baaf293aaefef8a1c9b8a4508ab2ec46c61
    
    ---------------------------------------------
    
    The ptep_set_access_flags path in handle_pte_fault, can cause a change of
    the pte's permissions on some architectures. A Read-Only and
    writeable-clean entry becomes Read-Write and dirty. This requires us to
    call the MMU notifier to invalidate the entry in remote TLBs, for instance
    in a PCIe Address Translation Cache (ATC).
    
    Here is a scenario where the lack of notifier call ends up locking a
    device:
    
    1) A shared anonymous buffer is mapped with READ|WRITE prot, at VA.
    
    2) A PCIe device with ATS/PRI/PASID capabilities wants to read the buffer,
       using its virtual address.
    
       a) Device asks for translation of VA for reading (NW=1)
    
       b) The IOMMU cannot fulfill the request, so the device does a Page
          Request for VA. The fault is handled with do_read_fault, after which
          the PTE has flags young, write and rdonly.
    
       c) Device retries the translation; IOMMU sends a Translation Completion
          with the PA and Read-Only permission.
    
       d) The VA->PA translation is stored in the ATC, with Read-Only
          permission. From the device's point of view, the page may or may not
          be writeable. It didn't ask for writeability, so it doesn't get a
          definite answer on that point.
    
    3) The same device now wants to write the buffer. It needs to restart
       the AT-PR-AT dance for writing this time.
    
       a) Device could asks for translation of VA for reading and writing
          (NW=0). The IOMMU would reply with the same Read-Only mapping, so
          this time the device is certain that the page isn't writeable. Some
          implementations might update their ATC entry to store that
          information. The ATS specification is pretty fuzzy on the behaviour
          to adopt.
    
       b) The entry is Read-Only, so we fault again. The PTE exists and is
          valid, all we need to do is mark it dirty. TLBs are invalidated, but
          not the ATC since there is no notifier.
    
       c) Now the behaviour depends on the device implementation. If 3a)
          didn't update the ATC entry, the device is still uncertain on the
          writeability of the page, goto 3a) - repeat the Translation Request
          and get Read-Write permissions.
    
          But if 3a) updated the ATC entry, the device is certain of the
          PTE's permissions, and will goto 3b) instead - repeat the page
          fault, again and again. This time we take the "spurious fault" path
          in the same function, which invalidates the TLB but doesn't call an
          MMU notifier either.
    
    To avoid this page request loop, call mmu_notifier_change_pte after
    dirtying the PTE.
    
    Note: if the IOMMU supports hardware update of the access/dirty bits, 3a)
    dirties the PTE, and the IOMMU returns RW permission to the device, so
    there is no need to do a Page Request.
    Signed-off-by: NJean-Philippe Brucker <jean-philippe@linaro.org>
    Signed-off-by: NLijun Fang <fanglijun3@huawei.com>
    Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
    Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
    d30a2a28
memory.c 143.0 KB