1. 09 7月, 2020 1 次提交
  2. 29 5月, 2020 1 次提交
    • J
      iommu: Remove iommu_sva_ops::mm_exit() · edcc40d2
      Jean-Philippe Brucker 提交于
      After binding a device to an mm, device drivers currently need to
      register a mm_exit handler. This function is called when the mm exits,
      to gracefully stop DMA targeting the address space and flush page faults
      to the IOMMU.
      
      This is deemed too complex for the MMU release() notifier, which may be
      triggered by any mmput() invocation, from about 120 callsites [1]. The
      upcoming SVA module has an example of such complexity: the I/O Page
      Fault handler would need to call mmput_async() instead of mmput() after
      handling an IOPF, to avoid triggering the release() notifier which would
      in turn drain the IOPF queue and lock up.
      
      Another concern is the DMA stop function taking too long, up to several
      minutes [2]. For some mmput() callers this may disturb other users. For
      example, if the OOM killer picks the mm bound to a device as the victim
      and that mm's memory is locked, if the release() takes too long, it
      might choose additional innocent victims to kill.
      
      To simplify the MMU release notifier, don't forward the notification to
      device drivers. Since they don't stop DMA on mm exit anymore, the PASID
      lifetime is extended:
      
      (1) The device driver calls bind(). A PASID is allocated.
      
        Here any DMA fault is handled by mm, and on error we don't print
        anything to dmesg. Userspace can easily trigger errors by issuing DMA
        on unmapped buffers.
      
      (2) exit_mmap(), for example the process took a SIGKILL. This step
          doesn't happen during normal operations. Remove the pgd from the
          PASID table, since the page tables are about to be freed. Invalidate
          the IOTLBs.
      
        Here the device may still perform DMA on the address space. Incoming
        transactions are aborted but faults aren't printed out. ATS
        Translation Requests return Successful Translation Completions with
        R=W=0. PRI Page Requests return with Invalid Request.
      
      (3) The device driver stops DMA, possibly following release of a fd, and
          calls unbind(). PASID table is cleared, IOTLB invalidated if
          necessary. The page fault queues are drained, and the PASID is
          freed.
      
        If DMA for that PASID is still running here, something went seriously
        wrong and errors should be reported.
      
      For now remove iommu_sva_ops entirely. We might need to re-introduce
      them at some point, for example to notify device drivers of unhandled
      IOPF.
      
      [1] https://lore.kernel.org/linux-iommu/20200306174239.GM31668@ziepe.ca/
      [2] https://lore.kernel.org/linux-iommu/4d68da96-0ad5-b412-5987-2f7a6aa796c3@amd.com/Signed-off-by: NJean-Philippe Brucker <jean-philippe@linaro.org>
      Acked-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
      Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20200423125329.782066-3-jean-philippe@linaro.orgSigned-off-by: NJoerg Roedel <jroedel@suse.de>
      edcc40d2
  3. 15 5月, 2020 1 次提交
  4. 13 5月, 2020 1 次提交
  5. 05 5月, 2020 5 次提交
  6. 27 3月, 2020 5 次提交
  7. 28 2月, 2020 1 次提交
    • R
      iommu: Use C99 flexible array in fwspec · 098accf2
      Robin Murphy 提交于
      Although the 1-element array was a typical pre-C99 way to implement
      variable-length structures, and indeed is a fundamental construct in the
      APIs of certain other popular platforms, there's no good reason for it
      here (and in particular the sizeof() trick is far too "clever" for its
      own good). We can just as easily implement iommu_fwspec's preallocation
      behaviour using a standard flexible array member, so let's make it look
      the way most readers would expect.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      098accf2
  8. 16 1月, 2020 1 次提交
  9. 10 1月, 2020 1 次提交
  10. 23 12月, 2019 2 次提交
  11. 07 11月, 2019 1 次提交
    • W
      iommu/io-pgtable-arm: Rename IOMMU_QCOM_SYS_CACHE and improve doc · dd5ddd3c
      Will Deacon 提交于
      The 'IOMMU_QCOM_SYS_CACHE' IOMMU protection flag is exposed to all
      users of the IOMMU API. Despite its name, the idea behind it isn't
      especially tied to Qualcomm implementations and could conceivably be
      used by other systems.
      
      Rename it to 'IOMMU_SYS_CACHE_ONLY' and update the comment to describe
      a bit better the idea behind it.
      
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: "Isaac J. Manjarres" <isaacm@codeaurora.org>
      Signed-off-by: NWill Deacon <will@kernel.org>
      dd5ddd3c
  12. 15 10月, 2019 3 次提交
  13. 23 8月, 2019 1 次提交
  14. 30 7月, 2019 1 次提交
    • W
      iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync() · 56f8af5e
      Will Deacon 提交于
      To allow IOMMU drivers to batch up TLB flushing operations and postpone
      them until ->iotlb_sync() is called, extend the prototypes for the
      ->unmap() and ->iotlb_sync() IOMMU ops callbacks to take a pointer to
      the current iommu_iotlb_gather structure.
      
      All affected IOMMU drivers are updated, but there should be no
      functional change since the extra parameter is ignored for now.
      Signed-off-by: NWill Deacon <will@kernel.org>
      56f8af5e
  15. 24 7月, 2019 3 次提交
    • W
      iommu: Introduce iommu_iotlb_gather_add_page() · 4fcf8544
      Will Deacon 提交于
      Introduce a helper function for drivers to use when updating an
      iommu_iotlb_gather structure in response to an ->unmap() call, rather
      than having to open-code the logic in every page-table implementation.
      Signed-off-by: NWill Deacon <will@kernel.org>
      4fcf8544
    • W
      iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes · a7d20dc1
      Will Deacon 提交于
      To permit batching of TLB flushes across multiple calls to the IOMMU
      driver's ->unmap() implementation, introduce a new structure for
      tracking the address range to be flushed and the granularity at which
      the flushing is required.
      
      This is hooked into the IOMMU API and its caller are updated to make use
      of the new structure. Subsequent patches will plumb this into the IOMMU
      drivers as well, but for now the gathering information is ignored.
      Signed-off-by: NWill Deacon <will@kernel.org>
      a7d20dc1
    • W
      iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops · 6d1bcb95
      Will Deacon 提交于
      Commit add02cfd ("iommu: Introduce Interface for IOMMU TLB Flushing")
      added three new TLB flushing operations to the IOMMU API so that the
      underlying driver operations can be batched when unmapping large regions
      of IO virtual address space.
      
      However, the ->iotlb_range_add() callback has not been implemented by
      any IOMMU drivers (amd_iommu.c implements it as an empty function, which
      incurs the overhead of an indirect branch). Instead, drivers either flush
      the entire IOTLB in the ->iotlb_sync() callback or perform the necessary
      invalidation during ->unmap().
      
      Attempting to implement ->iotlb_range_add() for arm-smmu-v3.c revealed
      two major issues:
      
        1. The page size used to map the region in the page-table is not known,
           and so it is not generally possible to issue TLB flushes in the most
           efficient manner.
      
        2. The only mutable state passed to the callback is a pointer to the
           iommu_domain, which can be accessed concurrently and therefore
           requires expensive synchronisation to keep track of the outstanding
           flushes.
      
      Remove the callback entirely in preparation for extending ->unmap() and
      ->iotlb_sync() to update a token on the caller's stack.
      Signed-off-by: NWill Deacon <will@kernel.org>
      6d1bcb95
  16. 19 6月, 2019 1 次提交
    • V
      iommu/io-pgtable-arm: Add support to use system cache · 90ec7a76
      Vivek Gautam 提交于
      Few Qualcomm platforms such as, sdm845 have an additional outer
      cache called as System cache, aka. Last level cache (LLC) that
      allows non-coherent devices to upgrade to using caching.
      This cache sits right before the DDR, and is tightly coupled
      with the memory controller. The clients using this cache request
      their slices from this system cache, make it active, and can then
      start using it.
      
      There is a fundamental assumption that non-coherent devices can't
      access caches. This change adds an exception where they *can* use
      some level of cache despite still being non-coherent overall.
      The coherent devices that use cacheable memory, and CPU make use of
      this system cache by default.
      
      Looking at memory types, we have following -
      a) Normal uncached :- MAIR 0x44, inner non-cacheable,
                            outer non-cacheable;
      b) Normal cached :-   MAIR 0xff, inner read write-back non-transient,
                            outer read write-back non-transient;
                            attribute setting for coherenet I/O devices.
      and, for non-coherent i/o devices that can allocate in system cache
      another type gets added -
      c) Normal sys-cached :- MAIR 0xf4, inner non-cacheable,
                              outer read write-back non-transient
      
      Coherent I/O devices use system cache by marking the memory as
      normal cached.
      Non-coherent I/O devices should mark the memory as normal
      sys-cached in page tables to use system cache.
      Acked-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NVivek Gautam <vivek.gautam@codeaurora.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      90ec7a76
  17. 12 6月, 2019 4 次提交
    • E
      iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions · adfd3738
      Eric Auger 提交于
      Introduce a new type for reserved region. This corresponds
      to directly mapped regions which are known to be relaxable
      in some specific conditions, such as device assignment use
      case. Well known examples are those used by USB controllers
      providing PS/2 keyboard emulation for pre-boot BIOS and
      early BOOT or RMRRs associated to IGD working in legacy mode.
      
      Since commit c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs
      from IOMMU API domains") and commit 18436afd ("iommu/vt-d: Allow
      RMRR on graphics devices too"), those regions are currently
      considered "safe" with respect to device assignment use case
      which requires a non direct mapping at IOMMU physical level
      (RAM GPA -> HPA mapping).
      
      Those RMRRs currently exist and sometimes the device is
      attempting to access it but this has not been considered
      an issue until now.
      
      However at the moment, iommu_get_group_resv_regions() is
      not able to make any difference between directly mapped
      regions: those which must be absolutely enforced and those
      like above ones which are known as relaxable.
      
      This is a blocker for reporting severe conflicts between
      non relaxable RMRRs (like MSI doorbells) and guest GPA space.
      
      With this new reserved region type we will be able to use
      iommu_get_group_resv_regions() to enumerate the IOVA space
      that is usable through the IOMMU API without introducing
      regressions with respect to existing device assignment
      use cases (USB and IGD).
      Signed-off-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      adfd3738
    • J
      iommu: Add recoverable fault reporting · bf3255b3
      Jean-Philippe Brucker 提交于
      Some IOMMU hardware features, for example PCI PRI and Arm SMMU Stall,
      enable recoverable I/O page faults. Allow IOMMU drivers to report PRI Page
      Requests and Stall events through the new fault reporting API. The
      consumer of the fault can be either an I/O page fault handler in the host,
      or a guest OS.
      
      Once handled, the fault must be completed by sending a page response back
      to the IOMMU. Add an iommu_page_response() function to complete a page
      fault.
      
      There are two ways to extend the userspace API:
      * Add a field to iommu_page_response and a flag to
        iommu_page_response::flags describing the validity of this field.
      * Introduce a new iommu_page_response_X structure with a different version
        number. The kernel must then support both versions.
      Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
      Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      bf3255b3
    • J
      iommu: Introduce device fault report API · 0c830e6b
      Jacob Pan 提交于
      Traditionally, device specific faults are detected and handled within
      their own device drivers. When IOMMU is enabled, faults such as DMA
      related transactions are detected by IOMMU. There is no generic
      reporting mechanism to report faults back to the in-kernel device
      driver or the guest OS in case of assigned devices.
      
      This patch introduces a registration API for device specific fault
      handlers. This differs from the existing iommu_set_fault_handler/
      report_iommu_fault infrastructures in several ways:
      - it allows to report more sophisticated fault events (both
        unrecoverable faults and page request faults) due to the nature
        of the iommu_fault struct
      - it is device specific and not domain specific.
      
      The current iommu_report_device_fault() implementation only handles
      the "shoot and forget" unrecoverable fault case. Handling of page
      request faults or stalled faults will come later.
      Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Signed-off-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      0c830e6b
    • J
      iommu: Introduce device fault data · 4e32348b
      Jacob Pan 提交于
      Device faults detected by IOMMU can be reported outside the IOMMU
      subsystem for further processing. This patch introduces
      a generic device fault data structure.
      
      The fault can be either an unrecoverable fault or a page request,
      also referred to as a recoverable fault.
      
      We only care about non internal faults that are likely to be reported
      to an external subsystem.
      Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
      Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Signed-off-by: NLiu, Yi L <yi.l.liu@linux.intel.com>
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      Signed-off-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      4e32348b
  18. 05 6月, 2019 1 次提交
  19. 27 5月, 2019 1 次提交
    • L
      iommu: Add API to request DMA domain for device · 7423e017
      Lu Baolu 提交于
      Normally during iommu probing a device, a default doamin will
      be allocated and attached to the device. The domain type of
      the default domain is statically defined, which results in a
      situation where the allocated default domain isn't suitable
      for the device due to some limitations. We already have API
      iommu_request_dm_for_dev() to replace a DMA domain with an
      identity one. This adds iommu_request_dma_domain_for_dev()
      to request a dma domain if an allocated identity domain isn't
      suitable for the device in question.
      Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      7423e017
  20. 23 4月, 2019 1 次提交
  21. 11 4月, 2019 2 次提交
    • J
      iommu: Bind process address spaces to devices · 26b25a2b
      Jean-Philippe Brucker 提交于
      Add bind() and unbind() operations to the IOMMU API.
      iommu_sva_bind_device() binds a device to an mm, and returns a handle to
      the bond, which is released by calling iommu_sva_unbind_device().
      
      Each mm bound to devices gets a PASID (by convention, a 20-bit system-wide
      ID representing the address space), which can be retrieved with
      iommu_sva_get_pasid(). When programming DMA addresses, device drivers
      include this PASID in a device-specific manner, to let the device access
      the given address space. Since the process memory may be paged out, device
      and IOMMU must support I/O page faults (e.g. PCI PRI).
      
      Using iommu_sva_set_ops(), device drivers provide an mm_exit() callback
      that is called by the IOMMU driver if the process exits before the device
      driver called unbind(). In mm_exit(), device driver should disable DMA
      from the given context, so that the core IOMMU can reallocate the PASID.
      Whether the process exited or nor, the device driver should always release
      the handle with unbind().
      
      To use these functions, device driver must first enable the
      IOMMU_DEV_FEAT_SVA device feature with iommu_dev_enable_feature().
      Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      26b25a2b
    • L
      iommu: Add APIs for multiple domains per device · a3a19592
      Lu Baolu 提交于
      Sharing a physical PCI device in a finer-granularity way
      is becoming a consensus in the industry. IOMMU vendors
      are also engaging efforts to support such sharing as well
      as possible. Among the efforts, the capability of support
      finer-granularity DMA isolation is a common requirement
      due to the security consideration. With finer-granularity
      DMA isolation, subsets of a PCI function can be isolated
      from each others by the IOMMU. As a result, there is a
      request in software to attach multiple domains to a physical
      PCI device. One example of such use model is the Intel
      Scalable IOV [1] [2]. The Intel vt-d 3.0 spec [3] introduces
      the scalable mode which enables PASID granularity DMA
      isolation.
      
      This adds the APIs to support multiple domains per device.
      In order to ease the discussions, we call it 'a domain in
      auxiliary mode' or simply 'auxiliary domain' when multiple
      domains are attached to a physical device.
      
      The APIs include:
      
      * iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
        - Detect both IOMMU and PCI endpoint devices supporting
          the feature (aux-domain here) without the host driver
          dependency.
      
      * iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)
        - Check the enabling status of the feature (aux-domain
          here). The aux-domain interfaces are available only
          if this returns true.
      
      * iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX)
        - Enable/disable device specific aux-domain feature.
      
      * iommu_aux_attach_device(domain, dev)
        - Attaches @domain to @dev in the auxiliary mode. Multiple
          domains could be attached to a single device in the
          auxiliary mode with each domain representing an isolated
          address space for an assignable subset of the device.
      
      * iommu_aux_detach_device(domain, dev)
        - Detach @domain which has been attached to @dev in the
          auxiliary mode.
      
      * iommu_aux_get_pasid(domain, dev)
        - Return ID used for finer-granularity DMA translation.
          For the Intel Scalable IOV usage model, this will be
          a PASID. The device which supports Scalable IOV needs
          to write this ID to the device register so that DMA
          requests could be tagged with a right PASID prefix.
      
      This has been updated with the latest proposal from Joerg
      posted here [5].
      
      Many people involved in discussions of this design.
      
      Kevin Tian <kevin.tian@intel.com>
      Liu Yi L <yi.l.liu@intel.com>
      Ashok Raj <ashok.raj@intel.com>
      Sanjay Kumar <sanjay.k.kumar@intel.com>
      Jacob Pan <jacob.jun.pan@linux.intel.com>
      Alex Williamson <alex.williamson@redhat.com>
      Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Joerg Roedel <joro@8bytes.org>
      
      and some discussions can be found here [4] [5].
      
      [1] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
      [2] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
      [3] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
      [4] https://lkml.org/lkml/2018/7/26/4
      [5] https://www.spinics.net/lists/iommu/msg31874.html
      
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Kevin Tian <kevin.tian@intel.com>
      Cc: Liu Yi L <yi.l.liu@intel.com>
      Suggested-by: NKevin Tian <kevin.tian@intel.com>
      Suggested-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Suggested-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Reviewed-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      a3a19592
  22. 26 2月, 2019 2 次提交