1. 11 4月, 2019 2 次提交
    • J
      iommu: Bind process address spaces to devices · 26b25a2b
      Jean-Philippe Brucker 提交于
      Add bind() and unbind() operations to the IOMMU API.
      iommu_sva_bind_device() binds a device to an mm, and returns a handle to
      the bond, which is released by calling iommu_sva_unbind_device().
      
      Each mm bound to devices gets a PASID (by convention, a 20-bit system-wide
      ID representing the address space), which can be retrieved with
      iommu_sva_get_pasid(). When programming DMA addresses, device drivers
      include this PASID in a device-specific manner, to let the device access
      the given address space. Since the process memory may be paged out, device
      and IOMMU must support I/O page faults (e.g. PCI PRI).
      
      Using iommu_sva_set_ops(), device drivers provide an mm_exit() callback
      that is called by the IOMMU driver if the process exits before the device
      driver called unbind(). In mm_exit(), device driver should disable DMA
      from the given context, so that the core IOMMU can reallocate the PASID.
      Whether the process exited or nor, the device driver should always release
      the handle with unbind().
      
      To use these functions, device driver must first enable the
      IOMMU_DEV_FEAT_SVA device feature with iommu_dev_enable_feature().
      Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      26b25a2b
    • L
      iommu: Add APIs for multiple domains per device · a3a19592
      Lu Baolu 提交于
      Sharing a physical PCI device in a finer-granularity way
      is becoming a consensus in the industry. IOMMU vendors
      are also engaging efforts to support such sharing as well
      as possible. Among the efforts, the capability of support
      finer-granularity DMA isolation is a common requirement
      due to the security consideration. With finer-granularity
      DMA isolation, subsets of a PCI function can be isolated
      from each others by the IOMMU. As a result, there is a
      request in software to attach multiple domains to a physical
      PCI device. One example of such use model is the Intel
      Scalable IOV [1] [2]. The Intel vt-d 3.0 spec [3] introduces
      the scalable mode which enables PASID granularity DMA
      isolation.
      
      This adds the APIs to support multiple domains per device.
      In order to ease the discussions, we call it 'a domain in
      auxiliary mode' or simply 'auxiliary domain' when multiple
      domains are attached to a physical device.
      
      The APIs include:
      
      * iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
        - Detect both IOMMU and PCI endpoint devices supporting
          the feature (aux-domain here) without the host driver
          dependency.
      
      * iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)
        - Check the enabling status of the feature (aux-domain
          here). The aux-domain interfaces are available only
          if this returns true.
      
      * iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX)
        - Enable/disable device specific aux-domain feature.
      
      * iommu_aux_attach_device(domain, dev)
        - Attaches @domain to @dev in the auxiliary mode. Multiple
          domains could be attached to a single device in the
          auxiliary mode with each domain representing an isolated
          address space for an assignable subset of the device.
      
      * iommu_aux_detach_device(domain, dev)
        - Detach @domain which has been attached to @dev in the
          auxiliary mode.
      
      * iommu_aux_get_pasid(domain, dev)
        - Return ID used for finer-granularity DMA translation.
          For the Intel Scalable IOV usage model, this will be
          a PASID. The device which supports Scalable IOV needs
          to write this ID to the device register so that DMA
          requests could be tagged with a right PASID prefix.
      
      This has been updated with the latest proposal from Joerg
      posted here [5].
      
      Many people involved in discussions of this design.
      
      Kevin Tian <kevin.tian@intel.com>
      Liu Yi L <yi.l.liu@intel.com>
      Ashok Raj <ashok.raj@intel.com>
      Sanjay Kumar <sanjay.k.kumar@intel.com>
      Jacob Pan <jacob.jun.pan@linux.intel.com>
      Alex Williamson <alex.williamson@redhat.com>
      Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Joerg Roedel <joro@8bytes.org>
      
      and some discussions can be found here [4] [5].
      
      [1] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
      [2] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
      [3] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
      [4] https://lkml.org/lkml/2018/7/26/4
      [5] https://www.spinics.net/lists/iommu/msg31874.html
      
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Kevin Tian <kevin.tian@intel.com>
      Cc: Liu Yi L <yi.l.liu@intel.com>
      Suggested-by: NKevin Tian <kevin.tian@intel.com>
      Suggested-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Suggested-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Reviewed-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      a3a19592
  2. 25 3月, 2019 1 次提交
  3. 11 2月, 2019 1 次提交
    • B
      iommu: Use dev_printk() when possible · 780da9e4
      Bjorn Helgaas 提交于
      Use dev_printk() when possible so the IOMMU messages are more consistent
      with other messages related to the device.
      
      E.g., I think these messages related to surprise hotplug:
      
        pciehp 0000:80:10.0:pcie004: Slot(36): Link Down
        iommu: Removing device 0000:87:00.0 from group 12
        pciehp 0000:80:10.0:pcie004: Slot(36): Card present
        pcieport 0000:80:10.0: Data Link Layer Link Active not set in 1000 msec
      
      would be easier to read as these (also requires some PCI changes not
      included here):
      
        pci 0000:80:10.0: Slot(36): Link Down
        pci 0000:87:00.0: Removing from iommu group 12
        pci 0000:80:10.0: Slot(36): Card present
        pci 0000:80:10.0: Data Link Layer Link Active not set in 1000 msec
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      780da9e4
  4. 16 1月, 2019 1 次提交
  5. 20 12月, 2018 1 次提交
  6. 17 12月, 2018 2 次提交
  7. 03 12月, 2018 1 次提交
    • P
      iommu: Audit and remove any unnecessary uses of module.h · c1af7b40
      Paul Gortmaker 提交于
      Historically a lot of these existed because we did not have
      a distinction between what was modular code and what was providing
      support to modules via EXPORT_SYMBOL and friends.  That changed
      when we forked out support for the latter into the export.h file.
      This means we should be able to reduce the usage of module.h
      in code that is obj-y Makefile or bool Kconfig.
      
      The advantage in removing such instances is that module.h itself
      sources about 15 other headers; adding significantly to what we feed
      cpp, and it can obscure what headers we are effectively using.
      
      Since module.h might have been the implicit source for init.h
      (for __init) and for export.h (for EXPORT_SYMBOL) we consider each
      instance for the presence of either and replace as needed.
      
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: iommu@lists.linux-foundation.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      c1af7b40
  8. 06 11月, 2018 1 次提交
    • R
      iommu: Do physical merging in iommu_map_sg() · 5d95f40e
      Robin Murphy 提交于
      The original motivation for iommu_map_sg() was to give IOMMU drivers the
      chance to map an IOVA-contiguous scatterlist as efficiently as they
      could. It turns out that there isn't really much driver-specific
      business involved there, so now that the default implementation is
      mandatory let's just improve that - the main thing we're after is to use
      larger pages wherever possible, and as long as domain->pgsize_bitmap
      reflects reality, iommu_map() can already do that in a generic way. All
      we need to do is detect physically-contiguous segments and batch them
      into a single map operation, since whatever we do here is transparent to
      our caller and not bound by any segment-length restrictions on the list
      itself.
      
      Speaking of efficiency, there's really very little point in duplicating
      the checks that iommu_map() is going to do anyway, so those get cleared
      up in the process.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      5d95f40e
  9. 01 10月, 2018 1 次提交
  10. 25 9月, 2018 4 次提交
  11. 08 8月, 2018 1 次提交
  12. 27 7月, 2018 2 次提交
  13. 06 7月, 2018 1 次提交
    • G
      iommu: Enable debugfs exposure of IOMMU driver internals · bad614b2
      Gary R Hook 提交于
      Provide base enablement for using debugfs to expose internal data of an
      IOMMU driver. When called, create the /sys/kernel/debug/iommu directory.
      
      Emit a strong warning at boot time to indicate that this feature is
      enabled.
      
      This function is called from iommu_init, and creates the initial DebugFS
      directory. Drivers may then call iommu_debugfs_new_driver_dir() to
      instantiate a device-specific directory to expose internal data.
      It will return a pointer to the new dentry structure created in
      /sys/kernel/debug/iommu, or NULL in the event of a failure.
      
      Since the IOMMU driver can not be removed from the running system, there
      is no need for an "off" function.
      Signed-off-by: NGary R Hook <gary.hook@amd.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      bad614b2
  14. 15 5月, 2018 2 次提交
  15. 14 2月, 2018 1 次提交
  16. 21 12月, 2017 1 次提交
  17. 31 8月, 2017 1 次提交
    • J
      iommu: Introduce Interface for IOMMU TLB Flushing · add02cfd
      Joerg Roedel 提交于
      With the current IOMMU-API the hardware TLBs have to be
      flushed in every iommu_ops->unmap() call-back.
      
      For unmapping large amounts of address space, like it
      happens when a KVM domain with assigned devices is
      destroyed, this causes thousands of unnecessary TLB flushes
      in the IOMMU hardware because the unmap call-back runs for
      every unmapped physical page.
      
      With the TLB Flush Interface and the new iommu_unmap_fast()
      function introduced here the need to clean the hardware TLBs
      is removed from the unmapping code-path. Users of
      iommu_unmap_fast() have to explicitly call the TLB-Flush
      functions to sync the page-table changes to the hardware.
      
      Three functions for TLB-Flushes are introduced:
      
      	* iommu_flush_tlb_all() - Flushes all TLB entries
      	                          associated with that
      				  domain. TLBs entries are
      				  flushed when this function
      				  returns.
      
      	* iommu_tlb_range_add() - This will add a given
      				  range to the flush queue
      				  for this domain.
      
      	* iommu_tlb_sync() - Flushes all queued ranges from
      			     the hardware TLBs. Returns when
      			     the flush is finished.
      
      The semantic of this interface is intentionally similar to
      the iommu_gather_ops from the io-pgtable code.
      
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      add02cfd
  18. 18 8月, 2017 1 次提交
    • R
      iommu: Avoid NULL group dereference · 1464d0b1
      Robin Murphy 提交于
      The recently-removed FIXME in iommu_get_domain_for_dev() turns out to
      have been a little misleading, since that check is still worthwhile even
      when groups *are* universal. We have a few IOMMU-aware drivers which
      only care whether their device is already attached to an existing domain
      or not, for which the previous behaviour of iommu_get_domain_for_dev()
      was ideal, and who now crash if their device does not have an IOMMU.
      
      With IOMMU groups now serving as a reliable indicator of whether a
      device has an IOMMU or not (barring false-positives from VFIO no-IOMMU
      mode), drivers could arguably do this:
      
      	group = iommu_group_get(dev);
      	if (group) {
      		domain = iommu_get_domain_for_dev(dev);
      		iommu_group_put(group);
      	}
      
      However, rather than duplicate that code across multiple callsites,
      particularly when it's still only the domain they care about, let's skip
      straight to the next step and factor out the check into the common place
      it applies - in iommu_get_domain_for_dev() itself. Sure, it ends up
      looking rather familiar, but now it's backed by the reasoning of having
      a robust API able to do the expected thing for all devices regardless.
      
      Fixes: 05f80300 ("iommu: Finish making iommu_group support mandatory")
      Reported-by: NShawn Lin <shawn.lin@rock-chips.com>
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      1464d0b1
  19. 16 8月, 2017 1 次提交
  20. 10 8月, 2017 1 次提交
    • R
      iommu: Finish making iommu_group support mandatory · 05f80300
      Robin Murphy 提交于
      Now that all the drivers properly implementing the IOMMU API support
      groups (I'm ignoring the etnaviv GPU MMUs which seemingly only do just
      enough to convince the ARM DMA mapping ops), we can remove the FIXME
      workarounds from the core code. In the process, it also seems logical to
      make the .device_group callback non-optional for drivers calling
      iommu_group_get_for_dev() - the current callers all implement it anyway,
      and it doesn't make sense for any future callers not to either.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      05f80300
  21. 28 6月, 2017 2 次提交
  22. 27 4月, 2017 1 次提交
  23. 20 4月, 2017 1 次提交
  24. 06 4月, 2017 1 次提交
    • W
      iommu: Allow default domain type to be set on the kernel command line · fccb4e3b
      Will Deacon 提交于
      The IOMMU core currently initialises the default domain for each group
      to IOMMU_DOMAIN_DMA, under the assumption that devices will use
      IOMMU-backed DMA ops by default. However, in some cases it is desirable
      for the DMA ops to bypass the IOMMU for performance reasons, reserving
      use of translation for subsystems such as VFIO that require it for
      enforcing device isolation.
      
      Rather than modify each IOMMU driver to provide different semantics for
      DMA domains, instead we introduce a command line parameter that can be
      used to change the type of the default domain. Passthrough can then be
      specified using "iommu.passthrough=1" on the kernel command line.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      fccb4e3b
  25. 22 3月, 2017 1 次提交
    • R
      iommu: Disambiguate MSI region types · 9d3a4de4
      Robin Murphy 提交于
      The introduction of reserved regions has left a couple of rough edges
      which we could do with sorting out sooner rather than later. Since we
      are not yet addressing the potential dynamic aspect of software-managed
      reservations and presenting them at arbitrary fixed addresses, it is
      incongruous that we end up displaying hardware vs. software-managed MSI
      regions to userspace differently, especially since ARM-based systems may
      actually require one or the other, or even potentially both at once,
      (which iommu-dma currently has no hope of dealing with at all). Let's
      resolve the former user-visible inconsistency ASAP before the ABI has
      been baked into a kernel release, in a way that also lays the groundwork
      for the latter shortcoming to be addressed by follow-up patches.
      
      For clarity, rename the software-managed type to IOMMU_RESV_SW_MSI, use
      IOMMU_RESV_MSI to describe the hardware type, and document everything a
      little bit. Since the x86 MSI remapping hardware falls squarely under
      this meaning of IOMMU_RESV_MSI, apply that type to their regions as well,
      so that we tell the same story to userspace across all platforms.
      
      Secondly, as the various region types require quite different handling,
      and it really makes little sense to ever try combining them, convert the
      bitfield-esque #defines to a plain enum in the process before anyone
      gets the wrong impression.
      
      Fixes: d30ddcaa ("iommu: Add a new type field in iommu_resv_region")
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      CC: Alex Williamson <alex.williamson@redhat.com>
      CC: David Woodhouse <dwmw2@infradead.org>
      CC: kvm@vger.kernel.org
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      9d3a4de4
  26. 10 2月, 2017 4 次提交
  27. 06 2月, 2017 2 次提交
  28. 23 1月, 2017 1 次提交