1. 24 6月, 2017 9 次提交
    • R
      iommu/io-pgtable: Introduce explicit coherency · 81b3c252
      Robin Murphy 提交于
      Once we remove the serialising spinlock, a potential race opens up for
      non-coherent IOMMUs whereby a caller of .map() can be sure that cache
      maintenance has been performed on their new PTE, but will have no
      guarantee that such maintenance for table entries above it has actually
      completed (e.g. if another CPU took an interrupt immediately after
      writing the table entry, but before initiating the DMA sync).
      
      Handling this race safely will add some potentially non-trivial overhead
      to installing a table entry, which we would much rather avoid on
      coherent systems where it will be unnecessary, and where we are stirivng
      to minimise latency by removing the locking in the first place.
      
      To that end, let's introduce an explicit notion of cache-coherency to
      io-pgtable, such that we will be able to avoid penalising IOMMUs which
      know enough to know when they are coherent.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      81b3c252
    • R
      iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap · b9f1ef30
      Robin Murphy 提交于
      Whilst the short-descriptor format's split_blk_unmap implementation has
      no need to be recursive, it followed the pattern of the LPAE version
      anyway for the sake of consistency. With the latter now reworked for
      both efficiency and future scalability improvements, tweak the former
      similarly, not least to make it less obtuse.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      b9f1ef30
    • R
      iommu/io-pgtable-arm: Improve split_blk_unmap · fb3a9579
      Robin Murphy 提交于
      The current split_blk_unmap implementation suffers from some inscrutable
      pointer trickery for creating the tables to replace the block entry, but
      more than that it also suffers from hideous inefficiency. For example,
      the most pathological case of unmapping a level 3 page from a level 1
      block will allocate 513 lower-level tables to remap the entire block at
      page granularity, when only 2 are actually needed (the rest can be
      covered by level 2 block entries).
      
      Also, we would like to be able to relax the spinlock requirement in
      future, for which the roll-back-and-try-again logic for race resolution
      would be pretty hideous under the current paradigm.
      
      Both issues can be resolved most neatly by turning things sideways:
      instead of repeatedly recursing into __arm_lpae_map() map to build up an
      entire new sub-table depth-first, we can directly replace the block
      entry with a next-level table of block/page entries, then repeat by
      unmapping at the next level if necessary. With a little refactoring of
      some helper functions, the code ends up not much bigger than before, but
      considerably easier to follow and to adapt in future.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      fb3a9579
    • R
      iommu/io-pgtable-arm-v7s: Check table PTEs more precisely · 9db829d2
      Robin Murphy 提交于
      Whilst we don't support the PXN bit at all, so should never encounter a
      level 1 section or supersection PTE with it set, it would still be wise
      to check both table type bits to resolve any theoretical ambiguity.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      9db829d2
    • A
      iommu: arm-smmu: Handle return of iommu_device_register. · 5c2d0218
      Arvind Yadav 提交于
      iommu_device_register returns an error code and, although it currently
      never fails, we should check its return value anyway.
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      [will: adjusted to follow arm-smmu.c]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      5c2d0218
    • A
      iommu: arm-smmu-v3: make of_device_ids const · ebdd13c9
      Arvind Yadav 提交于
      of_device_ids are not supposed to change at runtime. All functions
      working with of_device_ids provided by <linux/of.h> work with const
      of_device_ids. So mark the non-const structs as const.
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      ebdd13c9
    • R
      iommu/arm-smmu: Plumb in new ACPI identifiers · 84c24379
      Robin Murphy 提交于
      Revision C of IORT now allows us to identify ARM MMU-401 and the Cavium
      ThunderX implementation. Wire them up so that we can probe these models
      once firmware starts using the new codes in place of generic ones, and
      so that the appropriate features and quirks get enabled when we do.
      
      For the sake of backports and mitigating sychronisation problems with
      the ACPICA headers, we'll carry a backup copy of the new definitions
      locally for the short term to make life simpler.
      
      CC: stable@vger.kernel.org # 4.10
      Acked-by: NRobert Richter <rrichter@cavium.com>
      Tested-by: NRobert Richter <rrichter@cavium.com>
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      84c24379
    • A
      iommu/io-pgtable-arm-v7s: constify dummy_tlb_ops. · 60ab7a75
      Arvind Yadav 提交于
      File size before:
         text	   data	    bss	    dec	    hex	filename
         6146	     56	      9	   6211	   1843	drivers/iommu/io-pgtable-arm-v7s.o
      
      File size After adding 'const':
         text	   data	    bss	    dec	    hex	filename
         6170	     24	      9	   6203	   183b	drivers/iommu/io-pgtable-arm-v7s.o
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      60ab7a75
    • S
      iommu/arm-smmu-v3: Increase CMDQ drain timeout value · b847de4e
      Sunil Goutham 提交于
      Waiting for a CMD_SYNC to be processed involves waiting for the command
      queue to drain, which can take an awful lot longer than waiting for a
      single entry to become available. Consequently, the common timeout value
      of 100us has been observed to be too short on some platforms when a
      CMD_SYNC is issued into a queued full of TLBI commands.
      
      This patch resolves the issue by using a different (1s) timeout when
      waiting for the CMDQ to drain and using a simple back-off mechanism
      when polling the cons pointer in the absence of WFE support.
      Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
      [will: rewrote commit message and cosmetic changes]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      b847de4e
  2. 29 4月, 2017 2 次提交
    • J
      iommu: Remove pci.h include from trace/events/iommu.h · 461a6946
      Joerg Roedel 提交于
      The include file does not need any PCI specifics, so remove
      that include. Also fix the places that relied on it.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      461a6946
    • Q
      iommu/vt-d: Don't print the failure message when booting non-kdump kernel · 8e121884
      Qiuxu Zhuo 提交于
      When booting a new non-kdump kernel, we have below failure message:
      
      [    0.004000] DMAR-IR: IRQ remapping was enabled on dmar2 but we are not in kdump mode
      [    0.004000] DMAR-IR: Failed to copy IR table for dmar2 from previous kernel
      [    0.004000] DMAR-IR: IRQ remapping was enabled on dmar1 but we are not in kdump mode
      [    0.004000] DMAR-IR: Failed to copy IR table for dmar1 from previous kernel
      [    0.004000] DMAR-IR: IRQ remapping was enabled on dmar0 but we are not in kdump mode
      [    0.004000] DMAR-IR: Failed to copy IR table for dmar0 from previous kernel
      [    0.004000] DMAR-IR: IRQ remapping was enabled on dmar3 but we are not in kdump mode
      [    0.004000] DMAR-IR: Failed to copy IR table for dmar3 from previous kernel
      
      For non-kdump case, we no need to copy IR table from previous kernel
      so it's nonthing actually failed. To be less alarming or misleading,
      do not print "DMAR-IR: Failed to copy IR table for dmar[0-9] from
      previous kernel" messages when booting non-kdump kernel.
      Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      8e121884
  3. 27 4月, 2017 2 次提交
    • J
      iommu: Move report_iommu_fault() to iommu.c · 207c6e36
      Joerg Roedel 提交于
      The function is in no fast-path, there is no need for it to
      be static inline in a header file. This also removes the
      need to include iommu trace-points in iommu.h.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      207c6e36
    • S
      x86, iommu/vt-d: Add an option to disable Intel IOMMU force on · bfd20f1c
      Shaohua Li 提交于
      IOMMU harms performance signficantly when we run very fast networking
      workloads. It's 40GB networking doing XDP test. Software overhead is
      almost unaware, but it's the IOTLB miss (based on our analysis) which
      kills the performance. We observed the same performance issue even with
      software passthrough (identity mapping), only the hardware passthrough
      survives. The pps with iommu (with software passthrough) is only about
      ~30% of that without it. This is a limitation in hardware based on our
      observation, so we'd like to disable the IOMMU force on, but we do want
      to use TBOOT and we can sacrifice the DMA security bought by IOMMU. I
      must admit I know nothing about TBOOT, but TBOOT guys (cc-ed) think not
      eabling IOMMU is totally ok.
      
      So introduce a new boot option to disable the force on. It's kind of
      silly we need to run into intel_iommu_init even without force on, but we
      need to disable TBOOT PMR registers. For system without the boot option,
      nothing is changed.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      bfd20f1c
  4. 26 4月, 2017 1 次提交
  5. 25 4月, 2017 1 次提交
  6. 24 4月, 2017 1 次提交
  7. 20 4月, 2017 11 次提交
  8. 07 4月, 2017 1 次提交
    • N
      iommu/iova: Fix underflow bug in __alloc_and_insert_iova_range · 5016bdb7
      Nate Watterson 提交于
      Normally, calling alloc_iova() using an iova_domain with insufficient
      pfns remaining between start_pfn and dma_limit will fail and return a
      NULL pointer. Unexpectedly, if such a "full" iova_domain contains an
      iova with pfn_lo == 0, the alloc_iova() call will instead succeed and
      return an iova containing invalid pfns.
      
      This is caused by an underflow bug in __alloc_and_insert_iova_range()
      that occurs after walking the "full" iova tree when the search ends
      at the iova with pfn_lo == 0 and limit_pfn is then adjusted to be just
      below that (-1). This (now huge) limit_pfn gives the impression that a
      vast amount of space is available between it and start_pfn and thus
      a new iova is allocated with the invalid pfn_hi value, 0xFFF.... .
      
      To rememdy this, a check is introduced to ensure that adjustments to
      limit_pfn will not underflow.
      
      This issue has been observed in the wild, and is easily reproduced with
      the following sample code.
      
      	struct iova_domain *iovad = kzalloc(sizeof(*iovad), GFP_KERNEL);
      	struct iova *rsvd_iova, *good_iova, *bad_iova;
      	unsigned long limit_pfn = 3;
      	unsigned long start_pfn = 1;
      	unsigned long va_size = 2;
      
      	init_iova_domain(iovad, SZ_4K, start_pfn, limit_pfn);
      	rsvd_iova = reserve_iova(iovad, 0, 0);
      	good_iova = alloc_iova(iovad, va_size, limit_pfn, true);
      	bad_iova = alloc_iova(iovad, va_size, limit_pfn, true);
      
      Prior to the patch, this yielded:
      	*rsvd_iova == {0, 0}   /* Expected */
      	*good_iova == {2, 3}   /* Expected */
      	*bad_iova  == {-2, -1} /* Oh no... */
      
      After the patch, bad_iova is NULL as expected since inadequate
      space remains between limit_pfn and start_pfn after allocating
      good_iova.
      Signed-off-by: NNate Watterson <nwatters@codeaurora.org>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      5016bdb7
  9. 06 4月, 2017 12 次提交
    • R
      iommu/io-pgtable-arm: Avoid shift overflow in block size · 022f4e4f
      Robin Murphy 提交于
      The recursive nature of __arm_lpae_{map,unmap}() means that
      ARM_LPAE_BLOCK_SIZE() is evaluated for every level, including those
      where block mappings aren't possible. This in itself is harmless enough,
      as we will only ever be called with valid sizes from the pgsize_bitmap,
      and thus always recurse down past any imaginary block sizes. The only
      problem is that most of those imaginary sizes overflow the type used for
      the calculation, and thus trigger warnings under UBsan:
      
      [   63.020939] ================================================================================
      [   63.021284] UBSAN: Undefined behaviour in drivers/iommu/io-pgtable-arm.c:312:22
      [   63.021602] shift exponent 39 is too large for 32-bit type 'int'
      [   63.021909] CPU: 0 PID: 1119 Comm: lkvm Not tainted 4.7.0-rc3+ #819
      [   63.022163] Hardware name: FVP Base (DT)
      [   63.022345] Call trace:
      [   63.022629] [<ffffff900808f258>] dump_backtrace+0x0/0x3a8
      [   63.022975] [<ffffff900808f614>] show_stack+0x14/0x20
      [   63.023294] [<ffffff90086bc9dc>] dump_stack+0x104/0x148
      [   63.023609] [<ffffff9008713ce8>] ubsan_epilogue+0x18/0x68
      [   63.023956] [<ffffff9008714410>] __ubsan_handle_shift_out_of_bounds+0x18c/0x1bc
      [   63.024365] [<ffffff900890fcb0>] __arm_lpae_map+0x720/0xae0
      [   63.024732] [<ffffff9008910170>] arm_lpae_map+0x100/0x190
      [   63.025049] [<ffffff90089183d8>] arm_smmu_map+0x78/0xc8
      [   63.025390] [<ffffff9008906c18>] iommu_map+0x130/0x230
      [   63.025763] [<ffffff9008bf7564>] vfio_iommu_type1_attach_group+0x4bc/0xa00
      [   63.026156] [<ffffff9008bf3c78>] vfio_fops_unl_ioctl+0x320/0x580
      [   63.026515] [<ffffff9008377420>] do_vfs_ioctl+0x140/0xd28
      [   63.026858] [<ffffff9008378094>] SyS_ioctl+0x8c/0xa0
      [   63.027179] [<ffffff9008086e70>] el0_svc_naked+0x24/0x28
      [   63.027412] ================================================================================
      
      Perform the shift in a 64-bit type to prevent the theoretical overflow
      and keep the peace. As it turns out, this generates identical code for
      32-bit ARM, and marginally shorter AArch64 code, so it's good all round.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      022f4e4f
    • W
      iommu: Allow default domain type to be set on the kernel command line · fccb4e3b
      Will Deacon 提交于
      The IOMMU core currently initialises the default domain for each group
      to IOMMU_DOMAIN_DMA, under the assumption that devices will use
      IOMMU-backed DMA ops by default. However, in some cases it is desirable
      for the DMA ops to bypass the IOMMU for performance reasons, reserving
      use of translation for subsystems such as VFIO that require it for
      enforcing device isolation.
      
      Rather than modify each IOMMU driver to provide different semantics for
      DMA domains, instead we introduce a command line parameter that can be
      used to change the type of the default domain. Passthrough can then be
      specified using "iommu.passthrough=1" on the kernel command line.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      fccb4e3b
    • W
      iommu/arm-smmu-v3: Install bypass STEs for IOMMU_DOMAIN_IDENTITY domains · beb3c6a0
      Will Deacon 提交于
      In preparation for allowing the default domain type to be overridden,
      this patch adds support for IOMMU_DOMAIN_IDENTITY domains to the
      ARM SMMUv3 driver.
      
      An identity domain is created by placing the corresponding stream table
      entries into "bypass" mode, which allows transactions to flow through
      the SMMU without any translation.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      beb3c6a0
    • W
      iommu/arm-smmu-v3: Make arm_smmu_install_ste_for_dev return void · 67560edc
      Will Deacon 提交于
      arm_smmu_install_ste_for_dev cannot fail and always returns 0, however
      the fact that it returns int means that callers end up implementing
      redundant error handling code which complicates STE tracking and is
      never executed.
      
      This patch changes the return type of arm_smmu_install_ste_for_dev
      to void, to make it explicit that it cannot fail.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      67560edc
    • W
      iommu/arm-smmu: Install bypass S2CRs for IOMMU_DOMAIN_IDENTITY domains · 61bc6711
      Will Deacon 提交于
      In preparation for allowing the default domain type to be overridden,
      this patch adds support for IOMMU_DOMAIN_IDENTITY domains to the
      ARM SMMU driver.
      
      An identity domain is created by placing the corresponding S2CR
      registers into "bypass" mode, which allows transactions to flow through
      the SMMU without any translation.
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      61bc6711
    • W
      iommu/arm-smmu: Restrict domain attributes to UNMANAGED domains · 0834cc28
      Will Deacon 提交于
      The ARM SMMU drivers provide a DOMAIN_ATTR_NESTING domain attribute,
      which allows callers of the IOMMU API to request that the page table
      for a domain is installed at stage-2, if supported by the hardware.
      
      Since setting this attribute only makes sense for UNMANAGED domains,
      this patch returns -ENODEV if the domain_{get,set}_attr operations are
      called on other domain types.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      0834cc28
    • R
      iommu/arm-smmu: Add global SMR masking property · 56fbf600
      Robin Murphy 提交于
      The current SMR masking support using a 2-cell iommu-specifier is
      primarily intended to handle individual masters with large and/or
      complex Stream ID assignments; it quickly gets a bit clunky in other SMR
      use-cases where we just want to consistently mask out the same part of
      every Stream ID (e.g. for MMU-500 configurations where the appended TBU
      number gets in the way unnecessarily). Let's add a new property to allow
      a single global mask value to better fit the latter situation.
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: NNipun Gupta <nipun.gupta@nxp.com>
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      56fbf600
    • R
      iommu/arm-smmu: Poll for TLB sync completion more effectively · 8513c893
      Robin Murphy 提交于
      On relatively slow development platforms and software models, the
      inefficiency of our TLB sync loop tends not to show up - for instance on
      a Juno r1 board I typically see the TLBI has completed of its own accord
      by the time we get to the sync, such that the latter finishes instantly.
      
      However, on larger systems doing real I/O, it's less realistic for the
      TLBs to go idle immediately, and at that point falling into the 1MHz
      polling loop turns out to throw away performance drastically. Let's
      strike a balance by polling more than once between pauses, such that we
      have much more chance of catching normal operations completing before
      committing to the fixed delay, but also backing off exponentially, since
      if a sync really hasn't completed within one or two "reasonable time"
      periods, it becomes increasingly unlikely that it ever will.
      Reviewed-by: NJordan Crouse <jcrouse@codeaurora.org>
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      8513c893
    • R
      iommu/arm-smmu: Use per-context TLB sync as appropriate · 11febfca
      Robin Murphy 提交于
      TLB synchronisation typically involves the SMMU blocking all incoming
      transactions until the TLBs report completion of all outstanding
      operations. In the common SMMUv2 configuration of a single distributed
      SMMU serving multiple peripherals, that means that a single unmap
      request has the potential to bring the hammer down on the entire system
      if synchronised globally. Since stage 1 contexts, and stage 2 contexts
      under SMMUv2, offer local sync operations, let's make use of those
      wherever we can in the hope of minimising global disruption.
      
      To that end, rather than add any more branches to the already unwieldy
      monolithic TLB maintenance ops, break them up into smaller, neater,
      functions which we can then mix and match as appropriate.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      11febfca
    • R
      iommu/arm-smmu: Tidy up context bank indexing · 452107c7
      Robin Murphy 提交于
      ARM_AMMU_CB() is calculated relative to ARM_SMMU_CB_BASE(), but the
      latter is never of use on its own, and what we end up with is the same
      ARM_SMMU_CB_BASE() + ARM_AMMU_CB() expression being duplicated at every
      callsite. Folding the two together gives us a self-contained context
      bank accessor which is much more pleasant to work with.
      
      Secondly, we might as well simplify CB_BASE itself at the same time.
      We use the address space size for its own sake precisely once, at probe
      time, and every other usage is to dynamically calculate CB_BASE over
      and over and over again. Let's flip things around so that we just
      maintain the CB_BASE address directly.
      Reviewed-by: NJordan Crouse <jcrouse@codeaurora.org>
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      452107c7
    • R
      iommu/arm-smmu: Simplify ASID/VMID handling · 280b683c
      Robin Murphy 提交于
      Calculating ASIDs/VMIDs dynamically from arm_smmu_cfg was a neat trick,
      but the global uniqueness workaround makes it somewhat more awkward, and
      means we end up having to pass extra state around in certain cases just
      to keep a handle on the offset.
      
      We already have 16 bits going spare in arm_smmu_cfg; let's just
      precalculate an ASID/VMID, plop it in there, and tidy up the users
      accordingly. We'd also need something like this anyway if we ever get
      near to thinking about SVM, so it's no bad thing.
      Reviewed-by: NJordan Crouse <jcrouse@codeaurora.org>
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      280b683c
    • S
      iommu/arm-smmu: Fix 16-bit ASID configuration · 125458ab
      Sunil Goutham 提交于
      16-bit ASID should be enabled before initializing TTBR0/1,
      otherwise only LSB 8-bit ASID will be considered. Hence
      moving configuration of TTBCR register ahead of TTBR0/1
      while initializing context bank.
      Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
      [will: rewrote comment]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      125458ab