1. 05 11月, 2019 7 次提交
    • R
      iommu/io-pgtable-arm: Simplify level indexing · 5fb190b0
      Robin Murphy 提交于
      The nature of the LPAE format means that data->pg_shift is always
      redundant with data->bits_per_level, since they represent the size of a
      page and the number of PTEs per page respectively, and the size of a PTE
      is constant. Thus it works out more efficient to only store the latter,
      and derive the former via a trivial addition where necessary.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      [will: Reworked granule check in iopte_to_paddr()]
      Signed-off-by: NWill Deacon <will@kernel.org>
      5fb190b0
    • R
      iommu/io-pgtable-arm: Simplify PGD size handling · c79278c1
      Robin Murphy 提交于
      We use data->pgd_size directly for the one-off allocation and freeing of
      the top-level table, but otherwise it serves for ARM_LPAE_PGD_IDX() to
      repeatedly re-calculate the effective number of top-level address bits
      it represents. Flip this around so we store the form we most commonly
      need, and derive the lesser-used one instead. This cuts a whole bunch of
      code out of the map/unmap/iova_to_phys fast-paths.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      c79278c1
    • R
      iommu/io-pgtable-arm: Simplify start level lookup · 594ab90f
      Robin Murphy 提交于
      Beyond a couple of allocation-time calculations, data->levels is only
      ever used to derive the start level. Storing the start level directly
      leads to a small reduction in object code, which should help eke out a
      little more efficiency, and slightly more readable source to boot.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      594ab90f
    • R
      iommu/io-pgtable-arm: Simplify bounds checks · 67f3e53d
      Robin Murphy 提交于
      We're merely checking that the relevant upper bits of each address
      are all zero, so there are cheaper ways to achieve that.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      67f3e53d
    • R
      iommu/io-pgtable-arm: Rationalise size check · f7b90d2c
      Robin Murphy 提交于
      It makes little sense to only validate the requested size after we think
      we've found a matching block size - making the check up-front is simple,
      and far more logical than waiting to walk off the bottom of the table to
      infer that we must have been passed a bogus size to start with.
      
      We're missing an equivalent check on the unmap path, so add that as well
      for consistency.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      f7b90d2c
    • R
      iommu/io-pgtable: Make selftest gubbins consistently __init · b5813c16
      Robin Murphy 提交于
      The selftests run as an initcall, but the annotation of the various
      callbacks and data seems to be somewhat arbitrary. Add it consistently
      for everything related to the selftests.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      b5813c16
    • V
      iommu: arm-smmu-impl: Add sdm845 implementation hook · 759aaa10
      Vivek Gautam 提交于
      Add reset hook for sdm845 based platforms to turn off
      the wait-for-safe sequence.
      
      Understanding how wait-for-safe logic affects USB and UFS performance
      on MTP845 and DB845 boards:
      
      Qcom's implementation of arm,mmu-500 adds a WAIT-FOR-SAFE logic
      to address under-performance issues in real-time clients, such as
      Display, and Camera.
      On receiving an invalidation requests, the SMMU forwards SAFE request
      to these clients and waits for SAFE ack signal from real-time clients.
      The SAFE signal from such clients is used to qualify the start of
      invalidation.
      This logic is controlled by chicken bits, one for each - MDP (display),
      IFE0, and IFE1 (camera), that can be accessed only from secure software
      on sdm845.
      
      This configuration, however, degrades the performance of non-real time
      clients, such as USB, and UFS etc. This happens because, with wait-for-safe
      logic enabled the hardware tries to throttle non-real time clients while
      waiting for SAFE ack signals from real-time clients.
      
      On mtp845 and db845 devices, with wait-for-safe logic enabled by the
      bootloaders we see degraded performance of USB and UFS when kernel
      enables the smmu stage-1 translations for these clients.
      Turn off this wait-for-safe logic from the kernel gets us back the perf
      of USB and UFS devices until we re-visit this when we start seeing perf
      issues on display/camera on upstream supported SDM845 platforms.
      The bootloaders on these boards implement secure monitor callbacks to
      handle a specific command - QCOM_SCM_SVC_SMMU_PROGRAM with which the
      logic can be toggled.
      
      There are other boards such as cheza whose bootloaders don't enable this
      logic. Such boards don't implement callbacks to handle the specific SCM
      call so disabling this logic for such boards will be a no-op.
      
      This change is inspired by the downstream change from Patrick Daly
      to address performance issues with display and camera by handling
      this wait-for-safe within separte io-pagetable ops to do TLB
      maintenance. So a big thanks to him for the change and for all the
      offline discussions.
      
      Without this change the UFS reads are pretty slow:
      $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=10 conv=sync
      10+0 records in
      10+0 records out
      10485760 bytes (10.0MB) copied, 22.394903 seconds, 457.2KB/s
      real    0m 22.39s
      user    0m 0.00s
      sys     0m 0.01s
      
      With this change they are back to rock!
      $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=300 conv=sync
      300+0 records in
      300+0 records out
      314572800 bytes (300.0MB) copied, 1.030541 seconds, 291.1MB/s
      real    0m 1.03s
      user    0m 0.00s
      sys     0m 0.54s
      Signed-off-by: NVivek Gautam <vivek.gautam@codeaurora.org>
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      Reviewed-by: NStephen Boyd <swboyd@chromium.org>
      Reviewed-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: NSai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
      Signed-off-by: NWill Deacon <will@kernel.org>
      759aaa10
  2. 02 11月, 2019 1 次提交
  3. 01 10月, 2019 10 次提交
  4. 28 9月, 2019 6 次提交
  5. 24 9月, 2019 5 次提交
  6. 14 9月, 2019 1 次提交
  7. 11 9月, 2019 6 次提交
  8. 06 9月, 2019 3 次提交
    • A
      iommu/omap: Mark pm functions __maybe_unused · 96088a20
      Arnd Bergmann 提交于
      The runtime_pm functions are unused when CONFIG_PM is disabled:
      
      drivers/iommu/omap-iommu.c:1022:12: error: unused function 'omap_iommu_runtime_suspend' [-Werror,-Wunused-function]
      static int omap_iommu_runtime_suspend(struct device *dev)
      drivers/iommu/omap-iommu.c:1064:12: error: unused function 'omap_iommu_runtime_resume' [-Werror,-Wunused-function]
      static int omap_iommu_runtime_resume(struct device *dev)
      
      Mark them as __maybe_unused to let gcc silently drop them
      instead of warning.
      
      Fixes: db8918f6 ("iommu/omap: streamline enable/disable through runtime pm callbacks")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NSuman Anna <s-anna@ti.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      96088a20
    • J
      iommu/amd: Fix race in increase_address_space() · 754265bc
      Joerg Roedel 提交于
      After the conversion to lock-less dma-api call the
      increase_address_space() function can be called without any
      locking. Multiple CPUs could potentially race for increasing
      the address space, leading to invalid domain->mode settings
      and invalid page-tables. This has been happening in the wild
      under high IO load and memory pressure.
      
      Fix the race by locking this operation. The function is
      called infrequently so that this does not introduce
      a performance regression in the dma-api path again.
      Reported-by: NQian Cai <cai@lca.pw>
      Fixes: 256e4621 ('iommu/amd: Make use of the generic IOVA allocator')
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      754265bc
    • S
      iommu/amd: Flush old domains in kdump kernel · 36b7200f
      Stuart Hayes 提交于
      When devices are attached to the amd_iommu in a kdump kernel, the old device
      table entries (DTEs), which were copied from the crashed kernel, will be
      overwritten with a new domain number.  When the new DTE is written, the IOMMU
      is told to flush the DTE from its internal cache--but it is not told to flush
      the translation cache entries for the old domain number.
      
      Without this patch, AMD systems using the tg3 network driver fail when kdump
      tries to save the vmcore to a network system, showing network timeouts and
      (sometimes) IOMMU errors in the kernel log.
      
      This patch will flush IOMMU translation cache entries for the old domain when
      a DTE gets overwritten with a new domain number.
      Signed-off-by: NStuart Hayes <stuart.w.hayes@gmail.com>
      Fixes: 3ac3e5ee ('iommu/amd: Copy old trans table from old kernel')
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      36b7200f
  9. 05 9月, 2019 1 次提交