1. 15 10月, 2019 3 次提交
    • T
      iommu/amd: Convert AMD iommu driver to the dma-iommu api · be62dbf5
      Tom Murphy 提交于
      Convert the AMD iommu driver to the dma-iommu api. Remove the iova
      handling and reserve region code from the AMD iommu driver.
      Signed-off-by: NTom Murphy <murphyt7@tcd.ie>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      be62dbf5
    • T
      iommu: Add gfp parameter to iommu_ops::map · 781ca2de
      Tom Murphy 提交于
      Add a gfp_t parameter to the iommu_ops::map function.
      Remove the needless locking in the AMD iommu driver.
      
      The iommu_ops::map function (or the iommu_map function which calls it)
      was always supposed to be sleepable (according to Joerg's comment in
      this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so
      should probably have had a "might_sleep()" since it was written. However
      currently the dma-iommu api can call iommu_map in an atomic context,
      which it shouldn't do. This doesn't cause any problems because any iommu
      driver which uses the dma-iommu api uses gfp_atomic in it's
      iommu_ops::map function. But doing this wastes the memory allocators
      atomic pools.
      Signed-off-by: NTom Murphy <murphyt7@tcd.ie>
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      781ca2de
    • T
      iommu/amd: Remove unnecessary locking from AMD iommu driver · 37ec8eb8
      Tom Murphy 提交于
      With or without locking it doesn't make sense for two writers to be
      writing to the same IOVA range at the same time. Even with locking we
      still have a race condition, whoever gets the lock first, so we still
      can't be sure what the result will be. With locking the result will be
      more sane, it will be correct for the last writer, but still useless
      because we can't be sure which writer will get the lock last. It's a
      fundamentally broken design to have two writers writing to the same
      IOVA range at the same time.
      
      So we can remove the locking and work on the assumption that no two
      writers will be writing to the same IOVA range at the same time.
      
      The only exception is when we have to allocate a middle page in the page
      tables, the middle page can cover more than just the IOVA range a writer
      has been allocated. However this isn't an issue in the AMD driver
      because it can atomically allocate middle pages using "cmpxchg64()".
      Signed-off-by: NTom Murphy <murphyt7@tcd.ie>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      37ec8eb8
  2. 28 9月, 2019 6 次提交
  3. 24 9月, 2019 5 次提交
  4. 06 9月, 2019 2 次提交
    • J
      iommu/amd: Fix race in increase_address_space() · 754265bc
      Joerg Roedel 提交于
      After the conversion to lock-less dma-api call the
      increase_address_space() function can be called without any
      locking. Multiple CPUs could potentially race for increasing
      the address space, leading to invalid domain->mode settings
      and invalid page-tables. This has been happening in the wild
      under high IO load and memory pressure.
      
      Fix the race by locking this operation. The function is
      called infrequently so that this does not introduce
      a performance regression in the dma-api path again.
      Reported-by: NQian Cai <cai@lca.pw>
      Fixes: 256e4621 ('iommu/amd: Make use of the generic IOVA allocator')
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      754265bc
    • S
      iommu/amd: Flush old domains in kdump kernel · 36b7200f
      Stuart Hayes 提交于
      When devices are attached to the amd_iommu in a kdump kernel, the old device
      table entries (DTEs), which were copied from the crashed kernel, will be
      overwritten with a new domain number.  When the new DTE is written, the IOMMU
      is told to flush the DTE from its internal cache--but it is not told to flush
      the translation cache entries for the old domain number.
      
      Without this patch, AMD systems using the tg3 network driver fail when kdump
      tries to save the vmcore to a network system, showing network timeouts and
      (sometimes) IOMMU errors in the kernel log.
      
      This patch will flush IOMMU translation cache entries for the old domain when
      a DTE gets overwritten with a new domain number.
      Signed-off-by: NStuart Hayes <stuart.w.hayes@gmail.com>
      Fixes: 3ac3e5ee ('iommu/amd: Copy old trans table from old kernel')
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      36b7200f
  5. 04 9月, 2019 1 次提交
    • C
      dma-mapping: explicitly wire up ->mmap and ->get_sgtable · f9f3232a
      Christoph Hellwig 提交于
      While the default ->mmap and ->get_sgtable implementations work for the
      majority of our dma_map_ops impementations they are inherently safe
      for others that don't use the page allocator or CMA and/or use their
      own way of remapping not covered by the common code.  So remove the
      defaults if these methods are not wired up, but instead wire up the
      default implementations for all safe instances.
      
      Fixes: e1c7e324 ("dma-mapping: always provide the dma_map_ops based implementation")
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      f9f3232a
  6. 30 8月, 2019 1 次提交
    • Q
      iommu/amd: Silence warnings under memory pressure · 3d708895
      Qian Cai 提交于
      When running heavy memory pressure workloads, the system is throwing
      endless warnings,
      
      smartpqi 0000:23:00.0: AMD-Vi: IOMMU mapping error in map_sg (io-pages:
      5 reason: -12)
      Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40
      07/10/2019
      swapper/10: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC),
      nodemask=(null),cpuset=/,mems_allowed=0,4
      Call Trace:
       <IRQ>
       dump_stack+0x62/0x9a
       warn_alloc.cold.43+0x8a/0x148
       __alloc_pages_nodemask+0x1a5c/0x1bb0
       get_zeroed_page+0x16/0x20
       iommu_map_page+0x477/0x540
       map_sg+0x1ce/0x2f0
       scsi_dma_map+0xc6/0x160
       pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470 [smartpqi]
       do_IRQ+0x81/0x170
       common_interrupt+0xf/0xf
       </IRQ>
      
      because the allocation could fail from iommu_map_page(), and the volume
      of this call could be huge which may generate a lot of serial console
      output and cosumes all CPUs.
      
      Fix it by silencing the warning in this call site, and there is still a
      dev_err() later to notify the failure.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      3d708895
  7. 23 8月, 2019 1 次提交
  8. 09 8月, 2019 1 次提交
  9. 30 7月, 2019 1 次提交
    • W
      iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync() · 56f8af5e
      Will Deacon 提交于
      To allow IOMMU drivers to batch up TLB flushing operations and postpone
      them until ->iotlb_sync() is called, extend the prototypes for the
      ->unmap() and ->iotlb_sync() IOMMU ops callbacks to take a pointer to
      the current iommu_iotlb_gather structure.
      
      All affected IOMMU drivers are updated, but there should be no
      functional change since the extra parameter is ignored for now.
      Signed-off-by: NWill Deacon <will@kernel.org>
      56f8af5e
  10. 24 7月, 2019 1 次提交
    • W
      iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops · 6d1bcb95
      Will Deacon 提交于
      Commit add02cfd ("iommu: Introduce Interface for IOMMU TLB Flushing")
      added three new TLB flushing operations to the IOMMU API so that the
      underlying driver operations can be batched when unmapping large regions
      of IO virtual address space.
      
      However, the ->iotlb_range_add() callback has not been implemented by
      any IOMMU drivers (amd_iommu.c implements it as an empty function, which
      incurs the overhead of an indirect branch). Instead, drivers either flush
      the entire IOTLB in the ->iotlb_sync() callback or perform the necessary
      invalidation during ->unmap().
      
      Attempting to implement ->iotlb_range_add() for arm-smmu-v3.c revealed
      two major issues:
      
        1. The page size used to map the region in the page-table is not known,
           and so it is not generally possible to issue TLB flushes in the most
           efficient manner.
      
        2. The only mutable state passed to the callback is a pointer to the
           iommu_domain, which can be accessed concurrently and therefore
           requires expensive synchronisation to keep track of the outstanding
           flushes.
      
      Remove the callback entirely in preparation for extending ->unmap() and
      ->iotlb_sync() to update a token on the caller's stack.
      Signed-off-by: NWill Deacon <will@kernel.org>
      6d1bcb95
  11. 22 7月, 2019 1 次提交
    • Q
      iommu/amd: fix a crash in iova_magazine_free_pfns · 8cf66504
      Qian Cai 提交于
      The commit b3aa14f0 ("iommu: remove the mapping_error dma_map_ops
      method") incorrectly changed the checking from dma_ops_alloc_iova() in
      map_sg() causes a crash under memory pressure as dma_ops_alloc_iova()
      never return DMA_MAPPING_ERROR on failure but 0, so the error handling
      is all wrong.
      
         kernel BUG at drivers/iommu/iova.c:801!
          Workqueue: kblockd blk_mq_run_work_fn
          RIP: 0010:iova_magazine_free_pfns+0x7d/0xc0
          Call Trace:
           free_cpu_cached_iovas+0xbd/0x150
           alloc_iova_fast+0x8c/0xba
           dma_ops_alloc_iova.isra.6+0x65/0xa0
           map_sg+0x8c/0x2a0
           scsi_dma_map+0xc6/0x160
           pqi_aio_submit_io+0x1f6/0x440 [smartpqi]
           pqi_scsi_queue_command+0x90c/0xdd0 [smartpqi]
           scsi_queue_rq+0x79c/0x1200
           blk_mq_dispatch_rq_list+0x4dc/0xb70
           blk_mq_sched_dispatch_requests+0x249/0x310
           __blk_mq_run_hw_queue+0x128/0x200
           blk_mq_run_work_fn+0x27/0x30
           process_one_work+0x522/0xa10
           worker_thread+0x63/0x5b0
           kthread+0x1d2/0x1f0
           ret_from_fork+0x22/0x40
      
      Fixes: b3aa14f0 ("iommu: remove the mapping_error dma_map_ops method")
      Signed-off-by: NQian Cai <cai@lca.pw>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8cf66504
  12. 01 7月, 2019 1 次提交
  13. 05 6月, 2019 1 次提交
  14. 28 5月, 2019 1 次提交
  15. 27 5月, 2019 1 次提交
  16. 07 5月, 2019 1 次提交
  17. 06 5月, 2019 1 次提交
  18. 03 5月, 2019 1 次提交
  19. 30 4月, 2019 1 次提交
  20. 26 4月, 2019 1 次提交
  21. 11 4月, 2019 2 次提交
  22. 30 3月, 2019 1 次提交
  23. 18 3月, 2019 1 次提交
    • S
      iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE · 4e50ce03
      Stanislaw Gruszka 提交于
      Take into account that sg->offset can be bigger than PAGE_SIZE when
      setting segment sg->dma_address. Otherwise sg->dma_address will point
      at diffrent page, what makes DMA not possible with erros like this:
      
      xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa70c0 flags=0x0020]
      xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7040 flags=0x0020]
      xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7080 flags=0x0020]
      xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7100 flags=0x0020]
      xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7000 flags=0x0020]
      
      Additinally with wrong sg->dma_address unmap_sg will free wrong pages,
      what what can cause crashes like this:
      
      Feb 28 19:27:45 kernel: BUG: Bad page state in process cinnamon  pfn:39e8b1
      Feb 28 19:27:45 kernel: Disabling lock debugging due to kernel taint
      Feb 28 19:27:45 kernel: flags: 0x2ffff0000000000()
      Feb 28 19:27:45 kernel: raw: 02ffff0000000000 0000000000000000 ffffffff00000301 0000000000000000
      Feb 28 19:27:45 kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
      Feb 28 19:27:45 kernel: page dumped because: nonzero _refcount
      Feb 28 19:27:45 kernel: Modules linked in: ccm fuse arc4 nct6775 hwmon_vid amdgpu nls_iso8859_1 nls_cp437 edac_mce_amd vfat fat kvm_amd ccp rng_core kvm mt76x0u mt76x0_common mt76x02_usb irqbypass mt76_usb mt76x02_lib mt76 crct10dif_pclmul crc32_pclmul chash mac80211 amd_iommu_v2 ghash_clmulni_intel gpu_sched i2c_algo_bit ttm wmi_bmof snd_hda_codec_realtek snd_hda_codec_generic drm_kms_helper snd_hda_codec_hdmi snd_hda_intel drm snd_hda_codec aesni_intel snd_hda_core snd_hwdep aes_x86_64 crypto_simd snd_pcm cfg80211 cryptd mousedev snd_timer glue_helper pcspkr r8169 input_leds realtek agpgart libphy rfkill snd syscopyarea sysfillrect sysimgblt fb_sys_fops soundcore sp5100_tco k10temp i2c_piix4 wmi evdev gpio_amdpt pinctrl_amd mac_hid pcc_cpufreq acpi_cpufreq sg ip_tables x_tables ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) sd_mod(E) hid_generic(E) usbhid(E) hid(E) dm_mod(E) serio_raw(E) atkbd(E) libps2(E) crc32c_intel(E) ahci(E) libahci(E) libata(E) xhci_pci(E) xhci_hcd(E)
      Feb 28 19:27:45 kernel:  scsi_mod(E) i8042(E) serio(E) bcache(E) crc64(E)
      Feb 28 19:27:45 kernel: CPU: 2 PID: 896 Comm: cinnamon Tainted: G    B   W   E     4.20.12-arch1-1-custom #1
      Feb 28 19:27:45 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450M Pro4, BIOS P1.20 06/26/2018
      Feb 28 19:27:45 kernel: Call Trace:
      Feb 28 19:27:45 kernel:  dump_stack+0x5c/0x80
      Feb 28 19:27:45 kernel:  bad_page.cold.29+0x7f/0xb2
      Feb 28 19:27:45 kernel:  __free_pages_ok+0x2c0/0x2d0
      Feb 28 19:27:45 kernel:  skb_release_data+0x96/0x180
      Feb 28 19:27:45 kernel:  __kfree_skb+0xe/0x20
      Feb 28 19:27:45 kernel:  tcp_recvmsg+0x894/0xc60
      Feb 28 19:27:45 kernel:  ? reuse_swap_page+0x120/0x340
      Feb 28 19:27:45 kernel:  ? ptep_set_access_flags+0x23/0x30
      Feb 28 19:27:45 kernel:  inet_recvmsg+0x5b/0x100
      Feb 28 19:27:45 kernel:  __sys_recvfrom+0xc3/0x180
      Feb 28 19:27:45 kernel:  ? handle_mm_fault+0x10a/0x250
      Feb 28 19:27:45 kernel:  ? syscall_trace_enter+0x1d3/0x2d0
      Feb 28 19:27:45 kernel:  ? __audit_syscall_exit+0x22a/0x290
      Feb 28 19:27:45 kernel:  __x64_sys_recvfrom+0x24/0x30
      Feb 28 19:27:45 kernel:  do_syscall_64+0x5b/0x170
      Feb 28 19:27:45 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Cc: stable@vger.kernel.org
      Reported-and-tested-by: NJan Viktorin <jan.viktorin@gmail.com>
      Reviewed-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Fixes: 80187fd3 ('iommu/amd: Optimize map_sg and unmap_sg')
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      4e50ce03
  24. 15 3月, 2019 1 次提交
  25. 11 2月, 2019 1 次提交
  26. 31 1月, 2019 1 次提交
  27. 24 1月, 2019 1 次提交
    • S
      iommu/amd: Fix IOMMU page flush when detach device from a domain · 9825bd94
      Suravee Suthikulpanit 提交于
      When a VM is terminated, the VFIO driver detaches all pass-through
      devices from VFIO domain by clearing domain id and page table root
      pointer from each device table entry (DTE), and then invalidates
      the DTE. Then, the VFIO driver unmap pages and invalidate IOMMU pages.
      
      Currently, the IOMMU driver keeps track of which IOMMU and how many
      devices are attached to the domain. When invalidate IOMMU pages,
      the driver checks if the IOMMU is still attached to the domain before
      issuing the invalidate page command.
      
      However, since VFIO has already detached all devices from the domain,
      the subsequent INVALIDATE_IOMMU_PAGES commands are being skipped as
      there is no IOMMU attached to the domain. This results in data
      corruption and could cause the PCI device to end up in indeterministic
      state.
      
      Fix this by invalidate IOMMU pages when detach a device, and
      before decrementing the per-domain device reference counts.
      
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Suggested-by: NJoerg Roedel <joro@8bytes.org>
      Co-developed-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Fixes: 6de8ad9b ('x86/amd-iommu: Make iommu_flush_pages aware of multiple IOMMUs')
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      9825bd94