1. 11 11月, 2019 2 次提交
    • D
      iommu/vt-d: Turn off translations at shutdown · 6c3a44ed
      Deepa Dinamani 提交于
      The intel-iommu driver assumes that the iommu state is
      cleaned up at the start of the new kernel.
      But, when we try to kexec boot something other than the
      Linux kernel, the cleanup cannot be relied upon.
      Hence, cleanup before we go down for reboot.
      
      Keeping the cleanup at initialization also, in case BIOS
      leaves the IOMMU enabled.
      
      I considered turning off iommu only during kexec reboot, but a clean
      shutdown seems always a good idea. But if someone wants to make it
      conditional, such as VMM live update, we can do that.  There doesn't
      seem to be such a condition at this time.
      
      Tested that before, the info message
      'DMAR: Translation was enabled for <iommu> but we are not in kdump mode'
      would be reported for each iommu. The message will not appear when the
      DMA-remapping is not enabled on entry to the kernel.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      6c3a44ed
    • Y
      iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved · f036c7fa
      Yian Chen 提交于
      VT-d RMRR (Reserved Memory Region Reporting) regions are reserved
      for device use only and should not be part of allocable memory pool of OS.
      
      BIOS e820_table reports complete memory map to OS, including OS usable
      memory ranges and BIOS reserved memory ranges etc.
      
      x86 BIOS may not be trusted to include RMRR regions as reserved type
      of memory in its e820 memory map, hence validate every RMRR entry
      with the e820 memory map to make sure the RMRR regions will not be
      used by OS for any other purposes.
      
      ia64 EFI is working fine so implement RMRR validation as a dummy function
      Reviewed-by: NLu Baolu <baolu.lu@linux.intel.com>
      Reviewed-by: NSohil Mehta <sohil.mehta@intel.com>
      Signed-off-by: NYian Chen <yian.chen@intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      f036c7fa
  2. 30 10月, 2019 1 次提交
    • J
      iommu/vt-d: Fix panic after kexec -p for kdump · 160c63f9
      John Donnelly 提交于
      This cures a panic on restart after a kexec operation on 5.3 and 5.4
      kernels.
      
      The underlying state of the iommu registers (iommu->flags &
      VTD_FLAG_TRANS_PRE_ENABLED) on a restart results in a domain being marked as
      "DEFER_DEVICE_DOMAIN_INFO" that produces an Oops in identity_mapping().
      
      [   43.654737] BUG: kernel NULL pointer dereference, address:
      0000000000000056
      [   43.655720] #PF: supervisor read access in kernel mode
      [   43.655720] #PF: error_code(0x0000) - not-present page
      [   43.655720] PGD 0 P4D 0
      [   43.655720] Oops: 0000 [#1] SMP PTI
      [   43.655720] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
      5.3.2-1940.el8uek.x86_64 #1
      [   43.655720] Hardware name: Oracle Corporation ORACLE SERVER
      X5-2/ASM,MOTHERBOARD,1U, BIOS 30140300 09/20/2018
      [   43.655720] RIP: 0010:iommu_need_mapping+0x29/0xd0
      [   43.655720] Code: 00 0f 1f 44 00 00 48 8b 97 70 02 00 00 48 83 fa ff
      74 53 48 8d 4a ff b8 01 00 00 00 48 83 f9 fd 76 01 c3 48 8b 35 7f 58 e0
      01 <48> 39 72 58 75 f2 55 48 89 e5 41 54 53 48 8b 87 28 02 00 00 4c 8b
      [   43.655720] RSP: 0018:ffffc9000001b9b0 EFLAGS: 00010246
      [   43.655720] RAX: 0000000000000001 RBX: 0000000000001000 RCX:
      fffffffffffffffd
      [   43.655720] RDX: fffffffffffffffe RSI: ffff8880719b8000 RDI:
      ffff8880477460b0
      [   43.655720] RBP: ffffc9000001b9e8 R08: 0000000000000000 R09:
      ffff888047c01700
      [   43.655720] R10: 00002194036fc692 R11: 0000000000000000 R12:
      0000000000000000
      [   43.655720] R13: ffff8880477460b0 R14: 0000000000000cc0 R15:
      ffff888072d2b558
      [   43.655720] FS:  0000000000000000(0000) GS:ffff888071c00000(0000)
      knlGS:0000000000000000
      [   43.655720] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   43.655720] CR2: 0000000000000056 CR3: 000000007440a002 CR4:
      00000000001606b0
      [   43.655720] Call Trace:
      [   43.655720]  ? intel_alloc_coherent+0x2a/0x180
      [   43.655720]  ? __schedule+0x2c2/0x650
      [   43.655720]  dma_alloc_attrs+0x8c/0xd0
      [   43.655720]  dma_pool_alloc+0xdf/0x200
      [   43.655720]  ehci_qh_alloc+0x58/0x130
      [   43.655720]  ehci_setup+0x287/0x7ba
      [   43.655720]  ? _dev_info+0x6c/0x83
      [   43.655720]  ehci_pci_setup+0x91/0x436
      [   43.655720]  usb_add_hcd.cold.48+0x1d4/0x754
      [   43.655720]  usb_hcd_pci_probe+0x2bc/0x3f0
      [   43.655720]  ehci_pci_probe+0x39/0x40
      [   43.655720]  local_pci_probe+0x47/0x80
      [   43.655720]  pci_device_probe+0xff/0x1b0
      [   43.655720]  really_probe+0xf5/0x3a0
      [   43.655720]  driver_probe_device+0xbb/0x100
      [   43.655720]  device_driver_attach+0x58/0x60
      [   43.655720]  __driver_attach+0x8f/0x150
      [   43.655720]  ? device_driver_attach+0x60/0x60
      [   43.655720]  bus_for_each_dev+0x74/0xb0
      [   43.655720]  driver_attach+0x1e/0x20
      [   43.655720]  bus_add_driver+0x151/0x1f0
      [   43.655720]  ? ehci_hcd_init+0xb2/0xb2
      [   43.655720]  ? do_early_param+0x95/0x95
      [   43.655720]  driver_register+0x70/0xc0
      [   43.655720]  ? ehci_hcd_init+0xb2/0xb2
      [   43.655720]  __pci_register_driver+0x57/0x60
      [   43.655720]  ehci_pci_init+0x6a/0x6c
      [   43.655720]  do_one_initcall+0x4a/0x1fa
      [   43.655720]  ? do_early_param+0x95/0x95
      [   43.655720]  kernel_init_freeable+0x1bd/0x262
      [   43.655720]  ? rest_init+0xb0/0xb0
      [   43.655720]  kernel_init+0xe/0x110
      [   43.655720]  ret_from_fork+0x24/0x50
      
      Fixes: 8af46c78 ("iommu/vt-d: Implement is_attach_deferred iommu ops entry")
      Cc: stable@vger.kernel.org # v5.3+
      Signed-off-by: NJohn Donnelly <john.p.donnelly@oracle.com>
      Reviewed-by: NLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      160c63f9
  3. 18 10月, 2019 1 次提交
  4. 15 10月, 2019 2 次提交
    • L
      iommu/vt-d: Refactor find_domain() helper · 1ee0186b
      Lu Baolu 提交于
      Current find_domain() helper checks and does the deferred domain
      attachment and return the domain in use. This isn't always the
      use case for the callers. Some callers only want to retrieve the
      current domain in use.
      
      This refactors find_domain() into two helpers: 1) find_domain()
      only returns the domain in use; 2) deferred_attach_domain() does
      the deferred domain attachment if required and return the domain
      in use.
      
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Kevin Tian <kevin.tian@intel.com>
      Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      1ee0186b
    • T
      iommu: Add gfp parameter to iommu_ops::map · 781ca2de
      Tom Murphy 提交于
      Add a gfp_t parameter to the iommu_ops::map function.
      Remove the needless locking in the AMD iommu driver.
      
      The iommu_ops::map function (or the iommu_map function which calls it)
      was always supposed to be sleepable (according to Joerg's comment in
      this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so
      should probably have had a "might_sleep()" since it was written. However
      currently the dma-iommu api can call iommu_map in an atomic context,
      which it shouldn't do. This doesn't cause any problems because any iommu
      driver which uses the dma-iommu api uses gfp_atomic in it's
      iommu_ops::map function. But doing this wastes the memory allocators
      atomic pools.
      Signed-off-by: NTom Murphy <murphyt7@tcd.ie>
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      781ca2de
  5. 11 9月, 2019 5 次提交
  6. 04 9月, 2019 1 次提交
    • C
      dma-mapping: explicitly wire up ->mmap and ->get_sgtable · f9f3232a
      Christoph Hellwig 提交于
      While the default ->mmap and ->get_sgtable implementations work for the
      majority of our dma_map_ops impementations they are inherently safe
      for others that don't use the page allocator or CMA and/or use their
      own way of remapping not covered by the common code.  So remove the
      defaults if these methods are not wired up, but instead wire up the
      default implementations for all safe instances.
      
      Fixes: e1c7e324 ("dma-mapping: always provide the dma_map_ops based implementation")
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      f9f3232a
  7. 30 8月, 2019 1 次提交
  8. 23 8月, 2019 1 次提交
  9. 09 8月, 2019 2 次提交
  10. 06 8月, 2019 1 次提交
  11. 30 7月, 2019 1 次提交
    • W
      iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync() · 56f8af5e
      Will Deacon 提交于
      To allow IOMMU drivers to batch up TLB flushing operations and postpone
      them until ->iotlb_sync() is called, extend the prototypes for the
      ->unmap() and ->iotlb_sync() IOMMU ops callbacks to take a pointer to
      the current iommu_iotlb_gather structure.
      
      All affected IOMMU drivers are updated, but there should be no
      functional change since the extra parameter is ignored for now.
      Signed-off-by: NWill Deacon <will@kernel.org>
      56f8af5e
  12. 22 7月, 2019 4 次提交
    • D
      iommu/vt-d: Check if domain->pgd was allocated · 3ee9eca7
      Dmitry Safonov 提交于
      There is a couple of places where on domain_init() failure domain_exit()
      is called. While currently domain_init() can fail only if
      alloc_pgtable_page() has failed.
      
      Make domain_exit() check if domain->pgd present, before calling
      domain_unmap(), as it theoretically should crash on clearing pte entries
      in dma_pte_clear_level().
      
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Lu Baolu <baolu.lu@linux.intel.com>
      Cc: iommu@lists.linux-foundation.org
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Reviewed-by: NLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      3ee9eca7
    • D
      iommu/vt-d: Don't queue_iova() if there is no flush queue · effa4678
      Dmitry Safonov 提交于
      Intel VT-d driver was reworked to use common deferred flushing
      implementation. Previously there was one global per-cpu flush queue,
      afterwards - one per domain.
      
      Before deferring a flush, the queue should be allocated and initialized.
      
      Currently only domains with IOMMU_DOMAIN_DMA type initialize their flush
      queue. It's probably worth to init it for static or unmanaged domains
      too, but it may be arguable - I'm leaving it to iommu folks.
      
      Prevent queuing an iova flush if the domain doesn't have a queue.
      The defensive check seems to be worth to keep even if queue would be
      initialized for all kinds of domains. And is easy backportable.
      
      On 4.19.43 stable kernel it has a user-visible effect: previously for
      devices in si domain there were crashes, on sata devices:
      
       BUG: spinlock bad magic on CPU#6, swapper/0/1
        lock: 0xffff88844f582008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
       CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #1
       Call Trace:
        <IRQ>
        dump_stack+0x61/0x7e
        spin_bug+0x9d/0xa3
        do_raw_spin_lock+0x22/0x8e
        _raw_spin_lock_irqsave+0x32/0x3a
        queue_iova+0x45/0x115
        intel_unmap+0x107/0x113
        intel_unmap_sg+0x6b/0x76
        __ata_qc_complete+0x7f/0x103
        ata_qc_complete+0x9b/0x26a
        ata_qc_complete_multiple+0xd0/0xe3
        ahci_handle_port_interrupt+0x3ee/0x48a
        ahci_handle_port_intr+0x73/0xa9
        ahci_single_level_irq_intr+0x40/0x60
        __handle_irq_event_percpu+0x7f/0x19a
        handle_irq_event_percpu+0x32/0x72
        handle_irq_event+0x38/0x56
        handle_edge_irq+0x102/0x121
        handle_irq+0x147/0x15c
        do_IRQ+0x66/0xf2
        common_interrupt+0xf/0xf
       RIP: 0010:__do_softirq+0x8c/0x2df
      
      The same for usb devices that use ehci-pci:
       BUG: spinlock bad magic on CPU#0, swapper/0/1
        lock: 0xffff88844f402008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #4
       Call Trace:
        <IRQ>
        dump_stack+0x61/0x7e
        spin_bug+0x9d/0xa3
        do_raw_spin_lock+0x22/0x8e
        _raw_spin_lock_irqsave+0x32/0x3a
        queue_iova+0x77/0x145
        intel_unmap+0x107/0x113
        intel_unmap_page+0xe/0x10
        usb_hcd_unmap_urb_setup_for_dma+0x53/0x9d
        usb_hcd_unmap_urb_for_dma+0x17/0x100
        unmap_urb_for_dma+0x22/0x24
        __usb_hcd_giveback_urb+0x51/0xc3
        usb_giveback_urb_bh+0x97/0xde
        tasklet_action_common.isra.4+0x5f/0xa1
        tasklet_action+0x2d/0x30
        __do_softirq+0x138/0x2df
        irq_exit+0x7d/0x8b
        smp_apic_timer_interrupt+0x10f/0x151
        apic_timer_interrupt+0xf/0x20
        </IRQ>
       RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x39
      
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Lu Baolu <baolu.lu@linux.intel.com>
      Cc: iommu@lists.linux-foundation.org
      Cc: <stable@vger.kernel.org> # 4.14+
      Fixes: 13cf0174 ("iommu/vt-d: Make use of iova deferred flushing")
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Reviewed-by: NLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      effa4678
    • L
      iommu/vt-d: Avoid duplicated pci dma alias consideration · 55752949
      Lu Baolu 提交于
      As we have abandoned the home-made lazy domain allocation
      and delegated the DMA domain life cycle up to the default
      domain mechanism defined in the generic iommu layer, we
      needn't consider pci alias anymore when mapping/unmapping
      the context entries. Without this fix, we see kernel NULL
      pointer dereference during pci device hot-plug test.
      
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Kevin Tian <kevin.tian@intel.com>
      Fixes: fa954e68 ("iommu/vt-d: Delegate the dma domain to upper layer")
      Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Reported-and-tested-by: NXu Pengfei <pengfei.xu@intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      55752949
    • J
      Revert "iommu/vt-d: Consolidate domain_init() to avoid duplication" · 301e7ee1
      Joerg Roedel 提交于
      This reverts commit 123b2ffc.
      
      This commit reportedly caused boot failures on some systems
      and needs to be reverted for now.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      301e7ee1
  13. 23 6月, 2019 1 次提交
    • P
      Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock" · 0aafc8ae
      Peter Xu 提交于
      This reverts commit 7560cc3c.
      
      With 5.2.0-rc5 I can easily trigger this with lockdep and iommu=pt:
      
          ======================================================
          WARNING: possible circular locking dependency detected
          5.2.0-rc5 #78 Not tainted
          ------------------------------------------------------
          swapper/0/1 is trying to acquire lock:
          00000000ea2b3beb (&(&iommu->lock)->rlock){+.+.}, at: domain_context_mapping_one+0xa5/0x4e0
          but task is already holding lock:
          00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
          which lock already depends on the new lock.
          the existing dependency chain (in reverse order) is:
          -> #1 (device_domain_lock){....}:
                 _raw_spin_lock_irqsave+0x3c/0x50
                 dmar_insert_one_dev_info+0xbb/0x510
                 domain_add_dev_info+0x50/0x90
                 dev_prepare_static_identity_mapping+0x30/0x68
                 intel_iommu_init+0xddd/0x1422
                 pci_iommu_init+0x16/0x3f
                 do_one_initcall+0x5d/0x2b4
                 kernel_init_freeable+0x218/0x2c1
                 kernel_init+0xa/0x100
                 ret_from_fork+0x3a/0x50
          -> #0 (&(&iommu->lock)->rlock){+.+.}:
                 lock_acquire+0x9e/0x170
                 _raw_spin_lock+0x25/0x30
                 domain_context_mapping_one+0xa5/0x4e0
                 pci_for_each_dma_alias+0x30/0x140
                 dmar_insert_one_dev_info+0x3b2/0x510
                 domain_add_dev_info+0x50/0x90
                 dev_prepare_static_identity_mapping+0x30/0x68
                 intel_iommu_init+0xddd/0x1422
                 pci_iommu_init+0x16/0x3f
                 do_one_initcall+0x5d/0x2b4
                 kernel_init_freeable+0x218/0x2c1
                 kernel_init+0xa/0x100
                 ret_from_fork+0x3a/0x50
      
          other info that might help us debug this:
           Possible unsafe locking scenario:
                 CPU0                    CPU1
                 ----                    ----
            lock(device_domain_lock);
                                         lock(&(&iommu->lock)->rlock);
                                         lock(device_domain_lock);
            lock(&(&iommu->lock)->rlock);
      
           *** DEADLOCK ***
          2 locks held by swapper/0/1:
           #0: 00000000033eb13d (dmar_global_lock){++++}, at: intel_iommu_init+0x1e0/0x1422
           #1: 00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
      
          stack backtrace:
          CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5 #78
          Hardware name: LENOVO 20KGS35G01/20KGS35G01, BIOS N23ET50W (1.25 ) 06/25/2018
          Call Trace:
           dump_stack+0x85/0xc0
           print_circular_bug.cold.57+0x15c/0x195
           __lock_acquire+0x152a/0x1710
           lock_acquire+0x9e/0x170
           ? domain_context_mapping_one+0xa5/0x4e0
           _raw_spin_lock+0x25/0x30
           ? domain_context_mapping_one+0xa5/0x4e0
           domain_context_mapping_one+0xa5/0x4e0
           ? domain_context_mapping_one+0x4e0/0x4e0
           pci_for_each_dma_alias+0x30/0x140
           dmar_insert_one_dev_info+0x3b2/0x510
           domain_add_dev_info+0x50/0x90
           dev_prepare_static_identity_mapping+0x30/0x68
           intel_iommu_init+0xddd/0x1422
           ? printk+0x58/0x6f
           ? lockdep_hardirqs_on+0xf0/0x180
           ? do_early_param+0x8e/0x8e
           ? e820__memblock_setup+0x63/0x63
           pci_iommu_init+0x16/0x3f
           do_one_initcall+0x5d/0x2b4
           ? do_early_param+0x8e/0x8e
           ? rcu_read_lock_sched_held+0x55/0x60
           ? do_early_param+0x8e/0x8e
           kernel_init_freeable+0x218/0x2c1
           ? rest_init+0x230/0x230
           kernel_init+0xa/0x100
           ret_from_fork+0x3a/0x50
      
      domain_context_mapping_one() is taking device_domain_lock first then
      iommu lock, while dmar_insert_one_dev_info() is doing the reverse.
      
      That should be introduced by commit:
      
      7560cc3c ("iommu/vt-d: Fix lock inversion between iommu->lock and
                    device_domain_lock", 2019-05-27)
      
      So far I still cannot figure out how the previous deadlock was
      triggered (I cannot find iommu lock taken before calling of
      iommu_flush_dev_iotlb()), however I'm pretty sure that that change
      should be incomplete at least because it does not fix all the places
      so we're still taking the locks in different orders, while reverting
      that commit is very clean to me so far that we should always take
      device_domain_lock first then the iommu lock.
      
      We can continue to try to find the real culprit mentioned in
      7560cc3c, but for now I think we should revert it to fix current
      breakage.
      
      CC: Joerg Roedel <joro@8bytes.org>
      CC: Lu Baolu <baolu.lu@linux.intel.com>
      CC: dave.jiang@intel.com
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      0aafc8ae
  14. 18 6月, 2019 2 次提交
  15. 12 6月, 2019 12 次提交
  16. 05 6月, 2019 1 次提交
  17. 03 6月, 2019 1 次提交
  18. 02 6月, 2019 1 次提交