1. 21 12月, 2012 1 次提交
    • W
      intel-iommu: Free old page tables before creating superpage · 6491d4d0
      Woodhouse, David 提交于
      The dma_pte_free_pagetable() function will only free a page table page
      if it is asked to free the *entire* 2MiB range that it covers. So if a
      page table page was used for one or more small mappings, it's likely to
      end up still present in the page tables... but with no valid PTEs.
      
      This was fine when we'd only be repopulating it with 4KiB PTEs anyway
      but the same virtual address range can end up being reused for a
      *large-page* mapping. And in that case were were trying to insert the
      large page into the second-level page table, and getting a complaint
      from the sanity check in __domain_mapping() because there was already a
      corresponding entry. This was *relatively* harmless; it led to a memory
      leak of the old page table page, but no other ill-effects.
      
      Fix it by calling dma_pte_clear_range (hopefully redundant) and
      dma_pte_free_pagetable() before setting up the new large page.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Tested-by: NRavi Murty <Ravi.Murty@intel.com>
      Tested-by: NSudeep Dutt <sudeep.dutt@intel.com>
      Cc: stable@kernel.org [3.0+]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6491d4d0
  2. 17 11月, 2012 1 次提交
  3. 18 9月, 2012 1 次提交
    • A
      intel-iommu: Default to non-coherent for domains unattached to iommus · 2e12bc29
      Alex Williamson 提交于
      domain_update_iommu_coherency() currently defaults to setting domains
      as coherent when the domain is not attached to any iommus.  This
      allows for a window in domain_context_mapping_one() where such a
      domain can update context entries non-coherently, and only after
      update the domain capability to clear iommu_coherency.
      
      This can be seen using KVM device assignment on VT-d systems that
      do not support coherency in the ecap register.  When a device is
      added to a guest, a domain is created (iommu_coherency = 0), the
      device is attached, and ranges are mapped.  If we then hot unplug
      the device, the coherency is updated and set to the default (1)
      since no iommus are attached to the domain.  A subsequent attach
      of a device makes use of the same dmar domain (now marked coherent)
      updates context entries with coherency enabled, and only disables
      coherency as the last step in the process.
      
      To fix this, switch domain_update_iommu_coherency() to use the
      safer, non-coherent default for domains not attached to iommus.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Tested-by: NDonald Dutile <ddutile@redhat.com>
      Acked-by: NDonald Dutile <ddutile@redhat.com>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      2e12bc29
  4. 23 8月, 2012 1 次提交
  5. 07 8月, 2012 1 次提交
  6. 03 8月, 2012 1 次提交
  7. 11 7月, 2012 1 次提交
  8. 25 6月, 2012 3 次提交
    • A
      intel-iommu: Make use of DMA quirks and ACS checks in IOMMU groups · 783f157b
      Alex Williamson 提交于
      Work around broken devices and adhere to ACS support when determining
      IOMMU grouping.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      783f157b
    • A
      intel-iommu: Support IOMMU groups · abdfdde2
      Alex Williamson 提交于
      Add IOMMU group support to Intel VT-d code.  This driver sets up
      devices ondemand, so make use of the add_device/remove_device
      callbacks in IOMMU API to manage setting up the groups.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      abdfdde2
    • A
      iommu: IOMMU Groups · d72e31c9
      Alex Williamson 提交于
      IOMMU device groups are currently a rather vague associative notion
      with assembly required by the user or user level driver provider to
      do anything useful.  This patch intends to grow the IOMMU group concept
      into something a bit more consumable.
      
      To do this, we first create an object representing the group, struct
      iommu_group.  This structure is allocated (iommu_group_alloc) and
      filled (iommu_group_add_device) by the iommu driver.  The iommu driver
      is free to add devices to the group using it's own set of policies.
      This allows inclusion of devices based on physical hardware or topology
      limitations of the platform, as well as soft requirements, such as
      multi-function trust levels or peer-to-peer protection of the
      interconnects.  Each device may only belong to a single iommu group,
      which is linked from struct device.iommu_group.  IOMMU groups are
      maintained using kobject reference counting, allowing for automatic
      removal of empty, unreferenced groups.  It is the responsibility of
      the iommu driver to remove devices from the group
      (iommu_group_remove_device).
      
      IOMMU groups also include a userspace representation in sysfs under
      /sys/kernel/iommu_groups.  When allocated, each group is given a
      dynamically assign ID (int).  The ID is managed by the core IOMMU group
      code to support multiple heterogeneous iommu drivers, which could
      potentially collide in group naming/numbering.  This also keeps group
      IDs to small, easily managed values.  A directory is created under
      /sys/kernel/iommu_groups for each group.  A further subdirectory named
      "devices" contains links to each device within the group.  The iommu_group
      file in the device's sysfs directory, which formerly contained a group
      number when read, is now a link to the iommu group.  Example:
      
      $ ls -l /sys/kernel/iommu_groups/26/devices/
      total 0
      lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 ->
      		../../../../devices/pci0000:00/0000:00:1e.0
      lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 ->
      		../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0
      lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 ->
      		../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1
      
      $ ls -l  /sys/kernel/iommu_groups/26/devices/*/iommu_group
      [truncating perms/owner/timestamp]
      /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group ->
      					../../../kernel/iommu_groups/26
      /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group ->
      					../../../../kernel/iommu_groups/26
      /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group ->
      					../../../../kernel/iommu_groups/26
      
      Groups also include several exported functions for use by user level
      driver providers, for example VFIO.  These include:
      
      iommu_group_get(): Acquires a reference to a group from a device
      iommu_group_put(): Releases reference
      iommu_group_for_each_dev(): Iterates over group devices using callback
      iommu_group_[un]register_notifier(): Allows notification of device add
              and remove operations relevant to the group
      iommu_group_id(): Return the group number
      
      This patch also extends the IOMMU API to allow attaching groups to
      domains.  This is currently a simple wrapper for iterating through
      devices within a group, but it's expected that the IOMMU API may
      eventually make groups a more integral part of domains.
      
      Groups intentionally do not try to manage group ownership.  A user
      level driver provider must independently acquire ownership for each
      device within a group before making use of the group as a whole.
      This may change in the future if group usage becomes more pervasive
      across both DMA and IOMMU ops.
      
      Groups intentionally do not provide a mechanism for driver locking
      or otherwise manipulating driver matching/probing of devices within
      the group.  Such interfaces are generic to devices and beyond the
      scope of IOMMU groups.  If implemented, user level providers have
      ready access via iommu_group_for_each_dev and group notifiers.
      
      iommu_device_group() is removed here as it has no users.  The
      replacement is:
      
      	group = iommu_group_get(dev);
      	id = iommu_group_id(group);
      	iommu_group_put(group);
      
      AMD-Vi & Intel VT-d support re-added in following patches.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      d72e31c9
  9. 14 6月, 2012 1 次提交
  10. 26 5月, 2012 2 次提交
  11. 07 5月, 2012 3 次提交
  12. 28 3月, 2012 1 次提交
  13. 06 3月, 2012 2 次提交
  14. 06 2月, 2012 1 次提交
  15. 17 12月, 2011 1 次提交
    • E
      iommu: Export intel_iommu_enabled to signal when iommu is in use · 8bc1f85c
      Eugeni Dodonov 提交于
      In i915 driver, we do not enable either rc6 or semaphores on SNB when dmar
      is enabled. The new 'intel_iommu_enabled' variable signals when the
      iommu code is in operation.
      
      Cc: Ted Phelps <phelps@gnusto.com>
      Cc: Peter <pab1612@gmail.com>
      Cc: Lukas Hejtmanek <xhejtman@fi.muni.cz>
      Cc: Andrew Lutomirski <luto@mit.edu>
      CC: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Eugeni Dodonov <eugeni.dodonov@intel.com>
      Signed-off-by: NKeith Packard <keithp@keithp.com>
      8bc1f85c
  16. 09 12月, 2011 1 次提交
    • T
      memblock: Kill early_node_map[] · 0ee332c1
      Tejun Heo 提交于
      Now all ARCH_POPULATES_NODE_MAP archs select HAVE_MEBLOCK_NODE_MAP -
      there's no user of early_node_map[] left.  Kill early_node_map[] and
      replace ARCH_POPULATES_NODE_MAP with HAVE_MEMBLOCK_NODE_MAP.  Also,
      relocate for_each_mem_pfn_range() and helper from mm.h to memblock.h
      as page_alloc.c would no longer host an alternative implementation.
      
      This change is ultimately one to one mapping and shouldn't cause any
      observable difference; however, after the recent changes, there are
      some functions which now would fit memblock.c better than page_alloc.c
      and dependency on HAVE_MEMBLOCK_NODE_MAP instead of HAVE_MEMBLOCK
      doesn't make much sense on some of them.  Further cleanups for
      functions inside HAVE_MEMBLOCK_NODE_MAP in mm.h would be nice.
      
      -v2: Fix compile bug introduced by mis-spelling
       CONFIG_HAVE_MEMBLOCK_NODE_MAP to CONFIG_MEMBLOCK_HAVE_NODE_MAP in
       mmzone.h.  Reported by Stephen Rothwell.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      0ee332c1
  17. 06 12月, 2011 1 次提交
  18. 15 11月, 2011 2 次提交
  19. 10 11月, 2011 2 次提交
    • O
      iommu/intel: announce supported page sizes · 6d1c56a9
      Ohad Ben-Cohen 提交于
      Let the IOMMU core know we support arbitrary page sizes (as long as
      they're an order of 4KiB).
      
      This way the IOMMU core will retain the existing behavior we're used to;
      it will let us map regions that:
      - their size is an order of 4KiB
      - they are naturally aligned
      
      Note: Intel IOMMU hardware doesn't support arbitrary page sizes,
      but the driver does (it splits arbitrary-sized mappings into
      the pages supported by the hardware).
      
      To make everything simpler for now, though, this patch effectively tells
      the IOMMU core to keep giving this driver the same memory regions it did
      before, so nothing is changed as far as it's concerned.
      
      At this point, the page sizes announced remain static within the IOMMU
      core. To correctly utilize the pgsize-splitting of the IOMMU core by
      this driver, it seems that some core changes should still be done,
      because Intel's IOMMU page size capabilities seem to have the potential
      to be different between different DMA remapping devices.
      Signed-off-by: NOhad Ben-Cohen <ohad@wizery.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      6d1c56a9
    • O
      iommu/core: stop converting bytes to page order back and forth · 5009065d
      Ohad Ben-Cohen 提交于
      Express sizes in bytes rather than in page order, to eliminate the
      size->order->size conversions we have whenever the IOMMU API is calling
      the low level drivers' map/unmap methods.
      
      Adopt all existing drivers.
      Signed-off-by: NOhad Ben-Cohen <ohad@wizery.com>
      Cc: David Brown <davidb@codeaurora.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Joerg Roedel <Joerg.Roedel@amd.com>
      Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
      Cc: KyongHo Cho <pullip.cho@samsung.com>
      Cc: Hiroshi DOYU <hdoyu@nvidia.com>
      Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      5009065d
  20. 01 11月, 2011 1 次提交
  21. 21 10月, 2011 3 次提交
  22. 19 10月, 2011 3 次提交
  23. 15 10月, 2011 2 次提交
  24. 11 10月, 2011 1 次提交
    • R
      intel-iommu: Fix AB-BA lockdep report · 3e7abe25
      Roland Dreier 提交于
      When unbinding a device so that I could pass it through to a KVM VM, I
      got the lockdep report below.  It looks like a legitimate lock
      ordering problem:
      
       - domain_context_mapping_one() takes iommu->lock and calls
         iommu_support_dev_iotlb(), which takes device_domain_lock (inside
         iommu->lock).
      
       - domain_remove_one_dev_info() starts by taking device_domain_lock
         then takes iommu->lock inside it (near the end of the function).
      
      So this is the classic AB-BA deadlock.  It looks like a safe fix is to
      simply release device_domain_lock a bit earlier, since as far as I can
      tell, it doesn't protect any of the stuff accessed at the end of
      domain_remove_one_dev_info() anyway.
      
      BTW, the use of device_domain_lock looks a bit unsafe to me... it's
      at least not obvious to me why we aren't vulnerable to the race below:
      
        iommu_support_dev_iotlb()
                                                domain_remove_dev_info()
      
        lock device_domain_lock
          find info
        unlock device_domain_lock
      
                                                lock device_domain_lock
                                                  find same info
                                                unlock device_domain_lock
      
                                                free_devinfo_mem(info)
      
        do stuff with info after it's free
      
      However I don't understand the locking here well enough to know if
      this is a real problem, let alone what the best fix is.
      
      Anyway here's the full lockdep output that prompted all of this:
      
           =======================================================
           [ INFO: possible circular locking dependency detected ]
           2.6.39.1+ #1
           -------------------------------------------------------
           bash/13954 is trying to acquire lock:
            (&(&iommu->lock)->rlock){......}, at: [<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
      
           but task is already holding lock:
            (device_domain_lock){-.-...}, at: [<ffffffff812f6508>] domain_remove_one_dev_info+0x208/0x230
      
           which lock already depends on the new lock.
      
           the existing dependency chain (in reverse order) is:
      
           -> #1 (device_domain_lock){-.-...}:
                  [<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
                  [<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
                  [<ffffffff812f8350>] domain_context_mapping_one+0x600/0x750
                  [<ffffffff812f84df>] domain_context_mapping+0x3f/0x120
                  [<ffffffff812f9175>] iommu_prepare_identity_map+0x1c5/0x1e0
                  [<ffffffff81ccf1ca>] intel_iommu_init+0x88e/0xb5e
                  [<ffffffff81cab204>] pci_iommu_init+0x16/0x41
                  [<ffffffff81002165>] do_one_initcall+0x45/0x190
                  [<ffffffff81ca3d3f>] kernel_init+0xe3/0x168
                  [<ffffffff8157ac24>] kernel_thread_helper+0x4/0x10
      
           -> #0 (&(&iommu->lock)->rlock){......}:
                  [<ffffffff8109bf3e>] __lock_acquire+0x195e/0x1e10
                  [<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
                  [<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
                  [<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
                  [<ffffffff812f8b42>] device_notifier+0x72/0x90
                  [<ffffffff8157555c>] notifier_call_chain+0x8c/0xc0
                  [<ffffffff81089768>] __blocking_notifier_call_chain+0x78/0xb0
                  [<ffffffff810897b6>] blocking_notifier_call_chain+0x16/0x20
                  [<ffffffff81373a5c>] __device_release_driver+0xbc/0xe0
                  [<ffffffff81373ccf>] device_release_driver+0x2f/0x50
                  [<ffffffff81372ee3>] driver_unbind+0xa3/0xc0
                  [<ffffffff813724ac>] drv_attr_store+0x2c/0x30
                  [<ffffffff811e4506>] sysfs_write_file+0xe6/0x170
                  [<ffffffff8117569e>] vfs_write+0xce/0x190
                  [<ffffffff811759e4>] sys_write+0x54/0xa0
                  [<ffffffff81579a82>] system_call_fastpath+0x16/0x1b
      
           other info that might help us debug this:
      
           6 locks held by bash/13954:
            #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff811e4464>] sysfs_write_file+0x44/0x170
            #1:  (s_active#3){++++.+}, at: [<ffffffff811e44ed>] sysfs_write_file+0xcd/0x170
            #2:  (&__lockdep_no_validate__){+.+.+.}, at: [<ffffffff81372edb>] driver_unbind+0x9b/0xc0
            #3:  (&__lockdep_no_validate__){+.+.+.}, at: [<ffffffff81373cc7>] device_release_driver+0x27/0x50
            #4:  (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: [<ffffffff8108974f>] __blocking_notifier_call_chain+0x5f/0xb0
            #5:  (device_domain_lock){-.-...}, at: [<ffffffff812f6508>] domain_remove_one_dev_info+0x208/0x230
      
           stack backtrace:
           Pid: 13954, comm: bash Not tainted 2.6.39.1+ #1
           Call Trace:
            [<ffffffff810993a7>] print_circular_bug+0xf7/0x100
            [<ffffffff8109bf3e>] __lock_acquire+0x195e/0x1e10
            [<ffffffff810972bd>] ? trace_hardirqs_off+0xd/0x10
            [<ffffffff8109d57d>] ? trace_hardirqs_on_caller+0x13d/0x180
            [<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
            [<ffffffff812f6421>] ? domain_remove_one_dev_info+0x121/0x230
            [<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
            [<ffffffff812f6421>] ? domain_remove_one_dev_info+0x121/0x230
            [<ffffffff810972bd>] ? trace_hardirqs_off+0xd/0x10
            [<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
            [<ffffffff812f8b42>] device_notifier+0x72/0x90
            [<ffffffff8157555c>] notifier_call_chain+0x8c/0xc0
            [<ffffffff81089768>] __blocking_notifier_call_chain+0x78/0xb0
            [<ffffffff810897b6>] blocking_notifier_call_chain+0x16/0x20
            [<ffffffff81373a5c>] __device_release_driver+0xbc/0xe0
            [<ffffffff81373ccf>] device_release_driver+0x2f/0x50
            [<ffffffff81372ee3>] driver_unbind+0xa3/0xc0
            [<ffffffff813724ac>] drv_attr_store+0x2c/0x30
            [<ffffffff811e4506>] sysfs_write_file+0xe6/0x170
            [<ffffffff8117569e>] vfs_write+0xce/0x190
            [<ffffffff811759e4>] sys_write+0x54/0xa0
            [<ffffffff81579a82>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      3e7abe25
  25. 21 9月, 2011 3 次提交