1. 09 1月, 2014 7 次提交
    • J
      iommu/vt-d: fix invalid memory access when freeing DMAR irq · b5f36d9e
      Jiang Liu 提交于
      In function free_dmar_iommu(), it sets IRQ handler data to NULL
      before calling free_irq(), which will cause invalid memory access
      because free_irq() will access IRQ handler data when calling
      function dmar_msi_mask(). So only set IRQ handler data to NULL
      after calling free_irq().
      
      Sample stack dump:
      [   13.094010] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
      [   13.103215] IP: [<ffffffff810a97cd>] __lock_acquire+0x4d/0x12a0
      [   13.110104] PGD 0
      [   13.112614] Oops: 0000 [#1] SMP
      [   13.116585] Modules linked in:
      [   13.120260] CPU: 60 PID: 1 Comm: swapper/0 Tainted: G        W    3.13.0-rc1-gerry+ #9
      [   13.129367] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012
      [   13.142555] task: ffff88042dd38010 ti: ffff88042dd32000 task.ti: ffff88042dd32000
      [   13.151179] RIP: 0010:[<ffffffff810a97cd>]  [<ffffffff810a97cd>] __lock_acquire+0x4d/0x12a0
      [   13.160867] RSP: 0000:ffff88042dd33b78  EFLAGS: 00010046
      [   13.166969] RAX: 0000000000000046 RBX: 0000000000000002 RCX: 0000000000000000
      [   13.175122] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000048
      [   13.183274] RBP: ffff88042dd33bd8 R08: 0000000000000002 R09: 0000000000000001
      [   13.191417] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88042dd38010
      [   13.199571] R13: 0000000000000000 R14: 0000000000000048 R15: 0000000000000000
      [   13.207725] FS:  0000000000000000(0000) GS:ffff88103f200000(0000) knlGS:0000000000000000
      [   13.217014] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   13.223596] CR2: 0000000000000048 CR3: 0000000001a0b000 CR4: 00000000000407e0
      [   13.231747] Stack:
      [   13.234160]  0000000000000004 0000000000000046 ffff88042dd33b98 ffffffff810a567d
      [   13.243059]  ffff88042dd33c08 ffffffff810bb14c ffffffff828995a0 0000000000000046
      [   13.251969]  0000000000000000 0000000000000000 0000000000000002 0000000000000000
      [   13.260862] Call Trace:
      [   13.263775]  [<ffffffff810a567d>] ? trace_hardirqs_off+0xd/0x10
      [   13.270571]  [<ffffffff810bb14c>] ? vprintk_emit+0x23c/0x570
      [   13.277058]  [<ffffffff810ab1e3>] lock_acquire+0x93/0x120
      [   13.283269]  [<ffffffff814623f7>] ? dmar_msi_mask+0x47/0x70
      [   13.289677]  [<ffffffff8156b449>] _raw_spin_lock_irqsave+0x49/0x90
      [   13.296748]  [<ffffffff814623f7>] ? dmar_msi_mask+0x47/0x70
      [   13.303153]  [<ffffffff814623f7>] dmar_msi_mask+0x47/0x70
      [   13.309354]  [<ffffffff810c0d93>] irq_shutdown+0x53/0x60
      [   13.315467]  [<ffffffff810bdd9d>] __free_irq+0x26d/0x280
      [   13.321580]  [<ffffffff810be920>] free_irq+0xf0/0x180
      [   13.327395]  [<ffffffff81466591>] free_dmar_iommu+0x271/0x2b0
      [   13.333996]  [<ffffffff810a947d>] ? trace_hardirqs_on+0xd/0x10
      [   13.340696]  [<ffffffff81461a17>] free_iommu+0x17/0x50
      [   13.346597]  [<ffffffff81dc75a5>] init_dmars+0x691/0x77a
      [   13.352711]  [<ffffffff81dc7afd>] intel_iommu_init+0x351/0x438
      [   13.359400]  [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d
      [   13.365806]  [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52
      [   13.372114]  [<ffffffff81000342>] do_one_initcall+0x122/0x180
      [   13.378707]  [<ffffffff81077738>] ? parse_args+0x1e8/0x320
      [   13.385016]  [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c
      [   13.392100]  [<ffffffff81d84833>] ? do_early_param+0x88/0x88
      [   13.398596]  [<ffffffff8154f8b0>] ? rest_init+0xd0/0xd0
      [   13.404614]  [<ffffffff8154f8be>] kernel_init+0xe/0x130
      [   13.410626]  [<ffffffff81574d6c>] ret_from_fork+0x7c/0xb0
      [   13.416829]  [<ffffffff8154f8b0>] ? rest_init+0xd0/0xd0
      [   13.422842] Code: ec 99 00 85 c0 8b 05 53 05 a5 00 41 0f 45 d8 85 c0 0f 84 ff 00 00 00 8b 05 99 f9 7e 01 49 89 fe 41 89 f7 85 c0 0f 84 03 01 00 00 <49> 8b 06 be 01 00 00 00 48 3d c0 0e 01 82 0f 44 de 41 83 ff 01
      [   13.450191] RIP  [<ffffffff810a97cd>] __lock_acquire+0x4d/0x12a0
      [   13.458598]  RSP <ffff88042dd33b78>
      [   13.462671] CR2: 0000000000000048
      [   13.466551] ---[ end trace c5bd26a37c81d760 ]---
      Reviewed-by: NYijing Wang <wangyijing@huawei.com>
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <joro@8bytes.org>
      b5f36d9e
    • J
      iommu/vt-d, trivial: simplify code with existing macros · 7c919779
      Jiang Liu 提交于
      Simplify vt-d related code with existing macros and introduce a new
      macro for_each_active_drhd_unit() to enumerate all active DRHD unit.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <joro@8bytes.org>
      7c919779
    • J
      iommu/vt-d, trivial: clean up unused code · b8a2d288
      Jiang Liu 提交于
      Remove dead code from VT-d related files.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <joro@8bytes.org>
      
      Conflicts:
      
      	drivers/iommu/dmar.c
      b8a2d288
    • J
      iommu/vt-d, trivial: print correct domain id of static identity domain · 9544c003
      Jiang Liu 提交于
      Field si_domain->id is set by iommu_attach_domain(), so we should only
      print domain id for static identity domain after calling
      iommu_attach_domain(si_domain, iommu), otherwise it's always zero.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <joro@8bytes.org>
      9544c003
    • J
      iommu/vt-d, trivial: refine support of 64bit guest address · 5c645b35
      Jiang Liu 提交于
      In Intel IOMMU driver, it calculate page table level from adjusted guest
      address width as 'level = (agaw - 30) / 9', which assumes (agaw -30)
      could be divided by 9. On the other hand, 64bit is a valid agaw and
      (64 - 30) can't be divided by 9, so it needs special handling.
      
      This patch enhances Intel IOMMU driver to correctly handle 64bit agaw.
      It's mainly for code readability because there's no hardware supporting
      64bit agaw yet.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <joro@8bytes.org>
      5c645b35
    • J
      iommu/vt-d: fix resource leakage on error recovery path in iommu_init_domains() · 852bdb04
      Jiang Liu 提交于
      Release allocated resources on error recovery path in function
      iommu_init_domains().
      
      Also improve printk messages in iommu_init_domains().
      Acked-by: NYijing Wang <wangyijing@huawei.com>
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <joro@8bytes.org>
      852bdb04
    • J
      iommu/vt-d: fix a race window in allocating domain ID for virtual machines · 18d99165
      Jiang Liu 提交于
      Function intel_iommu_domain_init() may be concurrently called by upper
      layer without serialization, so use atomic_t to protect domain id
      allocation.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NJoerg Roedel <joro@8bytes.org>
      18d99165
  2. 07 1月, 2014 1 次提交
  3. 01 11月, 2013 2 次提交
  4. 15 8月, 2013 1 次提交
  5. 20 6月, 2013 1 次提交
    • A
      iommu/{vt-d,amd}: Remove multifunction assumption around grouping · c14d2690
      Alex Williamson 提交于
      If a device is multifunction and does not have ACS enabled then we
      assume that the entire package lacks ACS and use function 0 as the
      base of the group.  The PCIe spec however states that components are
      permitted to implement ACS on some, none, or all of their applicable
      functions.  It's therefore conceivable that function 0 may be fully
      independent and support ACS while other functions do not.  Instead
      use the lowest function of the slot that does not have ACS enabled
      as the base of the group.  This may be the current device, which is
      intentional.  So long as we use a consistent algorithm, all the
      non-ACS functions will be grouped together and ACS functions will
      get separate groups.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NJoerg Roedel <joro@8bytes.org>
      c14d2690
  6. 23 4月, 2013 2 次提交
    • V
      iommu: Move swap_pci_ref function to drivers/iommu/pci.h. · 61e015ac
      Varun Sethi 提交于
      The swap_pci_ref function is used by the IOMMU API code for
      swapping pci device pointers, while determining the iommu
      group for the device.
      Currently this function was being implemented for different
      IOMMU drivers.  This patch moves the function to a new file,
      drivers/iommu/pci.h so that the implementation can be
      shared across various IOMMU drivers.
      Signed-off-by: NVarun Sethi <Varun.Sethi@freescale.com>
      Signed-off-by: NJoerg Roedel <joro@8bytes.org>
      61e015ac
    • T
      iommu/vt-d: Disable translation if already enabled · 3a93c841
      Takao Indoh 提交于
      This patch disables translation(dma-remapping) before its initialization
      if it is already enabled.
      
      This is needed for kexec/kdump boot. If dma-remapping is enabled in the
      first kernel, it need to be disabled before initializing its page table
      during second kernel boot. Wei Hu also reported that this is needed
      when second kernel boots with intel_iommu=off.
      
      Basically iommu->gcmd is used to know whether translation is enabled or
      disabled, but it is always zero at boot time even when translation is
      enabled since iommu->gcmd is initialized without considering such a
      case. Therefor this patch synchronizes iommu->gcmd value with global
      command register when iommu structure is allocated.
      Signed-off-by: NTakao Indoh <indou.takao@jp.fujitsu.com>
      Signed-off-by: NJoerg Roedel <joro@8bytes.org>
      3a93c841
  7. 03 4月, 2013 1 次提交
  8. 20 2月, 2013 1 次提交
  9. 28 1月, 2013 1 次提交
  10. 23 1月, 2013 1 次提交
  11. 04 1月, 2013 1 次提交
    • G
      Drivers: iommu: remove __dev* attributes. · d34d6517
      Greg Kroah-Hartman 提交于
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the use of __devinit, __devexit_p, __devinitdata,
      and __devexit from these drivers.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Ohad Ben-Cohen <ohad@wizery.com>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Omar Ramirez Luna <omar.luna@linaro.org>
      Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
      Cc: Hiroshi Doyu <hdoyu@nvidia.com>
      Cc: Stephen Warren <swarren@wwwdotorg.org>
      Cc: Bharat Nihalani <bnihalani@nvidia.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d34d6517
  12. 21 12月, 2012 1 次提交
    • W
      intel-iommu: Free old page tables before creating superpage · 6491d4d0
      Woodhouse, David 提交于
      The dma_pte_free_pagetable() function will only free a page table page
      if it is asked to free the *entire* 2MiB range that it covers. So if a
      page table page was used for one or more small mappings, it's likely to
      end up still present in the page tables... but with no valid PTEs.
      
      This was fine when we'd only be repopulating it with 4KiB PTEs anyway
      but the same virtual address range can end up being reused for a
      *large-page* mapping. And in that case were were trying to insert the
      large page into the second-level page table, and getting a complaint
      from the sanity check in __domain_mapping() because there was already a
      corresponding entry. This was *relatively* harmless; it led to a memory
      leak of the old page table page, but no other ill-effects.
      
      Fix it by calling dma_pte_clear_range (hopefully redundant) and
      dma_pte_free_pagetable() before setting up the new large page.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Tested-by: NRavi Murty <Ravi.Murty@intel.com>
      Tested-by: NSudeep Dutt <sudeep.dutt@intel.com>
      Cc: stable@kernel.org [3.0+]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6491d4d0
  13. 21 11月, 2012 1 次提交
  14. 17 11月, 2012 1 次提交
  15. 18 9月, 2012 1 次提交
    • A
      intel-iommu: Default to non-coherent for domains unattached to iommus · 2e12bc29
      Alex Williamson 提交于
      domain_update_iommu_coherency() currently defaults to setting domains
      as coherent when the domain is not attached to any iommus.  This
      allows for a window in domain_context_mapping_one() where such a
      domain can update context entries non-coherently, and only after
      update the domain capability to clear iommu_coherency.
      
      This can be seen using KVM device assignment on VT-d systems that
      do not support coherency in the ecap register.  When a device is
      added to a guest, a domain is created (iommu_coherency = 0), the
      device is attached, and ranges are mapped.  If we then hot unplug
      the device, the coherency is updated and set to the default (1)
      since no iommus are attached to the domain.  A subsequent attach
      of a device makes use of the same dmar domain (now marked coherent)
      updates context entries with coherency enabled, and only disables
      coherency as the last step in the process.
      
      To fix this, switch domain_update_iommu_coherency() to use the
      safer, non-coherent default for domains not attached to iommus.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Tested-by: NDonald Dutile <ddutile@redhat.com>
      Acked-by: NDonald Dutile <ddutile@redhat.com>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      2e12bc29
  16. 23 8月, 2012 1 次提交
  17. 07 8月, 2012 1 次提交
  18. 03 8月, 2012 1 次提交
  19. 11 7月, 2012 1 次提交
  20. 25 6月, 2012 3 次提交
    • A
      intel-iommu: Make use of DMA quirks and ACS checks in IOMMU groups · 783f157b
      Alex Williamson 提交于
      Work around broken devices and adhere to ACS support when determining
      IOMMU grouping.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      783f157b
    • A
      intel-iommu: Support IOMMU groups · abdfdde2
      Alex Williamson 提交于
      Add IOMMU group support to Intel VT-d code.  This driver sets up
      devices ondemand, so make use of the add_device/remove_device
      callbacks in IOMMU API to manage setting up the groups.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      abdfdde2
    • A
      iommu: IOMMU Groups · d72e31c9
      Alex Williamson 提交于
      IOMMU device groups are currently a rather vague associative notion
      with assembly required by the user or user level driver provider to
      do anything useful.  This patch intends to grow the IOMMU group concept
      into something a bit more consumable.
      
      To do this, we first create an object representing the group, struct
      iommu_group.  This structure is allocated (iommu_group_alloc) and
      filled (iommu_group_add_device) by the iommu driver.  The iommu driver
      is free to add devices to the group using it's own set of policies.
      This allows inclusion of devices based on physical hardware or topology
      limitations of the platform, as well as soft requirements, such as
      multi-function trust levels or peer-to-peer protection of the
      interconnects.  Each device may only belong to a single iommu group,
      which is linked from struct device.iommu_group.  IOMMU groups are
      maintained using kobject reference counting, allowing for automatic
      removal of empty, unreferenced groups.  It is the responsibility of
      the iommu driver to remove devices from the group
      (iommu_group_remove_device).
      
      IOMMU groups also include a userspace representation in sysfs under
      /sys/kernel/iommu_groups.  When allocated, each group is given a
      dynamically assign ID (int).  The ID is managed by the core IOMMU group
      code to support multiple heterogeneous iommu drivers, which could
      potentially collide in group naming/numbering.  This also keeps group
      IDs to small, easily managed values.  A directory is created under
      /sys/kernel/iommu_groups for each group.  A further subdirectory named
      "devices" contains links to each device within the group.  The iommu_group
      file in the device's sysfs directory, which formerly contained a group
      number when read, is now a link to the iommu group.  Example:
      
      $ ls -l /sys/kernel/iommu_groups/26/devices/
      total 0
      lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 ->
      		../../../../devices/pci0000:00/0000:00:1e.0
      lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 ->
      		../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0
      lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 ->
      		../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1
      
      $ ls -l  /sys/kernel/iommu_groups/26/devices/*/iommu_group
      [truncating perms/owner/timestamp]
      /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group ->
      					../../../kernel/iommu_groups/26
      /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group ->
      					../../../../kernel/iommu_groups/26
      /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group ->
      					../../../../kernel/iommu_groups/26
      
      Groups also include several exported functions for use by user level
      driver providers, for example VFIO.  These include:
      
      iommu_group_get(): Acquires a reference to a group from a device
      iommu_group_put(): Releases reference
      iommu_group_for_each_dev(): Iterates over group devices using callback
      iommu_group_[un]register_notifier(): Allows notification of device add
              and remove operations relevant to the group
      iommu_group_id(): Return the group number
      
      This patch also extends the IOMMU API to allow attaching groups to
      domains.  This is currently a simple wrapper for iterating through
      devices within a group, but it's expected that the IOMMU API may
      eventually make groups a more integral part of domains.
      
      Groups intentionally do not try to manage group ownership.  A user
      level driver provider must independently acquire ownership for each
      device within a group before making use of the group as a whole.
      This may change in the future if group usage becomes more pervasive
      across both DMA and IOMMU ops.
      
      Groups intentionally do not provide a mechanism for driver locking
      or otherwise manipulating driver matching/probing of devices within
      the group.  Such interfaces are generic to devices and beyond the
      scope of IOMMU groups.  If implemented, user level providers have
      ready access via iommu_group_for_each_dev and group notifiers.
      
      iommu_device_group() is removed here as it has no users.  The
      replacement is:
      
      	group = iommu_group_get(dev);
      	id = iommu_group_id(group);
      	iommu_group_put(group);
      
      AMD-Vi & Intel VT-d support re-added in following patches.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      d72e31c9
  21. 14 6月, 2012 1 次提交
  22. 26 5月, 2012 2 次提交
  23. 07 5月, 2012 3 次提交
  24. 28 3月, 2012 1 次提交
  25. 06 3月, 2012 2 次提交
  26. 06 2月, 2012 1 次提交