1. 23 10月, 2007 26 次提交
  2. 22 10月, 2007 14 次提交
    • L
      KVM: Use new smp_call_function_mask() in kvm_flush_remote_tlbs() · 49d3bd7e
      Laurent Vivier 提交于
      In kvm_flush_remote_tlbs(), replace a loop using smp_call_function_single()
      by a single call to smp_call_function_mask() (which is new for x86_64).
      Signed-off-by: NLaurent Vivier <Laurent.Vivier@bull.net>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      49d3bd7e
    • F
      intel-iommu sg chaining support · c03ab37c
      FUJITA Tomonori 提交于
      x86_64 defines ARCH_HAS_SG_CHAIN. So if IOMMU implementations don't
      support sg chaining, we will get data corruption.
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Acked-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c03ab37c
    • K
      intel-iommu: fix for IOMMU early crash · 358dd8ac
      Keshavamurthy, Anil S 提交于
      pci_dev's->sysdata is highly overloaded and currently IOMMU is broken due
      to IOMMU code depending on this field.
      
      This patch introduces new field in pci_dev's dev.archdata struct to hold
      IOMMU specific per device IOMMU private data.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Jeff Garzik <jeff@garzik.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      358dd8ac
    • K
      intel-iommu: optimize sg map/unmap calls · f76aec76
      Keshavamurthy, Anil S 提交于
      This patch adds PageSelectiveInvalidation support replacing existing
      DomainSelectiveInvalidation for intel_{map/unmap}_sg() calls and also
      enables to mapping one big contiguous DMA virtual address which is mapped
      to discontiguous physical address for SG map/unmap calls.
      
      "Doamin selective invalidations" wipes out the IOMMU address translation
      cache based on domain ID where as "Page selective invalidations" wipes out
      the IOMMU address translation cache for that address mask range which is
      more cache friendly when compared to Domain selective invalidations.
      
      Here is how it is done.
      1) changes to iova.c
      alloc_iova() now takes a bool size_aligned argument, which
      when when set, returns the io virtual address that is
      naturally aligned to 2 ^ x, where x is the order
      of the size requested.
      
      Returning this io vitual address which is naturally
      aligned helps iommu to do the "page selective
      invalidations" which is IOMMU cache friendly
      over "domain selective invalidations".
      
      2) Changes to driver/pci/intel-iommu.c
      Clean up intel_{map/unmap}_{single/sg} () calls so that
      s/g map/unamp calls is no more dependent on
      intel_{map/unmap}_single()
      
      intel_map_sg() now computes the total DMA virtual address
      required and allocates the size aligned total DMA virtual address
      and maps the discontiguous physical address to the allocated
      contiguous DMA virtual address.
      
      In the intel_unmap_sg() case since the DMA virtual address
      is contiguous and size_aligned, PageSelectiveInvalidation
      is used replacing earlier DomainSelectiveInvalidations.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Suresh B <suresh.b.siddha@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f76aec76
    • K
      Intel IOMMU: Iommu floppy workaround · 49a0429e
      Keshavamurthy, Anil S 提交于
      This config option (DMAR_FLPY_WA) sets up 1:1 mapping for the floppy device so
      that the floppy device which does not use DMA api's will continue to work.
      
      Once the floppy driver starts using DMA api's this config option can be turn
      off or this patch can be yanked out of kernel at that time.
      
      [akpm@linux-foundation.org: cleanups, rename things, build fix]
      [jengelh@computergmbh.de: Kconfig fixes]
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NJan Engelhardt <jengelh@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49a0429e
    • K
      Intel IOMMU: Iommu Gfx workaround · e820482c
      Keshavamurthy, Anil S 提交于
      When we fix all the opensource gfx drivers to use the DMA api's, at that time
      we can yank this config options out.
      
      [jengelh@computergmbh.de: Kconfig fixes]
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NJan Engelhardt <jengelh@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e820482c
    • K
      Intel IOMMU: DMAR fault handling support · 3460a6d9
      Keshavamurthy, Anil S 提交于
      MSI interrupt handler registrations and fault handling support for Intel-IOMMU
      hadrware.
      
      This patch enables the MSI interrupts for the DMA remapping units and in the
      interrupt handler read the fault cause and outputs the same on to the console.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3460a6d9
    • K
      Intel IOMMU: Intel iommu cmdline option - forcedac · 7d3b03ce
      Keshavamurthy, Anil S 提交于
      Introduce intel_iommu=forcedac commandline option.  This option is helpful to
      verify the pci device capability of handling physical dma'able address greater
      than 4G.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d3b03ce
    • K
      Intel IOMMU: Avoid memory allocation failures in dma map api calls · eb3fa7cb
      Keshavamurthy, Anil S 提交于
      Intel IOMMU driver needs memory during DMA map calls to setup its internal
      page tables and for other data structures.  As we all know that these DMA map
      calls are mostly called in the interrupt context or with the spinlock held by
      the upper level drivers(network/storage drivers), so in order to avoid any
      memory allocation failure due to low memory issues, this patch makes memory
      allocation by temporarily setting PF_MEMALLOC flags for the current task
      before making memory allocation calls.
      
      We evaluated mempools as a backup when kmem_cache_alloc() fails
      and found that mempools are really not useful here because
       1) We don't know for sure how much to reserve in advance
       2) And mempools are not useful for GFP_ATOMIC case (as we call
          memory alloc functions with GFP_ATOMIC)
      
      (akpm: point 2 is wrong...)
      
      With PF_MEMALLOC flag set in the current->flags, the VM subsystem avoids any
      watermark checks before allocating memory thus guarantee'ing the memory till
      the last free page.  Further, looking at the code in mm/page_alloc.c in
      __alloc_pages() function, looks like this flag is useful only in the
      non-interrupt context.
      
      If we are in the interrupt context and memory allocation in IOMMU driver fails
      for some reason, then the DMA map api's will return failure and it is up to
      the higher level drivers to retry.  Suppose, if upper level driver programs
      the controller with the buggy DMA virtual address, the IOMMU will block that
      DMA transaction when that happens thus preventing any corruption to main
      memory.
      
      So far in our test scenario, we were unable to create any memory allocation
      failure inside dma map api calls.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb3fa7cb
    • K
      Intel IOMMU: Intel IOMMU driver · ba395927
      Keshavamurthy, Anil S 提交于
      Actual intel IOMMU driver.  Hardware spec can be found at:
      http://www.intel.com/technology/virtualization
      
      This driver sets X86_64 'dma_ops', so hook into standard DMA APIs.  In this
      way, PCI driver will get virtual DMA address.  This change is transparent to
      PCI drivers.
      
      [akpm@linux-foundation.org: remove unneeded cast]
      [akpm@linux-foundation.org: build fix]
      [bunk@stusta.de: fix duplicate CONFIG_DMAR Makefile line]
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba395927
    • K
      Intel IOMMU: IOVA allocation and management routines · f8de50eb
      Keshavamurthy, Anil S 提交于
      This code implements a generic IOVA allocation and management.  As per Dave's
      suggestion we are now allocating IO virtual address from Higher DMA limit
      address rather than lower end address and this eliminated the need to preserve
      the IO virtual address for multiple devices sharing the same domain virtual
      address.
      
      Also this code uses red black trees to store the allocated and reserved iova
      nodes.  This showed a good performance improvements over previous linear
      linked list.
      
      [akpm@linux-foundation.org: remove inlines]
      [akpm@linux-foundation.org: coding style fixes]
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8de50eb
    • K
      Intel IOMMU: PCI generic helper function · 994a65e2
      Keshavamurthy, Anil S 提交于
      When devices are under a p2p bridge, upstream transactions get replaced by the
      device id of the bridge as it owns the PCIE transaction.  Hence its necessary
      to setup translations on behalf of the bridge as well.  Due to this limitation
      all devices under a p2p share the same domain in a DMAR.
      
      We just cache the type of device, if its a native PCIe device
      or not for later use.
      
      [akpm@linux-foundation.org: BUG_ON -> WARN_ON+recover]
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      994a65e2
    • K
      Intel IOMMU: DMAR detection and parsing logic · 10e5247f
      Keshavamurthy, Anil S 提交于
      This patch supports the upcomming Intel IOMMU hardware a.k.a.  Intel(R)
      Virtualization Technology for Directed I/O Architecture and the hardware spec
      for the same can be found here
      http://www.intel.com/technology/virtualization/index.htm
      
      FAQ! (questions from akpm, answers from ak)
      
      > So...  what's all this code for?
      >
      > I assume that the intent here is to speed things up under Xen, etc?
      
      Yes in some cases, but not this code.  That would be the Xen version of this
      code that could potentially assign whole devices to guests.  I expect this to
      be only useful in some special cases though because most hardware is not
      virtualizable and you typically want an own instance for each guest.
      
      Ok at some point KVM might implement this too; i likely would use this code
      for this.
      
      > Do we
      > have any benchmark results to help us to decide whether a merge would be
      > justified?
      
      The main advantage for doing it in the normal kernel is not performance, but
      more safety.  Broken devices won't be able to corrupt memory by doing random
      DMA.
      
      Unfortunately that doesn't work for graphics yet, for that need user space
      interfaces for the X server are needed.
      
      There are some potential performance benefits too:
      
      - When you have a device that cannot address the complete address range an
        IOMMU can remap its memory instead of bounce buffering.  Remapping is likely
        cheaper than copying.
      
      - The IOMMU can merge sg lists into a single virtual block.  This could
        potentially speed up SG IO when the device is slow walking SG lists.  [I
        long ago benchmarked 5% on some block benchmark with an old MPT Fusion; but
        it probably depends a lot on the HBA]
      
      And you get better driver debugging because unexpected memory accesses from
      the devices will cause a trappable event.
      
      >
      > Does it slow anything down?
      
      It adds more overhead to each IO so yes.
      
      This patch:
      
      Add support for early detection and parsing of DMAR's (DMA Remapping) reported
      to OS via ACPI tables.
      
      DMA remapping(DMAR) devices support enables independent address translations
      for Direct Memory Access(DMA) from Devices.  These DMA remapping devices are
      reported via ACPI tables and includes pci device scope covered by these DMA
      remapping device.
      
      For detailed info on the specification of "Intel(R) Virtualization Technology
      for Directed I/O Architecture" please see
      http://www.intel.com/technology/virtualization/index.htmSigned-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Len Brown <lenb@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      10e5247f
    • Y
      memory hotplug: rearrange memory hotplug notifier · 7b78d335
      Yasunori Goto 提交于
      Current memory notifier has some defects yet.  (Fortunately, nothing uses
      it.) This patch is to fix and rearrange for them.
      
        - Add information of start_pfn, nr_pages, and node id if node status is
          changes from/to memoryless node for callback functions.
          Callbacks can't do anything without those information.
        - Add notification going-online status.
          It is necessary for creating per node structure before the node's
          pages are available.
        - Move GOING_OFFLINE status notification after page isolation.
          It is good place for return memory like cache for callback,
          because returned page is not used again.
        - Make CANCEL events for rollingback when error occurs.
        - Delete MEM_MAPPING_INVALID notification. It will be not used.
        - Fix compile error of (un)register_memory_notifier().
      Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b78d335