1. 16 1月, 2016 2 次提交
    • K
      x86/PCI: Add driver for Intel Volume Management Device (VMD) · 185a383a
      Keith Busch 提交于
      The Intel Volume Management Device (VMD) is a Root Complex Integrated
      Endpoint that acts as a host bridge to a secondary PCIe domain.  BIOS can
      reassign one or more Root Ports to appear within a VMD domain instead of
      the primary domain.  The immediate benefit is that additional PCIe domains
      allow more than 256 buses in a system by letting bus numbers be reused
      across different domains.
      
      VMD domains do not define ACPI _SEG, so to avoid domain clashing with host
      bridges defining this segment, VMD domains start at 0x10000, which is
      greater than the highest possible 16-bit ACPI defined _SEG.
      
      This driver enumerates and enables the domain using the root bus
      configuration interface provided by the PCI subsystem.  The driver provides
      configuration space accessor functions (pci_ops), bus and memory resources,
      an MSI IRQ domain with irq_chip implementation, and DMA operations
      necessary to use devices through the VMD endpoint's interface.
      
      VMD routes I/O as follows:
      
         1) Configuration Space: BAR 0 ("CFGBAR") of VMD provides the base
         address and size for configuration space register access to VMD-owned
         root ports.  It works similarly to MMCONFIG for extended configuration
         space.  Bus numbering is independent and does not conflict with the
         primary domain.
      
         2) MMIO Space: BARs 2 and 4 ("MEMBAR1" and "MEMBAR2") of VMD provide the
         base address, size, and type for MMIO register access.  These addresses
         are not translated by VMD hardware; they are simply reservations to be
         distributed to root ports' memory base/limit registers and subdivided
         among devices downstream.
      
         3) DMA: To interact appropriately with an IOMMU, the source ID DMA read
         and write requests are translated to the bus-device-function of the VMD
         endpoint.  Otherwise, DMA operates normally without VMD-specific address
         translation.
      
         4) Interrupts: Part of VMD's BAR 4 is reserved for VMD's MSI-X Table and
         PBA.  MSIs from VMD domain devices and ports are remapped to appear as
         if they were issued using one of VMD's MSI-X table entries.  Each MSI
         and MSI-X address of VMD-owned devices and ports has a special format
         where the address refers to specific entries in the VMD's MSI-X table.
         As with DMA, the interrupt source ID is translated to VMD's
         bus-device-function.
      
         The driver provides its own MSI and MSI-X configuration functions
         specific to how MSI messages are used within the VMD domain, and
         provides an irq_chip for independent IRQ allocation to relay interrupts
         from VMD's interrupt handler to the appropriate device driver's handler.
      
         5) Errors: PCIe error message are intercepted by the root ports normally
         (e.g., AER), except with VMD, system errors (i.e., firmware first) are
         disabled by default.  AER and hotplug interrupts are translated in the
         same way as endpoint interrupts.
      
         6) VMD does not support INTx interrupts or IO ports.  Devices or drivers
         requiring these features should either not be placed below VMD-owned
         root ports, or VMD should be disabled by BIOS for such endpoints.
      
      [bhelgaas: add VMD BAR #defines, factor out vmd_cfg_addr(), rework VMD
      resource setup, whitespace, changelog]
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: Thomas Gleixner <tglx@linutronix.de> (IRQ-related parts)
      185a383a
    • K
      x86/PCI: Allow DMA ops specific to a PCI domain · d9c3d6ff
      Keith Busch 提交于
      The Intel Volume Management Device (VMD) is a PCIe endpoint that acts as a
      host bridge to another PCI domain.  When devices below the VMD perform DMA,
      the VMD replaces their DMA source IDs with its own source ID.  Therefore,
      those devices require special DMA ops.
      
      Add interfaces to allow the VMD driver to set up dma_ops for the devices
      below it.
      
      [bhelgaas: remove "extern", add "static", changelog]
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      d9c3d6ff
  2. 19 11月, 2015 2 次提交
  3. 14 11月, 2015 1 次提交
  4. 12 11月, 2015 5 次提交
    • H
      perf/x86/intel/rapl: Remove the unused RAPL_EVENT_DESC() macro · 41ac18eb
      Huang Rui 提交于
      Signed-off-by: NHuang Rui <ray.huang@amd.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Li <tony.li@amd.com>
      Link: http://lkml.kernel.org/r/1446630233-3166-1-git-send-email-ray.huang@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      41ac18eb
    • H
      x86/fpu: Fix get_xsave_addr() behavior under virtualization · a05917b6
      Huaitong Han 提交于
      KVM uses the get_xsave_addr() function in a different fashion from
      the native kernel, in that the 'xsave' parameter belongs to guest vcpu,
      not the currently running task.
      
      But 'xsave' is replaced with current task's (host) xsave structure, so
      get_xsave_addr() will incorrectly return the bad xsave address to KVM.
      
      Fix it so that the passed in 'xsave' address is used - as intended
      originally.
      Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
      Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave.hansen@intel.com
      Link: http://lkml.kernel.org/r/1446800423-21622-1-git-send-email-huaitong.han@intel.com
      [ Tidied up the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a05917b6
    • D
      x86/fpu: Fix 32-bit signal frame handling · ab6b5294
      Dave Hansen 提交于
      (This should have gone to LKML originally. Sorry for the extra
       noise, folks on the cc.)
      
      Background:
      
      Signal frames on x86 have two formats:
      
        1. For 32-bit executables (whether on a real 32-bit kernel or
           under 32-bit emulation on a 64-bit kernel) we have a
          'fpregset_t' that includes the "FSAVE" registers.
      
        2. For 64-bit executables (on 64-bit kernels obviously), the
           'fpregset_t' is smaller and does not contain the "FSAVE"
           state.
      
      When creating the signal frame, we have to be aware of whether
      we are running a 32 or 64-bit executable so we create the
      correct format signal frame.
      
      Problem:
      
      save_xstate_epilog() uses 'fx_sw_reserved_ia32' whenever it is
      called for a 32-bit executable.  This is for real 32-bit and
      ia32 emulation.
      
      But, fpu__init_prepare_fx_sw_frame() only initializes
      'fx_sw_reserved_ia32' when emulation is enabled, *NOT* for real
      32-bit kernels.
      
      This leads to really wierd situations where 32-bit programs
      lose their extended state when returning from a signal handler.
      The kernel copies the uninitialized (zero) 'fx_sw_reserved_ia32'
      out to userspace in save_xstate_epilog().  But when returning
      from the signal, the kernel errors out in check_for_xstate()
      when it does not see FP_XSTATE_MAGIC1 present (because it was
      zeroed).  This leads to the FPU/XSAVE state being initialized.
      
      For MPX, this leads to the most permissive state and means we
      silently lose bounds violations.  I think this would also mean
      that we could lose *ANY* FPU/SSE/AVX state.  I'm not sure why
      no one has spotted this bug.
      
      I believe this was broken by:
      
      	72a671ce ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")
      
      way back in 2012.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@sr71.net
      Cc: fenghua.yu@intel.com
      Cc: yu-cheng.yu@intel.com
      Link: http://lkml.kernel.org/r/20151111002354.A0799571@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ab6b5294
    • D
      x86/mpx: Fix 32-bit address space calculation · f3119b83
      Dave Hansen 提交于
      I received a bug report that running 32-bit MPX binaries on
      64-bit kernels was broken.  I traced it down to this little code
      snippet.  We were switching our "number of bounds directory
      entries" calculation correctly.  But, we didn't switch the other
      side of the calculation: the virtual space size.
      
      This meant that we were calculating an absurd size for
      bd_entry_virt_space() on 32-bit because we used the 64-bit
      virt_space.
      
      This was _also_ broken for 32-bit kernels running on 64-bit
      hardware since boot_cpu_data.x86_virt_bits=48 even when running
      in 32-bit mode.
      
      Correct that and properly handle all 3 possible cases:
      
       1. 32-bit binary on 64-bit kernel
       2. 64-bit binary on 64-bit kernel
       3. 32-bit binary on 32-bit kernel
      
      This manifested in having bounds tables not properly unmapped.
      It "leaked" memory but had no functional impact otherwise.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20151111181934.FA7FAC34@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f3119b83
    • D
      x86/mpx: Do proper get_user() when running 32-bit binaries on 64-bit kernels · 46561c39
      Dave Hansen 提交于
      When you call get_user(foo, bar), you effectively do a
      
      	copy_from_user(&foo, bar, sizeof(*bar));
      
      Note that the sizeof() is implicit.
      
      When we reach out to userspace to try to zap an entire "bounds
      table" we need to go read a "bounds directory entry" in order to
      locate the table's address.  The size of a "directory entry"
      depends on the binary being run and is always the size of a
      pointer.
      
      But, when we have a 64-bit kernel and a 32-bit application, the
      directory entry is still only 32-bits long, but we fetch it with
      a 64-bit pointer which makes get_user() does a 64-bit fetch.
      Reading 4 extra bytes isn't harmful, unless we are at the end of
      and run off the table.  It might also cause the zero page to get
      faulted in unnecessarily even if you are not at the end.
      
      Fix it up by doing a special 32-bit get_user() via a cast when
      we have 32-bit userspace.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20151111181931.3ACF6822@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      46561c39
  5. 10 11月, 2015 18 次提交
  6. 07 11月, 2015 12 次提交