1. 30 1月, 2016 1 次提交
  2. 23 1月, 2016 1 次提交
    • R
      pmem: add wb_cache_pmem() to the PMEM API · 3f4a2670
      Ross Zwisler 提交于
      __arch_wb_cache_pmem() was already an internal implementation detail of
      the x86 PMEM API, but this functionality needs to be exported as part of
      the general PMEM API to handle the fsync/msync case for DAX mmaps.
      
      One thing worth noting is that we really do want this to be part of the
      PMEM API as opposed to a stand-alone function like clflush_cache_range()
      because of ordering restrictions.  By having wb_cache_pmem() as part of
      the PMEM API we can leave it unordered, call it multiple times to write
      back large amounts of memory, and then order the multiple calls with a
      single wmb_pmem().
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f4a2670
  3. 21 1月, 2016 1 次提交
    • C
      dma-mapping: always provide the dma_map_ops based implementation · e1c7e324
      Christoph Hellwig 提交于
      Move the generic implementation to <linux/dma-mapping.h> now that all
      architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
      that everyone supports them.
      
      [valentinrothberg@gmail.com: remove leftovers in Kconfig]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: NValentin Rothberg <valentinrothberg@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e1c7e324
  4. 20 1月, 2016 2 次提交
  5. 19 1月, 2016 3 次提交
    • J
      x86/asm: Add C versions of frame pointer macros · ec518655
      Josh Poimboeuf 提交于
      Add C versions of the frame pointer macros which can be used to
      create a stack frame in inline assembly.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/f6786a282bf232ede3e2866414eae3cf02c7d662.1450442274.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ec518655
    • J
      x86/asm: Clean up frame pointer macros · 997963ed
      Josh Poimboeuf 提交于
      The asm macros for setting up and restoring the frame pointer
      aren't currently being used.  However, they will be needed soon
      to help asm functions to comply with stacktool.
      
      Rename FRAME/ENDFRAME to FRAME_BEGIN/FRAME_END for more
      symmetry.  Also make the code more readable and improve the
      comments.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3f488a8e3bfc8ac7d4d3d350953e664e7182b044.1450442274.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      997963ed
    • B
      x86/cpufeature: Add AMD AVIC bit · a1ff5726
      Borislav Petkov 提交于
      CPUID Fn8000_000A_EDX[13] denotes support for AMD's Virtual
      Interrupt controller, i.e., APIC virtualization.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Kaplan <david.kaplan@amd.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Link: http://lkml.kernel.org/r/1452938292-12327-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a1ff5726
  6. 16 1月, 2016 10 次提交
    • D
      mm, x86: get_user_pages() for dax mappings · 3565fce3
      Dan Williams 提交于
      A dax mapping establishes a pte with _PAGE_DEVMAP set when the driver
      has established a devm_memremap_pages() mapping, i.e.  when the pfn_t
      return from ->direct_access() has PFN_DEV and PFN_MAP set.  Later, when
      encountering _PAGE_DEVMAP during a page table walk we lookup and pin a
      struct dev_pagemap instance to keep the result of pfn_to_page() valid
      until put_page().
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NLogan Gunthorpe <logang@deltatee.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3565fce3
    • D
      mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd · 5c7fb56e
      Dan Williams 提交于
      A dax-huge-page mapping while it uses some thp helpers is ultimately not
      a transparent huge page.  The distinction is especially important in the
      get_user_pages() path.  pmd_devmap() is used to distinguish dax-pmds
      from pmd_huge() and pmd_trans_huge() which have slightly different
      semantics.
      
      Explicitly mark the pmd_trans_huge() helpers that dax needs by adding
      pmd_devmap() checks.
      
      [kirill.shutemov@linux.intel.com: fix regression in handling mlocked pages in  __split_huge_pmd()]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c7fb56e
    • D
      mm, dax: convert vmf_insert_pfn_pmd() to pfn_t · f25748e3
      Dan Williams 提交于
      Similar to the conversion of vm_insert_mixed() use pfn_t in the
      vmf_insert_pfn_pmd() to tag the resulting pte with _PAGE_DEVICE when the
      pfn is backed by a devm_memremap_pages() mapping.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f25748e3
    • D
      mm, dax, gpu: convert vm_insert_mixed to pfn_t · 01c8f1c4
      Dan Williams 提交于
      Convert the raw unsigned long 'pfn' argument to pfn_t for the purpose of
      evaluating the PFN_MAP and PFN_DEV flags.  When both are set it triggers
      _PAGE_DEVMAP to be set in the resulting pte.
      
      There are no functional changes to the gpu drivers as a result of this
      conversion.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: David Airlie <airlied@linux.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01c8f1c4
    • D
      x86, mm: introduce _PAGE_DEVMAP · 69660fd7
      Dan Williams 提交于
      _PAGE_DEVMAP is a hardware-unused pte bit that will later be used in the
      get_user_pages() path to identify pfns backed by the dynamic allocation
      established by devm_memremap_pages.  Upon seeing that bit the gup path
      will lookup and pin the allocation while the pages are in use.
      
      Since the _PAGE_DEVMAP bit is > 32 it must be cast to u64 instead of a
      pteval_t to allow pmd_flags() usage in the realmode boot code to build.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69660fd7
    • D
      pmem, dax: clean up clear_pmem() · 52db400f
      Dan Williams 提交于
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 25):
      
      Both __dax_pmd_fault, and clear_pmem() were taking special steps to
      clear memory a page at a time to take advantage of non-temporal
      clear_page() implementations.  However, x86_64 does not use non-temporal
      instructions for clear_page(), and arch_clear_pmem() was always
      incurring the cost of __arch_wb_cache_pmem().
      
      Clean up the assumption that doing clear_pmem() a page at a time is more
      performant.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52db400f
    • M
      arch/x86/include/asm/pgtable.h: add pmd_[dirty|mkclean] for THP · 590a471c
      Minchan Kim 提交于
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
      support.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      590a471c
    • K
      x86, thp: remove infrastructure for handling splitting PMDs · 1f19617d
      Kirill A. Shutemov 提交于
      With new refcounting we don't need to mark PMDs splitting.  Let's drop
      code to handle this.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f19617d
    • K
      x86/PCI: Add driver for Intel Volume Management Device (VMD) · 185a383a
      Keith Busch 提交于
      The Intel Volume Management Device (VMD) is a Root Complex Integrated
      Endpoint that acts as a host bridge to a secondary PCIe domain.  BIOS can
      reassign one or more Root Ports to appear within a VMD domain instead of
      the primary domain.  The immediate benefit is that additional PCIe domains
      allow more than 256 buses in a system by letting bus numbers be reused
      across different domains.
      
      VMD domains do not define ACPI _SEG, so to avoid domain clashing with host
      bridges defining this segment, VMD domains start at 0x10000, which is
      greater than the highest possible 16-bit ACPI defined _SEG.
      
      This driver enumerates and enables the domain using the root bus
      configuration interface provided by the PCI subsystem.  The driver provides
      configuration space accessor functions (pci_ops), bus and memory resources,
      an MSI IRQ domain with irq_chip implementation, and DMA operations
      necessary to use devices through the VMD endpoint's interface.
      
      VMD routes I/O as follows:
      
         1) Configuration Space: BAR 0 ("CFGBAR") of VMD provides the base
         address and size for configuration space register access to VMD-owned
         root ports.  It works similarly to MMCONFIG for extended configuration
         space.  Bus numbering is independent and does not conflict with the
         primary domain.
      
         2) MMIO Space: BARs 2 and 4 ("MEMBAR1" and "MEMBAR2") of VMD provide the
         base address, size, and type for MMIO register access.  These addresses
         are not translated by VMD hardware; they are simply reservations to be
         distributed to root ports' memory base/limit registers and subdivided
         among devices downstream.
      
         3) DMA: To interact appropriately with an IOMMU, the source ID DMA read
         and write requests are translated to the bus-device-function of the VMD
         endpoint.  Otherwise, DMA operates normally without VMD-specific address
         translation.
      
         4) Interrupts: Part of VMD's BAR 4 is reserved for VMD's MSI-X Table and
         PBA.  MSIs from VMD domain devices and ports are remapped to appear as
         if they were issued using one of VMD's MSI-X table entries.  Each MSI
         and MSI-X address of VMD-owned devices and ports has a special format
         where the address refers to specific entries in the VMD's MSI-X table.
         As with DMA, the interrupt source ID is translated to VMD's
         bus-device-function.
      
         The driver provides its own MSI and MSI-X configuration functions
         specific to how MSI messages are used within the VMD domain, and
         provides an irq_chip for independent IRQ allocation to relay interrupts
         from VMD's interrupt handler to the appropriate device driver's handler.
      
         5) Errors: PCIe error message are intercepted by the root ports normally
         (e.g., AER), except with VMD, system errors (i.e., firmware first) are
         disabled by default.  AER and hotplug interrupts are translated in the
         same way as endpoint interrupts.
      
         6) VMD does not support INTx interrupts or IO ports.  Devices or drivers
         requiring these features should either not be placed below VMD-owned
         root ports, or VMD should be disabled by BIOS for such endpoints.
      
      [bhelgaas: add VMD BAR #defines, factor out vmd_cfg_addr(), rework VMD
      resource setup, whitespace, changelog]
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: Thomas Gleixner <tglx@linutronix.de> (IRQ-related parts)
      185a383a
    • K
      x86/PCI: Allow DMA ops specific to a PCI domain · d9c3d6ff
      Keith Busch 提交于
      The Intel Volume Management Device (VMD) is a PCIe endpoint that acts as a
      host bridge to another PCI domain.  When devices below the VMD perform DMA,
      the VMD replaces their DMA source IDs with its own source ID.  Therefore,
      those devices require special DMA ops.
      
      Add interfaces to allow the VMD driver to set up dma_ops for the devices
      below it.
      
      [bhelgaas: remove "extern", add "static", changelog]
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      d9c3d6ff
  7. 13 1月, 2016 3 次提交
  8. 12 1月, 2016 6 次提交
    • R
      lguest: Map switcher text R/O · e27d90e8
      Rusty Russell 提交于
      Pavel noted that lguest maps the switcher code executable and
      read-write.  This is a bad idea for any kernel text, but
      particularly for text mapped at a fixed address.
      
      Create two vmas, one for the text (PAGE_KERNEL_RX) and another
      for the stacks (PAGE_KERNEL).  Use VM_NO_GUARD to map them
      adjacent (as expected by the rest of the code).
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Tested-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e27d90e8
    • A
      x86/vdso: Disallow vvar access to vclock IO for never-used vclocks · bd902c53
      Andy Lutomirski 提交于
      It makes me uncomfortable that even modern systems grant every
      process direct read access to the HPET.
      
      While fixing this for real without regressing anything is a mess
      (unmapping the HPET is tricky because we don't adequately track
      all the mappings), we can do almost as well by tracking which
      vclocks have ever been used and only allowing pages associated
      with used vclocks to be faulted in.
      
      This will cause rogue programs that try to peek at the HPET to
      get SIGBUS instead on most systems.
      
      We can't restrict faults to vclock pages that are associated
      with the currently selected vclock due to a race: a process
      could start to access the HPET for the first time and race
      against a switch away from the HPET as the current clocksource.
      We can't segfault the process trying to peek at the HPET in this
      case, even though the process isn't going to do anything useful
      with the data.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/e79d06295625c02512277737ab55085a498ac5d8.1451446564.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bd902c53
    • A
      x86/vdso: Use .fault for the vDSO text mapping · 05ef76b2
      Andy Lutomirski 提交于
      The old scheme for mapping the vDSO text is rather complicated.
      vdso2c generates a struct vm_special_mapping and a blank .pages
      array of the correct size for each vdso image.  Init code in
      vdso/vma.c populates the .pages array for each vDSO image, and
      the mapping code selects the appropriate struct
      vm_special_mapping.
      
      With .fault, we can use a less roundabout approach: vdso_fault()
      just returns the appropriate page for the selected vDSO image.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/f886954c186bafd74e1b967c8931d852ae199aa2.1451446564.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      05ef76b2
    • A
      x86/vdso: Track each mm's loaded vDSO image as well as its base · 352b78c6
      Andy Lutomirski 提交于
      As we start to do more intelligent things with the vDSO at
      runtime (as opposed to just at mm initialization time), we'll
      need to know which vDSO is in use.
      
      In principle, we could guess based on the mm type, but that's
      over-complicated and error-prone.  Instead, just track it in the
      mmu context.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/c99ac48681bad709ca7ad5ee899d9042a3af6b00.1451446564.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      352b78c6
    • Y
      x86/fpu: Disable AVX when eagerfpu is off · 394db20c
      yu-cheng yu 提交于
      When "eagerfpu=off" is given as a command-line input, the kernel
      should disable AVX support.
      
      The Task Switched bit used for lazy context switching does not
      support AVX. If AVX is enabled without eagerfpu context
      switching, one task's AVX state could become corrupted or leak
      to other tasks. This is a bug and has bad security implications.
      
      This only affects systems that have AVX/AVX2/AVX512 and this
      issue will be found only when one actually uses AVX/AVX2/AVX512
      _AND_ does eagerfpu=off.
      
      Reference: Intel Software Developer's Manual Vol. 3A
      
      Sec. 2.5 Control Registers:
      TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the
      x87 FPU/ MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch
      to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4
      instruction is actually executed by the new task.
      
      Sec. 13.4.1 Using the TS Flag to Control the Saving of the X87
      FPU and SSE State
      When the TS flag is set, the processor monitors the instruction
      stream for x87 FPU, MMX, SSE instructions. When the processor
      detects one of these instructions, it raises a
      device-not-available exeception (#NM) prior to executing the
      instruction.
      Signed-off-by: NYu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/1452119094-7252-5-git-send-email-yu-cheng.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      394db20c
    • Y
      x86/fpu: Disable MPX when eagerfpu is off · a5fe93a5
      yu-cheng yu 提交于
      This issue is a fallout from the command-line parsing move.
      
      When "eagerfpu=off" is given as a command-line input, the kernel
      should disable MPX support. The decision for turning off MPX was
      made in fpu__init_system_ctx_switch(), which is after the
      selection of the XSAVE format. This patch fixes it by getting
      that decision done earlier in fpu__init_system_xstate().
      Signed-off-by: NYu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/1452119094-7252-4-git-send-email-yu-cheng.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a5fe93a5
  9. 11 1月, 2016 2 次提交
    • H
      x86/boot: Double BOOT_HEAP_SIZE to 64KB · 8c31902c
      H.J. Lu 提交于
      When decompressing kernel image during x86 bootup, malloc memory
      for ELF program headers may run out of heap space, which leads
      to system halt.  This patch doubles BOOT_HEAP_SIZE to 64KB.
      
      Tested with 32-bit kernel which failed to boot without this patch.
      Signed-off-by: NH.J. Lu <hjl.tools@gmail.com>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8c31902c
    • A
      x86/mm: Add barriers and document switch_mm()-vs-flush synchronization · 71b3c126
      Andy Lutomirski 提交于
      When switch_mm() activates a new PGD, it also sets a bit that
      tells other CPUs that the PGD is in use so that TLB flush IPIs
      will be sent.  In order for that to work correctly, the bit
      needs to be visible prior to loading the PGD and therefore
      starting to fill the local TLB.
      
      Document all the barriers that make this work correctly and add
      a couple that were missing.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      71b3c126
  10. 09 1月, 2016 1 次提交
  11. 07 1月, 2016 1 次提交
  12. 06 1月, 2016 1 次提交
  13. 21 12月, 2015 1 次提交
  14. 20 12月, 2015 2 次提交
    • J
      x86/irq: Export functions to allow MSI domains in modules · c8f3e518
      Jake Oshins 提交于
      The Linux kernel already has the concept of IRQ domain, wherein a
      component can expose a set of IRQs which are managed by a particular
      interrupt controller chip or other subsystem. The PCI driver exposes
      the notion of an IRQ domain for Message-Signaled Interrupts (MSI) from
      PCI Express devices. This patch exposes the functions which are
      necessary for creating a MSI IRQ domain within a module.
      
      [ tglx: Split it into x86 and core irq parts ]
      Signed-off-by: NJake Oshins <jakeo@microsoft.com>
      Cc: gregkh@linuxfoundation.org
      Cc: kys@microsoft.com
      Cc: devel@linuxdriverproject.org
      Cc: olaf@aepfle.de
      Cc: apw@canonical.com
      Cc: vkuznets@redhat.com
      Cc: haiyangz@microsoft.com
      Cc: marc.zyngier@arm.com
      Cc: bhelgaas@google.com
      Link: http://lkml.kernel.org/r/1449769983-12948-4-git-send-email-jakeo@microsoft.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c8f3e518
    • D
      x86/paravirt: Prevent rtc_cmos platform device init on PV guests · d8c98a1d
      David Vrabel 提交于
      Adding the rtc platform device in non-privileged Xen PV guests causes
      an IRQ conflict because these guests do not have legacy PIC and may
      allocate irqs in the legacy range.
      
      In a single VCPU Xen PV guest we should have:
      
      /proc/interrupts:
                 CPU0
        0:       4934  xen-percpu-virq      timer0
        1:          0  xen-percpu-ipi       spinlock0
        2:          0  xen-percpu-ipi       resched0
        3:          0  xen-percpu-ipi       callfunc0
        4:          0  xen-percpu-virq      debug0
        5:          0  xen-percpu-ipi       callfuncsingle0
        6:          0  xen-percpu-ipi       irqwork0
        7:        321   xen-dyn-event     xenbus
        8:         90   xen-dyn-event     hvc_console
        ...
      
      But hvc_console cannot get its interrupt because it is already in use
      by rtc0 and the console does not work.
      
        genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0)
      
      We can avoid this problem by realizing that unprivileged PV guests (both
      Xen and lguests) are not supposed to have rtc_cmos device and so
      adding it is not necessary.
      
      Privileged guests (i.e. Xen's dom0) do use it but they should not have
      irq conflicts since they allocate irqs above legacy range (above
      gsi_top, in fact).
      
      Instead of explicitly testing whether the guest is privileged we can
      extend pv_info structure to include information about guest's RTC
      support.
      Reported-and-tested-by: NSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: vkuznets@redhat.com
      Cc: xen-devel@lists.xenproject.org
      Cc: konrad.wilk@oracle.com
      Cc: stable@vger.kernel.org # 4.2+
      Link: http://lkml.kernel.org/r/1449842873-2613-1-git-send-email-boris.ostrovsky@oracle.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      d8c98a1d
  15. 19 12月, 2015 5 次提交