1. 16 1月, 2016 7 次提交
    • D
      x86, mm: introduce vmem_altmap to augment vmemmap_populate() · 4b94ffdc
      Dan Williams 提交于
      In support of providing struct page for large persistent memory
      capacities, use struct vmem_altmap to change the default policy for
      allocating memory for the memmap array.  The default vmemmap_populate()
      allocates page table storage area from the page allocator.  Given
      persistent memory capacities relative to DRAM it may not be feasible to
      store the memmap in 'System Memory'.  Instead vmem_altmap represents
      pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf()
      requests.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b94ffdc
    • D
      kvm: rename pfn_t to kvm_pfn_t · ba049e93
      Dan Williams 提交于
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 18):
      
      The core has developed a need for a "pfn_t" type [1].  Move the existing
      pfn_t in KVM to kvm_pfn_t [2].
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba049e93
    • D
      pmem, dax: clean up clear_pmem() · 52db400f
      Dan Williams 提交于
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 25):
      
      Both __dax_pmd_fault, and clear_pmem() were taking special steps to
      clear memory a page at a time to take advantage of non-temporal
      clear_page() implementations.  However, x86_64 does not use non-temporal
      instructions for clear_page(), and arch_clear_pmem() was always
      incurring the cost of __arch_wb_cache_pmem().
      
      Clean up the assumption that doing clear_pmem() a page at a time is more
      performant.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52db400f
    • M
      arch/x86/include/asm/pgtable.h: add pmd_[dirty|mkclean] for THP · 590a471c
      Minchan Kim 提交于
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
      support.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      590a471c
    • K
      x86, thp: remove infrastructure for handling splitting PMDs · 1f19617d
      Kirill A. Shutemov 提交于
      With new refcounting we don't need to mark PMDs splitting.  Let's drop
      code to handle this.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f19617d
    • K
      mm: drop tail page refcounting · ddc58f27
      Kirill A. Shutemov 提交于
      Tail page refcounting is utterly complicated and painful to support.
      
      It uses ->_mapcount on tail pages to store how many times this page is
      pinned.  get_page() bumps ->_mapcount on tail page in addition to
      ->_count on head.  This information is required by split_huge_page() to
      be able to distribute pins from head of compound page to tails during
      the split.
      
      We will need ->_mapcount to account PTE mappings of subpages of the
      compound page.  We eliminate need in current meaning of ->_mapcount in
      tail pages by forbidding split entirely if the page is pinned.
      
      The only user of tail page refcounting is THP which is marked BROKEN for
      now.
      
      Let's drop all this mess.  It makes get_page() and put_page() much
      simpler.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ddc58f27
    • K
      thp: rename split_huge_page_pmd() to split_huge_pmd() · 78ddc534
      Kirill A. Shutemov 提交于
      We are going to decouple splitting THP PMD from splitting underlying
      compound page.
      
      This patch renames split_huge_page_pmd*() functions to split_huge_pmd*()
      to reflect the fact that it doesn't imply page splitting, only PMD.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78ddc534
  2. 15 1月, 2016 1 次提交
    • D
      x86: mm: support ARCH_MMAP_RND_BITS · 9e08f57d
      Daniel Cashman 提交于
      x86: arch_mmap_rnd() uses hard-coded values, 8 for 32-bit and 28 for
      64-bit, to generate the random offset for the mmap base address.  This
      value represents a compromise between increased ASLR effectiveness and
      avoiding address-space fragmentation.  Replace it with a Kconfig option,
      which is sensibly bounded, so that platform developers may choose where
      to place this compromise.  Keep default values as new minimums.
      Signed-off-by: NDaniel Cashman <dcashman@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mark Salyzyn <salyzyn@android.com>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hector Marco-Gisbert <hecmargi@upv.es>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e08f57d
  3. 14 1月, 2016 1 次提交
  4. 13 1月, 2016 2 次提交
  5. 12 1月, 2016 10 次提交
    • M
      x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[] · 2f0c0b2d
      Mario Kleiner 提交于
      Without the reboot=pci method, the iMac 10,1 simply
      hangs after printing "Restarting system" at the point
      when it should reboot. This fixes it.
      Signed-off-by: NMario Kleiner <mario.kleiner.de@gmail.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1450466646-26663-1-git-send-email-mario.kleiner.de@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2f0c0b2d
    • R
      lguest: Map switcher text R/O · e27d90e8
      Rusty Russell 提交于
      Pavel noted that lguest maps the switcher code executable and
      read-write.  This is a bad idea for any kernel text, but
      particularly for text mapped at a fixed address.
      
      Create two vmas, one for the text (PAGE_KERNEL_RX) and another
      for the stacks (PAGE_KERNEL).  Use VM_NO_GUARD to map them
      adjacent (as expected by the rest of the code).
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Tested-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e27d90e8
    • B
      x86/boot: Hide local labels in verify_cpu() · aa042141
      Borislav Petkov 提交于
      ... from the final ELF image's symbol table as they're not
      really needed there.
      
      Before:
      
      $ readelf -a vmlinux | grep verify_cpu
          43: ffffffff810001a9     0 NOTYPE  LOCAL  DEFAULT    1 verify_cpu
          45: ffffffff8100028f     0 NOTYPE  LOCAL  DEFAULT    1 verify_cpu_no_longmode
          46: ffffffff810001de     0 NOTYPE  LOCAL  DEFAULT    1 verify_cpu_noamd
          47: ffffffff8100022b     0 NOTYPE  LOCAL  DEFAULT    1 verify_cpu_check
          48: ffffffff8100021c     0 NOTYPE  LOCAL  DEFAULT    1 verify_cpu_clear_xd
          49: ffffffff81000263     0 NOTYPE  LOCAL  DEFAULT    1 verify_cpu_sse_test
          50: ffffffff81000296     0 NOTYPE  LOCAL  DEFAULT    1 verify_cpu_sse_ok
      
      After:
      
      $ readelf -a vmlinux | grep verify_cpu
          43: ffffffff810001a9     0 NOTYPE  LOCAL  DEFAULT    1 verify_cpu
      
      No functionality change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1451860733-21163-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aa042141
    • Y
      x86/fpu: Disable AVX when eagerfpu is off · 394db20c
      yu-cheng yu 提交于
      When "eagerfpu=off" is given as a command-line input, the kernel
      should disable AVX support.
      
      The Task Switched bit used for lazy context switching does not
      support AVX. If AVX is enabled without eagerfpu context
      switching, one task's AVX state could become corrupted or leak
      to other tasks. This is a bug and has bad security implications.
      
      This only affects systems that have AVX/AVX2/AVX512 and this
      issue will be found only when one actually uses AVX/AVX2/AVX512
      _AND_ does eagerfpu=off.
      
      Reference: Intel Software Developer's Manual Vol. 3A
      
      Sec. 2.5 Control Registers:
      TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the
      x87 FPU/ MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch
      to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4
      instruction is actually executed by the new task.
      
      Sec. 13.4.1 Using the TS Flag to Control the Saving of the X87
      FPU and SSE State
      When the TS flag is set, the processor monitors the instruction
      stream for x87 FPU, MMX, SSE instructions. When the processor
      detects one of these instructions, it raises a
      device-not-available exeception (#NM) prior to executing the
      instruction.
      Signed-off-by: NYu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/1452119094-7252-5-git-send-email-yu-cheng.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      394db20c
    • Y
      x86/fpu: Disable MPX when eagerfpu is off · a5fe93a5
      yu-cheng yu 提交于
      This issue is a fallout from the command-line parsing move.
      
      When "eagerfpu=off" is given as a command-line input, the kernel
      should disable MPX support. The decision for turning off MPX was
      made in fpu__init_system_ctx_switch(), which is after the
      selection of the XSAVE format. This patch fixes it by getting
      that decision done earlier in fpu__init_system_xstate().
      Signed-off-by: NYu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/1452119094-7252-4-git-send-email-yu-cheng.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a5fe93a5
    • Y
      x86/fpu: Disable XGETBV1 when no XSAVE · eb7c5f87
      yu-cheng yu 提交于
      When "noxsave" is given as a command-line input, the kernel
      should disable XGETBV1. This issue currently does not cause any
      actual problems. XGETBV1 is only useful if we have something
      using the 'init optimization' (i.e. xsaveopt, xsaves). We
      already clear both of those in fpu__xstate_clear_all_cpu_caps().
      But this is good for completeness.
      Signed-off-by: NYu-cheng Yu <yu-cheng.yu@intel.com>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/1452119094-7252-3-git-send-email-yu-cheng.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      eb7c5f87
    • Y
      x86/fpu: Fix early FPU command-line parsing · 4f81cbaf
      yu-cheng yu 提交于
      The function fpu__init_system() is executed before
      parse_early_param(). This causes wrong FPU configuration. This
      patch fixes this issue by parsing boot_command_line in the
      beginning of fpu__init_system().
      
      With all four patches in this series, each parameter disables
      features as the following:
      
      eagerfpu=off: eagerfpu, avx, avx2, avx512, mpx
      no387: fpu
      nofxsr: fxsr, fxsropt, xmm
      noxsave: xsave, xsaveopt, xsaves, xsavec, avx, avx2, avx512,
      mpx, xgetbv1 noxsaveopt: xsaveopt
      noxsaves: xsaves
      Signed-off-by: NYu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/1452119094-7252-2-git-send-email-yu-cheng.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4f81cbaf
    • H
      kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL · 45bdbcfd
      Huaitong Han 提交于
      vmx_cpuid_tries to update SECONDARY_VM_EXEC_CONTROL in the VMCS, but
      it will cause a vmwrite error on older CPUs because the code does not
      check for the presence of CPU_BASED_ACTIVATE_SECONDARY_CONTROLS.
      
      This will get rid of the following trace on e.g. Core2 6600:
      
      vmwrite error: reg 401e value 10 (err 12)
      Call Trace:
      [<ffffffff8116e2b9>] dump_stack+0x40/0x57
      [<ffffffffa020b88d>] vmx_cpuid_update+0x5d/0x150 [kvm_intel]
      [<ffffffffa01d8fdc>] kvm_vcpu_ioctl_set_cpuid2+0x4c/0x70 [kvm]
      [<ffffffffa01b8363>] kvm_arch_vcpu_ioctl+0x903/0xfa0 [kvm]
      
      Fixes: feda805f
      Cc: stable@vger.kernel.org
      Reported-by: NZdenek Kaspar <zkaspar82@gmail.com>
      Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      45bdbcfd
    • K
      x86/mm: Use PAGE_ALIGNED instead of IS_ALIGNED · b500f77b
      Kefeng Wang 提交于
      Use PAGE_ALIGEND macro in <linux/mm.h> to simplify code.
      Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: <guohanjun@huawei.com>
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1452565170-11083-1-git-send-email-wangkefeng.wang@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b500f77b
    • D
      x86/mm/pat: Make split_page_count() check for empty levels to fix /proc/meminfo output · c9e0d391
      Dave Jones 提交于
      In CONFIG_PAGEALLOC_DEBUG=y builds, we disable 2M pages.
      
      Unfortunatly when we split up mappings during boot,
      split_page_count() doesn't take this into account, and
      starts decrementing an empty direct_pages_count[] level.
      
      This results in /proc/meminfo showing crazy things like:
      
        DirectMap2M:    18446744073709543424 kB
      Signed-off-by: NDave Jones <davej@codemonkey.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c9e0d391
  6. 11 1月, 2016 5 次提交
    • H
      x86/boot: Double BOOT_HEAP_SIZE to 64KB · 8c31902c
      H.J. Lu 提交于
      When decompressing kernel image during x86 bootup, malloc memory
      for ELF program headers may run out of heap space, which leads
      to system halt.  This patch doubles BOOT_HEAP_SIZE to 64KB.
      
      Tested with 32-bit kernel which failed to boot without this patch.
      Signed-off-by: NH.J. Lu <hjl.tools@gmail.com>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8c31902c
    • A
      x86/mm: Add barriers and document switch_mm()-vs-flush synchronization · 71b3c126
      Andy Lutomirski 提交于
      When switch_mm() activates a new PGD, it also sets a bit that
      tells other CPUs that the PGD is in use so that TLB flush IPIs
      will be sent.  In order for that to work correctly, the bit
      needs to be visible prior to loading the PGD and therefore
      starting to fill the local TLB.
      
      Document all the barriers that make this work correctly and add
      a couple that were missing.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      71b3c126
    • M
      um: Fix build error and kconfig for i386 · 42d91f61
      Mickaël Salaün 提交于
      Fix build error by generating elfcore.o only when ELF_CORE (depending on
      COREDUMP) is selected:
      
      arch/x86/um/built-in.o: In function `elf_core_write_extra_phdrs':
      (.text+0x3e62): undefined reference to `dump_emit'
      arch/x86/um/built-in.o: In function `elf_core_write_extra_data':
      (.text+0x3eef): undefined reference to `dump_emit'
      
      Fixes: 5d2acfc7 ("kconfig: make allnoconfig disable options behind EMBEDDED and EXPERT")
      Signed-off-by: NMickaël Salaün <mic@digikod.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      42d91f61
    • M
      um: Add full asm/syscall.h support · d8f8b844
      Mickaël Salaün 提交于
      Add subarchitecture-independent implementation of asm-generic/syscall.h
      allowing access to user system call parameters and results:
      * syscall_get_nr()
      * syscall_rollback()
      * syscall_get_error()
      * syscall_get_return_value()
      * syscall_set_return_value()
      * syscall_get_arguments()
      * syscall_set_arguments()
      * syscall_get_arch() provided by arch/x86/um/asm/syscall.h
      
      This provides the necessary syscall helpers needed by
      HAVE_ARCH_SECCOMP_FILTER plus syscall_get_error().
      
      This is inspired from Meredydd Luff's patch
      (https://gerrit.chromium.org/gerrit/21425).
      Signed-off-by: NMickaël Salaün <mic@digikod.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Meredydd Luff <meredydd@senatehouse.org>
      Cc: David Drysdale <drysdale@google.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Acked-by: NKees Cook <keescook@chromium.org>
      d8f8b844
    • M
      um: Fix ptrace GETREGS/SETREGS bugs · e04c989e
      Mickaël Salaün 提交于
      This fix two related bugs:
      * PTRACE_GETREGS doesn't get the right orig_ax (syscall) value
      * PTRACE_SETREGS can't set the orig_ax value (erased by initial value)
      
      Get rid of the now useless and error-prone get_syscall().
      
      Fix inconsistent behavior in the ptrace implementation for i386 when
      updating orig_eax automatically update the syscall number as well. This
      is now updated in handle_syscall().
      Signed-off-by: NMickaël Salaün <mic@digikod.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Thomas Meyer <thomas@m3y3r.de>
      Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: Anton Ivanov <aivanov@brocade.com>
      Cc: Meredydd Luff <meredydd@senatehouse.org>
      Cc: David Drysdale <drysdale@google.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Acked-by: NKees Cook <keescook@chromium.org>
      e04c989e
  7. 09 1月, 2016 12 次提交
  8. 07 1月, 2016 2 次提交