1. 27 10月, 2017 1 次提交
    • I
      Revert "x86/mm: Limit mmap() of /dev/mem to valid physical addresses" · 90edaac6
      Ingo Molnar 提交于
      This reverts commit ce56a86e.
      
      There's unanticipated interaction with some boot parameters like 'mem=',
      which now cause the new checks via valid_mmap_phys_addr_range() to be too
      restrictive, crashing a Qemu bootup in fact, as reported by Fengguang Wu.
      
      So while the motivation of the change is still entirely valid, we
      need a few more rounds of testing to get it right - it's way too late
      after -rc6, so revert it for now.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NCraig Bergstrom <craigb@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dsafonov@virtuozzo.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: oleg@redhat.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      90edaac6
  2. 24 10月, 2017 1 次提交
  3. 23 10月, 2017 2 次提交
  4. 22 10月, 2017 1 次提交
  5. 20 10月, 2017 1 次提交
    • C
      x86/mm: Limit mmap() of /dev/mem to valid physical addresses · ce56a86e
      Craig Bergstrom 提交于
      Currently, it is possible to mmap() any offset from /dev/mem.  If a
      program mmaps() /dev/mem offsets outside of the addressable limits
      of a system, the page table can be corrupted by setting reserved bits.
      
      For example if you mmap() offset 0x0001000000000000 of /dev/mem on an
      x86_64 system with a 48-bit bus, the page fault handler will be called
      with error_code set to RSVD.  The kernel then crashes with a page table
      corruption error.
      
      This change prevents this page table corruption on x86 by refusing
      to mmap offsets higher than the highest valid address in the system.
      Signed-off-by: NCraig Bergstrom <craigb@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dsafonov@virtuozzo.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: oleg@redhat.com
      Link: http://lkml.kernel.org/r/20171019192856.39672-1-craigb@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ce56a86e
  6. 18 10月, 2017 4 次提交
  7. 17 10月, 2017 1 次提交
  8. 16 10月, 2017 1 次提交
  9. 14 10月, 2017 2 次提交
    • B
      x86/microcode: Do the family check first · 1f161f67
      Borislav Petkov 提交于
      On CPUs like AMD's Geode, for example, we shouldn't even try to load
      microcode because they do not support the modern microcode loading
      interface.
      
      However, we do the family check *after* the other checks whether the
      loader has been disabled on the command line or whether we're running in
      a guest.
      
      So move the family checks first in order to exit early if we're being
      loaded on an unsupported family.
      Reported-and-tested-by: NSven Glodowski <glodi1@arcor.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> # 4.11..
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://bugzilla.suse.com/show_bug.cgi?id=1061396
      Link: http://lkml.kernel.org/r/20171012112316.977-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1f161f67
    • A
      x86/mm: Flush more aggressively in lazy TLB mode · b956575b
      Andy Lutomirski 提交于
      Since commit:
      
        94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      
      x86's lazy TLB mode has been all the way lazy: when running a kernel thread
      (including the idle thread), the kernel keeps using the last user mm's
      page tables without attempting to maintain user TLB coherence at all.
      
      From a pure semantic perspective, this is fine -- kernel threads won't
      attempt to access user pages, so having stale TLB entries doesn't matter.
      
      Unfortunately, I forgot about a subtlety.  By skipping TLB flushes,
      we also allow any paging-structure caches that may exist on the CPU
      to become incoherent.  This means that we can have a
      paging-structure cache entry that references a freed page table, and
      the CPU is within its rights to do a speculative page walk starting
      at the freed page table.
      
      I can imagine this causing two different problems:
      
       - A speculative page walk starting from a bogus page table could read
         IO addresses.  I haven't seen any reports of this causing problems.
      
       - A speculative page walk that involves a bogus page table can install
         garbage in the TLB.  Such garbage would always be at a user VA, but
         some AMD CPUs have logic that triggers a machine check when it notices
         these bogus entries.  I've seen a couple reports of this.
      
      Boris further explains the failure mode:
      
      > It is actually more of an optimization which assumes that paging-structure
      > entries are in WB DRAM:
      >
      > "TlbCacheDis: cacheable memory disable. Read-write. 0=Enables
      > performance optimization that assumes PML4, PDP, PDE, and PTE entries
      > are in cacheable WB-DRAM; memory type checks may be bypassed, and
      > addresses outside of WB-DRAM may result in undefined behavior or NB
      > protocol errors. 1=Disables performance optimization and allows PML4,
      > PDP, PDE and PTE entries to be in any memory type. Operating systems
      > that maintain page tables in memory types other than WB- DRAM must set
      > TlbCacheDis to insure proper operation."
      >
      > The MCE generated is an NB protocol error to signal that
      >
      > "Link: A specific coherent-only packet from a CPU was issued to an
      > IO link. This may be caused by software which addresses page table
      > structures in a memory type other than cacheable WB-DRAM without
      > properly configuring MSRC001_0015[TlbCacheDis]. This may occur, for
      > example, when page table structure addresses are above top of memory. In
      > such cases, the NB will generate an MCE if it sees a mismatch between
      > the memory operation generated by the core and the link type."
      >
      > I'm assuming coherent-only packets don't go out on IO links, thus the
      > error.
      
      To fix this, reinstate TLB coherence in lazy mode.  With this patch
      applied, we do it in one of two ways:
      
       - If we have PCID, we simply switch back to init_mm's page tables
         when we enter a kernel thread -- this seems to be quite cheap
         except for the cost of serializing the CPU.
      
       - If we don't have PCID, then we set a flag and switch to init_mm
         the first time we would otherwise need to flush the TLB.
      
      The /sys/kernel/debug/x86/tlb_use_lazy_mode debug switch can be changed
      to override the default mode for benchmarking.
      
      In theory, we could optimize this better by only flushing the TLB in
      lazy CPUs when a page table is freed.  Doing that would require
      auditing the mm code to make sure that all page table freeing goes
      through tlb_remove_page() as well as reworking some data structures
      to implement the improved flush logic.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Reported-by: NAdam Borowski <kilobyte@angband.pl>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Johannes Hirte <johannes.hirte@datenkhaos.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      Link: http://lkml.kernel.org/r/20171009170231.fkpraqokz6e4zeco@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b956575b
  10. 12 10月, 2017 4 次提交
  11. 11 10月, 2017 1 次提交
  12. 10 10月, 2017 11 次提交
  13. 09 10月, 2017 2 次提交
  14. 06 10月, 2017 1 次提交
  15. 05 10月, 2017 3 次提交
    • J
      x86/kvm: Move kvm_fastop_exception to .fixup section · f26e6016
      Josh Poimboeuf 提交于
      When compiling the kernel with the '-frecord-gcc-switches' flag, objtool
      complains:
      
        arch/x86/kvm/emulate.o: warning: objtool: .GCC.command.line+0x0: special: can't find new instruction
      
      And also the kernel fails to link.
      
      The problem is that the 'kvm_fastop_exception' code gets placed into the
      throwaway '.GCC.command.line' section instead of '.text'.
      
      Exception fixup code is conventionally placed in the '.fixup' section,
      so put it there where it belongs.
      Reported-and-tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f26e6016
    • B
      x86/mce: Hide mca_cfg · 262e6811
      Borislav Petkov 提交于
      Now that lguest is gone, put it in the internal header which should be
      used only by MCA/RAS code.
      
      Add missing header guards while at it.
      
      No functional change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20171002092836.22971-3-bp@alien8.de
      262e6811
    • B
      kvm/x86: Avoid async PF preempting the kernel incorrectly · a2b7861b
      Boqun Feng 提交于
      Currently, in PREEMPT_COUNT=n kernel, kvm_async_pf_task_wait() could call
      schedule() to reschedule in some cases.  This could result in
      accidentally ending the current RCU read-side critical section early,
      causing random memory corruption in the guest, or otherwise preempting
      the currently running task inside between preempt_disable and
      preempt_enable.
      
      The difficulty to handle this well is because we don't know whether an
      async PF delivered in a preemptible section or RCU read-side critical section
      for PREEMPT_COUNT=n, since preempt_disable()/enable() and rcu_read_lock/unlock()
      are both no-ops in that case.
      
      To cure this, we treat any async PF interrupting a kernel context as one
      that cannot be preempted, preventing kvm_async_pf_task_wait() from choosing
      the schedule() path in that case.
      
      To do so, a second parameter for kvm_async_pf_task_wait() is introduced,
      so that we know whether it's called from a context interrupting the
      kernel, and the parameter is set properly in all the callsites.
      
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a2b7861b
  16. 04 10月, 2017 3 次提交
  17. 30 9月, 2017 1 次提交