1. 01 11月, 2017 1 次提交
    • V
      x86/mm: fix use-after-free of vma during userfaultfd fault · cb0631fd
      Vlastimil Babka 提交于
      Syzkaller with KASAN has reported a use-after-free of vma->vm_flags in
      __do_page_fault() with the following reproducer:
      
        mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
        mmap(&(0x7f0000011000/0x3000)=nil, 0x3000, 0x1, 0x32, 0xffffffffffffffff, 0x0)
        r0 = userfaultfd(0x0)
        ioctl$UFFDIO_API(r0, 0xc018aa3f, &(0x7f0000002000-0x18)={0xaa, 0x0, 0x0})
        ioctl$UFFDIO_REGISTER(r0, 0xc020aa00, &(0x7f0000019000)={{&(0x7f0000012000/0x2000)=nil, 0x2000}, 0x1, 0x0})
        r1 = gettid()
        syz_open_dev$evdev(&(0x7f0000013000-0x12)="2f6465762f696e7075742f6576656e742300", 0x0, 0x0)
        tkill(r1, 0x7)
      
      The vma should be pinned by mmap_sem, but handle_userfault() might (in a
      return to userspace scenario) release it and then acquire again, so when
      we return to __do_page_fault() (with other result than VM_FAULT_RETRY),
      the vma might be gone.
      
      Specifically, per Andrea the scenario is
       "A return to userland to repeat the page fault later with a
        VM_FAULT_NOPAGE retval (potentially after handling any pending signal
        during the return to userland). The return to userland is identified
        whenever FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in
        vmf->flags"
      
      However, since commit a3c4fb7c ("x86/mm: Fix fault error path using
      unsafe vma pointer") there is a vma_pkey() read of vma->vm_flags after
      that point, which can thus become use-after-free.  Fix this by moving
      the read before calling handle_mm_fault().
      Reported-by: Nsyzbot <bot+6a5269ce759a7bb12754ed9622076dc93f65a1f6@syzkaller.appspotmail.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
      Fixes: 3c4fb7c9c2e ("x86/mm: Fix fault error path using unsafe vma pointer")
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb0631fd
  2. 27 10月, 2017 1 次提交
    • I
      Revert "x86/mm: Limit mmap() of /dev/mem to valid physical addresses" · 90edaac6
      Ingo Molnar 提交于
      This reverts commit ce56a86e.
      
      There's unanticipated interaction with some boot parameters like 'mem=',
      which now cause the new checks via valid_mmap_phys_addr_range() to be too
      restrictive, crashing a Qemu bootup in fact, as reported by Fengguang Wu.
      
      So while the motivation of the change is still entirely valid, we
      need a few more rounds of testing to get it right - it's way too late
      after -rc6, so revert it for now.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NCraig Bergstrom <craigb@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dsafonov@virtuozzo.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: oleg@redhat.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      90edaac6
  3. 20 10月, 2017 1 次提交
    • C
      x86/mm: Limit mmap() of /dev/mem to valid physical addresses · ce56a86e
      Craig Bergstrom 提交于
      Currently, it is possible to mmap() any offset from /dev/mem.  If a
      program mmaps() /dev/mem offsets outside of the addressable limits
      of a system, the page table can be corrupted by setting reserved bits.
      
      For example if you mmap() offset 0x0001000000000000 of /dev/mem on an
      x86_64 system with a 48-bit bus, the page fault handler will be called
      with error_code set to RSVD.  The kernel then crashes with a page table
      corruption error.
      
      This change prevents this page table corruption on x86 by refusing
      to mmap offsets higher than the highest valid address in the system.
      Signed-off-by: NCraig Bergstrom <craigb@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dsafonov@virtuozzo.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: oleg@redhat.com
      Link: http://lkml.kernel.org/r/20171019192856.39672-1-craigb@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ce56a86e
  4. 18 10月, 2017 3 次提交
  5. 14 10月, 2017 1 次提交
    • A
      x86/mm: Flush more aggressively in lazy TLB mode · b956575b
      Andy Lutomirski 提交于
      Since commit:
      
        94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      
      x86's lazy TLB mode has been all the way lazy: when running a kernel thread
      (including the idle thread), the kernel keeps using the last user mm's
      page tables without attempting to maintain user TLB coherence at all.
      
      From a pure semantic perspective, this is fine -- kernel threads won't
      attempt to access user pages, so having stale TLB entries doesn't matter.
      
      Unfortunately, I forgot about a subtlety.  By skipping TLB flushes,
      we also allow any paging-structure caches that may exist on the CPU
      to become incoherent.  This means that we can have a
      paging-structure cache entry that references a freed page table, and
      the CPU is within its rights to do a speculative page walk starting
      at the freed page table.
      
      I can imagine this causing two different problems:
      
       - A speculative page walk starting from a bogus page table could read
         IO addresses.  I haven't seen any reports of this causing problems.
      
       - A speculative page walk that involves a bogus page table can install
         garbage in the TLB.  Such garbage would always be at a user VA, but
         some AMD CPUs have logic that triggers a machine check when it notices
         these bogus entries.  I've seen a couple reports of this.
      
      Boris further explains the failure mode:
      
      > It is actually more of an optimization which assumes that paging-structure
      > entries are in WB DRAM:
      >
      > "TlbCacheDis: cacheable memory disable. Read-write. 0=Enables
      > performance optimization that assumes PML4, PDP, PDE, and PTE entries
      > are in cacheable WB-DRAM; memory type checks may be bypassed, and
      > addresses outside of WB-DRAM may result in undefined behavior or NB
      > protocol errors. 1=Disables performance optimization and allows PML4,
      > PDP, PDE and PTE entries to be in any memory type. Operating systems
      > that maintain page tables in memory types other than WB- DRAM must set
      > TlbCacheDis to insure proper operation."
      >
      > The MCE generated is an NB protocol error to signal that
      >
      > "Link: A specific coherent-only packet from a CPU was issued to an
      > IO link. This may be caused by software which addresses page table
      > structures in a memory type other than cacheable WB-DRAM without
      > properly configuring MSRC001_0015[TlbCacheDis]. This may occur, for
      > example, when page table structure addresses are above top of memory. In
      > such cases, the NB will generate an MCE if it sees a mismatch between
      > the memory operation generated by the core and the link type."
      >
      > I'm assuming coherent-only packets don't go out on IO links, thus the
      > error.
      
      To fix this, reinstate TLB coherence in lazy mode.  With this patch
      applied, we do it in one of two ways:
      
       - If we have PCID, we simply switch back to init_mm's page tables
         when we enter a kernel thread -- this seems to be quite cheap
         except for the cost of serializing the CPU.
      
       - If we don't have PCID, then we set a flag and switch to init_mm
         the first time we would otherwise need to flush the TLB.
      
      The /sys/kernel/debug/x86/tlb_use_lazy_mode debug switch can be changed
      to override the default mode for benchmarking.
      
      In theory, we could optimize this better by only flushing the TLB in
      lazy CPUs when a page table is freed.  Doing that would require
      auditing the mm code to make sure that all page table freeing goes
      through tlb_remove_page() as well as reworking some data structures
      to implement the improved flush logic.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Reported-by: NAdam Borowski <kilobyte@angband.pl>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Johannes Hirte <johannes.hirte@datenkhaos.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      Link: http://lkml.kernel.org/r/20171009170231.fkpraqokz6e4zeco@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b956575b
  6. 11 10月, 2017 1 次提交
  7. 30 9月, 2017 2 次提交
  8. 26 9月, 2017 1 次提交
    • I
      x86/fpu: Rename fpu::fpstate_active to fpu::initialized · e4a81bfc
      Ingo Molnar 提交于
      The x86 FPU code used to have a complex state machine where both the FPU
      registers and the FPU state context could be 'active' (or inactive)
      independently of each other - which enabled features like lazy FPU restore.
      
      Much of this complexity is gone in the current code: now we basically can
      have FPU-less tasks (kernel threads) that don't use (and save/restore) FPU
      state at all, plus full FPU users that save/restore directly with no laziness
      whatsoever.
      
      But the fpu::fpstate_active still carries bits of the old complexity - meanwhile
      this flag has become a simple flag that shows whether the FPU context saving
      area in the thread struct is initialized and used, or not.
      
      Rename it to fpu::initialized to express this simplicity in the name as well.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-30-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e4a81bfc
  9. 25 9月, 2017 2 次提交
    • L
      x86/mm: Fix fault error path using unsafe vma pointer · a3c4fb7c
      Laurent Dufour 提交于
      commit 7b2d0dba ("x86/mm/pkeys: Pass VMA down in to fault signal
      generation code") passes down a vma pointer to the error path, but that is
      done once the mmap_sem is released when calling mm_fault_error() from
      __do_page_fault().
      
      This is dangerous as the vma structure is no more safe to be used once the
      mmap_sem has been released. As only the protection key value is required in
      the error processing, we could just pass down this value.
      
      Fix it by passing a pointer to a protection key value down to the fault
      signal generation code. The use of a pointer allows to keep the check
      generating a warning message in fill_sig_info_pkey() when the vma was not
      known. If the pointer is valid, the protection value can be accessed by
      deferencing the pointer.
      
      [ tglx: Made *pkey u32 as that's the type which is passed in siginfo ]
      
      Fixes: 7b2d0dba ("x86/mm/pkeys: Pass VMA down in to fault signal generation code")
      Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1504513935-12742-1-git-send-email-ldufour@linux.vnet.ibm.com
      a3c4fb7c
    • E
      x86/fpu: Reinitialize FPU registers if restoring FPU state fails · d5c8028b
      Eric Biggers 提交于
      Userspace can change the FPU state of a task using the ptrace() or
      rt_sigreturn() system calls.  Because reserved bits in the FPU state can
      cause the XRSTOR instruction to fail, the kernel has to carefully
      validate that no reserved bits or other invalid values are being set.
      
      Unfortunately, there have been bugs in this validation code.  For
      example, we were not checking that the 'xcomp_bv' field in the
      xstate_header was 0.  As-is, such bugs are exploitable to read the FPU
      registers of other processes on the system.  To do so, an attacker can
      create a task, assign to it an invalid FPU state, then spin in a loop
      and monitor the values of the FPU registers.  Because the task's FPU
      registers are not being restored, sometimes the FPU registers will have
      the values from another process.
      
      This is likely to continue to be a problem in the future because the
      validation done by the CPU instructions like XRSTOR is not immediately
      visible to kernel developers.  Nor will invalid FPU states ever be
      encountered during ordinary use --- they will only be seen during
      fuzzing or exploits.  There can even be reserved bits outside the
      xstate_header which are easy to forget about.  For example, the MXCSR
      register contains reserved bits, which were not validated by the
      KVM_SET_XSAVE ioctl until commit a575813b ("KVM: x86: Fix load
      damaged SSEx MXCSR register").
      
      Therefore, mitigate this class of vulnerability by restoring the FPU
      registers from init_fpstate if restoring from the task's state fails.
      
      We actually used to do this, but it was (perhaps unwisely) removed by
      commit 9ccc27a5 ("x86/fpu: Remove error return values from
      copy_kernel_to_*regs() functions").  This new patch is also a bit
      different.  First, it only clears the registers, not also the bad
      in-memory state; this is simpler and makes it easier to make the
      mitigation cover all callers of __copy_kernel_to_fpregs().  Second, it
      does the register clearing in an exception handler so that no extra
      instructions are added to context switches.  In fact, we *remove*
      instructions, since previously we were always zeroing the register
      containing 'err' even if CONFIG_X86_DEBUG_FPU was disabled.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170922174156.16780-4-ebiggers3@gmail.com
      Link: http://lkml.kernel.org/r/20170923130016.21448-27-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d5c8028b
  10. 24 9月, 2017 2 次提交
    • I
      x86/fpu: Change fpu->fpregs_active users to fpu->fpstate_active · f1c8cd01
      Ingo Molnar 提交于
      We want to simplify the FPU state machine by eliminating fpu->fpregs_active,
      and we can do that because the two state flags (::fpregs_active and
      ::fpstate_active) are set essentially together.
      
      The old lazy FPU switching code used to make a distinction - but there's
      no lazy switching code anymore, we always switch in an 'eager' fashion.
      
      Do this by first changing all substantial uses of fpu->fpregs_active
      to fpu->fpstate_active and adding a few debug checks to double check
      our assumption is correct.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-19-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f1c8cd01
    • I
      x86/fpu: Simplify fpu->fpregs_active use · b3a16308
      Ingo Molnar 提交于
      The fpregs_active() inline function is pretty pointless - in almost
      all the callsites it can be replaced with a direct fpu->fpregs_active
      access.
      
      Do so and eliminate the extra layer of obfuscation.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-16-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b3a16308
  11. 23 9月, 2017 1 次提交
    • J
      x86/asm: Fix inline asm call constraints for Clang · f5caf621
      Josh Poimboeuf 提交于
      For inline asm statements which have a CALL instruction, we list the
      stack pointer as a constraint to convince GCC to ensure the frame
      pointer is set up first:
      
        static inline void foo()
        {
      	register void *__sp asm(_ASM_SP);
      	asm("call bar" : "+r" (__sp))
        }
      
      Unfortunately, that pattern causes Clang to corrupt the stack pointer.
      
      The fix is easy: convert the stack pointer register variable to a global
      variable.
      
      It should be noted that the end result is different based on the GCC
      version.  With GCC 6.4, this patch has exactly the same result as
      before:
      
      	defconfig	defconfig-nofp	distro		distro-nofp
       before	9820389		9491555		8816046		8516940
       after	9820389		9491555		8816046		8516940
      
      With GCC 7.2, however, GCC's behavior has changed.  It now changes its
      behavior based on the conversion of the register variable to a global.
      That somehow convinces it to *always* set up the frame pointer before
      inserting *any* inline asm.  (Therefore, listing the variable as an
      output constraint is a no-op and is no longer necessary.)  It's a bit
      overkill, but the performance impact should be negligible.  And in fact,
      there's a nice improvement with frame pointers disabled:
      
      	defconfig	defconfig-nofp	distro		distro-nofp
       before	9796316		9468236		9076191		8790305
       after	9796957		9464267		9076381		8785949
      
      So in summary, while listing the stack pointer as an output constraint
      is no longer necessary for newer versions of GCC, it's still needed for
      older versions.
      Suggested-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: NMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f5caf621
  12. 18 9月, 2017 1 次提交
  13. 13 9月, 2017 3 次提交
    • J
      x86/paravirt: Remove no longer used paravirt functions · 87930019
      Juergen Gross 提交于
      With removal of lguest some of the paravirt functions are no longer
      needed:
      
      	->read_cr4()
      	->store_idt()
      	->set_pmd_at()
      	->set_pud_at()
      	->pte_update()
      
      Remove them.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akataria@vmware.com
      Cc: boris.ostrovsky@oracle.com
      Cc: chrisw@sous-sol.org
      Cc: jeremy@goop.org
      Cc: rusty@rustcorp.com.au
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/20170904102527.25409-1-jgross@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      87930019
    • A
      x86/mm/64: Initialize CR4.PCIDE early · c7ad5ad2
      Andy Lutomirski 提交于
      cpu_init() is weird: it's called rather late (after early
      identification and after most MMU state is initialized) on the boot
      CPU but is called extremely early (before identification) on secondary
      CPUs.  It's called just late enough on the boot CPU that its CR4 value
      isn't propagated to mmu_cr4_features.
      
      Even if we put CR4.PCIDE into mmu_cr4_features, we'd hit two
      problems.  First, we'd crash in the trampoline code.  That's
      fixable, and I tried that.  It turns out that mmu_cr4_features is
      totally ignored by secondary_start_64(), though, so even with the
      trampoline code fixed, it wouldn't help.
      
      This means that we don't currently have CR4.PCIDE reliably initialized
      before we start playing with cpu_tlbstate.  This is very fragile and
      tends to cause boot failures if I make even small changes to the TLB
      handling code.
      
      Make it more robust: initialize CR4.PCIDE earlier on the boot CPU
      and propagate it to secondary CPUs in start_secondary().
      
      ( Yes, this is ugly.  I think we should have improved mmu_cr4_features
        to actually control CR4 during secondary bootup, but that would be
        fairly intrusive at this stage. )
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reported-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Tested-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Fixes: 660da7c9 ("x86/mm: Enable CR4.PCIDE on supported systems")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c7ad5ad2
    • A
      x86/mm: Get rid of VM_BUG_ON in switch_tlb_irqs_off() · a376e7f9
      Andy Lutomirski 提交于
      If we hit the VM_BUG_ON(), we're detecting a genuinely bad situation,
      but we're very unlikely to get a useful call trace.
      
      Make it a warning instead.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3b4e06bbb382ca54a93218407c93925ff5871546.1504847163.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a376e7f9
  14. 11 9月, 2017 1 次提交
  15. 09 9月, 2017 1 次提交
    • M
      mm/memory_hotplug: introduce add_pages · 3072e413
      Michal Hocko 提交于
      There are new users of memory hotplug emerging.  Some of them require
      different subset of arch_add_memory.  There are some which only require
      allocation of struct pages without mapping those pages to the kernel
      address space.  We currently have __add_pages for that purpose.  But this
      is rather lowlevel and not very suitable for the code outside of the
      memory hotplug.  E.g.  x86_64 wants to update max_pfn which should be done
      by the caller.  Introduce add_pages() which should care about those
      details if they are needed.  Each architecture should define its
      implementation and select CONFIG_ARCH_HAS_ADD_PAGES.  All others use the
      currently existing __add_pages.
      
      Link: http://lkml.kernel.org/r/20170817000548.32038-7-jglisse@redhat.comSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NJérôme Glisse <jglisse@redhat.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Nellans <dnellans@nvidia.com>
      Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mark Hairgrove <mhairgrove@nvidia.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Sherry Cheung <SCheung@nvidia.com>
      Cc: Subhash Gutti <sgutti@nvidia.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Bob Liu <liubo95@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3072e413
  16. 07 9月, 2017 2 次提交
  17. 31 8月, 2017 2 次提交
  18. 29 8月, 2017 3 次提交
  19. 17 8月, 2017 3 次提交
    • K
      locking/refcounts, x86/asm: Implement fast refcount overflow protection · 7a46ec0e
      Kees Cook 提交于
      This implements refcount_t overflow protection on x86 without a noticeable
      performance impact, though without the fuller checking of REFCOUNT_FULL.
      
      This is done by duplicating the existing atomic_t refcount implementation
      but with normally a single instruction added to detect if the refcount
      has gone negative (e.g. wrapped past INT_MAX or below zero). When detected,
      the handler saturates the refcount_t to INT_MIN / 2. With this overflow
      protection, the erroneous reference release that would follow a wrap back
      to zero is blocked from happening, avoiding the class of refcount-overflow
      use-after-free vulnerabilities entirely.
      
      Only the overflow case of refcounting can be perfectly protected, since
      it can be detected and stopped before the reference is freed and left to
      be abused by an attacker. There isn't a way to block early decrements,
      and while REFCOUNT_FULL stops increment-from-zero cases (which would
      be the state _after_ an early decrement and stops potential double-free
      conditions), this fast implementation does not, since it would require
      the more expensive cmpxchg loops. Since the overflow case is much more
      common (e.g. missing a "put" during an error path), this protection
      provides real-world protection. For example, the two public refcount
      overflow use-after-free exploits published in 2016 would have been
      rendered unexploitable:
      
        http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/
      
        http://cyseclabs.com/page?n=02012016
      
      This implementation does, however, notice an unchecked decrement to zero
      (i.e. caller used refcount_dec() instead of refcount_dec_and_test() and it
      resulted in a zero). Decrements under zero are noticed (since they will
      have resulted in a negative value), though this only indicates that a
      use-after-free may have already happened. Such notifications are likely
      avoidable by an attacker that has already exploited a use-after-free
      vulnerability, but it's better to have them reported than allow such
      conditions to remain universally silent.
      
      On first overflow detection, the refcount value is reset to INT_MIN / 2
      (which serves as a saturation value) and a report and stack trace are
      produced. When operations detect only negative value results (such as
      changing an already saturated value), saturation still happens but no
      notification is performed (since the value was already saturated).
      
      On the matter of races, since the entire range beyond INT_MAX but before
      0 is negative, every operation at INT_MIN / 2 will trap, leaving no
      overflow-only race condition.
      
      As for performance, this implementation adds a single "js" instruction
      to the regular execution flow of a copy of the standard atomic_t refcount
      operations. (The non-"and_test" refcount_dec() function, which is uncommon
      in regular refcount design patterns, has an additional "jz" instruction
      to detect reaching exactly zero.) Since this is a forward jump, it is by
      default the non-predicted path, which will be reinforced by dynamic branch
      prediction. The result is this protection having virtually no measurable
      change in performance over standard atomic_t operations. The error path,
      located in .text.unlikely, saves the refcount location and then uses UD0
      to fire a refcount exception handler, which resets the refcount, handles
      reporting, and returns to regular execution. This keeps the changes to
      .text size minimal, avoiding return jumps and open-coded calls to the
      error reporting routine.
      
      Example assembly comparison:
      
      refcount_inc() before:
      
        .text:
        ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
      
      refcount_inc() after:
      
        .text:
        ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
        ffffffff8154614d:       0f 88 80 d5 17 00       js     ffffffff816c36d3
        ...
        .text.unlikely:
        ffffffff816c36d3:       48 8d 4d f4             lea    -0xc(%rbp),%rcx
        ffffffff816c36d7:       0f ff                   (bad)
      
      These are the cycle counts comparing a loop of refcount_inc() from 1
      to INT_MAX and back down to 0 (via refcount_dec_and_test()), between
      unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL
      (refcount_t-full), and this overflow-protected refcount (refcount_t-fast):
      
        2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s:
      		    cycles		protections
        atomic_t           82249267387	none
        refcount_t-fast    82211446892	overflow, untested dec-to-zero
        refcount_t-full   144814735193	overflow, untested dec-to-zero, inc-from-zero
      
      This code is a modified version of the x86 PAX_REFCOUNT atomic_t
      overflow defense from the last public patch of PaX/grsecurity, based
      on my understanding of the code. Changes or omissions from the original
      code are mine and don't reflect the original grsecurity/PaX code. Thanks
      to PaX Team for various suggestions for improvement for repurposing this
      code to be a refcount-only protection.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Elena Reshetova <elena.reshetova@intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Hans Liljestrand <ishkamiel@gmail.com>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Serge E. Hallyn <serge@hallyn.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arozansk@redhat.com
      Cc: axboe@kernel.dk
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-arch <linux-arch@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20170815161924.GA133115@beastSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7a46ec0e
    • O
      x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks · 01578e36
      Oleg Nesterov 提交于
      The ADDR_NO_RANDOMIZE checks in stack_maxrandom_size() and
      randomize_stack_top() are not required.
      
      PF_RANDOMIZE is set by load_elf_binary() only if ADDR_NO_RANDOMIZE is not
      set, no need to re-check after that.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Cc: stable@vger.kernel.org
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170815154011.GB1076@redhat.com
      01578e36
    • O
      x86: Fix norandmaps/ADDR_NO_RANDOMIZE · 47ac5484
      Oleg Nesterov 提交于
      Documentation/admin-guide/kernel-parameters.txt says:
      
          norandmaps  Don't use address space randomization. Equivalent
                      to echo 0 > /proc/sys/kernel/randomize_va_space
      
      but it doesn't work because arch_rnd() which is used to randomize
      mm->mmap_base returns a random value unconditionally. And as Kirill
      pointed out, ADDR_NO_RANDOMIZE is broken by the same reason.
      
      Just shift the PF_RANDOMIZE check from arch_mmap_rnd() to arch_rnd().
      
      Fixes: 1b028f78 ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()")
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Reviewed-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Cc: stable@vger.kernel.org
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20170815153952.GA1076@redhat.com
      47ac5484
  20. 11 8月, 2017 1 次提交
  21. 30 7月, 2017 1 次提交
    • A
      x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment reads · 99504819
      Andy Lutomirski 提交于
      Now that pt_regs properly defines segment fields as 16-bit on 32-bit
      CPUs, there's no need to mask off the high word.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      99504819
  22. 25 7月, 2017 2 次提交
    • A
      x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y · 04b67022
      Andrey Ryabinin 提交于
      KASAN fills kernel page tables with repeated values to map several
      TBs of the virtual memory to the single kasan_zero_page:
        kasan_zero_p4d ->
          kasan_zero_pud ->
              kasan_zero_pmd->
                  kasan_zero_pte->
                      kasan_zero_page
      
      Walking the whole KASAN shadow range takes a lot of time, especially
      with 5-level page tables. Since we already know that all kasan page tables
      eventually point to the kasan_zero_page we could call note_page()
      right and avoid walking lower levels of the page tables.
      This will not affect the output of the kernel_page_tables file,
      but let us avoid spending time in page table walkers:
      
      Before:
      
        $ time cat /sys/kernel/debug/kernel_page_tables > /dev/null
      
        real    0m55.855s
        user    0m0.000s
        sys     0m55.840s
      
      After:
      
        $ time cat /sys/kernel/debug/kernel_page_tables > /dev/null
      
        real    0m0.054s
        user    0m0.000s
        sys     0m0.054s
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170724152558.24689-1-aryabinin@virtuozzo.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      04b67022
    • A
      x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID · 10af6235
      Andy Lutomirski 提交于
      PCID is a "process context ID" -- it's what other architectures call
      an address space ID.  Every non-global TLB entry is tagged with a
      PCID, only TLB entries that match the currently selected PCID are
      used, and we can switch PGDs without flushing the TLB.  x86's
      PCID is 12 bits.
      
      This is an unorthodox approach to using PCID.  x86's PCID is far too
      short to uniquely identify a process, and we can't even really
      uniquely identify a running process because there are monster
      systems with over 4096 CPUs.  To make matters worse, past attempts
      to use all 12 PCID bits have resulted in slowdowns instead of
      speedups.
      
      This patch uses PCID differently.  We use a PCID to identify a
      recently-used mm on a per-cpu basis.  An mm has no fixed PCID
      binding at all; instead, we give it a fresh PCID each time it's
      loaded except in cases where we want to preserve the TLB, in which
      case we reuse a recent value.
      
      Here are some benchmark results, done on a Skylake laptop at 2.3 GHz
      (turbo off, intel_pstate requesting max performance) under KVM with
      the guest using idle=poll (to avoid artifacts when bouncing between
      CPUs).  I haven't done any real statistics here -- I just ran them
      in a loop and picked the fastest results that didn't look like
      outliers.  Unpatched means commit a4eb8b99, so all the
      bookkeeping overhead is gone.
      
      ping-pong between two mms on the same CPU using eventfd:
      
        patched:         1.22µs
        patched, nopcid: 1.33µs
        unpatched:       1.34µs
      
      Same ping-pong, but now touch 512 pages (all zero-page to minimize
      cache misses) each iteration.  dTLB misses are measured by
      dtlb_load_misses.miss_causes_a_walk:
      
        patched:         1.8µs  11M  dTLB misses
        patched, nopcid: 6.2µs, 207M dTLB misses
        unpatched:       6.1µs, 190M dTLB misses
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NNadav Amit <nadav.amit@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/9ee75f17a81770feed616358e6860d98a2a5b1e7.1500957502.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      10af6235
  23. 21 7月, 2017 4 次提交
    • K
      x86/mm: Prepare to expose larger address space to userspace · b569bab7
      Kirill A. Shutemov 提交于
      On x86, 5-level paging enables 56-bit userspace virtual address space.
      Not all user space is ready to handle wide addresses. It's known that
      at least some JIT compilers use higher bits in pointers to encode their
      information. It collides with valid pointers with 5-level paging and
      leads to crashes.
      
      To mitigate this, we are not going to allocate virtual address space
      above 47-bit by default.
      
      But userspace can ask for allocation from full address space by
      specifying hint address (with or without MAP_FIXED) above 47-bits.
      
      If hint address set above 47-bit, but MAP_FIXED is not specified, we try
      to look for unmapped area by specified address. If it's already
      occupied, we look for unmapped area in *full* address space, rather than
      from 47-bit window.
      
      A high hint address would only affect the allocation in question, but not
      any future mmap()s.
      
      Specifying high hint address on older kernel or on machine without 5-level
      paging support is safe. The hint will be ignored and kernel will fall back
      to allocation from 47-bit address space.
      
      This approach helps to easily make application's memory allocator aware
      about large address space without manually tracking allocated virtual
      address space.
      
      The patch puts all machinery in place, but not yet allows userspace to have
      mappings above 47-bit -- TASK_SIZE_MAX has to be raised to get the effect.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20170716225954.74185-7-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b569bab7
    • K
      x86/mpx: Do not allow MPX if we have mappings above 47-bit · 44b04912
      Kirill A. Shutemov 提交于
      MPX (without MAWA extension) cannot handle addresses above 47 bits, so we
      need to make sure that MPX cannot be enabled if we already have a VMA above
      the boundary and forbid creating such VMAs once MPX is enabled.
      
      The patch implements mpx_unmapped_area_check() which is called from all
      variants of get_unmapped_area() to check if the requested address fits
      mpx.
      
      On enabling MPX, we check if we already have any vma above 47-bit
      boundary and forbit the enabling if we do.
      
      As long as DEFAULT_MAP_WINDOW is equal to TASK_SIZE_MAX, the change is
      nop. It will change when we allow userspace to have mappings above
      47-bits.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20170716225954.74185-6-kirill.shutemov@linux.intel.com
      [ Readability edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      44b04912
    • K
      x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit() · e8f01a8d
      Kirill A. Shutemov 提交于
      Rename these helpers to be consistent with spelling of TASK_SIZE and
      related constants.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20170716225954.74185-5-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e8f01a8d
    • K
      x86/mm/dump_pagetables: Fix printout of p4d level · 45dcd209
      Kirill A. Shutemov 提交于
      Modify printk_prot() and callers to print out additional page table
      level correctly.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20170716225954.74185-3-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      45dcd209