1. 26 4月, 2017 5 次提交
    • A
      x86/mm: Fix flush_tlb_page() on Xen · dbd68d8e
      Andy Lutomirski 提交于
      flush_tlb_page() passes a bogus range to flush_tlb_others() and
      expects the latter to fix it up.  native_flush_tlb_others() has the
      fixup but Xen's version doesn't.  Move the fixup to
      flush_tlb_others().
      
      AFAICS the only real effect is that, without this fix, Xen would
      flush everything instead of just the one page on remote vCPUs in
      when flush_tlb_page() was called.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: e7b52ffd ("x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range")
      Link: http://lkml.kernel.org/r/10ed0e4dfea64daef10b87fb85df1746999b4dba.1492844372.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dbd68d8e
    • A
      x86/mm: Make flush_tlb_mm_range() more predictable · ce27374f
      Andy Lutomirski 提交于
      I'm about to rewrite the function almost completely, but first I
      want to get a functional change out of the way.  Currently, if
      flush_tlb_mm_range() does not flush the local TLB at all, it will
      never do individual page flushes on remote CPUs.  This seems to be
      an accident, and preserving it will be awkward.  Let's change it
      first so that any regressions in the rewrite will be easier to
      bisect and so that the rewrite can attempt to change no visible
      behavior at all.
      
      The fix is simple: we can simply avoid short-circuiting the
      calculation of base_pages_to_flush.
      
      As a side effect, this also eliminates a potential corner case: if
      tlb_single_page_flush_ceiling == TLB_FLUSH_ALL, flush_tlb_mm_range()
      could have ended up flushing the entire address space one page at a
      time.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NDave Hansen <dave.hansen@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/4b29b771d9975aad7154c314534fec235618175a.1492844372.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ce27374f
    • A
      x86/mm: Remove flush_tlb() and flush_tlb_current_task() · 29961b59
      Andy Lutomirski 提交于
      I was trying to figure out what how flush_tlb_current_task() would
      possibly work correctly if current->mm != current->active_mm, but I
      realized I could spare myself the effort: it has no callers except
      the unused flush_tlb() macro.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/e52d64c11690f85e9f1d69d7b48cc2269cd2e94b.1492844372.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      29961b59
    • A
      x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly() · 9ccee237
      Andy Lutomirski 提交于
      mark_screen_rdonly() is the last remaining caller of flush_tlb().
      flush_tlb_mm_range() is potentially faster and isn't obsolete.
      
      Compile-tested only because I don't know whether software that uses
      this mechanism even exists.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/791a644076fc3577ba7f7b7cafd643cc089baa7d.1492844372.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9ccee237
    • K
      x86/mm/64: Fix crash in remove_pagetable() · e6ab9c4d
      Kirill A. Shutemov 提交于
      remove_pagetable() does page walk using p*d_page_vaddr() plus cast.
      It's not canonical approach -- we usually use p*d_offset() for that.
      
      It works fine as long as all page table levels are present. We broke the
      invariant by introducing folded p4d page table level.
      
      As result, remove_pagetable() interprets PMD as PUD and it leads to
      crash:
      
      	BUG: unable to handle kernel paging request at ffff880300000000
      	IP: memchr_inv+0x60/0x110
      	PGD 317d067
      	P4D 317d067
      	PUD 3180067
      	PMD 33f102067
      	PTE 8000000300000060
      
      Let's fix this by using p*d_offset() instead of p*d_page_vaddr() for
      page walk.
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Fixes: f2a6a705 ("x86: Convert the rest of the code to support p4d_t")
      Link: http://lkml.kernel.org/r/20170425092557.21852-1-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e6ab9c4d
  2. 23 4月, 2017 1 次提交
    • I
      Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" · 6dd29b3d
      Ingo Molnar 提交于
      This reverts commit 2947ba05.
      
      Dan Williams reported dax-pmem kernel warnings with the following signature:
      
         WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
         percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0 (0) after switching to atomic
      
      ... and bisected it to this commit, which suggests possible memory corruption
      caused by the x86 fast-GUP conversion.
      
      He also pointed out:
      
       "
        This is similar to the backtrace when we were not properly handling
        pud faults and was fixed with this commit: 220ced16 "mm: fix
        get_user_pages() vs device-dax pud mappings"
      
        I've found some missing _devmap checks in the generic
        get_user_pages_fast() path, but this does not fix the regression
        [...]
       "
      
      So given that there are known bugs, and a pretty robust looking bisection
      points to this commit suggesting that are unknown bugs in the conversion
      as well, revert it for the time being - we'll re-try in v4.13.
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: dann.frazier@canonical.com
      Cc: dave.hansen@intel.com
      Cc: steve.capper@linaro.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6dd29b3d
  3. 14 4月, 2017 1 次提交
  4. 13 4月, 2017 1 次提交
  5. 12 4月, 2017 1 次提交
    • J
      x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space · 5ed386ec
      Joerg Roedel 提交于
      When this function fails it just sends a SIGSEGV signal to
      user-space using force_sig(). This signal is missing
      essential information about the cause, e.g. the trap_nr or
      an error code.
      
      Fix this by propagating the error to the only caller of
      mpx_handle_bd_fault(), do_bounds(), which sends the correct
      SIGSEGV signal to the process.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: fe3d197f ('x86, mpx: On-demand kernel allocation of bounds tables')
      Link: http://lkml.kernel.org/r/1491488362-27198-1-git-send-email-joro@8bytes.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5ed386ec
  6. 08 4月, 2017 1 次提交
  7. 04 4月, 2017 7 次提交
  8. 03 4月, 2017 2 次提交
  9. 01 4月, 2017 1 次提交
  10. 31 3月, 2017 5 次提交
    • D
      x86/mm: Make in_compat_syscall() work during exec · ada26481
      Dmitry Safonov 提交于
      The x86 mmap() code selects the mmap base for an allocation depending on
      the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and
      for 32bit mm->mmap_compat_base.
      
      On execve the registers of the task invoking exec() are copied to the child
      pt_regs. So child->pt_regs->orig_ax contains the execve syscall number of the
      parent.
      
      exec() calls mmap() which in turn uses in_compat_syscall() to check whether
      the mapping is for a 32bit or a 64bit task. The decision is made on the
      following criteria:
      
        ia32	  child->thread.status & TS_COMPAT
         x32	  child->pt_regs.orig_ax & __X32_SYSCALL_BIT
        ia64	  !ia32 && !x32 
      
      child->thread.status is corretly set up in set_personality_*(), but the
      syscall number in child->pt_regs.orig_ax is left unmodified.
      
      Therefore the parent/child combinations work or fail in the following way:
      
      Parent Child Child->thread_status  child->pt_regs.orig_ax  in_compat()  Works
      ia64    ia64   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 0     false       Y
      ia64    ia32   TS_COMPAT == 1	   __X32_SYSCALL_BIT == 0     true        Y
      ia64     x32   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 0     false       N
      ia32    ia64   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 0     false       Y
      ia32    ia32   TS_COMPAT == 1	   __X32_SYSCALL_BIT == 0     true        Y
      ia32     x32   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 0     false       N
       x32    ia64   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 1     true        N
       x32    ia32   TS_COMPAT == 1	   __X32_SYSCALL_BIT == 1     true        Y
       x32     x32   TS_COMPAT == 0	   __X32_SYSCALL_BIT == 1     true        Y
      
      Make set_personality_*() store the syscall number incl. __X32_SYSCALL_BIT
      which corresponds to the newly started ELF executable in the childs
      pt_regs, i.e. pretend that the exec was invoked from a task with the same
      executable format.
      
      So both thread.status and pt_regs.orig_ax correspond to the new ELF format
      and in_compat_syscall() returns the correct result.
      
      [ tglx: Rewrote changelog ]
      
      Fixes: commit 1b028f78 ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()")
      Reported-by: NAdam Borowski <kilobyte@angband.pl>
      Suggested-by: NH. Peter Anvin <hpa@zytor.com>
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Cc: 0x7f454c46@gmail.com
      Cc: linux-mm@kvack.org
      Cc: Andrei Vagin <avagin@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170331111137.28170-1-dsafonov@virtuozzo.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      ada26481
    • Z
      x86/boot: Include missing header file · 6b1cc946
      Zhengyi Shen 提交于
      Sparse complains about missing forward declarations:
      
      arch/x86/boot/compressed/error.c:8:6:
      	warning: symbol 'warn' was not declared. Should it be static?
      arch/x86/boot/compressed/error.c:15:6:
      	warning: symbol 'error' was not declared. Should it be static?
      
      Include the missing header file.
      Signed-off-by: NZhengyi Shen <shenzhengyi@gmail.com>
      Acked-by: NKess Cook <keescook@chromium.org>
      Link: http://lkml.kernel.org/r/1490770820-24472-1-git-send-email-shenzhengyi@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      6b1cc946
    • Y
      x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs · 29f72ce3
      Yazen Ghannam 提交于
      MCA bank 3 is reserved on systems pre-Fam17h, so it didn't have a name.
      However, MCA bank 3 is defined on Fam17h systems and can be accessed
      using legacy MSRs. Without a name we get a stack trace on Fam17h systems
      when trying to register sysfs files for bank 3 on kernels that don't
      recognize Scalable MCA.
      
      Call MCA bank 3 "decode_unit" since this is what it represents on
      Fam17h. This will allow kernels without SMCA support to see this bank on
      Fam17h+ and prevent the stack trace. This will not affect older systems
      since this bank is reserved on them, i.e. it'll be ignored.
      
      Tested on AMD Fam15h and Fam17h systems.
      
        WARNING: CPU: 26 PID: 1 at lib/kobject.c:210 kobject_add_internal
        kobject: (ffff88085bb256c0): attempted to be registered with empty name!
        ...
        Call Trace:
         kobject_add_internal
         kobject_add
         kobject_create_and_add
         threshold_create_device
         threshold_init_device
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/1490102285-3659-1-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      29f72ce3
    • Z
      x86/boot: Fix Sparse warning by including required header file · 5af21843
      Zhengyi Shen 提交于
      Include declarations for various symbols defined in the error.h header file
      to fix the following Sparse warnings:
      
        arch/x86/boot/compressed/error.c:8:6:
      	warning: symbol 'warn' was not declared. Should it be static?
        arch/x86/boot/compressed/error.c:15:6:
      	warning: symbol 'error' was not declared. Should it be static?
      Signed-off-by: NZhengyi Shen <shenzhengyi@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1490770820-24472-1-git-send-email-shenzhengyi@gmail.com
      [ Fixed/enhanced the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5af21843
    • B
      x86/boot/32: Flip the logic in test_wp_bit() · 952a6c2c
      Borislav Petkov 提交于
      ... to have a natural "likely()" in the code flow and thus have the
      success case with a branch 99.999% of the times non-taken and function
      return code following it instead of jumping to it each time.
      
      This puts the panic() call at the end of the function - it is going to
      be practically unreachable anyway.
      
      The C code is a bit more readable too.
      
      No functionality change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: thgarnie@google.com
      Link: http://lkml.kernel.org/r/20170330080101.ywsf5rg6ilzu4itk@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      952a6c2c
  11. 30 3月, 2017 3 次提交
    • J
      x86/build: Mostly disable '-maccumulate-outgoing-args' · 3f135e57
      Josh Poimboeuf 提交于
      The GCC '-maccumulate-outgoing-args' flag is enabled for most configs,
      mostly because of issues which are no longer relevant.  For most
      configs, and with most recent versions of GCC, it's no longer needed.
      
      Clarify which cases need it, and only enable it for those cases.  Also
      produce a compile-time error for the ftrace graph + mcount + '-Os' case,
      which will otherwise cause runtime failures.
      
      The main benefit of '-maccumulate-outgoing-args' is that it prevents an
      ugly prologue for functions which have aligned stacks.  But removing the
      option also has some benefits: more readable argument saves, smaller
      text size, and (presumably) slightly improved performance.
      
      Here are the object size savings for 32-bit and 64-bit defconfig
      kernels:
      
            text	   data	    bss	     dec	    hex	filename
        10006710	3543328	1773568	15323606	 e9d1d6	vmlinux.x86-32.before
         9706358	3547424	1773568	15027350	 e54c96	vmlinux.x86-32.after
      
            text	   data	    bss	     dec	    hex	filename
        10652105	4537576	 843776	16033457	 f4a6b1	vmlinux.x86-64.before
        10639629	4537576	 843776	16020981	 f475f5	vmlinux.x86-64.after
      
      That comes out to a 3% text size improvement on x86-32 and a 0.1% text
      size improvement on x86-64.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrew Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170316193133.zrj6gug53766m6nn@trebleSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3f135e57
    • A
      x86/boot/32: Rewrite test_wp_bit() · 4af17110
      Andy Lutomirski 提交于
      This code seems to be very old and has gotten only minor updates.
      It's overcomplicated and has a bunch of comments that are, at best,
      of purely historical interest.  Nowadays we have a shiny function
      probe_kernel_write() that does more or less exactly what we need.
      Use it.
      
      I switched the page that we test from swapper_pg_dir to
      empty_zero_page because writing zero to empty_zero_page is more
      obviously safe than writing to the paging structures.  (It's
      extremely unlikely that any of this would cause problems in practice
      because the write will fail on any supported CPU.)
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/0b9e64ab0236de30e7572213cea77bf95ae2e990.1490831211.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4af17110
    • K
      x86/dump_pagetables: Add support for 5-level paging · fdd3d8ce
      Kirill A. Shutemov 提交于
      Simple extension to support one more page table level.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20170328104806.41711-1-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fdd3d8ce
  12. 28 3月, 2017 2 次提交
  13. 27 3月, 2017 6 次提交
  14. 24 3月, 2017 4 次提交
    • B
      x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization · a46f60d7
      Baoquan He 提交于
      Currently KASLR is enabled on three regions: the direct mapping of physical
      memory, vamlloc and vmemmap. However the EFI region is also mistakenly
      included for VA space randomization because of misusing EFI_VA_START macro
      and assuming EFI_VA_START < EFI_VA_END.
      
      (This breaks kexec and possibly other things that rely on stable addresses.)
      
      The EFI region is reserved for EFI runtime services virtual mapping which
      should not be included in KASLR ranges. In Documentation/x86/x86_64/mm.txt,
      we can see:
      
        ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
      
      EFI uses the space from -4G to -64G thus EFI_VA_START > EFI_VA_END,
      Here EFI_VA_START = -4G, and EFI_VA_END = -64G.
      
      Changing EFI_VA_START to EFI_VA_END in mm/kaslr.c fixes this problem.
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Reviewed-by: NBhupesh Sharma <bhsharma@redhat.com>
      Acked-by: NDave Young <dyoung@redhat.com>
      Acked-by: NThomas Garnier <thgarnie@google.com>
      Cc: <stable@vger.kernel.org> #4.8+
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1490331592-31860-1-git-send-email-bhe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a46f60d7
    • W
      KVM: VMX: Fix enable VPID conditions · 08d839c4
      Wanpeng Li 提交于
      This can be reproduced by running L2 on L1, and disable VPID on L0
      if w/o commit "KVM: nVMX: Fix nested VPID vmx exec control", the L2
      crash as below:
      
      KVM: entry failed, hardware error 0x7
      EAX=00000000 EBX=00000000 ECX=00000000 EDX=000306c3
      ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
      EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
      ES =0000 00000000 0000ffff 00009300
      CS =f000 ffff0000 0000ffff 00009b00
      SS =0000 00000000 0000ffff 00009300
      DS =0000 00000000 0000ffff 00009300
      FS =0000 00000000 0000ffff 00009300
      GS =0000 00000000 0000ffff 00009300
      LDT=0000 00000000 0000ffff 00008200
      TR =0000 00000000 0000ffff 00008b00
      GDT=     00000000 0000ffff
      IDT=     00000000 0000ffff
      CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
      DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
      DR6=00000000ffff0ff0 DR7=0000000000000400
      EFER=0000000000000000
      
      Reference SDM 30.3 INVVPID:
      
      Protected Mode Exceptions
      - #UD
        - If not in VMX operation.
        - If the logical processor does not support VPIDs (IA32_VMX_PROCBASED_CTLS2[37]=0).
        - If the logical processor supports VPIDs (IA32_VMX_PROCBASED_CTLS2[37]=1) but does
          not support the INVVPID instruction (IA32_VMX_EPT_VPID_CAP[32]=0).
      
      So we should check both VPID enable bit in vmx exec control and INVVPID support bit
      in vmx capability MSRs to enable VPID. This patch adds the guarantee to not enable
      VPID if either INVVPID or single-context/all-context invalidation is not exposed in
      vmx capability MSRs.
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      08d839c4
    • W
      KVM: nVMX: Fix nested VPID vmx exec control · 63cb6d5f
      Wanpeng Li 提交于
      This can be reproduced by running kvm-unit-tests/vmx.flat on L0 w/ vpid disabled.
      
      Test suite: VPID
      Unhandled exception 6 #UD at ip 00000000004051a6
      error_code=0000      rflags=00010047      cs=00000008
      rax=0000000000000000 rcx=0000000000000001 rdx=0000000000000047 rbx=0000000000402f79
      rbp=0000000000456240 rsi=0000000000000001 rdi=0000000000000000
      r8=000000000000000a  r9=00000000000003f8 r10=0000000080010011 r11=0000000000000000
      r12=0000000000000003 r13=0000000000000708 r14=0000000000000000 r15=0000000000000000
      cr0=0000000080010031 cr2=0000000000000000 cr3=0000000007fff000 cr4=0000000000002020
      cr8=0000000000000000
      STACK: @4051a6 40523e 400f7f 402059 40028f
      
      We should hide and forbid VPID in L1 if it is disabled on L0. However, nested VPID
      enable bit is set unconditionally during setup nested vmx exec controls though VPID
      is not exposed through nested VMX capablity. This patch fixes it by don't set nested
      VPID enable bit if it is disabled on L0.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 5c614b35 (KVM: nVMX: nested VPID emulation)
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      63cb6d5f
    • W
      KVM: x86: correct async page present tracepoint · 24dccf83
      Wanpeng Li 提交于
      After async pf setup successfully, there is a broadcast wakeup w/ special
      token 0xffffffff which tells vCPU that it should wake up all processes
      waiting for APFs though there is no real process waiting at the moment.
      
      The async page present tracepoint print prematurely and fails to catch the
      special token setup. This patch fixes it by moving the async page present
      tracepoint after the special token setup.
      
      Before patch:
      
      qemu-system-x86-8499  [006] ...1  5973.473292: kvm_async_pf_ready: token 0x0 gva 0x0
      
      After patch:
      
      qemu-system-x86-8499  [006] ...1  5973.473292: kvm_async_pf_ready: token 0xffffffff gva 0x0
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      24dccf83