1. 17 4月, 2019 30 次提交
    • A
      x86/irq/64: Remap the IRQ stack with guard pages · 18b7a6be
      Andy Lutomirski 提交于
      The IRQ stack lives in percpu space, so an IRQ handler that overflows it
      will overwrite other data structures.
      
      Use vmap() to remap the IRQ stack so that it will have the usual guard
      pages that vmap()/vmalloc() allocations have. With this, the kernel will
      panic immediately on an IRQ stack overflow.
      
      [ tglx: Move the map code to a proper place and invoke it only when a CPU
        	is about to be brought online. No point in installing the map at
        	early boot for all possible CPUs. Fail the CPU bringup if the vmap()
        	fails as done for all other preparatory stages in CPU hotplug. ]
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160146.363733568@linutronix.de
      18b7a6be
    • A
      x86/irq/64: Split the IRQ stack into its own pages · e6401c13
      Andy Lutomirski 提交于
      Currently, the IRQ stack is hardcoded as the first page of the percpu
      area, and the stack canary lives on the IRQ stack. The former gets in
      the way of adding an IRQ stack guard page, and the latter is a potential
      weakness in the stack canary mechanism.
      
      Split the IRQ stack into its own private percpu pages.
      
      [ tglx: Make 64 and 32 bit share struct irq_stack ]
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Jordan Borgner <mail@jordan-borgner.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Maran Wilson <maran.wilson@oracle.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: "Rafael Ávila de Espíndola" <rafael@espindo.la>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: x86-ml <x86@kernel.org>
      Cc: xen-devel@lists.xenproject.org
      Link: https://lkml.kernel.org/r/20190414160146.267376656@linutronix.de
      e6401c13
    • T
      x86/irq/64: Init hardirq_stack_ptr during CPU hotplug · 0ac26104
      Thomas Gleixner 提交于
      Preparatory change for disentangling the irq stack union as a
      prerequisite for irq stacks with guard pages.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Cc: Yi Wang <wang.yi59@zte.com.cn>
      Link: https://lkml.kernel.org/r/20190414160146.177558566@linutronix.de
      0ac26104
    • T
      x86/irq/32: Handle irq stack allocation failure proper · 66c7ceb4
      Thomas Gleixner 提交于
      irq_ctx_init() crashes hard on page allocation failures. While that's ok
      during early boot, it's just wrong in the CPU hotplug bringup code.
      
      Check the page allocation failure and return -ENOMEM and handle it at the
      call sites. On early boot the only way out is to BUG(), but on CPU hotplug
      there is no reason to crash, so just abort the operation.
      
      Rename the function to something more sensible while at it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Alison Schofield <alison.schofield@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: x86-ml <x86@kernel.org>
      Cc: xen-devel@lists.xenproject.org
      Cc: Yazen Ghannam <yazen.ghannam@amd.com>
      Cc: Yi Wang <wang.yi59@zte.com.cn>
      Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
      Link: https://lkml.kernel.org/r/20190414160146.089060584@linutronix.de
      66c7ceb4
    • T
      x86/irq/32: Invoke irq_ctx_init() from init_IRQ() · 451f743a
      Thomas Gleixner 提交于
      irq_ctx_init() is invoked from native_init_IRQ() or from xen_init_IRQ()
      code. There is no reason to have this split. The interrupt stacks must be
      allocated no matter what.
      
      Invoke it from init_IRQ() before invoking the native or XEN init
      implementation.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Abraham <j.abraham1776@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: x86-ml <x86@kernel.org>
      Cc: xen-devel@lists.xenproject.org
      Link: https://lkml.kernel.org/r/20190414160146.001162606@linutronix.de
      451f743a
    • T
      x86/irq/64: Rename irq_stack_ptr to hardirq_stack_ptr · 758a2e31
      Thomas Gleixner 提交于
      Preparatory patch to share code with 32bit.
      
      No functional changes.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pingfan Liu <kernelfans@gmail.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160145.912584074@linutronix.de
      758a2e31
    • T
      x86/irq/32: Rename hard/softirq_stack to hard/softirq_stack_ptr · a754fe2b
      Thomas Gleixner 提交于
      The percpu storage holds a pointer to the stack not the stack
      itself. Rename it before sharing struct irq_stack with 64-bit.
      
      No functional changes.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160145.824805922@linutronix.de
      a754fe2b
    • T
      x86/irq/32: Make irq stack a character array · 231c4846
      Thomas Gleixner 提交于
      There is no reason to have an u32 array in struct irq_stack. The only
      purpose of the array is to size the struct properly.
      
      Preparatory change for sharing struct irq_stack with 64-bit.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Pingfan Liu <kernelfans@gmail.com>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160145.736241969@linutronix.de
      231c4846
    • T
      x86/irq/32: Define IRQ_STACK_SIZE · aa641c28
      Thomas Gleixner 提交于
      On 32-bit IRQ_STACK_SIZE is the same as THREAD_SIZE.
      
      To allow sharing struct irq_stack with 32-bit, define IRQ_STACK_SIZE for
      32-bit and use it for struct irq_stack.
      
      No functional change.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160145.632513987@linutronix.de
      aa641c28
    • T
      x86/dumpstack/64: Speedup in_exception_stack() · c450c8f5
      Thomas Gleixner 提交于
      The current implementation of in_exception_stack() iterates over the
      exception stacks array. Most of the time this is an useless exercise, but
      even for the actual use cases (perf and ftrace) it takes at least 2
      iterations to get to the NMI stack.
      
      As the exception stacks and the guard pages are page aligned the loop can
      be avoided completely.
      
      Add a initial check whether the stack pointer is inside the full exception
      stack area and leave early if not.
      
      Create a lookup table which describes the stack area. The table index is
      the page offset from the beginning of the exception stacks. So for any
      given stack pointer the page offset is computed and a lookup in the
      description table is performed. If it is inside a guard page, return. If
      not, use the descriptor to fill in the info structure.
      
      The table is filled at compile time and for the !KASAN case the interesting
      page descriptors exactly fit into a single cache line. Just the last guard
      page descriptor is in the next cacheline, but that should not be accessed
      in the regular case.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160145.543320386@linutronix.de
      c450c8f5
    • T
      x86/exceptions: Split debug IST stack · 2a594d4c
      Thomas Gleixner 提交于
      The debug IST stack is actually two separate debug stacks to handle #DB
      recursion. This is required because the CPU starts always at top of stack
      on exception entry, which means on #DB recursion the second #DB would
      overwrite the stack of the first.
      
      The low level entry code therefore adjusts the top of stack on entry so a
      secondary #DB starts from a different stack page. But the stack pages are
      adjacent without a guard page between them.
      
      Split the debug stack into 3 stacks which are separated by guard pages. The
      3rd stack is never mapped into the cpu_entry_area and is only there to
      catch triple #DB nesting:
      
            --- top of DB_stack	<- Initial stack
            --- end of DB_stack
            	  guard page
      
            --- top of DB1_stack	<- Top of stack after entering first #DB
            --- end of DB1_stack
            	  guard page
      
            --- top of DB2_stack	<- Top of stack after entering second #DB
            --- end of DB2_stack
            	  guard page
      
      If DB2 would not act as the final guard hole, a second #DB would point the
      top of #DB stack to the stack below #DB1 which would be valid and not catch
      the not so desired triple nesting.
      
      The backing store does not allocate any memory for DB2 and its guard page
      as it is not going to be mapped into the cpu_entry_area.
      
       - Adjust the low level entry code so it adjusts top of #DB with the offset
         between the stacks instead of exception stack size.
      
       - Make the dumpstack code aware of the new stacks.
      
       - Adjust the in_debug_stack() implementation and move it into the NMI code
         where it belongs. As this is NMI hotpath code, it just checks the full
         area between top of DB_stack and bottom of DB1_stack without checking
         for the guard page. That's correct because the NMI cannot hit a
         stackpointer pointing to the guard page between DB and DB1 stack.  Even
         if it would, then the NMI operation still is unaffected, but the resume
         of the debug exception on the topmost DB stack will crash by touching
         the guard page.
      
        [ bp: Make exception_stack_names static const char * const ]
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: linux-doc@vger.kernel.org
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160145.439944544@linutronix.de
      2a594d4c
    • T
      x86/exceptions: Enable IST guard pages · 1bdb67e5
      Thomas Gleixner 提交于
      All usage sites which expected that the exception stacks in the CPU entry
      area are mapped linearly are fixed up. Enable guard pages between the
      IST stacks.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160145.349862042@linutronix.de
      1bdb67e5
    • T
      x86/exceptions: Disconnect IST index and stack order · 32074269
      Thomas Gleixner 提交于
      The entry order of the TSS.IST array and the order of the stack
      storage/mapping are not required to be the same.
      
      With the upcoming split of the debug stack this is going to fall apart as
      the number of TSS.IST array entries stays the same while the actual stacks
      are increasing.
      
      Make them separate so that code like dumpstack can just utilize the mapping
      order. The IST index is solely required for the actual TSS.IST array
      initialization.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160145.241588113@linutronix.de
      32074269
    • T
      x86/cpu: Remove orig_ist array · 4d68c3d0
      Thomas Gleixner 提交于
      All users gone.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pingfan Liu <kernelfans@gmail.com>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160145.151435667@linutronix.de
      4d68c3d0
    • T
      x86/cpu: Prepare TSS.IST setup for guard pages · f6ef7322
      Thomas Gleixner 提交于
      Convert the TSS.IST setup code to use the cpu entry area information
      directly instead of assuming a linear mapping of the IST stacks.
      
      The store to orig_ist[] is no longer required as there are no users
      anymore.
      
      This is the last preparatory step towards IST guard pages.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160145.061686012@linutronix.de
      f6ef7322
    • T
      x86/dumpstack/64: Use cpu_entry_area instead of orig_ist · afcd21da
      Thomas Gleixner 提交于
      The orig_ist[] array is a shadow copy of the IST array in the TSS. The
      reason why it exists is that older kernels used two TSS variants with
      different pointers into the debug stack. orig_ist[] contains the real
      starting points.
      
      There is no point anymore to do so because the same information can be
      retrieved using the base address of the cpu entry area mapping and the
      offsets of the various exception stacks.
      
      No functional change. Preparation for removing orig_ist.
      
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.974900463@linutronix.de
      afcd21da
    • T
      x86/irq/64: Use cpu entry area instead of orig_ist · bf5882ab
      Thomas Gleixner 提交于
      The orig_ist[] array is a shadow copy of the IST array in the TSS. The
      reason why it exists is that older kernels used two TSS variants with
      different pointers into the debug stack. orig_ist[] contains the real
      starting points.
      
      There is no point anymore to do so because the same information can be
      retrieved using the base address of the cpu entry area mapping and the
      offsets of the various exception stacks.
      
      No functional change. Preparation for removing orig_ist.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.885741626@linutronix.de
      bf5882ab
    • T
      x86/traps: Use cpu_entry_area instead of orig_ist · d876b673
      Thomas Gleixner 提交于
      The orig_ist[] array is a shadow copy of the IST array in the TSS. The
      reason why it exists is that older kernels used two TSS variants with
      different pointers into the debug stack. orig_ist[] contains the real
      starting points.
      
      There is no point anymore to do so because the same information can be
      retrieved using the base address of the cpu entry area mapping and the
      offsets of the various exception stacks.
      
      No functional change. Preparation for removing orig_ist.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.784487230@linutronix.de
      d876b673
    • T
      x86/cpu_entry_area: Provide exception stack accessor · 7623f37e
      Thomas Gleixner 提交于
      Store a pointer to the per cpu entry area exception stack mappings to allow
      fast retrieval.
      
      Required for converting various places from using the shadow IST array to
      directly doing address calculations on the actual mapping address.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.680960459@linutronix.de
      7623f37e
    • T
      x86/cpu_entry_area: Prepare for IST guard pages · a4af767a
      Thomas Gleixner 提交于
      To allow guard pages between the IST stacks each stack needs to be
      mapped individually.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.592691557@linutronix.de
      a4af767a
    • T
      x86/exceptions: Add structs for exception stacks · 019b17b3
      Thomas Gleixner 提交于
      At the moment everything assumes a full linear mapping of the various
      exception stacks. Adding guard pages to the cpu entry area mapping of the
      exception stacks will break that assumption.
      
      As a preparatory step convert both the real storage and the effective
      mapping in the cpu entry area from character arrays to structures.
      
      To ensure that both arrays have the same ordering and the same size of the
      individual stacks fill the members with a macro. The guard size is the only
      difference between the two resulting structures. For now both have guard
      size 0 until the preparation of all usage sites is done.
      
      Provide a couple of helper macros which are used in the following
      conversions.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.506807893@linutronix.de
      019b17b3
    • T
      x86/cpu_entry_area: Cleanup setup functions · 881a463c
      Thomas Gleixner 提交于
      No point in retrieving the entry area pointer over and over. Do it once
      and use unsigned int for 'cpu' everywhere.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.419653165@linutronix.de
      881a463c
    • T
      x86/exceptions: Make IST index zero based · 8f34c5b5
      Thomas Gleixner 提交于
      The defines for the exception stack (IST) array in the TSS are using the
      SDM convention IST1 - IST7. That causes all sorts of code to subtract 1 for
      array indices related to IST. That's confusing at best and does not provide
      any value.
      
      Make the indices zero based and fixup the usage sites. The only code which
      needs to adjust the 0 based index is the interrupt descriptor setup which
      needs to add 1 now.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: linux-doc@vger.kernel.org
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.331772825@linutronix.de
      8f34c5b5
    • T
      x86/exceptions: Remove unused stack defines on 32bit · 30842211
      Thomas Gleixner 提交于
      Nothing requires those for 32bit builds.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.227822695@linutronix.de
      30842211
    • T
      x86/64: Remove stale CURRENT_MASK · 6f36bd8d
      Thomas Gleixner 提交于
      Nothing uses that and before people get the wrong ideas, get rid of it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.139284839@linutronix.de
      6f36bd8d
    • T
      x86/idt: Remove unused macro SISTG · 99d33451
      Thomas Gleixner 提交于
      Commit
      
        d8ba61ba ("x86/entry/64: Don't use IST entry for #BP stack")
      
      removed the last user but left the macro around.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.050689789@linutronix.de
      99d33451
    • T
      x86/irq/64: Sanitize the top/bottom confusion · df835e70
      Thomas Gleixner 提交于
      On x86, stacks go top to bottom, but the stack overflow check uses it
      the other way round, which is just confusing. Clean it up and sanitize
      the warning string a bit.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160143.961241397@linutronix.de
      df835e70
    • A
      x86/irq/64: Remove a hardcoded irq_stack_union access · 4f44b8f0
      Andy Lutomirski 提交于
      stack_overflow_check() is using both irq_stack_ptr and irq_stack_union
      to find the IRQ stack. That's going to break when vmapped irq stacks are
      introduced.
      
      Change it to just use irq_stack_ptr.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160143.872549191@linutronix.de
      4f44b8f0
    • A
      x86/dumpstack: Fix off-by-one errors in stack identification · fa332154
      Andy Lutomirski 提交于
      The get_stack_info() function is off-by-one when checking whether an
      address is on a IRQ stack or a IST stack. This prevents an overflowed
      IRQ or IST stack from being dumped properly.
      
      [ tglx: Do the same for 32-bit ]
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160143.785651055@linutronix.de
      fa332154
    • T
      x86/irq/64: Limit IST stack overflow check to #DB stack · 7dbcf2b0
      Thomas Gleixner 提交于
      Commit
      
        37fe6a42 ("x86: Check stack overflow in detail")
      
      added a broad check for the full exception stack area, i.e. it considers
      the full exception stack area as valid.
      
      That's wrong in two aspects:
      
       1) It does not check the individual areas one by one
      
       2) #DF, NMI and #MCE are not enabling interrupts which means that a
          regular device interrupt cannot happen in their context. In fact if a
          device interrupt hits one of those IST stacks that's a bug because some
          code path enabled interrupts while handling the exception.
      
      Limit the check to the #DB stack and consider all other IST stacks as
      'overflow' or invalid.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160143.682135110@linutronix.de
      7dbcf2b0
  2. 10 4月, 2019 1 次提交
    • L
      x86/perf/amd: Remove need to check "running" bit in NMI handler · 3966c3fe
      Lendacky, Thomas 提交于
      Spurious interrupt support was added to perf in the following commit, almost
      a decade ago:
      
        63e6be6d ("perf, x86: Catch spurious interrupts after disabling counters")
      
      The two previous patches (resolving the race condition when disabling a
      PMC and NMI latency mitigation) allow for the removal of this older
      spurious interrupt support.
      
      Currently in x86_pmu_stop(), the bit for the PMC in the active_mask bitmap
      is cleared before disabling the PMC, which sets up a race condition. This
      race condition was mitigated by introducing the running bitmap. That race
      condition can be eliminated by first disabling the PMC, waiting for PMC
      reset on overflow and then clearing the bit for the PMC in the active_mask
      bitmap. The NMI handler will not re-enable a disabled counter.
      
      If x86_pmu_stop() is called from the perf NMI handler, the NMI latency
      mitigation support will guard against any unhandled NMI messages.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # 4.14.x-
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/Message-ID:
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3966c3fe
  3. 06 4月, 2019 5 次提交
    • A
      x86/asm: Use stricter assembly constraints in bitops · 5b77e95d
      Alexander Potapenko 提交于
      There's a number of problems with how arch/x86/include/asm/bitops.h
      is currently using assembly constraints for the memory region
      bitops are modifying:
      
      1) Use memory clobber in bitops that touch arbitrary memory
      
      Certain bit operations that read/write bits take a base pointer and an
      arbitrarily large offset to address the bit relative to that base.
      Inline assembly constraints aren't expressive enough to tell the
      compiler that the assembly directive is going to touch a specific memory
      location of unknown size, therefore we have to use the "memory" clobber
      to indicate that the assembly is going to access memory locations other
      than those listed in the inputs/outputs.
      
      To indicate that BTR/BTS instructions don't necessarily touch the first
      sizeof(long) bytes of the argument, we also move the address to assembly
      inputs.
      
      This particular change leads to size increase of 124 kernel functions in
      a defconfig build. For some of them the diff is in NOP operations, other
      end up re-reading values from memory and may potentially slow down the
      execution. But without these clobbers the compiler is free to cache
      the contents of the bitmaps and use them as if they weren't changed by
      the inline assembly.
      
      2) Use byte-sized arguments for operations touching single bytes.
      
      Passing a long value to ANDB/ORB/XORB instructions makes the compiler
      treat sizeof(long) bytes as being clobbered, which isn't the case. This
      may theoretically lead to worse code in the case of heavy optimization.
      
      Practical impact:
      
      I've built a defconfig kernel and looked through some of the functions
      generated by GCC 7.3.0 with and without this clobber, and didn't spot
      any miscompilations.
      
      However there is a (trivial) theoretical case where this code leads to
      miscompilation:
      
        https://lkml.org/lkml/2019/3/28/393
      
      using just GCC 8.3.0 with -O2.  It isn't hard to imagine someone writes
      such a function in the kernel someday.
      
      So the primary motivation is to fix an existing misuse of the asm
      directive, which happens to work in certain configurations now, but
      isn't guaranteed to work under different circumstances.
      
      [ --mingo: Added -stable tag because defconfig only builds a fraction
        of the kernel and the trivial testcase looks normal enough to
        be used in existing or in-development code. ]
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: James Y Knight <jyknight@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20190402112813.193378-1-glider@google.com
      [ Edited the changelog, tidied up one of the defines. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5b77e95d
    • M
      KVM: x86: nVMX: fix x2APIC VTPR read intercept · c73f4c99
      Marc Orr 提交于
      Referring to the "VIRTUALIZING MSR-BASED APIC ACCESSES" chapter of the
      SDM, when "virtualize x2APIC mode" is 1 and "APIC-register
      virtualization" is 0, a RDMSR of 808H should return the VTPR from the
      virtual APIC page.
      
      However, for nested, KVM currently fails to disable the read intercept
      for this MSR. This means that a RDMSR exit takes precedence over
      "virtualize x2APIC mode", and KVM passes through L1's TPR to L2,
      instead of sourcing the value from L2's virtual APIC page.
      
      This patch fixes the issue by disabling the read intercept, in VMCS02,
      for the VTPR when "APIC-register virtualization" is 0.
      
      The issue described above and fix prescribed here, were verified with
      a related patch in kvm-unit-tests titled "Test VMX's virtualize x2APIC
      mode w/ nested".
      Signed-off-by: NMarc Orr <marcorr@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Fixes: c992384b ("KVM: vmx: speed up MSR bitmap merge")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c73f4c99
    • M
      KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887) · acff7847
      Marc Orr 提交于
      The nested_vmx_prepare_msr_bitmap() function doesn't directly guard the
      x2APIC MSR intercepts with the "virtualize x2APIC mode" MSR. As a
      result, we discovered the potential for a buggy or malicious L1 to get
      access to L0's x2APIC MSRs, via an L2, as follows.
      
      1. L1 executes WRMSR(IA32_SPEC_CTRL, 1). This causes the spec_ctrl
      variable, in nested_vmx_prepare_msr_bitmap() to become true.
      2. L1 disables "virtualize x2APIC mode" in VMCS12.
      3. L1 enables "APIC-register virtualization" in VMCS12.
      
      Now, KVM will set VMCS02's x2APIC MSR intercepts from VMCS12, and then
      set "virtualize x2APIC mode" to 0 in VMCS02. Oops.
      
      This patch closes the leak by explicitly guarding VMCS02's x2APIC MSR
      intercepts with VMCS12's "virtualize x2APIC mode" control.
      
      The scenario outlined above and fix prescribed here, were verified with
      a related patch in kvm-unit-tests titled "Add leak scenario to
      virt_x2apic_mode_test".
      
      Note, it looks like this issue may have been introduced inadvertently
      during a merge---see 15303ba5.
      Signed-off-by: NMarc Orr <marcorr@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      acff7847
    • D
      KVM: SVM: prevent DBG_DECRYPT and DBG_ENCRYPT overflow · b86bc285
      David Rientjes 提交于
      This ensures that the address and length provided to DBG_DECRYPT and
      DBG_ENCRYPT do not cause an overflow.
      
      At the same time, pass the actual number of pages pinned in memory to
      sev_unpin_memory() as a cleanup.
      Reported-by: NCfir Cohen <cfir@google.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b86bc285
    • D
      kvm: svm: fix potential get_num_contig_pages overflow · ede885ec
      David Rientjes 提交于
      get_num_contig_pages() could potentially overflow int so make its type
      consistent with its usage.
      Reported-by: NCfir Cohen <cfir@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ede885ec
  4. 05 4月, 2019 3 次提交
    • S
      syscalls: Remove start and number from syscall_set_arguments() args · 32d92586
      Steven Rostedt (VMware) 提交于
      After removing the start and count arguments of syscall_get_arguments() it
      seems reasonable to remove them from syscall_set_arguments(). Note, as of
      today, there are no users of syscall_set_arguments(). But we are told that
      there will be soon. But for now, at least make it consistent with
      syscall_get_arguments().
      
      Link: http://lkml.kernel.org/r/20190327222014.GA32540@altlinux.org
      
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Dave Martin <dave.martin@arm.com>
      Cc: "Dmitry V. Levin" <ldv@altlinux.org>
      Cc: x86@kernel.org
      Cc: linux-snps-arc@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-c6x-dev@linux-c6x.org
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      Cc: linux-hexagon@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-mips@vger.kernel.org
      Cc: nios2-dev@lists.rocketboards.org
      Cc: openrisc@lists.librecores.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-riscv@lists.infradead.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Cc: linux-um@lists.infradead.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: linux-arch@vger.kernel.org
      Acked-by: Max Filippov <jcmvbkbc@gmail.com> # For xtensa changes
      Acked-by: Will Deacon <will.deacon@arm.com> # For the arm64 bits
      Reviewed-by: Thomas Gleixner <tglx@linutronix.de> # for x86
      Reviewed-by: NDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      32d92586
    • S
      syscalls: Remove start and number from syscall_get_arguments() args · b35f549d
      Steven Rostedt (Red Hat) 提交于
      At Linux Plumbers, Andy Lutomirski approached me and pointed out that the
      function call syscall_get_arguments() implemented in x86 was horribly
      written and not optimized for the standard case of passing in 0 and 6 for
      the starting index and the number of system calls to get. When looking at
      all the users of this function, I discovered that all instances pass in only
      0 and 6 for these arguments. Instead of having this function handle
      different cases that are never used, simply rewrite it to return the first 6
      arguments of a system call.
      
      This should help out the performance of tracing system calls by ptrace,
      ftrace and perf.
      
      Link: http://lkml.kernel.org/r/20161107213233.754809394@goodmis.org
      
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Dave Martin <dave.martin@arm.com>
      Cc: "Dmitry V. Levin" <ldv@altlinux.org>
      Cc: x86@kernel.org
      Cc: linux-snps-arc@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-c6x-dev@linux-c6x.org
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      Cc: linux-hexagon@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-mips@vger.kernel.org
      Cc: nios2-dev@lists.rocketboards.org
      Cc: openrisc@lists.librecores.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-riscv@lists.infradead.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Cc: linux-um@lists.infradead.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: linux-arch@vger.kernel.org
      Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts
      Acked-by: Max Filippov <jcmvbkbc@gmail.com> # For xtensa changes
      Acked-by: Will Deacon <will.deacon@arm.com> # For the arm64 bits
      Reviewed-by: Thomas Gleixner <tglx@linutronix.de> # for x86
      Reviewed-by: NDmitry V. Levin <ldv@altlinux.org>
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      b35f549d
    • D
      xen: Prevent buffer overflow in privcmd ioctl · 42d8644b
      Dan Carpenter 提交于
      The "call" variable comes from the user in privcmd_ioctl_hypercall().
      It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32)
      elements.  We need to put an upper bound on it to prevent an out of
      bounds access.
      
      Cc: stable@vger.kernel.org
      Fixes: 1246ae0b ("xen: add variable hypercall caller")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      42d8644b
  5. 03 4月, 2019 1 次提交
    • L
      x86/perf/amd: Resolve NMI latency issues for active PMCs · 6d3edaae
      Lendacky, Thomas 提交于
      On AMD processors, the detection of an overflowed PMC counter in the NMI
      handler relies on the current value of the PMC. So, for example, to check
      for overflow on a 48-bit counter, bit 47 is checked to see if it is 1 (not
      overflowed) or 0 (overflowed).
      
      When the perf NMI handler executes it does not know in advance which PMC
      counters have overflowed. As such, the NMI handler will process all active
      PMC counters that have overflowed. NMI latency in newer AMD processors can
      result in multiple overflowed PMC counters being processed in one NMI and
      then a subsequent NMI, that does not appear to be a back-to-back NMI, not
      finding any PMC counters that have overflowed. This may appear to be an
      unhandled NMI resulting in either a panic or a series of messages,
      depending on how the kernel was configured.
      
      To mitigate this issue, add an AMD handle_irq callback function,
      amd_pmu_handle_irq(), that will invoke the common x86_pmu_handle_irq()
      function and upon return perform some additional processing that will
      indicate if the NMI has been handled or would have been handled had an
      earlier NMI not handled the overflowed PMC. Using a per-CPU variable, a
      minimum value of the number of active PMCs or 2 will be set whenever a
      PMC is active. This is used to indicate the possible number of NMIs that
      can still occur. The value of 2 is used for when an NMI does not arrive
      at the LAPIC in time to be collapsed into an already pending NMI. Each
      time the function is called without having handled an overflowed counter,
      the per-CPU value is checked. If the value is non-zero, it is decremented
      and the NMI indicates that it handled the NMI. If the value is zero, then
      the NMI indicates that it did not handle the NMI.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # 4.14.x-
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/Message-ID:
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6d3edaae