1. 05 10月, 2017 1 次提交
    • B
      kvm/x86: Avoid async PF preempting the kernel incorrectly · a2b7861b
      Boqun Feng 提交于
      Currently, in PREEMPT_COUNT=n kernel, kvm_async_pf_task_wait() could call
      schedule() to reschedule in some cases.  This could result in
      accidentally ending the current RCU read-side critical section early,
      causing random memory corruption in the guest, or otherwise preempting
      the currently running task inside between preempt_disable and
      preempt_enable.
      
      The difficulty to handle this well is because we don't know whether an
      async PF delivered in a preemptible section or RCU read-side critical section
      for PREEMPT_COUNT=n, since preempt_disable()/enable() and rcu_read_lock/unlock()
      are both no-ops in that case.
      
      To cure this, we treat any async PF interrupting a kernel context as one
      that cannot be preempted, preventing kvm_async_pf_task_wait() from choosing
      the schedule() path in that case.
      
      To do so, a second parameter for kvm_async_pf_task_wait() is introduced,
      so that we know whether it's called from a context interrupting the
      kernel, and the parameter is set properly in all the callsites.
      
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a2b7861b
  2. 30 9月, 2017 1 次提交
  3. 29 9月, 2017 1 次提交
    • B
      kvm/x86: Handle async PF in RCU read-side critical sections · b862789a
      Boqun Feng 提交于
      Sasha Levin reported a WARNING:
      
      | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329
      | rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
      | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329
      | rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
      ...
      | CPU: 0 PID: 6974 Comm: syz-fuzzer Not tainted 4.13.0-next-20170908+ #246
      | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      | 1.10.1-1ubuntu1 04/01/2014
      | Call Trace:
      ...
      | RIP: 0010:rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
      | RIP: 0010:rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
      | RSP: 0018:ffff88003b2debc8 EFLAGS: 00010002
      | RAX: 0000000000000001 RBX: 1ffff1000765bd85 RCX: 0000000000000000
      | RDX: 1ffff100075d7882 RSI: ffffffffb5c7da20 RDI: ffff88003aebc410
      | RBP: ffff88003b2def30 R08: dffffc0000000000 R09: 0000000000000001
      | R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003b2def08
      | R13: 0000000000000000 R14: ffff88003aebc040 R15: ffff88003aebc040
      | __schedule+0x201/0x2240 kernel/sched/core.c:3292
      | schedule+0x113/0x460 kernel/sched/core.c:3421
      | kvm_async_pf_task_wait+0x43f/0x940 arch/x86/kernel/kvm.c:158
      | do_async_page_fault+0x72/0x90 arch/x86/kernel/kvm.c:271
      | async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
      | RIP: 0010:format_decode+0x240/0x830 lib/vsprintf.c:1996
      | RSP: 0018:ffff88003b2df520 EFLAGS: 00010283
      | RAX: 000000000000003f RBX: ffffffffb5d1e141 RCX: ffff88003b2df670
      | RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffb5d1e140
      | RBP: ffff88003b2df560 R08: dffffc0000000000 R09: 0000000000000000
      | R10: ffff88003b2df718 R11: 0000000000000000 R12: ffff88003b2df5d8
      | R13: 0000000000000064 R14: ffffffffb5d1e140 R15: 0000000000000000
      | vsnprintf+0x173/0x1700 lib/vsprintf.c:2136
      | sprintf+0xbe/0xf0 lib/vsprintf.c:2386
      | proc_self_get_link+0xfb/0x1c0 fs/proc/self.c:23
      | get_link fs/namei.c:1047 [inline]
      | link_path_walk+0x1041/0x1490 fs/namei.c:2127
      ...
      
      This happened when the host hit a page fault, and delivered it as in an
      async page fault, while the guest was in an RCU read-side critical
      section.  The guest then tries to reschedule in kvm_async_pf_task_wait(),
      but rcu_preempt_note_context_switch() would treat the reschedule as a
      sleep in RCU read-side critical section, which is not allowed (even in
      preemptible RCU).  Thus the WARN.
      
      To cure this, make kvm_async_pf_task_wait() go to the halt path if the
      PF happens in a RCU read-side critical section.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b862789a
  4. 28 9月, 2017 1 次提交
  5. 27 9月, 2017 1 次提交
  6. 26 9月, 2017 35 次提交
    • E
      x86/fpu: Use using_compacted_format() instead of open coded X86_FEATURE_XSAVES · 738f48cb
      Eric Biggers 提交于
      This is the canonical method to use.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-11-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      738f48cb
    • E
      x86/fpu: Use validate_xstate_header() to validate the xstate_header in copy_user_to_xstate() · 98c0fad9
      Eric Biggers 提交于
      Tighten the checks in copy_user_to_xstate().
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-10-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      98c0fad9
    • E
      x86/fpu: Eliminate the 'xfeatures' local variable in copy_user_to_xstate() · 3d703477
      Eric Biggers 提交于
      We now have this field in hdr.xfeatures.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-9-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3d703477
    • E
      x86/fpu: Copy the full header in copy_user_to_xstate() · af2c4322
      Eric Biggers 提交于
      This is in preparation to verify the full xstate header as supplied by user-space.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-8-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      af2c4322
    • E
      x86/fpu: Use validate_xstate_header() to validate the xstate_header in copy_kernel_to_xstate() · af95774b
      Eric Biggers 提交于
      Tighten the checks in copy_kernel_to_xstate().
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-7-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      af95774b
    • E
      x86/fpu: Eliminate the 'xfeatures' local variable in copy_kernel_to_xstate() · b89eda48
      Eric Biggers 提交于
      We have this information in the xstate_header.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-6-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b89eda48
    • E
      x86/fpu: Copy the full state_header in copy_kernel_to_xstate() · 80d8ae86
      Eric Biggers 提交于
      This is in preparation to verify the full xstate header as supplied by user-space.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-5-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      80d8ae86
    • E
      x86/fpu: Use validate_xstate_header() to validate the xstate_header in __fpu__restore_sig() · b11e2e18
      Eric Biggers 提交于
      Tighten the checks in __fpu__restore_sig() and update comments.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-4-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b11e2e18
    • E
      x86/fpu: Use validate_xstate_header() to validate the xstate_header in xstateregs_set() · cf9df81b
      Eric Biggers 提交于
      Tighten the checks in xstateregs_set().
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-3-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cf9df81b
    • E
      x86/fpu: Introduce validate_xstate_header() · e63e5d5c
      Eric Biggers 提交于
      Move validation of user-supplied xstate_header into a helper function,
      in preparation of calling it from both the ptrace and sigreturn syscall
      paths.
      
      The new function also considers it to be an error if *any* reserved bits
      are set, whereas before we were just clearing most of them silently.
      
      This should reduce the chance of bugs that fail to correctly validate
      user-supplied XSAVE areas.  It also will expose any broken userspace
      programs that set the other reserved bits; this is desirable because
      such programs will lose compatibility with future CPUs and kernels if
      those bits are ever used for anything.  (There shouldn't be any such
      programs, and in fact in the case where the compacted format is in use
      we were already validating xfeatures.  But you never know...)
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-2-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e63e5d5c
    • I
      x86/fpu: Rename fpu__activate_fpstate_read/write() to fpu__prepare_[read|write]() · 369a036d
      Ingo Molnar 提交于
      As per the new nomenclature we don't 'activate' the FPU state
      anymore, we initialize it. So drop the _activate_fpstate name
      from these functions, which were a bit of a mouthful anyway,
      and name them:
      
      	fpu__prepare_read()
      	fpu__prepare_write()
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      369a036d
    • I
      x86/fpu: Rename fpu__activate_curr() to fpu__initialize() · 2ce03d85
      Ingo Molnar 提交于
      Rename this function to better express that it's all about
      initializing the FPU state of a task which goes hand in hand
      with the fpu::initialized field.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-33-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2ce03d85
    • I
      x86/fpu: Simplify and speed up fpu__copy() · e10078eb
      Ingo Molnar 提交于
      fpu__copy() has a preempt_disable()/enable() pair, which it had to do to
      be able to atomically unlazy the current task when doing an FNSAVE.
      
      But we don't unlazy tasks anymore, we always do direct saves/restores of
      FPU context.
      
      So remove both the unnecessary critical section, and update the comments.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-32-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e10078eb
    • I
      x86/fpu: Fix stale comments about lazy FPU logic · 7f1487c5
      Ingo Molnar 提交于
      We don't do any lazy restore anymore, what we have are two pieces of optimization:
      
       - no-FPU tasks that don't save/restore the FPU context (kernel threads are such)
      
       - cached FPU registers maintained via the fpu->last_cpu field. This means that
         if an FPU task context switches to a non-FPU task then we can maintain the
         FPU registers as an in-FPU copies (cache), and skip the restoration of them
         once we switch back to the original FPU-using task.
      
      Update all the comments that still referred to old 'lazy' and 'unlazy' concepts.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-31-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7f1487c5
    • I
      x86/fpu: Rename fpu::fpstate_active to fpu::initialized · e4a81bfc
      Ingo Molnar 提交于
      The x86 FPU code used to have a complex state machine where both the FPU
      registers and the FPU state context could be 'active' (or inactive)
      independently of each other - which enabled features like lazy FPU restore.
      
      Much of this complexity is gone in the current code: now we basically can
      have FPU-less tasks (kernel threads) that don't use (and save/restore) FPU
      state at all, plus full FPU users that save/restore directly with no laziness
      whatsoever.
      
      But the fpu::fpstate_active still carries bits of the old complexity - meanwhile
      this flag has become a simple flag that shows whether the FPU context saving
      area in the thread struct is initialized and used, or not.
      
      Rename it to fpu::initialized to express this simplicity in the name as well.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-30-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e4a81bfc
    • I
      x86/fpu: Remove fpu__current_fpstate_write_begin/end() · 685c930d
      Ingo Molnar 提交于
      These functions are not used anymore, so remove them.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-29-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      685c930d
    • I
      x86/fpu: Fix fpu__activate_fpstate_read() and update comments · 4618e909
      Ingo Molnar 提交于
      fpu__activate_fpstate_read() can be called for the current task
      when coredumping - or for stopped tasks when ptrace-ing.
      
      Implement this properly in the code and update the comments.
      
      This also fixes an incorrect (but harmless) warning introduced by
      one of the earlier patches.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-28-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4618e909
    • T
      x86/vector: Respect affinity mask in irq descriptor · d6ffc6ac
      Thomas Gleixner 提交于
      The interrupt descriptor has a preset affinity mask at allocation
      time, which is usually the default affinity mask.
      
      The current code does not respect that mask and places the vector at some
      random CPU, which gets corrected later by a set_affinity() call. That's
      silly because the vector allocation can respect the mask upfront and place
      the interrupt on a CPU which is in the mask. If that fails, then the
      affinity is broken and a interrupt assigned on any online CPU.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213156.431670325@linutronix.de
      d6ffc6ac
    • T
      x86/irq: Simplify hotplug vector accounting · 2cffad7b
      Thomas Gleixner 提交于
      Before a CPU is taken offline the number of active interrupt vectors on the
      outgoing CPU and the number of vectors which are available on the other
      online CPUs are counted and compared. If the active vectors are more than
      the available vectors on the other CPUs then the CPU hot-unplug operation
      is aborted. This again uses loop based search and is inaccurate.
      
      The bitmap matrix allocator has accurate accounting information and can
      tell exactly whether the vector space is sufficient or not.
      
      Emit a message when the number of globaly reserved (unallocated) vectors is
      larger than the number of available vectors after offlining a CPU because
      after that point request_irq() might fail.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213156.351193962@linutronix.de
      2cffad7b
    • T
      x86/vector: Switch IOAPIC to global reservation mode · 464d1230
      Thomas Gleixner 提交于
      IOAPICs install and allocate vectors for inactive interrupts. This results
      in problems on CPU offline and wastes vector resources for nothing.
      
      Handle inactive IOAPIC interrupts in the same way as inactive MSI
      interrupts and switch them to the global reservation mode.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213156.273454591@linutronix.de
      464d1230
    • T
      x86/vector/msi: Switch to global reservation mode · 4900be83
      Thomas Gleixner 提交于
      Devices with many queues allocate a huge number of interrupts and get
      assigned a vector for each of them, even if the queues are not active and
      the interrupts never requested. This causes problems with the decision
      whether the global vector space is sufficient for CPU hot unplug
      operations.
      
      Change it to a reservation scheme, which allows overcommitment.
      
      When the interrupt is allocated and initialized the vector assignment
      merily updates the reservation request counter in the matrix
      allocator. This counter is used to emit warnings when the reservation
      exceeds the available vector space, but does not affect CPU offline
      operations. Like the managed interrupts the corresponding MSI/DMAR/IOAPIC
      entries are directed to the special shutdown vector.
      
      When the interrupt is requested, then the activation code tries to assign a
      real vector. If that succeeds the interrupt is started up and functional.
      
      If that fails, then subsequently request_irq() fails with -ENOSPC.
      
      This allows a clear separation of inactive and active modes and simplifies
      the final decisions whether the global vector space is sufficient for CPU
      offline operations.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213156.184211133@linutronix.de
      4900be83
    • T
      x86/vector: Handle managed interrupts proper · 2db1f959
      Thomas Gleixner 提交于
      Managed interrupts need to reserve interrupt vectors permanently, but as
      long as the interrupt is deactivated, the vector should not be active.
      
      Reserve a new system vector, which can be used to initially initialize
      MSI/DMAR/IOAPIC entries. In that situation the interrupts are disabled in
      the corresponding MSI/DMAR/IOAPIC devices. So the vector should never be
      sent to any CPU.
      
      When the managed interrupt is started up, a real vector is assigned from
      the managed vector space and configured in MSI/DMAR/IOAPIC.
      
      This allows a clear separation of inactive and active modes and simplifies
      the final decisions whether the global vector space is sufficient for CPU
      offline operations.
      
      The vector space can be reserved even on offline CPUs and will survive CPU
      offline/online operations.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213156.104616625@linutronix.de
      2db1f959
    • T
      x86/io_apic: Reevaluate vector configuration on activate() · 90ad9e2d
      Thomas Gleixner 提交于
      With the upcoming reservation/management scheme, early activation will
      assign a special vector. The final activation at request_irq() assigns a
      real vector, which needs to be updated in the ioapic.
      
      Split out the reconfiguration code in set_affinity and use it for
      reactivation.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213156.025232175@linutronix.de
      90ad9e2d
    • T
      x86/apic/msi: Force reactivation of interrupts at startup time · 2a85386a
      Thomas Gleixner 提交于
      MSI(X) interrupts need a valid vector configuration early at allocation
      time, i.e. before the PCI core enables MSI(X).
      
      With managed interrupts and the new global reservation scheme, the early
      configuration will not assign a real device vector, but a special shutdown
      vector. When the irq is started up, then the interrupt must be
      reconfigured. Tell the MSI irqdomain core about it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213155.774066582@linutronix.de
      2a85386a
    • T
      x86/vector: Untangle internal state from irq_cfg · ba224fea
      Thomas Gleixner 提交于
      The vector management state is not required to live in irq_cfg. irq_cfg is
      only relevant for the depending irq domains (IOAPIC, DMAR, MSI ...).
      
      The seperation of the vector management status allows to direct a shut down
      interrupt to a special shutdown vector w/o confusing the internal state of
      the vector management.
      
      Preparatory change for the rework of managed interrupts and the global
      vector reservation scheme.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213155.683712356@linutronix.de
      ba224fea
    • T
      x86/vector: Compile SMP only code conditionally · ba801640
      Thomas Gleixner 提交于
      No point in compiling this for UP.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213155.603191841@linutronix.de
      ba801640
    • T
      x86/apic: Remove unused callbacks · baab1e84
      Thomas Gleixner 提交于
      Now that the old allocator is gone, these apic functions are unused. Remove
      them.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213155.524662349@linutronix.de
      baab1e84
    • T
      x86/vector: Use matrix allocator for vector assignment · 69cde000
      Thomas Gleixner 提交于
      Replace the magic vector allocation code by a simple bitmap matrix
      allocator. This avoids loops and hoops over CPUs and vector arrays, so in
      case of densly used vector spaces it's way faster.
      
      This also gets rid of the magic 'spread the vectors accross priority
      levels' heuristics in the current allocator:
      
      The comment in __asign_irq_vector says:
      
         * NOTE! The local APIC isn't very good at handling
         * multiple interrupts at the same interrupt level.
         * As the interrupt level is determined by taking the
         * vector number and shifting that right by 4, we
         * want to spread these out a bit so that they don't
         * all fall in the same interrupt level.                         
      
      After doing some palaeontological research the following was found the
      following in the PPro Developer Manual Volume 3:
      
           "7.4.2. Valid Interrupts
      
           The local and I/O APICs support 240 distinct vectors in the range of 16
           to 255. Interrupt priority is implied by its vector, according to the
           following relationship: priority = vector / 16
      
           One is the lowest priority and 15 is the highest. Vectors 16 through
           31 are reserved for exclusive use by the processor. The remaining
           vectors are for general use. The processor's local APIC includes an
           in-service entry and a holding entry for each priority level. To avoid
           losing inter- rupts, software should allocate no more than 2 interrupt
           vectors per priority."
      
      The current SDM tells nothing about that, instead it states:
      
           "If more than one interrupt is generated with the same vector number,
            the local APIC can set the bit for the vector both in the IRR and the
            ISR. This means that for the Pentium 4 and Intel Xeon processors, the
            IRR and ISR can queue two interrupts for each interrupt vector: one
            in the IRR and one in the ISR. Any additional interrupts issued for
            the same interrupt vector are collapsed into the single bit in the
            IRR.
      
            For the P6 family and Pentium processors, the IRR and ISR registers
            can queue no more than two interrupts per interrupt vector and will
            reject other interrupts that are received within the same vector."
      
         Which means, that on P6/Pentium the APIC will reject a new message and
         tell the sender to retry, which increases the load on the APIC bus and
         nothing more.
      
      There is no affirmative answer from Intel on that, but it's a sane approach
      to remove that for the following reasons:
      
          1) No other (relevant Open Source) operating systems bothers to
             implement this or mentiones this at all.
      
          2) The current allocator has no enforcement for this and especially the
             legacy interrupts, which are the main source of interrupts on these
             P6 and older systmes, are allocated linearly in the same priority
             level and just work.
      
          3) The current machines have no problem with that at all as verified
             with some experiments.
      
          4) AMD at least confirmed that such an issue is unknown.
      
          5) P6 and older are dinosaurs almost 20 years EOL, so there is really
             no reason to worry about that too much.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213155.443678104@linutronix.de
      69cde000
    • T
      x86/vector: Add tracepoints for vector management · 8d1e3dca
      Thomas Gleixner 提交于
      Add tracepoints for analysing the new vector management
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213155.357986795@linutronix.de
      8d1e3dca
    • T
      x86/smpboot: Set online before setting up vectors · 8ed4f3e6
      Thomas Gleixner 提交于
      There is no reason to set the CPU online after establishing the vectors on
      the upcoming CPU. The vector space is protected by the vector lock so no
      changes can happen.
      
      Marking the CPU online before setting up the vector space makes tracing
      work in the early vector management cpu online code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213155.264311994@linutronix.de
      8ed4f3e6
    • T
      x86/vector: Add vector domain debugfs support · 65d7ed57
      Thomas Gleixner 提交于
      Add the debug callback for the vector domain, which gives a detailed
      information about vector usage if invoked for the domain by using rhe
      matrix allocator debug function and vector/target information when invoked
      for a particular interrupt.
      
      Extra information foir the Vector domain:
      
      Online bitmaps:       32
      Global available:   6352
      Global reserved:       5
      Total allocated:      20
      System: 41: 0-19,32,50,128,238-255
       | CPU | avl | man | act | vectors
           0   183     4    19  33-48,51-53
           1   199     4     1  33
           2   199     4     0  
      
      Extra information for interrupts:
      
           Vector:    42
           Target:     4
      
      This allows a detailed analysis of the vector usage and the association to
      interrupts and devices.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213155.188137174@linutronix.de
      65d7ed57
    • T
      x86/irq/vector: Initialize matrix allocator · 0fa115da
      Thomas Gleixner 提交于
      Initialize the matrix allocator and add the proper accounting points to the
      code.
      
      No functional change, just preparation.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213155.108410660@linutronix.de
      0fa115da
    • T
      x86/apic: Add replacement for cpu_mask_to_apicid() · 9f9e3bb1
      Thomas Gleixner 提交于
      As preparation for replacing the vector allocator, provide a new function
      which takes a cpu number instead of a cpu mask to calculate/lookup the
      resulting APIC destination id.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      9f9e3bb1
    • T
      x86/vector: Move helper functions around · 99a1482d
      Thomas Gleixner 提交于
      Move the helper functions to a different place as they would end up in the
      middle of management functions.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213154.949581934@linutronix.de
      99a1482d
    • T
      x86/vector: Remove pointless pointer checks · 258d86ee
      Thomas Gleixner 提交于
      The info pointer checks in assign_irq_vector_policy() are pointless because
      the pointer cannot be NULL, otherwise the calling code would already crash.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213154.859484148@linutronix.de
      258d86ee