1. 24 7月, 2020 6 次提交
  2. 09 7月, 2020 2 次提交
  3. 07 7月, 2020 1 次提交
  4. 05 7月, 2020 1 次提交
  5. 01 7月, 2020 2 次提交
  6. 13 6月, 2020 1 次提交
    • T
      x86/entry: Force rcu_irq_enter() when in idle task · 0bf3924b
      Thomas Gleixner 提交于
      The idea of conditionally calling into rcu_irq_enter() only when RCU is
      not watching turned out to be not completely thought through.
      
      Paul noticed occasional premature end of grace periods in RCU torture
      testing. Bisection led to the commit which made the invocation of
      rcu_irq_enter() conditional on !rcu_is_watching().
      
      It turned out that this conditional breaks RCU assumptions about the idle
      task when the scheduler tick happens to be a nested interrupt. Nested
      interrupts can happen when the first interrupt invokes softirq processing
      on return which enables interrupts.
      
      If that nested tick interrupt does not invoke rcu_irq_enter() then the
      RCU's irq-nesting checks will believe that this interrupt came directly
      from idle, which will cause RCU to report a quiescent state.  Because this
      interrupt instead came from a softirq handler which might have been
      executing an RCU read-side critical section, this can cause the grace
      period to end prematurely.
      
      Change the condition from !rcu_is_watching() to is_idle_task(current) which
      enforces that interrupts in the idle task unconditionally invoke
      rcu_irq_enter() independent of the RCU state.
      
      This is also correct vs. user mode entries in NOHZ full scenarios because
      user mode entries bring RCU out of EQS and force the RCU irq nesting state
      accounting to nested. As only the first interrupt can enter from user mode
      a nested tick interrupt will enter from kernel mode and as the nesting
      state accounting is forced to nesting it will not do anything stupid even
      if rcu_irq_enter() has not been invoked.
      
      Fixes: 3eeec385 ("x86/entry: Provide idtentry_entry/exit_cond_rcu()")
      Reported-by: N"Paul E. McKenney" <paulmck@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: N"Paul E. McKenney" <paulmck@kernel.org>
      Reviewed-by: N"Paul E. McKenney" <paulmck@kernel.org>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NFrederic Weisbecker <frederic@kernel.org>
      Link: https://lkml.kernel.org/r/87wo4cxubv.fsf@nanos.tec.linutronix.de
      0bf3924b
  7. 11 6月, 2020 12 次提交
  8. 21 3月, 2020 2 次提交
  9. 18 2月, 2020 1 次提交
  10. 16 11月, 2019 1 次提交
    • T
      x86/ioperm: Move TSS bitmap update to exit to user work · 22fe5b04
      Thomas Gleixner 提交于
      There is no point to update the TSS bitmap for tasks which use I/O bitmaps
      on every context switch. It's enough to update it right before exiting to
      user space.
      
      That reduces the context switch bitmap handling to invalidating the io
      bitmap base offset in the TSS when the outgoing task has TIF_IO_BITMAP
      set. The invaldiation is done on purpose when a task with an IO bitmap
      switches out to prevent any possible leakage of an activated IO bitmap.
      
      It also removes the requirement to update the tasks bitmap atomically in
      ioperm().
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      22fe5b04
  11. 22 7月, 2019 1 次提交
    • A
      x86/syscalls: Split the x32 syscalls into their own table · 6365b842
      Andy Lutomirski 提交于
      For unfortunate historical reasons, the x32 syscalls and the x86_64
      syscalls are not all numbered the same.  As an example, ioctl() is nr 16 on
      x86_64 but 514 on x32.
      
      This has potentially nasty consequences, since it means that there are two
      valid RAX values to do ioctl(2) and two invalid RAX values.  The valid
      values are 16 (i.e. ioctl(2) using the x86_64 ABI) and (514 | 0x40000000)
      (i.e. ioctl(2) using the x32 ABI).
      
      The invalid values are 514 and (16 | 0x40000000).  514 will enter the
      "COMPAT_SYSCALL_DEFINE3(ioctl, ...)" entry point with in_compat_syscall()
      and in_x32_syscall() returning false, whereas (16 | 0x40000000) will enter
      the native entry point with in_compat_syscall() and in_x32_syscall()
      returning true.  Both are bogus, and both will exercise code paths in the
      kernel and in any running seccomp filters that really ought to be
      unreachable.
      
      Splitting out the x32 syscalls into their own tables, allows both bogus
      invocations to return -ENOSYS.  I've checked glibc, musl, and Bionic, and
      all of them appear to call syscalls with their correct numbers, so this
      change should have no effect on them.
      
      There is an added benefit going forward: new syscalls that need special
      handling on x32 can share the same number on x32 and x86_64.  This means
      that the special syscall range 512-547 can be treated as a legacy wart
      instead of something that may need to be extended in the future.
      
      Also add a selftest to verify the new behavior.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/208024256b764312598f014ebfb0a42472c19354.1562185330.git.luto@kernel.org
      6365b842
  12. 27 6月, 2019 1 次提交
  13. 05 6月, 2019 1 次提交
  14. 13 4月, 2019 1 次提交
    • R
      x86/fpu: Defer FPU state load until return to userspace · 5f409e20
      Rik van Riel 提交于
      Defer loading of FPU state until return to userspace. This gives
      the kernel the potential to skip loading FPU state for tasks that
      stay in kernel mode, or for tasks that end up with repeated
      invocations of kernel_fpu_begin() & kernel_fpu_end().
      
      The fpregs_lock/unlock() section ensures that the registers remain
      unchanged. Otherwise a context switch or a bottom half could save the
      registers to its FPU context and the processor's FPU registers would
      became random if modified at the same time.
      
      KVM swaps the host/guest registers on entry/exit path. This flow has
      been kept as is. First it ensures that the registers are loaded and then
      saves the current (host) state before it loads the guest's registers. The
      swap is done at the very end with disabled interrupts so it should not
      change anymore before theg guest is entered. The read/save version seems
      to be cheaper compared to memcpy() in a micro benchmark.
      
      Each thread gets TIF_NEED_FPU_LOAD set as part of fork() / fpu__copy().
      For kernel threads, this flag gets never cleared which avoids saving /
      restoring the FPU state for kernel threads and during in-kernel usage of
      the FPU registers.
      
       [
         bp: Correct and update commit message and fix checkpatch warnings.
         s/register/registers/ where it is used in plural.
         minor comment corrections.
         remove unused trace_x86_fpu_activate_state() TP.
       ]
      Signed-off-by: NRik van Riel <riel@surriel.com>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aubrey Li <aubrey.li@intel.com>
      Cc: Babu Moger <Babu.Moger@amd.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dmitry Safonov <dima@arista.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: x86-ml <x86@kernel.org>
      Cc: Yi Wang <wang.yi59@zte.com.cn>
      Link: https://lkml.kernel.org/r/20190403164156.19645-24-bigeasy@linutronix.de
      5f409e20
  15. 07 3月, 2019 1 次提交
  16. 03 12月, 2018 1 次提交
    • I
      x86: Fix various typos in comments · a97673a1
      Ingo Molnar 提交于
      Go over arch/x86/ and fix common typos in comments,
      and a typo in an actual function argument name.
      
      No change in functionality intended.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a97673a1
  17. 23 6月, 2018 1 次提交
    • W
      rseq: Avoid infinite recursion when delivering SIGSEGV · 784e0300
      Will Deacon 提交于
      When delivering a signal to a task that is using rseq, we call into
      __rseq_handle_notify_resume() so that the registers pushed in the
      sigframe are updated to reflect the state of the restartable sequence
      (for example, ensuring that the signal returns to the abort handler if
      necessary).
      
      However, if the rseq management fails due to an unrecoverable fault when
      accessing userspace or certain combinations of RSEQ_CS_* flags, then we
      will attempt to deliver a SIGSEGV. This has the potential for infinite
      recursion if the rseq code continuously fails on signal delivery.
      
      Avoid this problem by using force_sigsegv() instead of force_sig(), which
      is explicitly designed to reset the SEGV handler to SIG_DFL in the case
      of a recursive fault. In doing so, remove rseq_signal_deliver() from the
      internal rseq API and have an optional struct ksignal * parameter to
      rseq_handle_notify_resume() instead.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: peterz@infradead.org
      Cc: paulmck@linux.vnet.ibm.com
      Cc: boqun.feng@gmail.com
      Link: https://lkml.kernel.org/r/1529664307-983-1-git-send-email-will.deacon@arm.com
      784e0300
  18. 06 6月, 2018 1 次提交
    • M
      x86: Add support for restartable sequences · d6761b8f
      Mathieu Desnoyers 提交于
      Call the rseq_handle_notify_resume() function on return to userspace if
      TIF_NOTIFY_RESUME thread flag is set.
      
      Perform fixup on the pre-signal frame when a signal is delivered on top
      of a restartable sequence critical section.
      
      Check that system calls are not invoked from within rseq critical
      sections by invoking rseq_signal() from syscall_return_slowpath().
      With CONFIG_DEBUG_RSEQ, such behavior results in termination of the
      process with SIGSEGV.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Chris Lameter <cl@linux.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ben Maurer <bmaurer@fb.com>
      Cc: linux-api@vger.kernel.org
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20180602124408.8430-7-mathieu.desnoyers@efficios.com
      d6761b8f
  19. 05 4月, 2018 3 次提交
    • D
      syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64 · f8781c4a
      Dominik Brodowski 提交于
      Removing CONFIG_SYSCALL_PTREGS from arch/x86/Kconfig and simply selecting
      ARCH_HAS_SYSCALL_WRAPPER unconditionally on x86-64 allows us to simplify
      several codepaths.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180405095307.3730-7-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f8781c4a
    • D
      syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32 · ebeb8c82
      Dominik Brodowski 提交于
      Extend ARCH_HAS_SYSCALL_WRAPPER for i386 emulation and for x32 on 64-bit
      x86.
      
      For x32, all we need to do is to create an additional stub for each
      compat syscall which decodes the parameters in x86-64 ordering, e.g.:
      
      	asmlinkage long __compat_sys_x32_xyzzy(struct pt_regs *regs)
      	{
      		return c_SyS_xyzzy(regs->di, regs->si, regs->dx);
      	}
      
      For i386 emulation, we need to teach compat_sys_*() to take struct
      pt_regs as its only argument, e.g.:
      
      	asmlinkage long __compat_sys_ia32_xyzzy(struct pt_regs *regs)
      	{
      		return c_SyS_xyzzy(regs->bx, regs->cx, regs->dx);
      	}
      
      In addition, we need to create additional stubs for common syscalls
      (that is, for syscalls which have the same parameters on 32-bit and
      64-bit), e.g.:
      
      	asmlinkage long __sys_ia32_xyzzy(struct pt_regs *regs)
      	{
      		return c_sys_xyzzy(regs->bx, regs->cx, regs->dx);
      	}
      
      This approach avoids leaking random user-provided register content down
      the call chain.
      
      This patch is based on an original proof-of-concept
      
       | From: Linus Torvalds <torvalds@linux-foundation.org>
       | Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
      
      and was split up and heavily modified by me, in particular to base it on
      ARCH_HAS_SYSCALL_WRAPPER.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180405095307.3730-6-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ebeb8c82
    • D
      syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls · fa697140
      Dominik Brodowski 提交于
      Let's make use of ARCH_HAS_SYSCALL_WRAPPER=y on pure 64-bit x86-64 systems:
      
      Each syscall defines a stub which takes struct pt_regs as its only
      argument. It decodes just those parameters it needs, e.g:
      
      	asmlinkage long sys_xyzzy(const struct pt_regs *regs)
      	{
      		return SyS_xyzzy(regs->di, regs->si, regs->dx);
      	}
      
      This approach avoids leaking random user-provided register content down
      the call chain.
      
      For example, for sys_recv() which is a 4-parameter syscall, the assembly
      now is (in slightly reordered fashion):
      
      	<sys_recv>:
      		callq	<__fentry__>
      
      		/* decode regs->di, ->si, ->dx and ->r10 */
      		mov	0x70(%rdi),%rdi
      		mov	0x68(%rdi),%rsi
      		mov	0x60(%rdi),%rdx
      		mov	0x38(%rdi),%rcx
      
      		[ SyS_recv() is automatically inlined by the compiler,
      		  as it is not [yet] used anywhere else ]
      		/* clear %r9 and %r8, the 5th and 6th args */
      		xor	%r9d,%r9d
      		xor	%r8d,%r8d
      
      		/* do the actual work */
      		callq	__sys_recvfrom
      
      		/* cleanup and return */
      		cltq
      		retq
      
      The only valid place in an x86-64 kernel which rightfully calls
      a syscall function on its own -- vsyscall -- needs to be modified
      to pass struct pt_regs onwards as well.
      
      To keep the syscall table generation working independent of
      SYSCALL_PTREGS being enabled, the stubs are named the same as the
      "original" syscall stubs, i.e. sys_*().
      
      This patch is based on an original proof-of-concept
      
       | From: Linus Torvalds <torvalds@linux-foundation.org>
       | Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
      
      and was split up and heavily modified by me, in particular to base it on
      ARCH_HAS_SYSCALL_WRAPPER, to limit it to 64-bit-only for the time being,
      and to update the vsyscall to the new calling convention.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180405095307.3730-4-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fa697140