1. 05 4月, 2018 4 次提交
    • D
      syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64 · f8781c4a
      Dominik Brodowski 提交于
      Removing CONFIG_SYSCALL_PTREGS from arch/x86/Kconfig and simply selecting
      ARCH_HAS_SYSCALL_WRAPPER unconditionally on x86-64 allows us to simplify
      several codepaths.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180405095307.3730-7-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f8781c4a
    • D
      syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32 · ebeb8c82
      Dominik Brodowski 提交于
      Extend ARCH_HAS_SYSCALL_WRAPPER for i386 emulation and for x32 on 64-bit
      x86.
      
      For x32, all we need to do is to create an additional stub for each
      compat syscall which decodes the parameters in x86-64 ordering, e.g.:
      
      	asmlinkage long __compat_sys_x32_xyzzy(struct pt_regs *regs)
      	{
      		return c_SyS_xyzzy(regs->di, regs->si, regs->dx);
      	}
      
      For i386 emulation, we need to teach compat_sys_*() to take struct
      pt_regs as its only argument, e.g.:
      
      	asmlinkage long __compat_sys_ia32_xyzzy(struct pt_regs *regs)
      	{
      		return c_SyS_xyzzy(regs->bx, regs->cx, regs->dx);
      	}
      
      In addition, we need to create additional stubs for common syscalls
      (that is, for syscalls which have the same parameters on 32-bit and
      64-bit), e.g.:
      
      	asmlinkage long __sys_ia32_xyzzy(struct pt_regs *regs)
      	{
      		return c_sys_xyzzy(regs->bx, regs->cx, regs->dx);
      	}
      
      This approach avoids leaking random user-provided register content down
      the call chain.
      
      This patch is based on an original proof-of-concept
      
       | From: Linus Torvalds <torvalds@linux-foundation.org>
       | Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
      
      and was split up and heavily modified by me, in particular to base it on
      ARCH_HAS_SYSCALL_WRAPPER.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180405095307.3730-6-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ebeb8c82
    • D
      syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls · fa697140
      Dominik Brodowski 提交于
      Let's make use of ARCH_HAS_SYSCALL_WRAPPER=y on pure 64-bit x86-64 systems:
      
      Each syscall defines a stub which takes struct pt_regs as its only
      argument. It decodes just those parameters it needs, e.g:
      
      	asmlinkage long sys_xyzzy(const struct pt_regs *regs)
      	{
      		return SyS_xyzzy(regs->di, regs->si, regs->dx);
      	}
      
      This approach avoids leaking random user-provided register content down
      the call chain.
      
      For example, for sys_recv() which is a 4-parameter syscall, the assembly
      now is (in slightly reordered fashion):
      
      	<sys_recv>:
      		callq	<__fentry__>
      
      		/* decode regs->di, ->si, ->dx and ->r10 */
      		mov	0x70(%rdi),%rdi
      		mov	0x68(%rdi),%rsi
      		mov	0x60(%rdi),%rdx
      		mov	0x38(%rdi),%rcx
      
      		[ SyS_recv() is automatically inlined by the compiler,
      		  as it is not [yet] used anywhere else ]
      		/* clear %r9 and %r8, the 5th and 6th args */
      		xor	%r9d,%r9d
      		xor	%r8d,%r8d
      
      		/* do the actual work */
      		callq	__sys_recvfrom
      
      		/* cleanup and return */
      		cltq
      		retq
      
      The only valid place in an x86-64 kernel which rightfully calls
      a syscall function on its own -- vsyscall -- needs to be modified
      to pass struct pt_regs onwards as well.
      
      To keep the syscall table generation working independent of
      SYSCALL_PTREGS being enabled, the stubs are named the same as the
      "original" syscall stubs, i.e. sys_*().
      
      This patch is based on an original proof-of-concept
      
       | From: Linus Torvalds <torvalds@linux-foundation.org>
       | Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
      
      and was split up and heavily modified by me, in particular to base it on
      ARCH_HAS_SYSCALL_WRAPPER, to limit it to 64-bit-only for the time being,
      and to update the vsyscall to the new calling convention.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180405095307.3730-4-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fa697140
    • L
      x86/syscalls: Don't pointlessly reload the system call number · dfe64506
      Linus Torvalds 提交于
      We have it in a register in the low-level asm, just pass it in as an
      argument rather than have do_syscall_64() load it back in from the
      ptregs pointer.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180405095307.3730-2-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dfe64506
  2. 31 1月, 2018 1 次提交
  3. 30 1月, 2018 1 次提交
  4. 05 12月, 2017 1 次提交
    • M
      livepatch: send a fake signal to all blocking tasks · 43347d56
      Miroslav Benes 提交于
      Live patching consistency model is of LEAVE_PATCHED_SET and
      SWITCH_THREAD. This means that all tasks in the system have to be marked
      one by one as safe to call a new patched function. Safe means when a
      task is not (sleeping) in a set of patched functions. That is, no
      patched function is on the task's stack. Another clearly safe place is
      the boundary between kernel and userspace. The patching waits for all
      tasks to get outside of the patched set or to cross the boundary. The
      transition is completed afterwards.
      
      The problem is that a task can block the transition for quite a long
      time, if not forever. It could sleep in a set of patched functions, for
      example.  Luckily we can force the task to leave the set by sending it a
      fake signal, that is a signal with no data in signal pending structures
      (no handler, no sign of proper signal delivered). Suspend/freezer use
      this to freeze the tasks as well. The task gets TIF_SIGPENDING set and
      is woken up (if it has been sleeping in the kernel before) or kicked by
      rescheduling IPI (if it was running on other CPU). This causes the task
      to go to kernel/userspace boundary where the signal would be handled and
      the task would be marked as safe in terms of live patching.
      
      There are tasks which are not affected by this technique though. The
      fake signal is not sent to kthreads. They should be handled differently.
      They can be woken up so they leave the patched set and their
      TIF_PATCH_PENDING can be cleared thanks to stack checking.
      
      For the sake of completeness, if the task is in TASK_RUNNING state but
      not currently running on some CPU it doesn't get the IPI, but it would
      eventually handle the signal anyway. Second, if the task runs in the
      kernel (in TASK_RUNNING state) it gets the IPI, but the signal is not
      handled on return from the interrupt. It would be handled on return to
      the userspace in the future when the fake signal is sent again. Stack
      checking deals with these cases in a better way.
      
      If the task was sleeping in a syscall it would be woken by our fake
      signal, it would check if TIF_SIGPENDING is set (by calling
      signal_pending() predicate) and return ERESTART* or EINTR. Syscalls with
      ERESTART* return values are restarted in case of the fake signal (see
      do_signal()). EINTR is propagated back to the userspace program. This
      could disturb the program, but...
      
      * each process dealing with signals should react accordingly to EINTR
        return values.
      * syscalls returning EINTR happen to be quite common situation in the
        system even if no fake signal is sent.
      * freezer sends the fake signal and does not deal with EINTR anyhow.
        Thus EINTR values are returned when the system is resumed.
      
      The very safe marking is done in architectures' "entry" on syscall and
      interrupt/exception exit paths, and in a stack checking functions of
      livepatch.  TIF_PATCH_PENDING is cleared and the next
      recalc_sigpending() drops TIF_SIGPENDING. In connection with this, also
      call klp_update_patch_state() before do_signal(), so that
      recalc_sigpending() in dequeue_signal() can clear TIF_PATCH_PENDING
      immediately and thus prevent a double call of do_signal().
      
      Note that the fake signal is not sent to stopped/traced tasks. Such task
      prevents the patching to finish till it continues again (is not traced
      anymore).
      
      Last, sending the fake signal is not automatic. It is done only when
      admin requests it by writing 1 to signal sysfs attribute in livepatch
      sysfs directory.
      Signed-off-by: NMiroslav Benes <mbenes@suse.cz>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: x86@kernel.org
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      43347d56
  5. 08 11月, 2017 1 次提交
  6. 25 10月, 2017 1 次提交
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
  7. 08 7月, 2017 1 次提交
    • T
      x86/syscalls: Check address limit on user-mode return · 5ea0727b
      Thomas Garnier 提交于
      Ensure the address limit is a user-mode segment before returning to
      user-mode. Otherwise a process can corrupt kernel-mode memory and elevate
      privileges [1].
      
      The set_fs function sets the TIF_SETFS flag to force a slow path on
      return. In the slow path, the address limit is checked to be USER_DS if
      needed.
      
      The addr_limit_user_check function is added as a cross-architecture
      function to check the address limit.
      
      [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: kernel-hardening@lists.openwall.com
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Will Drewry <wad@chromium.org>
      Cc: linux-api@vger.kernel.org
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: http://lkml.kernel.org/r/20170615011203.144108-1-thgarnie@google.com
      5ea0727b
  8. 08 3月, 2017 1 次提交
  9. 02 3月, 2017 1 次提交
  10. 25 12月, 2016 1 次提交
  11. 15 9月, 2016 2 次提交
  12. 27 7月, 2016 1 次提交
  13. 10 7月, 2016 2 次提交
  14. 15 6月, 2016 2 次提交
  15. 19 4月, 2016 1 次提交
  16. 10 3月, 2016 3 次提交
  17. 17 2月, 2016 1 次提交
  18. 30 1月, 2016 1 次提交
  19. 29 1月, 2016 1 次提交
  20. 21 12月, 2015 1 次提交
  21. 18 10月, 2015 1 次提交
  22. 09 10月, 2015 11 次提交