1. 05 12月, 2017 1 次提交
    • M
      livepatch: send a fake signal to all blocking tasks · 43347d56
      Miroslav Benes 提交于
      Live patching consistency model is of LEAVE_PATCHED_SET and
      SWITCH_THREAD. This means that all tasks in the system have to be marked
      one by one as safe to call a new patched function. Safe means when a
      task is not (sleeping) in a set of patched functions. That is, no
      patched function is on the task's stack. Another clearly safe place is
      the boundary between kernel and userspace. The patching waits for all
      tasks to get outside of the patched set or to cross the boundary. The
      transition is completed afterwards.
      
      The problem is that a task can block the transition for quite a long
      time, if not forever. It could sleep in a set of patched functions, for
      example.  Luckily we can force the task to leave the set by sending it a
      fake signal, that is a signal with no data in signal pending structures
      (no handler, no sign of proper signal delivered). Suspend/freezer use
      this to freeze the tasks as well. The task gets TIF_SIGPENDING set and
      is woken up (if it has been sleeping in the kernel before) or kicked by
      rescheduling IPI (if it was running on other CPU). This causes the task
      to go to kernel/userspace boundary where the signal would be handled and
      the task would be marked as safe in terms of live patching.
      
      There are tasks which are not affected by this technique though. The
      fake signal is not sent to kthreads. They should be handled differently.
      They can be woken up so they leave the patched set and their
      TIF_PATCH_PENDING can be cleared thanks to stack checking.
      
      For the sake of completeness, if the task is in TASK_RUNNING state but
      not currently running on some CPU it doesn't get the IPI, but it would
      eventually handle the signal anyway. Second, if the task runs in the
      kernel (in TASK_RUNNING state) it gets the IPI, but the signal is not
      handled on return from the interrupt. It would be handled on return to
      the userspace in the future when the fake signal is sent again. Stack
      checking deals with these cases in a better way.
      
      If the task was sleeping in a syscall it would be woken by our fake
      signal, it would check if TIF_SIGPENDING is set (by calling
      signal_pending() predicate) and return ERESTART* or EINTR. Syscalls with
      ERESTART* return values are restarted in case of the fake signal (see
      do_signal()). EINTR is propagated back to the userspace program. This
      could disturb the program, but...
      
      * each process dealing with signals should react accordingly to EINTR
        return values.
      * syscalls returning EINTR happen to be quite common situation in the
        system even if no fake signal is sent.
      * freezer sends the fake signal and does not deal with EINTR anyhow.
        Thus EINTR values are returned when the system is resumed.
      
      The very safe marking is done in architectures' "entry" on syscall and
      interrupt/exception exit paths, and in a stack checking functions of
      livepatch.  TIF_PATCH_PENDING is cleared and the next
      recalc_sigpending() drops TIF_SIGPENDING. In connection with this, also
      call klp_update_patch_state() before do_signal(), so that
      recalc_sigpending() in dequeue_signal() can clear TIF_PATCH_PENDING
      immediately and thus prevent a double call of do_signal().
      
      Note that the fake signal is not sent to stopped/traced tasks. Such task
      prevents the patching to finish till it continues again (is not traced
      anymore).
      
      Last, sending the fake signal is not automatic. It is done only when
      admin requests it by writing 1 to signal sysfs attribute in livepatch
      sysfs directory.
      Signed-off-by: NMiroslav Benes <mbenes@suse.cz>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: x86@kernel.org
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      43347d56
  2. 08 11月, 2017 1 次提交
  3. 25 10月, 2017 1 次提交
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
  4. 08 7月, 2017 1 次提交
    • T
      x86/syscalls: Check address limit on user-mode return · 5ea0727b
      Thomas Garnier 提交于
      Ensure the address limit is a user-mode segment before returning to
      user-mode. Otherwise a process can corrupt kernel-mode memory and elevate
      privileges [1].
      
      The set_fs function sets the TIF_SETFS flag to force a slow path on
      return. In the slow path, the address limit is checked to be USER_DS if
      needed.
      
      The addr_limit_user_check function is added as a cross-architecture
      function to check the address limit.
      
      [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: kernel-hardening@lists.openwall.com
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Will Drewry <wad@chromium.org>
      Cc: linux-api@vger.kernel.org
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: http://lkml.kernel.org/r/20170615011203.144108-1-thgarnie@google.com
      5ea0727b
  5. 08 3月, 2017 1 次提交
  6. 02 3月, 2017 1 次提交
  7. 25 12月, 2016 1 次提交
  8. 15 9月, 2016 2 次提交
  9. 27 7月, 2016 1 次提交
  10. 10 7月, 2016 2 次提交
  11. 15 6月, 2016 2 次提交
  12. 19 4月, 2016 1 次提交
  13. 10 3月, 2016 3 次提交
  14. 17 2月, 2016 1 次提交
  15. 30 1月, 2016 1 次提交
  16. 29 1月, 2016 1 次提交
  17. 21 12月, 2015 1 次提交
  18. 18 10月, 2015 1 次提交
  19. 09 10月, 2015 12 次提交
  20. 07 10月, 2015 1 次提交
  21. 05 8月, 2015 1 次提交
  22. 17 7月, 2015 1 次提交
  23. 07 7月, 2015 2 次提交
    • A
      x86/entry: Add new, comprehensible entry and exit handlers written in C · c5c46f59
      Andy Lutomirski 提交于
      The current x86 entry and exit code, written in a mixture of assembly and
      C code, is incomprehensible due to being open-coded in a lot of places
      without coherent documentation.
      
      It appears to work primary by luck and duct tape: i.e. obvious runtime
      failures were fixed on-demand, without re-thinking the design.
      
      Due to those reasons our confidence level in that code is low, and it is
      very difficult to incrementally improve.
      
      Add new code written in C, in preparation for simply deleting the old
      entry code.
      
      prepare_exit_to_usermode() is a new function that will handle all
      slow path exits to user mode.  It is called with IRQs disabled
      and it leaves us in a state in which it is safe to immediately
      return to user mode.  IRQs must not be re-enabled at any point
      after prepare_exit_to_usermode() returns and user mode is actually
      entered. (We can, of course, fail to enter user mode and treat
      that failure as a fresh entry to kernel mode.)
      
      All callers of do_notify_resume() will be migrated to call
      prepare_exit_to_usermode() instead; prepare_exit_to_usermode() needs
      to do everything that do_notify_resume() does today, but it also
      takes care of scheduling and context tracking.  Unlike
      do_notify_resume(), it does not need to be called in a loop.
      
      syscall_return_slowpath() is exactly what it sounds like: it will
      be called on any syscall exit slow path. It will replace
      syscall_trace_leave() and it calls prepare_exit_to_usermode() on the
      way out.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/c57c8b87661a4152801d7d3786eac2d1a2f209dd.1435952415.git.luto@kernel.org
      [ Improved the changelog a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c5c46f59
    • A
      x86/entry: Add enter_from_user_mode() and use it in syscalls · feed36cd
      Andy Lutomirski 提交于
      Changing the x86 context tracking hooks is dangerous because
      there are no good checks that we track our context correctly.
      Add a helper to check that we're actually in CONTEXT_USER when
      we enter from user mode and wire it up for syscall entries.
      
      Subsequent patches will wire this up for all non-NMI entries as
      well.  NMIs are their own special beast and cannot currently
      switch overall context tracking state.  Instead, they have their
      own special RCU hooks.
      
      This is a tiny speedup if !CONFIG_CONTEXT_TRACKING (removes a
      branch) and a tiny slowdown if CONFIG_CONTEXT_TRACING (adds a
      layer of indirection).  Eventually, we should fix up the core
      context tracking code to supply a function that does what we
      want (and can be much simpler than user_exit), which will enable
      us to get rid of the extra call.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/853b42420066ec3fb856779cdc223a6dcb5d355b.1435952415.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      feed36cd