1. 16 1月, 2016 1 次提交
  2. 20 12月, 2015 6 次提交
  3. 04 10月, 2015 1 次提交
  4. 22 9月, 2015 1 次提交
  5. 21 7月, 2015 1 次提交
  6. 20 7月, 2015 2 次提交
  7. 20 6月, 2015 1 次提交
  8. 19 5月, 2015 1 次提交
  9. 08 5月, 2015 1 次提交
    • D
      futex: Implement lockless wakeups · 1d0dcb3a
      Davidlohr Bueso 提交于
      Given the overall futex architecture, any chance of reducing
      hb->lock contention is welcome. In this particular case, using
      wake-queues to enable lockless wakeups addresses very much real
      world performance concerns, even cases of soft-lockups in cases
      of large amounts of blocked tasks (which is not hard to find in
      large boxes, using but just a handful of futex).
      
      At the lowest level, this patch can reduce latency of a single thread
      attempting to acquire hb->lock in highly contended scenarios by a
      up to 2x. At lower counts of nr_wake there are no regressions,
      confirming, of course, that the wake_q handling overhead is practically
      non existent. For instance, while a fair amount of variation,
      the extended pef-bench wakeup benchmark shows for a 20 core machine
      the following avg per-thread time to wakeup its share of tasks:
      
      	nr_thr	ms-before	ms-after
      	16 	0.0590		0.0215
      	32 	0.0396		0.0220
      	48 	0.0417		0.0182
      	64 	0.0536		0.0236
      	80 	0.0414		0.0097
      	96 	0.0672		0.0152
      
      Naturally, this can cause spurious wakeups. However there is no core code
      that cannot handle them afaict, and furthermore tglx does have the point
      that other events can already trigger them anyway.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Mason <clm@fb.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: George Spelvin <linux@horizon.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1430494072-30283-3-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1d0dcb3a
  10. 22 4月, 2015 1 次提交
  11. 18 2月, 2015 1 次提交
  12. 13 2月, 2015 1 次提交
    • A
      all arches, signal: move restart_block to struct task_struct · f56141e3
      Andy Lutomirski 提交于
      If an attacker can cause a controlled kernel stack overflow, overwriting
      the restart block is a very juicy exploit target.  This is because the
      restart_block is held in the same memory allocation as the kernel stack.
      
      Moving the restart block to struct task_struct prevents this exploit by
      making the restart_block harder to locate.
      
      Note that there are other fields in thread_info that are also easy
      targets, at least on some architectures.
      
      It's also a decent simplification, since the restart code is more or less
      identical on all architectures.
      
      [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: David Miller <davem@davemloft.net>
      Acked-by: NRichard Weinberger <richard@nod.at>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f56141e3
  13. 19 1月, 2015 1 次提交
    • M
      futex: Fix argument handling in futex_lock_pi() calls · 996636dd
      Michael Kerrisk 提交于
      This patch fixes two separate buglets in calls to futex_lock_pi():
      
        * Eliminate unused 'detect' argument
        * Change unused 'timeout' argument of FUTEX_TRYLOCK_PI to NULL
      
      The 'detect' argument of futex_lock_pi() seems never to have been
      used (when it was included with the initial PI mutex implementation
      in Linux 2.6.18, all checks against its value were disabled by
      ANDing against 0 (i.e., if (detect... && 0)), and with
      commit 778e9a9c, any mention of
      this argument in futex_lock_pi() went way altogether. Its presence
      now serves only to confuse readers of the code, by giving the
      impression that the futex() FUTEX_LOCK_PI operation actually does
      use the 'val' argument. This patch removes the argument.
      
      The futex_lock_pi() call that corresponds to FUTEX_TRYLOCK_PI includes
      'timeout' as one of its arguments. This misleads the reader into thinking
      that the FUTEX_TRYLOCK_PI operation does employ timeouts for some sensible
      purpose; but it does not.  Indeed, it cannot, because the checks at the
      start of sys_futex() exclude FUTEX_TRYLOCK_PI from the set of operations
      that do copy_from_user() on the timeout argument. So, in the
      FUTEX_TRYLOCK_PI futex_lock_pi() call it would be simplest to change
      'timeout' to 'NULL'. This patch does that.
      Signed-off-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Reviewed-by: NDarren Hart <darren@dvhart.com>
      Link: http://lkml.kernel.org/r/54B96646.8010200@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      996636dd
  14. 26 10月, 2014 2 次提交
  15. 19 10月, 2014 1 次提交
    • C
      futex: Ensure get_futex_key_refs() always implies a barrier · 76835b0e
      Catalin Marinas 提交于
      Commit b0c29f79 (futexes: Avoid taking the hb->lock if there's
      nothing to wake up) changes the futex code to avoid taking a lock when
      there are no waiters. This code has been subsequently fixed in commit
      11d4616b (futex: revert back to the explicit waiter counting code).
      Both the original commit and the fix-up rely on get_futex_key_refs() to
      always imply a barrier.
      
      However, for private futexes, none of the cases in the switch statement
      of get_futex_key_refs() would be hit and the function completes without
      a memory barrier as required before checking the "waiters" in
      futex_wake() -> hb_waiters_pending(). The consequence is a race with a
      thread waiting on a futex on another CPU, allowing the waker thread to
      read "waiters == 0" while the waiter thread to have read "futex_val ==
      locked" (in kernel).
      
      Without this fix, the problem (user space deadlocks) can be seen with
      Android bionic's mutex implementation on an arm64 multi-cluster system.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: NMatteo Franchin <Matteo.Franchin@arm.com>
      Fixes: b0c29f79 (futexes: Avoid taking the hb->lock if there's nothing to wake up)
      Acked-by: NDavidlohr Bueso <dave@stgolabs.net>
      Tested-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Cc: <stable@vger.kernel.org>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76835b0e
  16. 13 9月, 2014 1 次提交
  17. 22 6月, 2014 6 次提交
  18. 06 6月, 2014 4 次提交
    • T
      futex: Make lookup_pi_state more robust · 54a21788
      Thomas Gleixner 提交于
      The current implementation of lookup_pi_state has ambigous handling of
      the TID value 0 in the user space futex.  We can get into the kernel
      even if the TID value is 0, because either there is a stale waiters bit
      or the owner died bit is set or we are called from the requeue_pi path
      or from user space just for fun.
      
      The current code avoids an explicit sanity check for pid = 0 in case
      that kernel internal state (waiters) are found for the user space
      address.  This can lead to state leakage and worse under some
      circumstances.
      
      Handle the cases explicit:
      
             Waiter | pi_state | pi->owner | uTID      | uODIED | ?
      
        [1]  NULL   | ---      | ---       | 0         | 0/1    | Valid
        [2]  NULL   | ---      | ---       | >0        | 0/1    | Valid
      
        [3]  Found  | NULL     | --        | Any       | 0/1    | Invalid
      
        [4]  Found  | Found    | NULL      | 0         | 1      | Valid
        [5]  Found  | Found    | NULL      | >0        | 1      | Invalid
      
        [6]  Found  | Found    | task      | 0         | 1      | Valid
      
        [7]  Found  | Found    | NULL      | Any       | 0      | Invalid
      
        [8]  Found  | Found    | task      | ==taskTID | 0/1    | Valid
        [9]  Found  | Found    | task      | 0         | 0      | Invalid
        [10] Found  | Found    | task      | !=taskTID | 0/1    | Invalid
      
       [1] Indicates that the kernel can acquire the futex atomically. We
           came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.
      
       [2] Valid, if TID does not belong to a kernel thread. If no matching
           thread is found then it indicates that the owner TID has died.
      
       [3] Invalid. The waiter is queued on a non PI futex
      
       [4] Valid state after exit_robust_list(), which sets the user space
           value to FUTEX_WAITERS | FUTEX_OWNER_DIED.
      
       [5] The user space value got manipulated between exit_robust_list()
           and exit_pi_state_list()
      
       [6] Valid state after exit_pi_state_list() which sets the new owner in
           the pi_state but cannot access the user space value.
      
       [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.
      
       [8] Owner and user space value match
      
       [9] There is no transient state which sets the user space TID to 0
           except exit_robust_list(), but this is indicated by the
           FUTEX_OWNER_DIED bit. See [4]
      
      [10] There is no transient state which leaves owner and user space
           TID out of sync.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54a21788
    • T
      futex: Always cleanup owner tid in unlock_pi · 13fbca4c
      Thomas Gleixner 提交于
      If the owner died bit is set at futex_unlock_pi, we currently do not
      cleanup the user space futex.  So the owner TID of the current owner
      (the unlocker) persists.  That's observable inconsistant state,
      especially when the ownership of the pi state got transferred.
      
      Clean it up unconditionally.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13fbca4c
    • T
      futex: Validate atomic acquisition in futex_lock_pi_atomic() · b3eaa9fc
      Thomas Gleixner 提交于
      We need to protect the atomic acquisition in the kernel against rogue
      user space which sets the user space futex to 0, so the kernel side
      acquisition succeeds while there is existing state in the kernel
      associated to the real owner.
      
      Verify whether the futex has waiters associated with kernel state.  If
      it has, return -EINVAL.  The state is corrupted already, so no point in
      cleaning it up.  Subsequent calls will fail as well.  Not our problem.
      
      [ tglx: Use futex_top_waiter() and explain why we do not need to try
        	restoring the already corrupted user space state. ]
      Signed-off-by: NDarren Hart <dvhart@linux.intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Will Drewry <wad@chromium.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3eaa9fc
    • T
      futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in... · e9c243a5
      Thomas Gleixner 提交于
      futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)
      
      If uaddr == uaddr2, then we have broken the rule of only requeueing from
      a non-pi futex to a pi futex with this call.  If we attempt this, then
      dangling pointers may be left for rt_waiter resulting in an exploitable
      condition.
      
      This change brings futex_requeue() in line with futex_wait_requeue_pi()
      which performs the same check as per commit 6f7b0a2a ("futex: Forbid
      uaddr == uaddr2 in futex_wait_requeue_pi()")
      
      [ tglx: Compare the resulting keys as well, as uaddrs might be
        	different depending on the mapping ]
      
      Fixes CVE-2014-3153.
      
      Reported-by: Pinkie Pie
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9c243a5
  19. 19 5月, 2014 2 次提交
    • T
      futex: Prevent attaching to kernel threads · f0d71b3d
      Thomas Gleixner 提交于
      We happily allow userspace to declare a random kernel thread to be the
      owner of a user space PI futex.
      
      Found while analysing the fallout of Dave Jones syscall fuzzer.
      
      We also should validate the thread group for private futexes and find
      some fast way to validate whether the "alleged" owner has RW access on
      the file which backs the SHM, but that's a separate issue.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Darren Hart <darren@dvhart.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Carlos ODonell <carlos@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Link: http://lkml.kernel.org/r/20140512201701.194824402@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      f0d71b3d
    • T
      futex: Add another early deadlock detection check · 866293ee
      Thomas Gleixner 提交于
      Dave Jones trinity syscall fuzzer exposed an issue in the deadlock
      detection code of rtmutex:
        http://lkml.kernel.org/r/20140429151655.GA14277@redhat.com
      
      That underlying issue has been fixed with a patch to the rtmutex code,
      but the futex code must not call into rtmutex in that case because
          - it can detect that issue early
          - it avoids a different and more complex fixup for backing out
      
      If the user space variable got manipulated to 0x80000000 which means
      no lock holder, but the waiters bit set and an active pi_state in the
      kernel is found we can figure out the recursive locking issue by
      looking at the pi_state owner. If that is the current task, then we
      can safely return -EDEADLK.
      
      The check should have been added in commit 59fa6245 (futex: Handle
      futex_pi OWNER_DIED take over correctly) already, but I did not see
      the above issue caused by user space manipulation back then.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Darren Hart <darren@dvhart.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Carlos ODonell <carlos@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Link: http://lkml.kernel.org/r/20140512201701.097349971@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      866293ee
  20. 18 4月, 2014 1 次提交
  21. 13 4月, 2014 1 次提交
  22. 09 4月, 2014 1 次提交
    • L
      futex: avoid race between requeue and wake · 69cd9eba
      Linus Torvalds 提交于
      Jan Stancek reported:
       "pthread_cond_broadcast/4-1.c testcase from openposix testsuite (LTP)
        occasionally fails, because some threads fail to wake up.
      
        Testcase creates 5 threads, which are all waiting on same condition.
        Main thread then calls pthread_cond_broadcast() without holding mutex,
        which calls:
      
            futex(uaddr1, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, uaddr2, ..)
      
        This immediately wakes up single thread A, which unlocks mutex and
        tries to wake up another thread:
      
            futex(uaddr2, FUTEX_WAKE_PRIVATE, 1)
      
        If thread A manages to call futex_wake() before any waiters are
        requeued for uaddr2, no other thread is woken up"
      
      The ordering constraints for the hash bucket waiter counting are that
      the waiter counts have to be incremented _before_ getting the spinlock
      (because the spinlock acts as part of the memory barrier), but the
      "requeue" operation didn't honor those rules, and nobody had even
      thought about that case.
      
      This fairly simple patch just increments the waiter count for the target
      hash bucket (hb2) when requeing a futex before taking the locks.  It
      then decrements them again after releasing the lock - the code that
      actually moves the futex(es) between hash buckets will do the additional
      required waiter count housekeeping.
      Reported-and-tested-by: NJan Stancek <jstancek@redhat.com>
      Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org # 3.14
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69cd9eba
  23. 21 3月, 2014 1 次提交
    • L
      futex: revert back to the explicit waiter counting code · 11d4616b
      Linus Torvalds 提交于
      Srikar Dronamraju reports that commit b0c29f79 ("futexes: Avoid
      taking the hb->lock if there's nothing to wake up") causes java threads
      getting stuck on futexes when runing specjbb on a power7 numa box.
      
      The cause appears to be that the powerpc spinlocks aren't using the same
      ticket lock model that we use on x86 (and other) architectures, which in
      turn result in the "spin_is_locked()" test in hb_waiters_pending()
      occasionally reporting an unlocked spinlock even when there are pending
      waiters.
      
      So this reinstates Davidlohr Bueso's original explicit waiter counting
      code, which I had convinced Davidlohr to drop in favor of figuring out
      the pending waiters by just using the existing state of the spinlock and
      the wait queue.
      Reported-and-tested-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Original-code-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      11d4616b
  24. 03 3月, 2014 1 次提交