1. 25 10月, 2016 3 次提交
    • P
      locking/mutex: Restructure wait loop · 5bbd7e64
      Peter Zijlstra 提交于
      Doesn't really matter yet, but pull the HANDOFF and trylock out from
      under the wait_lock.
      
      The intention is to add an optimistic spin loop here, which requires
      we do not hold the wait_lock, so shuffle code around in preparation.
      
      Also clarify the purpose of taking the wait_lock in the wait loop, its
      tempting to want to avoid it altogether, but the cancellation cases
      need to to avoid losing wakeups.
      Suggested-by: NWaiman Long <waiman.long@hpe.com>
      Tested-by: NJason Low <jason.low2@hpe.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5bbd7e64
    • P
      locking/mutex: Add lock handoff to avoid starvation · 9d659ae1
      Peter Zijlstra 提交于
      Implement lock handoff to avoid lock starvation.
      
      Lock starvation is possible because mutex_lock() allows lock stealing,
      where a running (or optimistic spinning) task beats the woken waiter
      to the acquire.
      
      Lock stealing is an important performance optimization because waiting
      for a waiter to wake up and get runtime can take a significant time,
      during which everyboy would stall on the lock.
      
      The down-side is of course that it allows for starvation.
      
      This patch has the waiter requesting a handoff if it fails to acquire
      the lock upon waking. This re-introduces some of the wait time,
      because once we do a handoff we have to wait for the waiter to wake up
      again.
      
      A future patch will add a round of optimistic spinning to attempt to
      alleviate this penalty, but if that turns out to not be enough, we can
      add a counter and only request handoff after multiple failed wakeups.
      
      There are a few tricky implementation details:
      
       - accepting a handoff must only be done in the wait-loop. Since the
         handoff condition is owner == current, it can easily cause
         recursive locking trouble.
      
       - accepting the handoff must be careful to provide the ACQUIRE
         semantics.
      
       - having the HANDOFF bit set on unlock requires care, we must not
         clear the owner.
      
       - we must be careful to not leave HANDOFF set after we've acquired
         the lock. The tricky scenario is setting the HANDOFF bit on an
         unlocked mutex.
      Tested-by: NJason Low <jason.low2@hpe.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NWaiman Long <Waiman.Long@hpe.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9d659ae1
    • P
      locking/mutex: Rework mutex::owner · 3ca0ff57
      Peter Zijlstra 提交于
      The current mutex implementation has an atomic lock word and a
      non-atomic owner field.
      
      This disparity leads to a number of issues with the current mutex code
      as it means that we can have a locked mutex without an explicit owner
      (because the owner field has not been set, or already cleared).
      
      This leads to a number of weird corner cases, esp. between the
      optimistic spinning and debug code. Where the optimistic spinning
      code needs the owner field updated inside the lock region, the debug
      code is more relaxed because the whole lock is serialized by the
      wait_lock.
      
      Also, the spinning code itself has a few corner cases where we need to
      deal with a held lock without an owner field.
      
      Furthermore, it becomes even more of a problem when trying to fix
      starvation cases in the current code. We end up stacking special case
      on special case.
      
      To solve this rework the basic mutex implementation to be a single
      atomic word that contains the owner and uses the low bits for extra
      state.
      
      This matches how PI futexes and rt_mutex already work. By having the
      owner an integral part of the lock state a lot of the problems
      dissapear and we get a better option to deal with starvation cases,
      direct owner handoff.
      
      Changing the basic mutex does however invalidate all the arch specific
      mutex code; this patch leaves that unused in-place, a later patch will
      remove that.
      Tested-by: NJason Low <jason.low2@hpe.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3ca0ff57
  2. 24 6月, 2016 1 次提交
  3. 03 6月, 2016 1 次提交
  4. 29 2月, 2016 1 次提交
  5. 06 10月, 2015 1 次提交
  6. 09 4月, 2015 1 次提交
    • J
      locking/mutex: Further simplify mutex_spin_on_owner() · 01ac33c1
      Jason Low 提交于
      Similar to what Linus suggested for rwsem_spin_on_owner(), in
      mutex_spin_on_owner() instead of having while (true) and
      breaking out of the spin loop on lock->owner != owner, we can
      have the loop directly check for while (lock->owner == owner) to
      improve the readability of the code.
      
      It also shrinks the code a bit:
      
         text    data     bss     dec     hex filename
         3721       0       0    3721     e89 mutex.o.before
         3705       0       0    3705     e79 mutex.o.after
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Link: http://lkml.kernel.org/r/1428521960-5268-2-git-send-email-jason.low2@hp.com
      [ Added code generation info. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      01ac33c1
  7. 24 2月, 2015 1 次提交
  8. 18 2月, 2015 3 次提交
    • D
      locking/rwsem: Set lock ownership ASAP · 7a215f89
      Davidlohr Bueso 提交于
      In order to optimize the spinning step, we need to set the lock
      owner as soon as the lock is acquired; after a successful counter
      cmpxchg operation, that is. This is particularly useful as rwsems
      need to set the owner to nil for readers, so there is a greater
      chance of falling out of the spinning. Currently we only set the
      owner much later in the game, in the more generic level -- latency
      can be specially bad when waiting for a node->next pointer when
      releasing the osq in up_write calls.
      
      As such, update the owner inside rwsem_try_write_lock (when the
      lock is obtained after blocking) and rwsem_try_write_lock_unqueued
      (when the lock is obtained while spinning). This requires creating
      a new internal rwsem.h header to share the owner related calls.
      
      Also cleanup some headers for mutex and rwsem.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Link: http://lkml.kernel.org/r/1422609267-15102-4-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7a215f89
    • J
      locking/mutex: Refactor mutex_spin_on_owner() · be1f7bf2
      Jason Low 提交于
      As suggested by Davidlohr, we could refactor mutex_spin_on_owner().
      
      Currently, we split up owner_running() with mutex_spin_on_owner().
      When the owner changes, we make duplicate owner checks which are not
      necessary. It also makes the code a bit obscure as we are using a
      second check to figure out why we broke out of the loop.
      
      This patch modifies it such that we remove the owner_running() function
      and the mutex_spin_on_owner() loop directly checks for if the owner changes,
      if the owner is not running, or if we need to reschedule. If the owner
      changes, we break out of the loop and return true. If the owner is not
      running or if we need to reschedule, then break out of the loop and return
      false.
      Suggested-by: NDavidlohr Bueso <dave@stgolabs.net>
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: chegu_vinod@hp.com
      Cc: tglx@linutronix.de
      Link: http://lkml.kernel.org/r/1422914367-5574-3-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      be1f7bf2
    • J
      locking/mutex: In mutex_spin_on_owner(), return true when owner changes · 07d2413a
      Jason Low 提交于
      In the mutex_spin_on_owner(), we return true only if lock->owner == NULL.
      This was beneficial in situations where there were multiple threads
      simultaneously spinning for the mutex. If another thread got the lock
      while other spinner(s) were also doing mutex_spin_on_owner(), then the
      other spinners would stop spinning. This workaround helped reduce the
      chance that many spinners were simultaneously spinning for the mutex
      which can help reduce contention in highly contended cases.
      
      However, recent changes were made to the optimistic spinning code such
      that instead of having all spinners simultaneously spin for the mutex,
      we queue the spinners with an MCS lock such that only one thread spins
      for the mutex at a time. Furthermore, the OSQ optimizations ensure that
      spinners in the queue will stop waiting if it needs to reschedule.
      
      Now, we don't have to worry about multiple threads spinning on owner
      at the same time, and if lock->owner is not NULL at this point, it likely
      means another thread happens to obtain the lock in the fastpath. In this
      case, it would make sense for the spinner to continue spinning as long
      as the spinner doesn't need to schedule and the mutex owner is running.
      
      This patch changes this so that mutex_spin_on_owner() returns true when
      the lock owner changes, which means a thread will only stop spinning
      if it either needs to reschedule or if the lock owner is not running.
      
      We saw up to a 5% performance improvement in the fserver workload with
      this patch.
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NDavidlohr Bueso <dave@stgolabs.net>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: chegu_vinod@hp.com
      Cc: tglx@linutronix.de
      Link: http://lkml.kernel.org/r/1422914367-5574-2-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      07d2413a
  9. 04 2月, 2015 2 次提交
  10. 14 1月, 2015 3 次提交
  11. 28 10月, 2014 1 次提交
    • P
      locking/mutex: Don't assume TASK_RUNNING · 6f942a1f
      Peter Zijlstra 提交于
      We're going to make might_sleep() test for TASK_RUNNING, because
      blocking without TASK_RUNNING will destroy the task state by setting
      it to TASK_RUNNING.
      
      There are a few occasions where its 'valid' to call blocking
      primitives (and mutex_lock in particular) and not have TASK_RUNNING,
      typically such cases are right before we set TASK_RUNNING anyhow.
      
      Robustify the code by not assuming this; this has the beneficial side
      effect of allowing optional code emission for fixing the above
      might_sleep() false positives.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: tglx@linutronix.de
      Cc: ilya.dryomov@inktank.com
      Cc: umgwanakikbuti@gmail.com
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140924082241.988560063@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6f942a1f
  12. 13 8月, 2014 4 次提交
  13. 17 7月, 2014 1 次提交
    • D
      arch, locking: Ciao arch_mutex_cpu_relax() · 3a6bfbc9
      Davidlohr Bueso 提交于
      The arch_mutex_cpu_relax() function, introduced by 34b133f8, is
      hacky and ugly. It was added a few years ago to address the fact
      that common cpu_relax() calls include yielding on s390, and thus
      impact the optimistic spinning functionality of mutexes. Nowadays
      we use this function well beyond mutexes: rwsem, qrwlock, mcs and
      lockref. Since the macro that defines the call is in the mutex header,
      any users must include mutex.h and the naming is misleading as well.
      
      This patch (i) renames the call to cpu_relax_lowlatency  ("relax, but
      only if you can do it with very low latency") and (ii) defines it in
      each arch's asm/processor.h local header, just like for regular cpu_relax
      functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax,
      and thus we can take it out of mutex.h. While this can seem redundant,
      I believe it is a good choice as it allows us to move out arch specific
      logic from generic locking primitives and enables future(?) archs to
      transparently define it, similarly to System Z.
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bharat Bhushan <r65777@freescale.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Joseph Myers <joseph@codesourcery.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Qais Yousef <qais.yousef@imgtec.com>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Stratos Karafotis <stratosk@semaphore.gr>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Kulikov <segoon@openwall.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wolfram Sang <wsa@the-dreams.de>
      Cc: adi-buildroot-devel@lists.sourceforge.net
      Cc: linux390@de.ibm.com
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-am33-list@redhat.com
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-c6x-dev@linux-c6x.org
      Cc: linux-cris-kernel@axis.com
      Cc: linux-hexagon@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux@lists.openrisc.net
      Cc: linux-m32r-ja@ml.linux-m32r.org
      Cc: linux-m32r@ml.linux-m32r.org
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: linux-metag@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: sparclinux@vger.kernel.org
      Link: http://lkml.kernel.org/r/1404079773.2619.4.camel@buesod1.americas.hpqcorp.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3a6bfbc9
  14. 16 7月, 2014 2 次提交
    • J
      locking/spinlocks/mcs: Introduce and use init macro and function for osq locks · 4d9d951e
      Jason Low 提交于
      Currently, we initialize the osq lock by directly setting the lock's values. It
      would be preferable if we use an init macro to do the initialization like we do
      with other locks.
      
      This patch introduces and uses a macro and function for initializing the osq lock.
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Scott Norton <scott.norton@hp.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Link: http://lkml.kernel.org/r/1405358872-3732-4-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4d9d951e
    • J
      locking/spinlocks/mcs: Convert osq lock to atomic_t to reduce overhead · 90631822
      Jason Low 提交于
      The cancellable MCS spinlock is currently used to queue threads that are
      doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
      the lock would access and queue the local node corresponding to the CPU that
      it's running on. Currently, the cancellable MCS lock is implemented by using
      pointers to these nodes.
      
      In this patch, instead of operating on pointers to the per-cpu nodes, we
      store the CPU numbers in which the per-cpu nodes correspond to in atomic_t.
      A similar concept is used with the qspinlock.
      
      By operating on the CPU # of the nodes using atomic_t instead of pointers
      to those nodes, this can reduce the overhead of the cancellable MCS spinlock
      by 32 bits (on 64 bit systems).
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Scott Norton <scott.norton@hp.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Link: http://lkml.kernel.org/r/1405358872-3732-3-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      90631822
  15. 05 7月, 2014 4 次提交
  16. 12 3月, 2014 1 次提交
  17. 11 3月, 2014 6 次提交
  18. 14 2月, 2014 1 次提交
  19. 28 1月, 2014 2 次提交
  20. 11 11月, 2013 1 次提交