1. 14 1月, 2017 1 次提交
    • P
      locking/mutex: Fix mutex handoff · e274795e
      Peter Zijlstra 提交于
      While reviewing the ww_mutex patches, I noticed that it was still
      possible to (incorrectly) succeed for (incorrect) code like:
      
      	mutex_lock(&a);
      	mutex_lock(&a);
      
      This was possible if the second mutex_lock() would block (as expected)
      but then receive a spurious wakeup. At that point it would find itself
      at the front of the queue, request a handoff and instantly claim
      ownership and continue, since owner would point to itself.
      
      Avoid this scenario and simplify the code by introducing a third low
      bit to signal handoff pickup. So once we request handoff, unlock
      clears the handoff bit and sets the pickup bit along with the new
      owner.
      
      This also removes the need for the .handoff argument to
      __mutex_trylock(), since that becomes superfluous with PICKUP.
      
      In order to guarantee enough low bits, ensure task_struct alignment is
      at least L1_CACHE_BYTES (which seems a good ideal regardless).
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 9d659ae1 ("locking/mutex: Add lock handoff to avoid starvation")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e274795e
  2. 16 11月, 2016 1 次提交
    • I
      locking/mutex: Don't mark mutex_trylock_recursive() as deprecated, temporarily · 43496d35
      Ingo Molnar 提交于
      Until the DRM drivers are fixed to not use mutex_trylock_recursive(),
      allyes/modconfig builds will emit an API deprecation warning:
      
       drivers/gpu/drm/i915/i915_gem_shrinker.c: In function ‘i915_gem_shrinker_lock’:
       drivers/gpu/drm/i915/i915_gem_shrinker.c:230:2: warning: ‘mutex_trylock_recursive’ is deprecated [-Wdeprecated-declarations]
         switch (mutex_trylock_recursive(&dev->struct_mutex)) {
      	    ^
      
      Don't pollute the kernel log until the DRM code is fixed. Hopefully
      the checkpatch warning is enough to keep people from using this new
      API, and we'll be NAK-ing new users as well.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Ding Tianhong <dingtianhong@huawei.com>
      Cc: Imre Deak <imre.deak@intel.com>
      Cc: Jason Low <jason.low2@hpe.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rob Clark <robdclark@gmail.com>
      Cc: Terry Rudd <terry.rudd@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Will Deacon <Will.Deacon@arm.com>
      Cc: dri-devel@lists.freedesktop.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      43496d35
  3. 15 11月, 2016 1 次提交
    • P
      locking/mutex, drm: Introduce mutex_trylock_recursive() · 0f5225b0
      Peter Zijlstra 提交于
      By popular DRM demand, introduce mutex_trylock_recursive() to fix up the
      two GEM users.
      
      Without this it is very easy for these drivers to get stuck in
      low-memory situations and trigger OOM. Work is in progress to remove the
      need for this in at least i915.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Ding Tianhong <dingtianhong@huawei.com>
      Cc: Imre Deak <imre.deak@intel.com>
      Cc: Jason Low <jason.low2@hpe.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rob Clark <robdclark@gmail.com>
      Cc: Terry Rudd <terry.rudd@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Will Deacon <Will.Deacon@arm.com>
      Cc: dri-devel@lists.freedesktop.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0f5225b0
  4. 25 10月, 2016 1 次提交
    • P
      locking/mutex: Rework mutex::owner · 3ca0ff57
      Peter Zijlstra 提交于
      The current mutex implementation has an atomic lock word and a
      non-atomic owner field.
      
      This disparity leads to a number of issues with the current mutex code
      as it means that we can have a locked mutex without an explicit owner
      (because the owner field has not been set, or already cleared).
      
      This leads to a number of weird corner cases, esp. between the
      optimistic spinning and debug code. Where the optimistic spinning
      code needs the owner field updated inside the lock region, the debug
      code is more relaxed because the whole lock is serialized by the
      wait_lock.
      
      Also, the spinning code itself has a few corner cases where we need to
      deal with a held lock without an owner field.
      
      Furthermore, it becomes even more of a problem when trying to fix
      starvation cases in the current code. We end up stacking special case
      on special case.
      
      To solve this rework the basic mutex implementation to be a single
      atomic word that contains the owner and uses the low bits for extra
      state.
      
      This matches how PI futexes and rt_mutex already work. By having the
      owner an integral part of the lock state a lot of the problems
      dissapear and we get a better option to deal with starvation cases,
      direct owner handoff.
      
      Changing the basic mutex does however invalidate all the arch specific
      mutex code; this patch leaves that unused in-place, a later patch will
      remove that.
      Tested-by: NJason Low <jason.low2@hpe.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3ca0ff57
  5. 15 2月, 2015 1 次提交
  6. 13 8月, 2014 2 次提交
  7. 17 7月, 2014 1 次提交
    • D
      arch, locking: Ciao arch_mutex_cpu_relax() · 3a6bfbc9
      Davidlohr Bueso 提交于
      The arch_mutex_cpu_relax() function, introduced by 34b133f8, is
      hacky and ugly. It was added a few years ago to address the fact
      that common cpu_relax() calls include yielding on s390, and thus
      impact the optimistic spinning functionality of mutexes. Nowadays
      we use this function well beyond mutexes: rwsem, qrwlock, mcs and
      lockref. Since the macro that defines the call is in the mutex header,
      any users must include mutex.h and the naming is misleading as well.
      
      This patch (i) renames the call to cpu_relax_lowlatency  ("relax, but
      only if you can do it with very low latency") and (ii) defines it in
      each arch's asm/processor.h local header, just like for regular cpu_relax
      functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax,
      and thus we can take it out of mutex.h. While this can seem redundant,
      I believe it is a good choice as it allows us to move out arch specific
      logic from generic locking primitives and enables future(?) archs to
      transparently define it, similarly to System Z.
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bharat Bhushan <r65777@freescale.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Joseph Myers <joseph@codesourcery.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Qais Yousef <qais.yousef@imgtec.com>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Stratos Karafotis <stratosk@semaphore.gr>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Kulikov <segoon@openwall.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wolfram Sang <wsa@the-dreams.de>
      Cc: adi-buildroot-devel@lists.sourceforge.net
      Cc: linux390@de.ibm.com
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-am33-list@redhat.com
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-c6x-dev@linux-c6x.org
      Cc: linux-cris-kernel@axis.com
      Cc: linux-hexagon@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux@lists.openrisc.net
      Cc: linux-m32r-ja@ml.linux-m32r.org
      Cc: linux-m32r@ml.linux-m32r.org
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: linux-metag@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: sparclinux@vger.kernel.org
      Link: http://lkml.kernel.org/r/1404079773.2619.4.camel@buesod1.americas.hpqcorp.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3a6bfbc9
  8. 16 7月, 2014 2 次提交
    • J
      locking/spinlocks/mcs: Convert osq lock to atomic_t to reduce overhead · 90631822
      Jason Low 提交于
      The cancellable MCS spinlock is currently used to queue threads that are
      doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
      the lock would access and queue the local node corresponding to the CPU that
      it's running on. Currently, the cancellable MCS lock is implemented by using
      pointers to these nodes.
      
      In this patch, instead of operating on pointers to the per-cpu nodes, we
      store the CPU numbers in which the per-cpu nodes correspond to in atomic_t.
      A similar concept is used with the qspinlock.
      
      By operating on the CPU # of the nodes using atomic_t instead of pointers
      to those nodes, this can reduce the overhead of the cancellable MCS spinlock
      by 32 bits (on 64 bit systems).
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Scott Norton <scott.norton@hp.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Link: http://lkml.kernel.org/r/1405358872-3732-3-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      90631822
    • J
      locking/spinlocks/mcs: Rename optimistic_spin_queue() to optimistic_spin_node() · 046a619d
      Jason Low 提交于
      Currently, the per-cpu nodes structure for the cancellable MCS spinlock is
      named "optimistic_spin_queue". However, in a follow up patch in the series
      we will be introducing a new structure that serves as the new "handle" for
      the lock. It would make more sense if that structure is named
      "optimistic_spin_queue". Additionally, since the current use of the
      "optimistic_spin_queue" structure are  "nodes", it might be better if we
      rename them to "node" anyway.
      
      This preparatory patch renames all current "optimistic_spin_queue"
      to "optimistic_spin_node".
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Scott Norton <scott.norton@hp.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Link: http://lkml.kernel.org/r/1405358872-3732-2-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      046a619d
  9. 11 3月, 2014 1 次提交
  10. 28 1月, 2014 1 次提交
  11. 11 11月, 2013 1 次提交
  12. 28 9月, 2013 1 次提交
  13. 12 7月, 2013 1 次提交
  14. 26 6月, 2013 2 次提交
    • D
      mutex: Add w/w mutex slowpath debugging · 23010027
      Daniel Vetter 提交于
      Injects EDEADLK conditions at pseudo-random interval, with
      exponential backoff up to UINT_MAX (to ensure that every lock
      operation still completes in a reasonable time).
      
      This way we can test the wound slowpath even for ww mutex users
      where contention is never expected, and the ww deadlock
      avoidance algorithm is only needed for correctness against
      malicious userspace. An example would be protecting kernel
      modesetting properties, which thanks to single-threaded X isn't
      really expected to contend, ever.
      
      I've looked into using the CONFIG_FAULT_INJECTION
      infrastructure, but decided against it for two reasons:
      
      - EDEADLK handling is mandatory for ww mutex users and should
        never affect the outcome of a syscall. This is in contrast to -ENOMEM
        injection. So fine configurability isn't required.
      
      - The fault injection framework only allows to set a simple
        probability for failure. Now the probability that a ww mutex acquire
        stage with N locks will never complete (due to too many injected
        EDEADLK backoffs) is zero. But the expected number of ww_mutex_lock
        operations for the completely uncontended case would be O(exp(N)).
        The per-acuiqire ctx exponential backoff solution choosen here only
        results in O(log N) overhead due to injection and so O(log N * N)
        lock operations. This way we can fail with high probability (and so
        have good test coverage even for fancy backoff and lock acquisition
        paths) without running into patalogical cases.
      
      Note that EDEADLK will only ever be injected when we managed to
      acquire the lock. This prevents any behaviour changes for users
      which rely on the EALREADY semantics.
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: dri-devel@lists.freedesktop.org
      Cc: linaro-mm-sig@lists.linaro.org
      Cc: rostedt@goodmis.org
      Cc: daniel@ffwll.ch
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20130620113117.4001.21681.stgit@patserSigned-off-by: NIngo Molnar <mingo@kernel.org>
      23010027
    • M
      mutex: Add support for wound/wait style locks · 040a0a37
      Maarten Lankhorst 提交于
      Wound/wait mutexes are used when other multiple lock
      acquisitions of a similar type can be done in an arbitrary
      order. The deadlock handling used here is called wait/wound in
      the RDBMS literature: The older tasks waits until it can acquire
      the contended lock. The younger tasks needs to back off and drop
      all the locks it is currently holding, i.e. the younger task is
      wounded.
      
      For full documentation please read Documentation/ww-mutex-design.txt.
      
      References: https://lwn.net/Articles/548909/Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
      Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Acked-by: NRob Clark <robdclark@gmail.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: dri-devel@lists.freedesktop.org
      Cc: linaro-mm-sig@lists.linaro.org
      Cc: rostedt@goodmis.org
      Cc: daniel@ffwll.ch
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/51C8038C.9000106@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      040a0a37
  15. 19 4月, 2013 1 次提交
    • W
      mutex: Queue mutex spinners with MCS lock to reduce cacheline contention · 2bd2c92c
      Waiman Long 提交于
      The current mutex spinning code (with MUTEX_SPIN_ON_OWNER option
      turned on) allow multiple tasks to spin on a single mutex
      concurrently. A potential problem with the current approach is
      that when the mutex becomes available, all the spinning tasks
      will try to acquire the mutex more or less simultaneously. As a
      result, there will be a lot of cacheline bouncing especially on
      systems with a large number of CPUs.
      
      This patch tries to reduce this kind of contention by putting
      the mutex spinners into a queue so that only the first one in
      the queue will try to acquire the mutex. This will reduce
      contention and allow all the tasks to move forward faster.
      
      The queuing of mutex spinners is done using an MCS lock based
      implementation which will further reduce contention on the mutex
      cacheline than a similar ticket spinlock based implementation.
      This patch will add a new field into the mutex data structure
      for holding the MCS lock. This expands the mutex size by 8 bytes
      for 64-bit system and 4 bytes for 32-bit system. This overhead
      will be avoid if the MUTEX_SPIN_ON_OWNER option is turned off.
      
      The following table shows the jobs per minute (JPM) scalability
      data on an 8-node 80-core Westmere box with a 3.7.10 kernel. The
      numactl command is used to restrict the running of the fserver
      workloads to 1/2/4/8 nodes with hyperthreading off.
      
      +-----------------+-----------+-----------+-------------+----------+
      |  Configuration  | Mean JPM  | Mean JPM  |  Mean JPM   | % Change |
      |                 | w/o patch | patch 1   | patches 1&2 |  1->1&2  |
      +-----------------+------------------------------------------------+
      |                 |              User Range 1100 - 2000            |
      +-----------------+------------------------------------------------+
      | 8 nodes, HT off |  227972   |  227237   |   305043    |  +34.2%  |
      | 4 nodes, HT off |  393503   |  381558   |   394650    |   +3.4%  |
      | 2 nodes, HT off |  334957   |  325240   |   338853    |   +4.2%  |
      | 1 node , HT off |  198141   |  197972   |   198075    |   +0.1%  |
      +-----------------+------------------------------------------------+
      |                 |              User Range 200 - 1000             |
      +-----------------+------------------------------------------------+
      | 8 nodes, HT off |  282325   |  312870   |   332185    |   +6.2%  |
      | 4 nodes, HT off |  390698   |  378279   |   393419    |   +4.0%  |
      | 2 nodes, HT off |  336986   |  326543   |   340260    |   +4.2%  |
      | 1 node , HT off |  197588   |  197622   |   197582    |    0.0%  |
      +-----------------+-----------+-----------+-------------+----------+
      
      At low user range 10-100, the JPM differences were within +/-1%.
      So they are not that interesting.
      
      The fserver workload uses mutex spinning extensively. With just
      the mutex change in the first patch, there is no noticeable
      change in performance.  Rather, there is a slight drop in
      performance. This mutex spinning patch more than recovers the
      lost performance and show a significant increase of +30% at high
      user load with the full 8 nodes. Similar improvements were also
      seen in a 3.8 kernel.
      
      The table below shows the %time spent by different kernel
      functions as reported by perf when running the fserver workload
      at 1500 users with all 8 nodes.
      
      +-----------------------+-----------+---------+-------------+
      |        Function       |  % time   | % time  |   % time    |
      |                       | w/o patch | patch 1 | patches 1&2 |
      +-----------------------+-----------+---------+-------------+
      | __read_lock_failed    |  34.96%   | 34.91%  |   29.14%    |
      | __write_lock_failed   |  10.14%   | 10.68%  |    7.51%    |
      | mutex_spin_on_owner   |   3.62%   |  3.42%  |    2.33%    |
      | mspin_lock            |    N/A    |   N/A   |    9.90%    |
      | __mutex_lock_slowpath |   1.46%   |  0.81%  |    0.14%    |
      | _raw_spin_lock        |   2.25%   |  2.50%  |    1.10%    |
      +-----------------------+-----------+---------+-------------+
      
      The fserver workload for an 8-node system is dominated by the
      contention in the read/write lock. Mutex contention also plays a
      role. With the first patch only, mutex contention is down (as
      shown by the __mutex_lock_slowpath figure) which help a little
      bit. We saw only a few percents improvement with that.
      
      By applying patch 2 as well, the single mutex_spin_on_owner
      figure is now split out into an additional mspin_lock figure.
      The time increases from 3.42% to 11.23%. It shows a great
      reduction in contention among the spinners leading to a 30%
      improvement. The time ratio 9.9/2.33=4.3 indicates that there
      are on average 4+ spinners waiting in the spin_lock loop for
      each spinner in the mutex_spin_on_owner loop. Contention in
      other locking functions also go down by quite a lot.
      
      The table below shows the performance change of both patches 1 &
      2 over patch 1 alone in other AIM7 workloads (at 8 nodes,
      hyperthreading off).
      
      +--------------+---------------+----------------+-----------------+
      |   Workload   | mean % change | mean % change  | mean % change   |
      |              | 10-100 users  | 200-1000 users | 1100-2000 users |
      +--------------+---------------+----------------+-----------------+
      | alltests     |      0.0%     |     -0.8%      |     +0.6%       |
      | five_sec     |     -0.3%     |     +0.8%      |     +0.8%       |
      | high_systime |     +0.4%     |     +2.4%      |     +2.1%       |
      | new_fserver  |     +0.1%     |    +14.1%      |    +34.2%       |
      | shared       |     -0.5%     |     -0.3%      |     -0.4%       |
      | short        |     -1.7%     |     -9.8%      |     -8.3%       |
      +--------------+---------------+----------------+-----------------+
      
      The short workload is the only one that shows a decline in
      performance probably due to the spinner locking and queuing
      overhead.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Reviewed-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Chandramouleeswaran Aswin <aswin@hp.com>
      Cc: Norton Scott J <scott.norton@hp.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1366226594-5506-4-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2bd2c92c
  16. 27 7月, 2011 1 次提交
  17. 21 7月, 2011 1 次提交
  18. 25 5月, 2011 1 次提交
    • P
      lockdep, mutex: provide mutex_lock_nest_lock · e4c70a66
      Peter Zijlstra 提交于
      In order to convert i_mmap_lock to a mutex we need a mutex equivalent to
      spin_lock_nest_lock(), thus provide the mutex_lock_nest_lock() annotation.
      
      As with spin_lock_nest_lock(), mutex_lock_nest_lock() allows annotation of
      the locking pattern where an outer lock serializes the acquisition order
      of nested locks.  That is, if every time you lock multiple locks A, say A1
      and A2 you first acquire N, the order of acquiring A1 and A2 is
      irrelevant.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4c70a66
  19. 14 4月, 2011 1 次提交
  20. 05 1月, 2011 1 次提交
    • G
      [S390] mutex: Introduce arch_mutex_cpu_relax() · 34b133f8
      Gerald Schaefer 提交于
      The spinning mutex implementation uses cpu_relax() in busy loops as a
      compiler barrier. Depending on the architecture, cpu_relax() may do more
      than needed in this specific mutex spin loops. On System z we also give
      up the time slice of the virtual cpu in cpu_relax(), which prevents
      effective spinning on the mutex.
      
      This patch replaces cpu_relax() in the spinning mutex code with
      arch_mutex_cpu_relax(), which can be defined by each architecture that
      selects HAVE_ARCH_MUTEX_CPU_RELAX. The default is still cpu_relax(), so
      this patch should not affect other architectures than System z for now.
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1290437256.7455.4.camel@thinkpad>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      34b133f8
  21. 26 11月, 2010 1 次提交
    • G
      mutexes, sched: Introduce arch_mutex_cpu_relax() · 335d7afb
      Gerald Schaefer 提交于
      The spinning mutex implementation uses cpu_relax() in busy loops as a
      compiler barrier. Depending on the architecture, cpu_relax() may do more
      than needed in this specific mutex spin loops. On System z we also give
      up the time slice of the virtual cpu in cpu_relax(), which prevents
      effective spinning on the mutex.
      
      This patch replaces cpu_relax() in the spinning mutex code with
      arch_mutex_cpu_relax(), which can be defined by each architecture that
      selects HAVE_ARCH_MUTEX_CPU_RELAX. The default is still cpu_relax(), so
      this patch should not affect other architectures than System z for now.
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1290437256.7455.4.camel@thinkpad>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      335d7afb
  22. 03 9月, 2010 1 次提交
  23. 30 4月, 2009 1 次提交
    • A
      mutex: add atomic_dec_and_mutex_lock(), fix · a511e3f9
      Andrew Morton 提交于
      include/linux/mutex.h:136: warning: 'mutex_lock' declared inline after being called
       include/linux/mutex.h:136: warning: previous declaration of 'mutex_lock' was here
      
      uninline it.
      
      [ Impact: clean up and uninline, address compiler warning ]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <200904292318.n3TNIsi6028340@imap1.linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a511e3f9
  24. 29 4月, 2009 1 次提交
  25. 06 4月, 2009 1 次提交
  26. 15 1月, 2009 1 次提交
    • P
      mutex: implement adaptive spinning · 0d66bf6d
      Peter Zijlstra 提交于
      Change mutex contention behaviour such that it will sometimes busy wait on
      acquisition - moving its behaviour closer to that of spinlocks.
      
      This concept got ported to mainline from the -rt tree, where it was originally
      implemented for rtmutexes by Steven Rostedt, based on work by Gregory Haskins.
      
      Testing with Ingo's test-mutex application (http://lkml.org/lkml/2006/1/8/50)
      gave a 345% boost for VFS scalability on my testbox:
      
       # ./test-mutex-shm V 16 10 | grep "^avg ops"
       avg ops/sec:               296604
      
       # ./test-mutex-shm V 16 10 | grep "^avg ops"
       avg ops/sec:               85870
      
      The key criteria for the busy wait is that the lock owner has to be running on
      a (different) cpu. The idea is that as long as the owner is running, there is a
      fair chance it'll release the lock soon, and thus we'll be better off spinning
      instead of blocking/scheduling.
      
      Since regular mutexes (as opposed to rtmutexes) do not atomically track the
      owner, we add the owner in a non-atomic fashion and deal with the races in
      the slowpath.
      
      Furthermore, to ease the testing of the performance impact of this new code,
      there is means to disable this behaviour runtime (without having to reboot
      the system), when scheduler debugging is enabled (CONFIG_SCHED_DEBUG=y),
      by issuing the following command:
      
       # echo NO_OWNER_SPIN > /debug/sched_features
      
      This command re-enables spinning again (this is also the default):
      
       # echo OWNER_SPIN > /debug/sched_features
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0d66bf6d
  27. 31 10月, 2008 1 次提交
  28. 09 2月, 2008 1 次提交
  29. 07 12月, 2007 1 次提交
  30. 17 10月, 2007 1 次提交
  31. 12 10月, 2007 1 次提交
  32. 10 5月, 2007 1 次提交
  33. 27 1月, 2007 1 次提交
  34. 09 12月, 2006 1 次提交
  35. 08 12月, 2006 1 次提交
  36. 04 7月, 2006 2 次提交