1. 16 9月, 2014 1 次提交
  2. 04 9月, 2014 1 次提交
  3. 13 8月, 2014 7 次提交
  4. 17 7月, 2014 2 次提交
    • D
      arch, locking: Ciao arch_mutex_cpu_relax() · 3a6bfbc9
      Davidlohr Bueso 提交于
      The arch_mutex_cpu_relax() function, introduced by 34b133f8, is
      hacky and ugly. It was added a few years ago to address the fact
      that common cpu_relax() calls include yielding on s390, and thus
      impact the optimistic spinning functionality of mutexes. Nowadays
      we use this function well beyond mutexes: rwsem, qrwlock, mcs and
      lockref. Since the macro that defines the call is in the mutex header,
      any users must include mutex.h and the naming is misleading as well.
      
      This patch (i) renames the call to cpu_relax_lowlatency  ("relax, but
      only if you can do it with very low latency") and (ii) defines it in
      each arch's asm/processor.h local header, just like for regular cpu_relax
      functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax,
      and thus we can take it out of mutex.h. While this can seem redundant,
      I believe it is a good choice as it allows us to move out arch specific
      logic from generic locking primitives and enables future(?) archs to
      transparently define it, similarly to System Z.
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bharat Bhushan <r65777@freescale.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Joseph Myers <joseph@codesourcery.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Qais Yousef <qais.yousef@imgtec.com>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Stratos Karafotis <stratosk@semaphore.gr>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Kulikov <segoon@openwall.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wolfram Sang <wsa@the-dreams.de>
      Cc: adi-buildroot-devel@lists.sourceforge.net
      Cc: linux390@de.ibm.com
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-am33-list@redhat.com
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-c6x-dev@linux-c6x.org
      Cc: linux-cris-kernel@axis.com
      Cc: linux-hexagon@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux@lists.openrisc.net
      Cc: linux-m32r-ja@ml.linux-m32r.org
      Cc: linux-m32r@ml.linux-m32r.org
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: linux-metag@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: sparclinux@vger.kernel.org
      Link: http://lkml.kernel.org/r/1404079773.2619.4.camel@buesod1.americas.hpqcorp.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3a6bfbc9
    • A
      locking/lockdep: Only ask for /proc/lock_stat output when available · acf59377
      Andreas Gruenbacher 提交于
      When lockdep turns itself off, the following message is logged:
      
        Please attach the output of /proc/lock_stat to the bug report
      
      Omit this message when CONFIG_LOCK_STAT is off, and /proc/lock_stat
      doesn't exist.
      Signed-off-by: NAndreas Gruenbacher <andreas.gruenbacher@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1405451452-3824-1-git-send-email-andreas.gruenbacher@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      acf59377
  5. 16 7月, 2014 7 次提交
    • D
      locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNER · 5db6c6fe
      Davidlohr Bueso 提交于
      Just like with mutexes (CONFIG_MUTEX_SPIN_ON_OWNER),
      encapsulate the dependencies for rwsem optimistic spinning.
      No logical changes here as it continues to depend on both
      SMP and the XADD algorithm variant.
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Acked-by: NJason Low <jason.low2@hp.com>
      [ Also make it depend on ARCH_SUPPORTS_ATOMIC_RMW. ]
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1405112406-13052-2-git-send-email-davidlohr@hp.com
      Cc: aswin@hp.com
      Cc: Chris Mason <clm@fb.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5db6c6fe
    • P
      locking/rwsem: Rename 'activity' to 'count' · 13b9a962
      Peter Zijlstra 提交于
      There are two definitions of struct rw_semaphore, one in linux/rwsem.h
      and one in linux/rwsem-spinlock.h.
      
      For some reason they have different names for the initial field. This
      makes it impossible to use C99 named initialization for
      __RWSEM_INITIALIZER() -- or we have to duplicate that entire thing
      along with the structure definitions.
      
      The simpler patch is renaming the rwsem-spinlock variant to match the
      regular rwsem.
      
      This allows us to switch to C99 named initialization.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-bmrZolsbGmautmzrerog27io@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      13b9a962
    • J
      locking/spinlocks/mcs: Micro-optimize osq_unlock() · 33ecd208
      Jason Low 提交于
      In the unlock function of the cancellable MCS spinlock, the first
      thing we do is to retrive the current CPU's osq node. However, due to
      the changes made in the previous patch, in the common case where the
      lock is not contended, we wouldn't need to access the current CPU's
      osq node anymore.
      
      This patch optimizes this by only retriving this CPU's osq node
      after we attempt the initial cmpxchg to unlock the osq and found
      that its contended.
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Scott Norton <scott.norton@hp.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1405358872-3732-5-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      33ecd208
    • J
      locking/spinlocks/mcs: Introduce and use init macro and function for osq locks · 4d9d951e
      Jason Low 提交于
      Currently, we initialize the osq lock by directly setting the lock's values. It
      would be preferable if we use an init macro to do the initialization like we do
      with other locks.
      
      This patch introduces and uses a macro and function for initializing the osq lock.
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Scott Norton <scott.norton@hp.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Link: http://lkml.kernel.org/r/1405358872-3732-4-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4d9d951e
    • J
      locking/spinlocks/mcs: Convert osq lock to atomic_t to reduce overhead · 90631822
      Jason Low 提交于
      The cancellable MCS spinlock is currently used to queue threads that are
      doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
      the lock would access and queue the local node corresponding to the CPU that
      it's running on. Currently, the cancellable MCS lock is implemented by using
      pointers to these nodes.
      
      In this patch, instead of operating on pointers to the per-cpu nodes, we
      store the CPU numbers in which the per-cpu nodes correspond to in atomic_t.
      A similar concept is used with the qspinlock.
      
      By operating on the CPU # of the nodes using atomic_t instead of pointers
      to those nodes, this can reduce the overhead of the cancellable MCS spinlock
      by 32 bits (on 64 bit systems).
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Scott Norton <scott.norton@hp.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Link: http://lkml.kernel.org/r/1405358872-3732-3-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      90631822
    • J
      locking/spinlocks/mcs: Rename optimistic_spin_queue() to optimistic_spin_node() · 046a619d
      Jason Low 提交于
      Currently, the per-cpu nodes structure for the cancellable MCS spinlock is
      named "optimistic_spin_queue". However, in a follow up patch in the series
      we will be introducing a new structure that serves as the new "handle" for
      the lock. It would make more sense if that structure is named
      "optimistic_spin_queue". Additionally, since the current use of the
      "optimistic_spin_queue" structure are  "nodes", it might be better if we
      rename them to "node" anyway.
      
      This preparatory patch renames all current "optimistic_spin_queue"
      to "optimistic_spin_node".
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Scott Norton <scott.norton@hp.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Link: http://lkml.kernel.org/r/1405358872-3732-2-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      046a619d
    • J
      locking/rwsem: Allow conservative optimistic spinning when readers have lock · 37e95624
      Jason Low 提交于
      Commit 4fc828e2 ("locking/rwsem: Support optimistic spinning")
      introduced a major performance regression for workloads such as
      xfs_repair which mix read and write locking of the mmap_sem across
      many threads. The result was xfs_repair ran 5x slower on 3.16-rc2
      than on 3.15 and using 20x more system CPU time.
      
      Perf profiles indicate in some workloads that significant time can
      be spent spinning on !owner. This is because we don't set the lock
      owner when readers(s) obtain the rwsem.
      
      In this patch, we'll modify rwsem_can_spin_on_owner() such that we'll
      return false if there is no lock owner. The rationale is that if we
      just entered the slowpath, yet there is no lock owner, then there is
      a possibility that a reader has the lock. To be conservative, we'll
      avoid spinning in these situations.
      
      This patch reduced the total run time of the xfs_repair workload from
      about 4 minutes 24 seconds down to approximately 1 minute 26 seconds,
      back to close to the same performance as on 3.15.
      
      Retesting of AIM7, which were some of the workloads used to test the
      original optimistic spinning code, confirmed that we still get big
      performance gains with optimistic spinning, even with this additional
      regression fix. Davidlohr found that while the 'custom' workload took
      a performance hit of ~-14% to throughput for >300 users with this
      additional patch, the overall gain with optimistic spinning is
      still ~+45%. The 'disk' workload even improved by ~+15% at >1000 users.
      Tested-by: NDave Chinner <dchinner@redhat.com>
      Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1404532172.2572.30.camel@j-VirtualBoxSigned-off-by: NIngo Molnar <mingo@kernel.org>
      37e95624
  6. 05 7月, 2014 4 次提交
  7. 22 6月, 2014 9 次提交
  8. 16 6月, 2014 1 次提交
    • T
      rtmutex: Plug slow unlock race · 27e35715
      Thomas Gleixner 提交于
      When the rtmutex fast path is enabled the slow unlock function can
      create the following situation:
      
      spin_lock(foo->m->wait_lock);
      foo->m->owner = NULL;
      	    			rt_mutex_lock(foo->m); <-- fast path
      				free = atomic_dec_and_test(foo->refcnt);
      				rt_mutex_unlock(foo->m); <-- fast path
      				if (free)
      				   kfree(foo);
      
      spin_unlock(foo->m->wait_lock); <--- Use after free.
      
      Plug the race by changing the slow unlock to the following scheme:
      
           while (!rt_mutex_has_waiters(m)) {
           	    /* Clear the waiters bit in m->owner */
      	    clear_rt_mutex_waiters(m);
            	    owner = rt_mutex_owner(m);
            	    spin_unlock(m->wait_lock);
            	    if (cmpxchg(m->owner, owner, 0) == owner)
            	       return;
            	    spin_lock(m->wait_lock);
           }
      
      So in case of a new waiter incoming while the owner tries the slow
      path unlock we have two situations:
      
       unlock(wait_lock);
      					lock(wait_lock);
       cmpxchg(p, owner, 0) == owner
       	    	   			mark_rt_mutex_waiters(lock);
      	 				acquire(lock);
      
      Or:
      
       unlock(wait_lock);
      					lock(wait_lock);
      	 				mark_rt_mutex_waiters(lock);
       cmpxchg(p, owner, 0) != owner
      					enqueue_waiter();
      					unlock(wait_lock);
       lock(wait_lock);
       wakeup_next waiter();
       unlock(wait_lock);
      					lock(wait_lock);
      					acquire(lock);
      
      If the fast path is disabled, then the simple
      
         m->owner = NULL;
         unlock(m->wait_lock);
      
      is sufficient as all access to m->owner is serialized via
      m->wait_lock;
      
      Also document and clarify the wakeup_next_waiter function as suggested
      by Oleg Nesterov.
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140611183852.937945560@linutronix.de
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      27e35715
  9. 07 6月, 2014 2 次提交
    • T
      rtmutex: Detect changes in the pi lock chain · 82084984
      Thomas Gleixner 提交于
      When we walk the lock chain, we drop all locks after each step. So the
      lock chain can change under us before we reacquire the locks. That's
      harmless in principle as we just follow the wrong lock path. But it
      can lead to a false positive in the dead lock detection logic:
      
      T0 holds L0
      T0 blocks on L1 held by T1
      T1 blocks on L2 held by T2
      T2 blocks on L3 held by T3
      T4 blocks on L4 held by T4
      
      Now we walk the chain
      
      lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> 
           lock T2 ->  adjust T2 ->  drop locks
      
      T2 times out and blocks on L0
      
      Now we continue:
      
      lock T2 -> lock L0 -> deadlock detected, but it's not a deadlock at all.
      
      Brad tried to work around that in the deadlock detection logic itself,
      but the more I looked at it the less I liked it, because it's crystal
      ball magic after the fact.
      
      We actually can detect a chain change very simple:
      
      lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> lock T2 -> adjust T2 ->
      
           next_lock = T2->pi_blocked_on->lock;
      
      drop locks
      
      T2 times out and blocks on L0
      
      Now we continue:
      
      lock T2 -> 
      
           if (next_lock != T2->pi_blocked_on->lock)
           	   return;
      
      So if we detect that T2 is now blocked on a different lock we stop the
      chain walk. That's also correct in the following scenario:
      
      lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> lock T2 -> adjust T2 ->
      
           next_lock = T2->pi_blocked_on->lock;
      
      drop locks
      
      T3 times out and drops L3
      T2 acquires L3 and blocks on L4 now
      
      Now we continue:
      
      lock T2 -> 
      
           if (next_lock != T2->pi_blocked_on->lock)
           	   return;
      
      We don't have to follow up the chain at that point, because T2
      propagated our priority up to T4 already.
      
      [ Folded a cleanup patch from peterz ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reported-by: NBrad Mouring <bmouring@ni.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140605152801.930031935@linutronix.de
      Cc: stable@vger.kernel.org
      82084984
    • T
      rtmutex: Handle deadlock detection smarter · 3d5c9340
      Thomas Gleixner 提交于
      Even in the case when deadlock detection is not requested by the
      caller, we can detect deadlocks. Right now the code stops the lock
      chain walk and keeps the waiter enqueued, even on itself. Silly not to
      yell when such a scenario is detected and to keep the waiter enqueued.
      
      Return -EDEADLK unconditionally and handle it at the call sites.
      
      The futex calls return -EDEADLK. The non futex ones dequeue the
      waiter, throw a warning and put the task into a schedule loop.
      
      Tagged for stable as it makes the code more robust.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Brad Mouring <bmouring@ni.com>
      Link: http://lkml.kernel.org/r/20140605152801.836501969@linutronix.de
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      3d5c9340
  10. 06 6月, 2014 1 次提交
  11. 05 6月, 2014 2 次提交
    • A
      locking/rwsem: Fix checkpatch.pl warnings · 0cc3d011
      Andrew Morton 提交于
      WARNING: line over 80 characters
      #205: FILE: kernel/locking/rwsem-xadd.c:275:
      +		old = cmpxchg(&sem->count, count, count + RWSEM_ACTIVE_WRITE_BIAS);
      
      WARNING: line over 80 characters
      #376: FILE: kernel/locking/rwsem-xadd.c:434:
      +		 * If there were already threads queued before us and there are no
      
      WARNING: line over 80 characters
      #377: FILE: kernel/locking/rwsem-xadd.c:435:
      +		 * active writers, the lock must be read owned; so we try to wake
      
      total: 0 errors, 3 warnings, 417 lines checked
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-pn6pslaplw031lykweojsn8c@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0cc3d011
    • D
      locking/rwsem: Support optimistic spinning · 4fc828e2
      Davidlohr Bueso 提交于
      We have reached the point where our mutexes are quite fine tuned
      for a number of situations. This includes the use of heuristics
      and optimistic spinning, based on MCS locking techniques.
      
      Exclusive ownership of read-write semaphores are, conceptually,
      just about the same as mutexes, making them close cousins. To
      this end we need to make them both perform similarly, and
      right now, rwsems are simply not up to it. This was discovered
      by both reverting commit 4fc3f1d6 (mm/rmap, migration: Make
      rmap_walk_anon() and try_to_unmap_anon() more scalable) and
      similarly, converting some other mutexes (ie: i_mmap_mutex) to
      rwsems. This creates a situation where users have to choose
      between a rwsem and mutex taking into account this important
      performance difference. Specifically, biggest difference between
      both locks is when we fail to acquire a mutex in the fastpath,
      optimistic spinning comes in to play and we can avoid a large
      amount of unnecessary sleeping and overhead of moving tasks in
      and out of wait queue. Rwsems do not have such logic.
      
      This patch, based on the work from Tim Chen and I, adds support
      for write-side optimistic spinning when the lock is contended.
      It also includes support for the recently added cancelable MCS
      locking for adaptive spinning. Note that is is only applicable
      to the xadd method, and the spinlock rwsem variant remains intact.
      
      Allowing optimistic spinning before putting the writer on the wait
      queue reduces wait queue contention and provided greater chance
      for the rwsem to get acquired. With these changes, rwsem is on par
      with mutex. The performance benefits can be seen on a number of
      workloads. For instance, on a 8 socket, 80 core 64bit Westmere box,
      aim7 shows the following improvements in throughput:
      
       +--------------+---------------------+-----------------+
       |   Workload   | throughput-increase | number of users |
       +--------------+---------------------+-----------------+
       | alltests     | 20%                 | >1000           |
       | custom       | 27%, 60%            | 10-100, >1000   |
       | high_systime | 36%, 30%            | >100, >1000     |
       | shared       | 58%, 29%            | 10-100, >1000   |
       +--------------+---------------------+-----------------+
      
      There was also improvement on smaller systems, such as a quad-core
      x86-64 laptop running a 30Gb PostgreSQL (pgbench) workload for up
      to +60% in throughput for over 50 clients. Additionally, benefits
      were also noticed in exim (mail server) workloads. Furthermore, no
      performance regression have been seen at all.
      
      Based-on-work-from: Tim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      [peterz: rej fixup due to comment patches, sched/rt.h header]
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: "Scott J Norton" <scott.norton@hp.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Link: http://lkml.kernel.org/r/1399055055.6275.15.camel@buesod1.americas.hpqcorp.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4fc828e2
  12. 28 5月, 2014 1 次提交
    • T
      rtmutex: Fix deadlock detector for real · 397335f0
      Thomas Gleixner 提交于
      The current deadlock detection logic does not work reliably due to the
      following early exit path:
      
      	/*
      	 * Drop out, when the task has no waiters. Note,
      	 * top_waiter can be NULL, when we are in the deboosting
      	 * mode!
      	 */
      	if (top_waiter && (!task_has_pi_waiters(task) ||
      			   top_waiter != task_top_pi_waiter(task)))
      		goto out_unlock_pi;
      
      So this not only exits when the task has no waiters, it also exits
      unconditionally when the current waiter is not the top priority waiter
      of the task.
      
      So in a nested locking scenario, it might abort the lock chain walk
      and therefor miss a potential deadlock.
      
      Simple fix: Continue the chain walk, when deadlock detection is
      enabled.
      
      We also avoid the whole enqueue, if we detect the deadlock right away
      (A-A). It's an optimization, but also prevents that another waiter who
      comes in after the detection and before the task has undone the damage
      observes the situation and detects the deadlock and returns
      -EDEADLOCK, which is wrong as the other task is not in a deadlock
      situation.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20140522031949.725272460@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      397335f0
  13. 15 5月, 2014 2 次提交