1. 08 6月, 2016 1 次提交
  2. 26 1月, 2016 1 次提交
    • T
      rtmutex: Make wait_lock irq safe · b4abf910
      Thomas Gleixner 提交于
      Sasha reported a lockdep splat about a potential deadlock between RCU boosting
      rtmutex and the posix timer it_lock.
      
      CPU0					CPU1
      
      rtmutex_lock(&rcu->rt_mutex)
        spin_lock(&rcu->rt_mutex.wait_lock)
      					local_irq_disable()
      					spin_lock(&timer->it_lock)
      					spin_lock(&rcu->mutex.wait_lock)
      --> Interrupt
          spin_lock(&timer->it_lock)
      
      This is caused by the following code sequence on CPU1
      
           rcu_read_lock()
           x = lookup();
           if (x)
           	spin_lock_irqsave(&x->it_lock);
           rcu_read_unlock();
           return x;
      
      We could fix that in the posix timer code by keeping rcu read locked across
      the spinlocked and irq disabled section, but the above sequence is common and
      there is no reason not to support it.
      
      Taking rt_mutex.wait_lock irq safe prevents the deadlock.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      b4abf910
  3. 06 10月, 2015 1 次提交
  4. 23 9月, 2015 1 次提交
  5. 20 7月, 2015 1 次提交
  6. 20 6月, 2015 2 次提交
  7. 19 6月, 2015 1 次提交
    • D
      locking/rtmutex: Implement lockless top-waiter wakeup · 45ab4eff
      Davidlohr Bueso 提交于
      Mark the task for later wakeup after the wait_lock has been released.
      This way, once the next task is awoken, it will have a better chance
      to of finding the wait_lock free when continuing executing in
      __rt_mutex_slowlock() when trying to acquire the rtmutex, calling
      try_to_take_rt_mutex(). Upon contended scenarios, other tasks attempting
      take the lock may acquire it first, right after the wait_lock is released,
      but (a) this can also occur with the current code, as it relies on the
      spinlock fairness, and (b) we are dealing with the top-waiter anyway,
      so it will always take the lock next.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1432056298-18738-2-git-send-email-dave@stgolabs.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      45ab4eff
  8. 14 5月, 2015 1 次提交
    • T
      rtmutex: Warn if trylock is called from hard/softirq context · 6ce47fd9
      Thomas Gleixner 提交于
      rt_mutex_trylock() must be called from thread context. It can be
      called from atomic regions (preemption or interrupts disabled), but
      not from hard/softirq/nmi context. Add a warning to alert abusers.
      
      The reasons for this are:
      
          1) There is a potential deadlock in the slowpath
      
          2) Another cpu which blocks on the rtmutex will boost the task
             which allegedly locked the rtmutex, but that cannot work
             because the hard/softirq context borrows the task context.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      6ce47fd9
  9. 13 5月, 2015 1 次提交
    • S
      locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG · cede8841
      Sebastian Andrzej Siewior 提交于
      The rtmutex code is the only user of __HAVE_ARCH_CMPXCHG and we have a few
      other user of cmpxchg() which do not care about __HAVE_ARCH_CMPXCHG. This
      define was first introduced in 23f78d4a ("[PATCH] pi-futex: rt mutex core")
      which is v2.6.18. The generic cmpxchg was introduced later in 068fbad2
      ("Add cmpxchg_local to asm-generic for per cpu atomic operations") which is
      v2.6.25.
      Back then something was required to get rtmutex working with the fast
      path on architectures without cmpxchg and this seems to be the result.
      
      It popped up recently on rt-users because ARM (v6+) does not define
      __HAVE_ARCH_CMPXCHG (even that it implements it) which results in slower
      locking performance in the fast path.
      To put some numbers on it: preempt -RT, am335x, 10 loops of
      100000 invocations of rt_spin_lock() + rt_spin_unlock() (time "total" is
      the average of the 10 loops for the 100000 invocations, "loop" is
      "total / 100000 * 1000"):
      
           cmpxchg |    slowpath used  ||    cmpxchg used
                   |   total   | loop  ||   total    | loop
           --------|-----------|-------||------------|-------
           ARMv6   | 9129.4 us | 91 ns ||  3311.9 us |  33 ns
           generic | 9360.2 us | 94 ns || 10834.6 us | 108 ns
           ----------------------------||--------------------
      
      Forcing it to generic cmpxchg() made things worse for the slowpath and
      even worse in cmpxchg() path. It boils down to 14ns more per lock+unlock
      in a cache hot loop so it might not be that much in real world.
      The last test was a substitute for pre ARMv6 machine but then I was able
      to perform the comparison on imx28 which is ARMv5 and therefore is
      always is using the generic cmpxchg implementation. And the numbers:
      
                    |   total     | loop
           -------- |-----------  |--------
           slowpath | 263937.2 us | 2639 ns
           cmpxchg  |  16934.2 us |  169 ns
           --------------------------------
      
      The numbers are larger since the machine is slower in general. However,
      letting rtmutex use cmpxchg() instead the slowpath seem to improve things.
      
      Since from the ARM (tested on am335x + imx28) point of view always
      using cmpxchg() in rt_mutex_lock() + rt_mutex_unlock() makes sense I
      would drop the define.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: will.deacon@arm.com
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/20150225175613.GE6823@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      cede8841
  10. 08 5月, 2015 1 次提交
  11. 22 4月, 2015 1 次提交
  12. 25 3月, 2015 1 次提交
  13. 01 3月, 2015 1 次提交
  14. 18 2月, 2015 1 次提交
  15. 04 2月, 2015 1 次提交
  16. 13 8月, 2014 1 次提交
    • D
      locking/Documentation: Move locking related docs into Documentation/locking/ · 214e0aed
      Davidlohr Bueso 提交于
      Specifically:
        Documentation/locking/lockdep-design.txt
        Documentation/locking/lockstat.txt
        Documentation/locking/mutex-design.txt
        Documentation/locking/rt-mutex-design.txt
        Documentation/locking/rt-mutex.txt
        Documentation/locking/spinlocks.txt
        Documentation/locking/ww-mutex-design.txt
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Acked-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: jason.low2@hp.com
      Cc: aswin@hp.com
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lubomir Rintel <lkundrak@v3.sk>
      Cc: Masanari Iida <standby24x7@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: fengguang.wu@intel.com
      Link: http://lkml.kernel.org/r/1406752916-3341-6-git-send-email-davidlohr@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      214e0aed
  17. 22 6月, 2014 9 次提交
  18. 16 6月, 2014 1 次提交
    • T
      rtmutex: Plug slow unlock race · 27e35715
      Thomas Gleixner 提交于
      When the rtmutex fast path is enabled the slow unlock function can
      create the following situation:
      
      spin_lock(foo->m->wait_lock);
      foo->m->owner = NULL;
      	    			rt_mutex_lock(foo->m); <-- fast path
      				free = atomic_dec_and_test(foo->refcnt);
      				rt_mutex_unlock(foo->m); <-- fast path
      				if (free)
      				   kfree(foo);
      
      spin_unlock(foo->m->wait_lock); <--- Use after free.
      
      Plug the race by changing the slow unlock to the following scheme:
      
           while (!rt_mutex_has_waiters(m)) {
           	    /* Clear the waiters bit in m->owner */
      	    clear_rt_mutex_waiters(m);
            	    owner = rt_mutex_owner(m);
            	    spin_unlock(m->wait_lock);
            	    if (cmpxchg(m->owner, owner, 0) == owner)
            	       return;
            	    spin_lock(m->wait_lock);
           }
      
      So in case of a new waiter incoming while the owner tries the slow
      path unlock we have two situations:
      
       unlock(wait_lock);
      					lock(wait_lock);
       cmpxchg(p, owner, 0) == owner
       	    	   			mark_rt_mutex_waiters(lock);
      	 				acquire(lock);
      
      Or:
      
       unlock(wait_lock);
      					lock(wait_lock);
      	 				mark_rt_mutex_waiters(lock);
       cmpxchg(p, owner, 0) != owner
      					enqueue_waiter();
      					unlock(wait_lock);
       lock(wait_lock);
       wakeup_next waiter();
       unlock(wait_lock);
      					lock(wait_lock);
      					acquire(lock);
      
      If the fast path is disabled, then the simple
      
         m->owner = NULL;
         unlock(m->wait_lock);
      
      is sufficient as all access to m->owner is serialized via
      m->wait_lock;
      
      Also document and clarify the wakeup_next_waiter function as suggested
      by Oleg Nesterov.
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140611183852.937945560@linutronix.de
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      27e35715
  19. 07 6月, 2014 2 次提交
    • T
      rtmutex: Detect changes in the pi lock chain · 82084984
      Thomas Gleixner 提交于
      When we walk the lock chain, we drop all locks after each step. So the
      lock chain can change under us before we reacquire the locks. That's
      harmless in principle as we just follow the wrong lock path. But it
      can lead to a false positive in the dead lock detection logic:
      
      T0 holds L0
      T0 blocks on L1 held by T1
      T1 blocks on L2 held by T2
      T2 blocks on L3 held by T3
      T4 blocks on L4 held by T4
      
      Now we walk the chain
      
      lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> 
           lock T2 ->  adjust T2 ->  drop locks
      
      T2 times out and blocks on L0
      
      Now we continue:
      
      lock T2 -> lock L0 -> deadlock detected, but it's not a deadlock at all.
      
      Brad tried to work around that in the deadlock detection logic itself,
      but the more I looked at it the less I liked it, because it's crystal
      ball magic after the fact.
      
      We actually can detect a chain change very simple:
      
      lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> lock T2 -> adjust T2 ->
      
           next_lock = T2->pi_blocked_on->lock;
      
      drop locks
      
      T2 times out and blocks on L0
      
      Now we continue:
      
      lock T2 -> 
      
           if (next_lock != T2->pi_blocked_on->lock)
           	   return;
      
      So if we detect that T2 is now blocked on a different lock we stop the
      chain walk. That's also correct in the following scenario:
      
      lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> lock T2 -> adjust T2 ->
      
           next_lock = T2->pi_blocked_on->lock;
      
      drop locks
      
      T3 times out and drops L3
      T2 acquires L3 and blocks on L4 now
      
      Now we continue:
      
      lock T2 -> 
      
           if (next_lock != T2->pi_blocked_on->lock)
           	   return;
      
      We don't have to follow up the chain at that point, because T2
      propagated our priority up to T4 already.
      
      [ Folded a cleanup patch from peterz ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reported-by: NBrad Mouring <bmouring@ni.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140605152801.930031935@linutronix.de
      Cc: stable@vger.kernel.org
      82084984
    • T
      rtmutex: Handle deadlock detection smarter · 3d5c9340
      Thomas Gleixner 提交于
      Even in the case when deadlock detection is not requested by the
      caller, we can detect deadlocks. Right now the code stops the lock
      chain walk and keeps the waiter enqueued, even on itself. Silly not to
      yell when such a scenario is detected and to keep the waiter enqueued.
      
      Return -EDEADLK unconditionally and handle it at the call sites.
      
      The futex calls return -EDEADLK. The non futex ones dequeue the
      waiter, throw a warning and put the task into a schedule loop.
      
      Tagged for stable as it makes the code more robust.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Brad Mouring <bmouring@ni.com>
      Link: http://lkml.kernel.org/r/20140605152801.836501969@linutronix.de
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      3d5c9340
  20. 28 5月, 2014 1 次提交
    • T
      rtmutex: Fix deadlock detector for real · 397335f0
      Thomas Gleixner 提交于
      The current deadlock detection logic does not work reliably due to the
      following early exit path:
      
      	/*
      	 * Drop out, when the task has no waiters. Note,
      	 * top_waiter can be NULL, when we are in the deboosting
      	 * mode!
      	 */
      	if (top_waiter && (!task_has_pi_waiters(task) ||
      			   top_waiter != task_top_pi_waiter(task)))
      		goto out_unlock_pi;
      
      So this not only exits when the task has no waiters, it also exits
      unconditionally when the current waiter is not the top priority waiter
      of the task.
      
      So in a nested locking scenario, it might abort the lock chain walk
      and therefor miss a potential deadlock.
      
      Simple fix: Continue the chain walk, when deadlock detection is
      enabled.
      
      We also avoid the whole enqueue, if we detect the deadlock right away
      (A-A). It's an optimization, but also prevents that another waiter who
      comes in after the detection and before the task has undone the damage
      observes the situation and detects the deadlock and returns
      -EDEADLOCK, which is wrong as the other task is not in a deadlock
      situation.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20140522031949.725272460@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      397335f0
  21. 23 2月, 2014 1 次提交
  22. 13 1月, 2014 2 次提交
    • D
      sched/deadline: Add SCHED_DEADLINE inheritance logic · 2d3d891d
      Dario Faggioli 提交于
      Some method to deal with rt-mutexes and make sched_dl interact with
      the current PI-coded is needed, raising all but trivial issues, that
      needs (according to us) to be solved with some restructuring of
      the pi-code (i.e., going toward a proxy execution-ish implementation).
      
      This is under development, in the meanwhile, as a temporary solution,
      what this commits does is:
      
       - ensure a pi-lock owner with waiters is never throttled down. Instead,
         when it runs out of runtime, it immediately gets replenished and it's
         deadline is postponed;
      
       - the scheduling parameters (relative deadline and default runtime)
         used for that replenishments --during the whole period it holds the
         pi-lock-- are the ones of the waiting task with earliest deadline.
      
      Acting this way, we provide some kind of boosting to the lock-owner,
      still by using the existing (actually, slightly modified by the previous
      commit) pi-architecture.
      
      We would stress the fact that this is only a surely needed, all but
      clean solution to the problem. In the end it's only a way to re-start
      discussion within the community. So, as always, comments, ideas, rants,
      etc.. are welcome! :-)
      Signed-off-by: NDario Faggioli <raistlin@linux.it>
      Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
      [ Added !RT_MUTEXES build fix. ]
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1383831828-15501-11-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2d3d891d
    • P
      rtmutex: Turn the plist into an rb-tree · fb00aca4
      Peter Zijlstra 提交于
      Turn the pi-chains from plist to rb-tree, in the rt_mutex code,
      and provide a proper comparison function for -deadline and
      -priority tasks.
      
      This is done mainly because:
       - classical prio field of the plist is just an int, which might
         not be enough for representing a deadline;
       - manipulating such a list would become O(nr_deadline_tasks),
         which might be to much, as the number of -deadline task increases.
      
      Therefore, an rb-tree is used, and tasks are queued in it according
      to the following logic:
       - among two -priority (i.e., SCHED_BATCH/OTHER/RR/FIFO) tasks, the
         one with the higher (lower, actually!) prio wins;
       - among a -priority and a -deadline task, the latter always wins;
       - among two -deadline tasks, the one with the earliest deadline
         wins.
      
      Queueing and dequeueing functions are changed accordingly, for both
      the list of a task's pi-waiters and the list of tasks blocked on
      a pi-lock.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NDario Faggioli <raistlin@linux.it>
      Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
      Signed-off-again-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1383831828-15501-10-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fb00aca4
  23. 06 11月, 2013 1 次提交
  24. 28 5月, 2013 1 次提交
  25. 08 2月, 2013 1 次提交
  26. 12 12月, 2011 1 次提交
  27. 31 10月, 2011 1 次提交
  28. 29 9月, 2011 1 次提交
    • P
      rcu: Permit rt_mutex_unlock() with irqs disabled · 5342e269
      Paul E. McKenney 提交于
      Create a separate lockdep class for the rt_mutex used for RCU priority
      boosting and enable use of rt_mutex_lock() with irqs disabled.  This
      prevents RCU priority boosting from falling prey to deadlocks when
      someone begins an RCU read-side critical section in preemptible state,
      but releases it with an irq-disabled lock held.
      
      Unfortunately, the scheduler's runqueue and priority-inheritance locks
      still must either completely enclose or be completely enclosed by any
      overlapping RCU read-side critical section.
      
      This version removes a redundant local_irq_restore() noted by
      Yong Zhang.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5342e269
  29. 08 7月, 2011 1 次提交