1. 21 7月, 2015 1 次提交
  2. 20 6月, 2015 2 次提交
  3. 19 6月, 2015 5 次提交
    • P
      sched/stop_machine: Fix deadlock between multiple stop_two_cpus() · b17718d0
      Peter Zijlstra 提交于
      Jiri reported a machine stuck in multi_cpu_stop() with
      migrate_swap_stop() as function and with the following src,dst cpu
      pairs: {11,  4} {13, 11} { 4, 13}
      
                              4       11      13
      
      cpuM: queue(4 ,13)
                              *Ma
      cpuN: queue(13,11)
                                      *N      Na
                              *M              Mb
      cpuO: queue(11, 4)
                              *O      Oa
                                      *Nb
                              *Ob
      
      Where *X denotes the cpu running the queueing of cpu-X and X[ab] denotes
      the first/second queued work.
      
      You'll observe the top of the workqueue for each cpu: 4,11,13 to be work
      from cpus: M, O, N resp. IOW. deadlock.
      
      Do away with the queueing trickery and introduce lg_double_lock() to
      lock both CPUs and fully serialize the stop_two_cpus() callers instead
      of the partial (and buggy) serialization we have now.
      Reported-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150605153023.GH19282@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b17718d0
    • W
      locking/qrwlock: Don't contend with readers when setting _QW_WAITING · 405963b6
      Waiman Long 提交于
      The current cmpxchg() loop in setting the _QW_WAITING flag for writers
      in queue_write_lock_slowpath() will contend with incoming readers
      causing possibly extra cmpxchg() operations that are wasteful. This
      patch changes the code to do a byte cmpxchg() to eliminate contention
      with new readers.
      
      A multithreaded microbenchmark running 5M read_lock/write_lock loop
      on a 8-socket 80-core Westmere-EX machine running 4.0 based kernel
      with the qspinlock patch have the following execution times (in ms)
      with and without the patch:
      
      With R:W ratio = 5:1
      
      	Threads	   w/o patch	with patch	% change
      	-------	   ---------	----------	--------
      	   2	     990	    895		  -9.6%
      	   3	    2136	   1912		 -10.5%
      	   4	    3166	   2830		 -10.6%
      	   5	    3953	   3629		  -8.2%
      	   6	    4628	   4405		  -4.8%
      	   7	    5344	   5197		  -2.8%
      	   8	    6065	   6004		  -1.0%
      	   9	    6826	   6811		  -0.2%
      	  10	    7599	   7599		   0.0%
      	  15	    9757	   9766		  +0.1%
      	  20	   13767	  13817		  +0.4%
      
      With small number of contending threads, this patch can improve
      locking performance by up to 10%. With more contending threads,
      however, the gain diminishes.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1433863153-30722-3-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      405963b6
    • P
      lockdep: Implement lock pinning · a24fc60d
      Peter Zijlstra 提交于
      Add a lockdep annotation that WARNs if you 'accidentially' unlock a
      lock.
      
      This is especially helpful for code with callbacks, where the upper
      layer assumes a lock remains taken but a lower layer thinks it maybe
      can drop and reacquire the lock.
      
      By unwittingly breaking up the lock, races can be introduced.
      
      Lock pinning is a lockdep annotation that helps with this, when you
      lockdep_pin_lock() a held lock, any unlock without a
      lockdep_unpin_lock() will produce a WARN. Think of this as a relative
      of lockdep_assert_held(), except you don't only assert its held now,
      but ensure it stays held until you release your assertion.
      
      RFC: a possible alternative API would be something like:
      
        int cookie = lockdep_pin_lock(&foo);
        ...
        lockdep_unpin_lock(&foo, cookie);
      
      Where we pick a random number for the pin_count; this makes it
      impossible to sneak a lock break in without also passing the right
      cookie along.
      
      I've not done this because it ends up generating code for !LOCKDEP,
      esp. if you need to pass the cookie around for some reason.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.906731065@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a24fc60d
    • P
      lockdep: Simplify lock_release() · e0f56fd7
      Peter Zijlstra 提交于
      lock_release() takes this nested argument that's mostly pointless
      these days, remove the implementation but leave the argument a
      rudiment for now.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.840411606@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e0f56fd7
    • D
      locking/rtmutex: Implement lockless top-waiter wakeup · 45ab4eff
      Davidlohr Bueso 提交于
      Mark the task for later wakeup after the wait_lock has been released.
      This way, once the next task is awoken, it will have a better chance
      to of finding the wait_lock free when continuing executing in
      __rt_mutex_slowlock() when trying to acquire the rtmutex, calling
      try_to_take_rt_mutex(). Upon contended scenarios, other tasks attempting
      take the lock may acquire it first, right after the wait_lock is released,
      but (a) this can also occur with the current code, as it relies on the
      spinlock fairness, and (b) we are dealing with the top-waiter anyway,
      so it will always take the lock next.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1432056298-18738-2-git-send-email-dave@stgolabs.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      45ab4eff
  4. 07 6月, 2015 1 次提交
  5. 03 6月, 2015 1 次提交
    • B
      lockdep: Do not break user-visible string · 92ae1837
      Borislav Petkov 提交于
      Remove the line-break in the user-visible string and add the
      missing space in this error message:
      
        WARNING: lockdep init error! lock-(console_sem).lock was acquiredbefore lockdep_init
      
      Also:
      
        - don't yell, it's just a debug warning
      
        - denote references to function calls with '()'
      
        - standardize the lock name quoting
      
        - and finish the sentence.
      
      The result:
      
        WARNING: lockdep init error: lock '(console_sem).lock' was acquired before lockdep_init().
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150602133827.GD19887@pd.tnic
      [ Added a few more stylistic tweaks to the error message. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      92ae1837
  6. 28 5月, 2015 2 次提交
  7. 19 5月, 2015 1 次提交
  8. 14 5月, 2015 1 次提交
    • T
      rtmutex: Warn if trylock is called from hard/softirq context · 6ce47fd9
      Thomas Gleixner 提交于
      rt_mutex_trylock() must be called from thread context. It can be
      called from atomic regions (preemption or interrupts disabled), but
      not from hard/softirq/nmi context. Add a warning to alert abusers.
      
      The reasons for this are:
      
          1) There is a potential deadlock in the slowpath
      
          2) Another cpu which blocks on the rtmutex will boost the task
             which allegedly locked the rtmutex, but that cannot work
             because the hard/softirq context borrows the task context.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      6ce47fd9
  9. 13 5月, 2015 1 次提交
    • S
      locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG · cede8841
      Sebastian Andrzej Siewior 提交于
      The rtmutex code is the only user of __HAVE_ARCH_CMPXCHG and we have a few
      other user of cmpxchg() which do not care about __HAVE_ARCH_CMPXCHG. This
      define was first introduced in 23f78d4a ("[PATCH] pi-futex: rt mutex core")
      which is v2.6.18. The generic cmpxchg was introduced later in 068fbad2
      ("Add cmpxchg_local to asm-generic for per cpu atomic operations") which is
      v2.6.25.
      Back then something was required to get rtmutex working with the fast
      path on architectures without cmpxchg and this seems to be the result.
      
      It popped up recently on rt-users because ARM (v6+) does not define
      __HAVE_ARCH_CMPXCHG (even that it implements it) which results in slower
      locking performance in the fast path.
      To put some numbers on it: preempt -RT, am335x, 10 loops of
      100000 invocations of rt_spin_lock() + rt_spin_unlock() (time "total" is
      the average of the 10 loops for the 100000 invocations, "loop" is
      "total / 100000 * 1000"):
      
           cmpxchg |    slowpath used  ||    cmpxchg used
                   |   total   | loop  ||   total    | loop
           --------|-----------|-------||------------|-------
           ARMv6   | 9129.4 us | 91 ns ||  3311.9 us |  33 ns
           generic | 9360.2 us | 94 ns || 10834.6 us | 108 ns
           ----------------------------||--------------------
      
      Forcing it to generic cmpxchg() made things worse for the slowpath and
      even worse in cmpxchg() path. It boils down to 14ns more per lock+unlock
      in a cache hot loop so it might not be that much in real world.
      The last test was a substitute for pre ARMv6 machine but then I was able
      to perform the comparison on imx28 which is ARMv5 and therefore is
      always is using the generic cmpxchg implementation. And the numbers:
      
                    |   total     | loop
           -------- |-----------  |--------
           slowpath | 263937.2 us | 2639 ns
           cmpxchg  |  16934.2 us |  169 ns
           --------------------------------
      
      The numbers are larger since the machine is slower in general. However,
      letting rtmutex use cmpxchg() instead the slowpath seem to improve things.
      
      Since from the ARM (tested on am335x + imx28) point of view always
      using cmpxchg() in rt_mutex_lock() + rt_mutex_unlock() makes sense I
      would drop the define.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: will.deacon@arm.com
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/20150225175613.GE6823@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      cede8841
  10. 12 5月, 2015 1 次提交
  11. 11 5月, 2015 2 次提交
  12. 08 5月, 2015 9 次提交
    • W
      locking/pvqspinlock: Implement simple paravirt support for the qspinlock · a23db284
      Waiman Long 提交于
      Provide a separate (second) version of the spin_lock_slowpath for
      paravirt along with a special unlock path.
      
      The second slowpath is generated by adding a few pv hooks to the
      normal slowpath, but where those will compile away for the native
      case, they expand into special wait/wake code for the pv version.
      
      The actual MCS queue can use extra storage in the mcs_nodes[] array to
      keep track of state and therefore uses directed wakeups.
      
      The head contender has no such storage directly visible to the
      unlocker.  So the unlocker searches a hash table with open addressing
      using a simple binary Galois linear feedback shift register.
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1429901803-29771-9-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a23db284
    • P
      locking/qspinlock: Revert to test-and-set on hypervisors · 2aa79af6
      Peter Zijlstra (Intel) 提交于
      When we detect a hypervisor (!paravirt, see qspinlock paravirt support
      patches), revert to a simple test-and-set lock to avoid the horrors
      of queue preemption.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-8-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2aa79af6
    • W
      locking/qspinlock: Use a simple write to grab the lock · 2c83e8e9
      Waiman Long 提交于
      Currently, atomic_cmpxchg() is used to get the lock. However, this
      is not really necessary if there is more than one task in the queue
      and the queue head don't need to reset the tail code. For that case,
      a simple write to set the lock bit is enough as the queue head will
      be the only one eligible to get the lock as long as it checks that
      both the lock and pending bits are not set. The current pending bit
      waiting code will ensure that the bit will not be set as soon as the
      tail code in the lock is set.
      
      With that change, the are some slight improvement in the performance
      of the queued spinlock in the 5M loop micro-benchmark run on a 4-socket
      Westere-EX machine as shown in the tables below.
      
      		[Standalone/Embedded - same node]
        # of tasks	Before patch	After patch	%Change
        ----------	-----------	----------	-------
             3	 2324/2321	2248/2265	 -3%/-2%
             4	 2890/2896	2819/2831	 -2%/-2%
             5	 3611/3595	3522/3512	 -2%/-2%
             6	 4281/4276	4173/4160	 -3%/-3%
             7	 5018/5001	4875/4861	 -3%/-3%
             8	 5759/5750	5563/5568	 -3%/-3%
      
      		[Standalone/Embedded - different nodes]
        # of tasks	Before patch	After patch	%Change
        ----------	-----------	----------	-------
             3	12242/12237	12087/12093	 -1%/-1%
             4	10688/10696	10507/10521	 -2%/-2%
      
      It was also found that this change produced a much bigger performance
      improvement in the newer IvyBridge-EX chip and was essentially to close
      the performance gap between the ticket spinlock and queued spinlock.
      
      The disk workload of the AIM7 benchmark was run on a 4-socket
      Westmere-EX machine with both ext4 and xfs RAM disks at 3000 users
      on a 3.14 based kernel. The results of the test runs were:
      
                      AIM7 XFS Disk Test
        kernel                 JPM    Real Time   Sys Time    Usr Time
        -----                  ---    ---------   --------    --------
        ticketlock            5678233    3.17       96.61       5.81
        qspinlock             5750799    3.13       94.83       5.97
      
                      AIM7 EXT4 Disk Test
        kernel                 JPM    Real Time   Sys Time    Usr Time
        -----                  ---    ---------   --------    --------
        ticketlock            1114551   16.15      509.72       7.11
        qspinlock             2184466    8.24      232.99       6.01
      
      The ext4 filesystem run had a much higher spinlock contention than
      the xfs filesystem run.
      
      The "ebizzy -m" test was also run with the following results:
      
        kernel               records/s  Real Time   Sys Time    Usr Time
        -----                ---------  ---------   --------    --------
        ticketlock             2075       10.00      216.35       3.49
        qspinlock              3023       10.00      198.20       4.80
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-7-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2c83e8e9
    • P
      locking/qspinlock: Optimize for smaller NR_CPUS · 69f9cae9
      Peter Zijlstra (Intel) 提交于
      When we allow for a max NR_CPUS < 2^14 we can optimize the pending
      wait-acquire and the xchg_tail() operations.
      
      By growing the pending bit to a byte, we reduce the tail to 16bit.
      This means we can use xchg16 for the tail part and do away with all
      the repeated compxchg() operations.
      
      This in turn allows us to unconditionally acquire; the locked state
      as observed by the wait loops cannot change. And because both locked
      and pending are now a full byte we can use simple stores for the
      state transition, obviating one atomic operation entirely.
      
      This optimization is needed to make the qspinlock achieve performance
      parity with ticket spinlock at light load.
      
      All this is horribly broken on Alpha pre EV56 (and any other arch that
      cannot do single-copy atomic byte stores).
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-6-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69f9cae9
    • W
      locking/qspinlock: Extract out code snippets for the next patch · 6403bd7d
      Waiman Long 提交于
      This is a preparatory patch that extracts out the following 2 code
      snippets to prepare for the next performance optimization patch.
      
       1) the logic for the exchange of new and previous tail code words
          into a new xchg_tail() function.
       2) the logic for clearing the pending bit and setting the locked bit
          into a new clear_pending_set_locked() function.
      
      This patch also simplifies the trylock operation before queuing by
      calling queued_spin_trylock() directly.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-5-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6403bd7d
    • P
      locking/qspinlock: Add pending bit · c1fb159d
      Peter Zijlstra (Intel) 提交于
      Because the qspinlock needs to touch a second cacheline (the per-cpu
      mcs_nodes[]); add a pending bit and allow a single in-word spinner
      before we punt to the second cacheline.
      
      It is possible so observe the pending bit without the locked bit when
      the last owner has just released but the pending owner has not yet
      taken ownership.
      
      In this case we would normally queue -- because the pending bit is
      already taken. However, in this case the pending bit is guaranteed
      to be released 'soon', therefore wait for it and avoid queueing.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-4-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c1fb159d
    • W
      locking/qspinlock: Introduce a simple generic 4-byte queued spinlock · a33fda35
      Waiman Long 提交于
      This patch introduces a new generic queued spinlock implementation that
      can serve as an alternative to the default ticket spinlock. Compared
      with the ticket spinlock, this queued spinlock should be almost as fair
      as the ticket spinlock. It has about the same speed in single-thread
      and it can be much faster in high contention situations especially when
      the spinlock is embedded within the data structure to be protected.
      
      Only in light to moderate contention where the average queue depth
      is around 1-3 will this queued spinlock be potentially a bit slower
      due to the higher slowpath overhead.
      
      This queued spinlock is especially suit to NUMA machines with a large
      number of cores as the chance of spinlock contention is much higher
      in those machines. The cost of contention is also higher because of
      slower inter-node memory traffic.
      
      Due to the fact that spinlocks are acquired with preemption disabled,
      the process will not be migrated to another CPU while it is trying
      to get a spinlock. Ignoring interrupt handling, a CPU can only be
      contending in one spinlock at any one time. Counting soft IRQ, hard
      IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting
      activities.  By allocating a set of per-cpu queue nodes and used them
      to form a waiting queue, we can encode the queue node address into a
      much smaller 24-bit size (including CPU number and queue node index)
      leaving one byte for the lock.
      
      Please note that the queue node is only needed when waiting for the
      lock. Once the lock is acquired, the queue node can be released to
      be used later.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-2-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a33fda35
    • W
      locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write() · 59aabfc7
      Waiman Long 提交于
      In up_write()/up_read(), rwsem_wake() will be called whenever it
      detects that some writers/readers are waiting. The rwsem_wake()
      function will take the wait_lock and call __rwsem_do_wake() to do the
      real wakeup.  For a heavily contended rwsem, doing a spin_lock() on
      wait_lock will cause further contention on the heavily contended rwsem
      cacheline resulting in delay in the completion of the up_read/up_write
      operations.
      
      This patch makes the wait_lock taking and the call to __rwsem_do_wake()
      optional if at least one spinning writer is present. The spinning
      writer will be able to take the rwsem and call rwsem_wake() later
      when it calls up_write(). With the presence of a spinning writer,
      rwsem_wake() will now try to acquire the lock using trylock. If that
      fails, it will just quit.
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDavidlohr Bueso <dave@stgolabs.net>
      Acked-by: NJason Low <jason.low2@hp.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1430428337-16802-2-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      59aabfc7
    • T
      sched: Handle priority boosted tasks proper in setscheduler() · 0782e63b
      Thomas Gleixner 提交于
      Ronny reported that the following scenario is not handled correctly:
      
      	T1 (prio = 10)
      	   lock(rtmutex);
      
      	T2 (prio = 20)
      	   lock(rtmutex)
      	      boost T1
      
      	T1 (prio = 20)
      	   sys_set_scheduler(prio = 30)
      	   T1 prio = 30
      	   ....
      	   sys_set_scheduler(prio = 10)
      	   T1 prio = 30
      
      The last step is wrong as T1 should now be back at prio 20.
      
      Commit c365c292 ("sched: Consider pi boosting in setscheduler()")
      only handles the case where a boosted tasks tries to lower its
      priority.
      
      Fix it by taking the new effective priority into account for the
      decision whether a change of the priority is required.
      Reported-by: NRonny Meeus <ronny.meeus@gmail.com>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: <stable@vger.kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Fixes: c365c292 ("sched: Consider pi boosting in setscheduler()")
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanosSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0782e63b
  13. 22 4月, 2015 1 次提交
  14. 17 4月, 2015 1 次提交
    • P
      lockdep: Make print_lock() robust against concurrent release · d7bc3197
      Peter Zijlstra 提交于
      During sysrq's show-held-locks command it is possible that
      hlock_class() returns NULL for a given lock. The result is then (after
      the warning):
      
      	|BUG: unable to handle kernel NULL pointer dereference at 0000001c
      	|IP: [<c1088145>] get_usage_chars+0x5/0x100
      	|Call Trace:
      	| [<c1088263>] print_lock_name+0x23/0x60
      	| [<c1576b57>] print_lock+0x5d/0x7e
      	| [<c1088314>] lockdep_print_held_locks+0x74/0xe0
      	| [<c1088652>] debug_show_all_locks+0x132/0x1b0
      	| [<c1315c48>] sysrq_handle_showlocks+0x8/0x10
      
      This *might* happen because the thread on the other CPU drops the lock
      after we are looking ->lockdep_depth and ->held_locks points no longer
      to a lock that is held.
      
      The fix here is to simply ignore it and continue.
      Reported-by: NAndreas Messerschmid <andreas@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d7bc3197
  15. 09 4月, 2015 1 次提交
    • J
      locking/mutex: Further simplify mutex_spin_on_owner() · 01ac33c1
      Jason Low 提交于
      Similar to what Linus suggested for rwsem_spin_on_owner(), in
      mutex_spin_on_owner() instead of having while (true) and
      breaking out of the spin loop on lock->owner != owner, we can
      have the loop directly check for while (lock->owner == owner) to
      improve the readability of the code.
      
      It also shrinks the code a bit:
      
         text    data     bss     dec     hex filename
         3721       0       0    3721     e89 mutex.o.before
         3705       0       0    3705     e79 mutex.o.after
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Link: http://lkml.kernel.org/r/1428521960-5268-2-git-send-email-jason.low2@hp.com
      [ Added code generation info. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      01ac33c1
  16. 25 3月, 2015 1 次提交
  17. 23 3月, 2015 1 次提交
    • P
      lockdep: Fix the module unload key range freeing logic · 35a9393c
      Peter Zijlstra 提交于
      Module unload calls lockdep_free_key_range(), which removes entries
      from the data structures. Most of the lockdep code OTOH assumes the
      data structures are append only; in specific see the comments in
      add_lock_to_list() and look_up_lock_class().
      
      Clearly this has only worked by accident; make it work proper. The
      actual scenario to make it go boom would involve the memory freed by
      the module unlock being re-allocated and re-used for a lock inside of
      a rcu-sched grace period. This is a very unlikely scenario, still
      better plug the hole.
      
      Use RCU list iteration in all places and ammend the comments.
      
      Change lockdep_free_key_range() to issue a sync_sched() between
      removal from the lists and returning -- which results in the memory
      being freed. Further ensure the callers are placed correctly and
      comment the requirements.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Tsyvarev <tsyvarev@ispras.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      35a9393c
  18. 07 3月, 2015 1 次提交
    • J
      locking/rwsem: Fix lock optimistic spinning when owner is not running · 9198f6ed
      Jason Low 提交于
      Ming reported soft lockups occurring when running xfstest due to
      the following tip:locking/core commit:
      
        b3fd4f03 ("locking/rwsem: Avoid deceiving lock spinners")
      
      When doing optimistic spinning in rwsem, threads should stop
      spinning when the lock owner is not running. While a thread is
      spinning on owner, if the owner reschedules, owner->on_cpu
      returns false and we stop spinning.
      
      However, this commit essentially caused the check to get
      ignored because when we break out of the spin loop due to
      !on_cpu, we continue spinning if sem->owner != NULL.
      
      This patch fixes this by making sure we stop spinning if the
      owner is not running. Furthermore, just like with mutexes,
      refactor the code such that we don't have separate checks for
      owner_running(). This makes it more straightforward in terms of
      why we exit the spin on owner loop and we would also avoid
      needing to "guess" why we broke out of the loop to make this
      more readable.
      Reported-and-tested-by: NMing Lei <ming.lei@canonical.com>
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Acked-by: NDavidlohr Bueso <dave@stgolabs.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Link: http://lkml.kernel.org/r/1425714331.2475.388.camel@j-VirtualBoxSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9198f6ed
  19. 01 3月, 2015 1 次提交
  20. 24 2月, 2015 1 次提交
  21. 18 2月, 2015 5 次提交