1. 24 1月, 2017 1 次提交
  2. 15 1月, 2017 1 次提交
    • Y
      locktorture: Fix potential memory leak with rw lock test · f4dbba59
      Yang Shi 提交于
      When running locktorture module with the below commands with kmemleak enabled:
      
      $ modprobe locktorture torture_type=rw_lock_irq
      $ rmmod locktorture
      
      The below kmemleak got caught:
      
      root@10:~# echo scan > /sys/kernel/debug/kmemleak
      [  323.197029] kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      root@10:~# cat /sys/kernel/debug/kmemleak
      unreferenced object 0xffffffc07592d500 (size 128):
        comm "modprobe", pid 368, jiffies 4294924118 (age 205.824s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 c3 7b 02 00 00 00 00 00  .........{......
          00 00 00 00 00 00 00 00 d7 9b 02 00 00 00 00 00  ................
        backtrace:
          [<ffffff80081e5a88>] create_object+0x110/0x288
          [<ffffff80086c6078>] kmemleak_alloc+0x58/0xa0
          [<ffffff80081d5acc>] __kmalloc+0x234/0x318
          [<ffffff80006fa130>] 0xffffff80006fa130
          [<ffffff8008083ae4>] do_one_initcall+0x44/0x138
          [<ffffff800817e28c>] do_init_module+0x68/0x1cc
          [<ffffff800811c848>] load_module+0x1a68/0x22e0
          [<ffffff800811d340>] SyS_finit_module+0xe0/0xf0
          [<ffffff80080836f0>] el0_svc_naked+0x24/0x28
          [<ffffffffffffffff>] 0xffffffffffffffff
      unreferenced object 0xffffffc07592d480 (size 128):
        comm "modprobe", pid 368, jiffies 4294924118 (age 205.824s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 3b 6f 01 00 00 00 00 00  ........;o......
          00 00 00 00 00 00 00 00 23 6a 01 00 00 00 00 00  ........#j......
        backtrace:
          [<ffffff80081e5a88>] create_object+0x110/0x288
          [<ffffff80086c6078>] kmemleak_alloc+0x58/0xa0
          [<ffffff80081d5acc>] __kmalloc+0x234/0x318
          [<ffffff80006fa22c>] 0xffffff80006fa22c
          [<ffffff8008083ae4>] do_one_initcall+0x44/0x138
          [<ffffff800817e28c>] do_init_module+0x68/0x1cc
          [<ffffff800811c848>] load_module+0x1a68/0x22e0
          [<ffffff800811d340>] SyS_finit_module+0xe0/0xf0
          [<ffffff80080836f0>] el0_svc_naked+0x24/0x28
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      It is because cxt.lwsa and cxt.lrsa don't get freed in module_exit, so free
      them in lock_torture_cleanup() and free writer_tasks if reader_tasks is
      failed at memory allocation.
      Signed-off-by: NYang Shi <yang.shi@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      f4dbba59
  3. 14 1月, 2017 1 次提交
  4. 25 12月, 2016 1 次提交
  5. 06 12月, 2016 1 次提交
    • D
      lockdep: Fix report formatting · f943fe0f
      Dmitry Vyukov 提交于
      Since commit:
      
        4bcc595c ("printk: reinstate KERN_CONT for printing continuation lines")
      
      printk() requires KERN_CONT to continue log messages. Lots of printk()
      in lockdep.c and print_ip_sym() don't have it. As the result lockdep
      reports are completely messed up.
      
      Add missing KERN_CONT and inline print_ip_sym() where necessary.
      
      Example of a messed up report:
      
        0-rc5+ #41 Not tainted
        -------------------------------------------------------
        syz-executor0/5036 is trying to acquire lock:
         (
        rtnl_mutex
        ){+.+.+.}
        , at:
        [<ffffffff86b3d6ac>] rtnl_lock+0x1c/0x20
        but task is already holding lock:
         (
        &net->packet.sklist_lock
        ){+.+...}
        , at:
        [<ffffffff873541a6>] packet_diag_dump+0x1a6/0x1920
        which lock already depends on the new lock.
        the existing dependency chain (in reverse order) is:
        -> #3
         (
        &net->packet.sklist_lock
        +.+...}
        ...
      
      Without this patch all scripts that parse kernel bug reports are broken.
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: andreyknvl@google.com
      Cc: aryabinin@virtuozzo.com
      Cc: joe@perches.com
      Cc: syzkaller@googlegroups.com
      Link: http://lkml.kernel.org/r/1480343083-48731-1-git-send-email-dvyukov@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f943fe0f
  6. 05 12月, 2016 1 次提交
  7. 02 12月, 2016 4 次提交
    • T
      locking/rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked() · 84d82ec5
      Thomas Gleixner 提交于
      While debugging the unlock vs. dequeue race which resulted in state
      corruption of futexes the lockless nature of rt_mutex_proxy_unlock()
      caused some confusion.
      
      Add commentry to explain why it is safe to do this lockless. Add matching
      comments to rt_mutex_init_proxy_locked() for completeness sake.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David Daney <ddaney@caviumnetworks.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/20161130210030.591941927@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      84d82ec5
    • T
      locking/rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL · b5016e82
      Thomas Gleixner 提交于
      This is a left over from the original rtmutex implementation which used
      both bit0 and bit1 in the owner pointer. Commit:
      
        8161239a ("rtmutex: Simplify PI algorithm and make highest prio task get lock")
      
      ... removed the usage of bit1, but kept the extra mask around. This is
      confusing at best.
      
      Remove it and just use RT_MUTEX_HAS_WAITERS for the masking.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David Daney <ddaney@caviumnetworks.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/20161130210030.509567906@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b5016e82
    • T
      locking/rtmutex: Use READ_ONCE() in rt_mutex_owner() · 1be5d4fa
      Thomas Gleixner 提交于
      While debugging the rtmutex unlock vs. dequeue race Will suggested to use
      READ_ONCE() in rt_mutex_owner() as it might race against the
      cmpxchg_release() in unlock_rt_mutex_safe().
      
      Will: "It's a minor thing which will most likely not matter in practice"
      
      Careful search did not unearth an actual problem in todays code, but it's
      better to be safe than surprised.
      Suggested-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David Daney <ddaney@caviumnetworks.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20161130210030.431379999@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1be5d4fa
    • T
      locking/rtmutex: Prevent dequeue vs. unlock race · dbb26055
      Thomas Gleixner 提交于
      David reported a futex/rtmutex state corruption. It's caused by the
      following problem:
      
      CPU0		CPU1		CPU2
      
      l->owner=T1
      		rt_mutex_lock(l)
      		lock(l->wait_lock)
      		l->owner = T1 | HAS_WAITERS;
      		enqueue(T2)
      		boost()
      		  unlock(l->wait_lock)
      		schedule()
      
      				rt_mutex_lock(l)
      				lock(l->wait_lock)
      				l->owner = T1 | HAS_WAITERS;
      				enqueue(T3)
      				boost()
      				  unlock(l->wait_lock)
      				schedule()
      		signal(->T2)	signal(->T3)
      		lock(l->wait_lock)
      		dequeue(T2)
      		deboost()
      		  unlock(l->wait_lock)
      				lock(l->wait_lock)
      				dequeue(T3)
      				  ===> wait list is now empty
      				deboost()
      				 unlock(l->wait_lock)
      		lock(l->wait_lock)
      		fixup_rt_mutex_waiters()
      		  if (wait_list_empty(l)) {
      		    owner = l->owner & ~HAS_WAITERS;
      		    l->owner = owner
      		     ==> l->owner = T1
      		  }
      
      				lock(l->wait_lock)
      rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
      				  if (wait_list_empty(l)) {
      				    owner = l->owner & ~HAS_WAITERS;
      cmpxchg(l->owner, T1, NULL)
       ===> Success (l->owner = NULL)
      				    l->owner = owner
      				     ==> l->owner = T1
      				  }
      
      That means the problem is caused by fixup_rt_mutex_waiters() which does the
      RMW to clear the waiters bit unconditionally when there are no waiters in
      the rtmutexes rbtree.
      
      This can be fatal: A concurrent unlock can release the rtmutex in the
      fastpath because the waiters bit is not set. If the cmpxchg() gets in the
      middle of the RMW operation then the previous owner, which just unlocked
      the rtmutex is set as the owner again when the write takes place after the
      successfull cmpxchg().
      
      The solution is rather trivial: verify that the owner member of the rtmutex
      has the waiters bit set before clearing it. This does not require a
      cmpxchg() or other atomic operations because the waiters bit can only be
      set and cleared with the rtmutex wait_lock held. It's also safe against the
      fast path unlock attempt. The unlock attempt via cmpxchg() will either see
      the bit set and take the slowpath or see the bit cleared and release it
      atomically in the fastpath.
      
      It's remarkable that the test program provided by David triggers on ARM64
      and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
      problem exists there as well. That refusal might explain that this got not
      discovered earlier despite the bug existing from day one of the rtmutex
      implementation more than 10 years ago.
      
      Thanks to David for meticulously instrumenting the code and providing the
      information which allowed to decode this subtle problem.
      Reported-by: NDavid Daney <ddaney@caviumnetworks.com>
      Tested-by: NDavid Daney <david.daney@cavium.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@vger.kernel.org
      Fixes: 23f78d4a ("[PATCH] pi-futex: rt mutex core")
      Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dbb26055
  8. 30 11月, 2016 1 次提交
  9. 22 11月, 2016 2 次提交
    • P
      locking/mutex: Break out of expensive busy-loop on... · 05ffc951
      Pan Xinhui 提交于
      locking/mutex: Break out of expensive busy-loop on {mutex,rwsem}_spin_on_owner() when owner vCPU is preempted
      
      An over-committed guest with more vCPUs than pCPUs has a heavy overload
      in the two spin_on_owner. This blames on the lock holder preemption
      issue.
      
      Break out of the loop if the vCPU is preempted: if vcpu_is_preempted(cpu)
      is true.
      
      test-case:
      perf record -a perf bench sched messaging -g 400 -p && perf report
      
      before patch:
      20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
       8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
       4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
       3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
       2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
       2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
       2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
      
      after patch:
       9.99%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
       5.28%  sched-messaging  [unknown]         [H] 0xc0000000000768e0
       4.27%  sched-messaging  [kernel.vmlinux]  [k] __copy_tofrom_user_power7
       3.77%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
       3.24%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
       3.02%  sched-messaging  [kernel.vmlinux]  [k] system_call
       2.69%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
      Tested-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NPan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: David.Laight@ACULAB.COM
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: benh@kernel.crashing.org
      Cc: boqun.feng@gmail.com
      Cc: bsingharora@gmail.com
      Cc: dave@stgolabs.net
      Cc: kernellwp@gmail.com
      Cc: konrad.wilk@oracle.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: mpe@ellerman.id.au
      Cc: paulmck@linux.vnet.ibm.com
      Cc: paulus@samba.org
      Cc: rkrcmar@redhat.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: will.deacon@arm.com
      Cc: xen-devel-request@lists.xenproject.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1478077718-37424-4-git-send-email-xinhui.pan@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      05ffc951
    • P
      locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU in osq_lock() · 5aff60a1
      Pan Xinhui 提交于
      An over-committed guest with more vCPUs than pCPUs has a heavy overload
      in osq_lock().
      
      This is because if vCPU-A holds the osq lock and yields out, vCPU-B ends
      up waiting for per_cpu node->locked to be set. IOW, vCPU-B waits for
      vCPU-A to run and unlock the osq lock.
      
      Use the new vcpu_is_preempted(cpu) interface to detect if a vCPU is
      currently running or not, and break out of the spin-loop if so.
      
      test case:
      
       $ perf record -a perf bench sched messaging -g 400 -p && perf report
      
       before patch:
       18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
       12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
        5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
        3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
        3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
        3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
        2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
      
       after patch:
       20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
        8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
        4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
        3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
        2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
        2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
        2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
      Suggested-by: NBoqun Feng <boqun.feng@gmail.com>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NPan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: David.Laight@ACULAB.COM
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: benh@kernel.crashing.org
      Cc: bsingharora@gmail.com
      Cc: dave@stgolabs.net
      Cc: kernellwp@gmail.com
      Cc: konrad.wilk@oracle.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: mpe@ellerman.id.au
      Cc: paulmck@linux.vnet.ibm.com
      Cc: paulus@samba.org
      Cc: rkrcmar@redhat.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: will.deacon@arm.com
      Cc: xen-devel-request@lists.xenproject.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1478077718-37424-3-git-send-email-xinhui.pan@linux.vnet.ibm.com
      [ Translated to English. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5aff60a1
  10. 21 11月, 2016 1 次提交
  11. 19 11月, 2016 1 次提交
  12. 16 11月, 2016 1 次提交
  13. 11 11月, 2016 1 次提交
  14. 25 10月, 2016 5 次提交
    • W
      locking/mutex: Enable optimistic spinning of woken waiter · b341afb3
      Waiman Long 提交于
      This patch makes the waiter that sets the HANDOFF flag start spinning
      instead of sleeping until the handoff is complete or the owner
      sleeps. Otherwise, the handoff will cause the optimistic spinners to
      abort spinning as the handed-off owner may not be running.
      Tested-by: NJason Low <jason.low2@hpe.com>
      Signed-off-by: NWaiman Long <Waiman.Long@hpe.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Ding Tianhong <dingtianhong@huawei.com>
      Cc: Imre Deak <imre.deak@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Will Deacon <Will.Deacon@arm.com>
      Link: http://lkml.kernel.org/r/1472254509-27508-2-git-send-email-Waiman.Long@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b341afb3
    • W
      locking/mutex: Simplify some ww_mutex code in __mutex_lock_common() · a40ca565
      Waiman Long 提交于
      This patch removes some of the redundant ww_mutex code in
      __mutex_lock_common().
      Tested-by: NJason Low <jason.low2@hpe.com>
      Signed-off-by: NWaiman Long <Waiman.Long@hpe.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Ding Tianhong <dingtianhong@huawei.com>
      Cc: Imre Deak <imre.deak@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Will Deacon <Will.Deacon@arm.com>
      Link: http://lkml.kernel.org/r/1472254509-27508-1-git-send-email-Waiman.Long@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a40ca565
    • P
      locking/mutex: Restructure wait loop · 5bbd7e64
      Peter Zijlstra 提交于
      Doesn't really matter yet, but pull the HANDOFF and trylock out from
      under the wait_lock.
      
      The intention is to add an optimistic spin loop here, which requires
      we do not hold the wait_lock, so shuffle code around in preparation.
      
      Also clarify the purpose of taking the wait_lock in the wait loop, its
      tempting to want to avoid it altogether, but the cancellation cases
      need to to avoid losing wakeups.
      Suggested-by: NWaiman Long <waiman.long@hpe.com>
      Tested-by: NJason Low <jason.low2@hpe.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5bbd7e64
    • P
      locking/mutex: Add lock handoff to avoid starvation · 9d659ae1
      Peter Zijlstra 提交于
      Implement lock handoff to avoid lock starvation.
      
      Lock starvation is possible because mutex_lock() allows lock stealing,
      where a running (or optimistic spinning) task beats the woken waiter
      to the acquire.
      
      Lock stealing is an important performance optimization because waiting
      for a waiter to wake up and get runtime can take a significant time,
      during which everyboy would stall on the lock.
      
      The down-side is of course that it allows for starvation.
      
      This patch has the waiter requesting a handoff if it fails to acquire
      the lock upon waking. This re-introduces some of the wait time,
      because once we do a handoff we have to wait for the waiter to wake up
      again.
      
      A future patch will add a round of optimistic spinning to attempt to
      alleviate this penalty, but if that turns out to not be enough, we can
      add a counter and only request handoff after multiple failed wakeups.
      
      There are a few tricky implementation details:
      
       - accepting a handoff must only be done in the wait-loop. Since the
         handoff condition is owner == current, it can easily cause
         recursive locking trouble.
      
       - accepting the handoff must be careful to provide the ACQUIRE
         semantics.
      
       - having the HANDOFF bit set on unlock requires care, we must not
         clear the owner.
      
       - we must be careful to not leave HANDOFF set after we've acquired
         the lock. The tricky scenario is setting the HANDOFF bit on an
         unlocked mutex.
      Tested-by: NJason Low <jason.low2@hpe.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NWaiman Long <Waiman.Long@hpe.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9d659ae1
    • P
      locking/mutex: Rework mutex::owner · 3ca0ff57
      Peter Zijlstra 提交于
      The current mutex implementation has an atomic lock word and a
      non-atomic owner field.
      
      This disparity leads to a number of issues with the current mutex code
      as it means that we can have a locked mutex without an explicit owner
      (because the owner field has not been set, or already cleared).
      
      This leads to a number of weird corner cases, esp. between the
      optimistic spinning and debug code. Where the optimistic spinning
      code needs the owner field updated inside the lock region, the debug
      code is more relaxed because the whole lock is serialized by the
      wait_lock.
      
      Also, the spinning code itself has a few corner cases where we need to
      deal with a held lock without an owner field.
      
      Furthermore, it becomes even more of a problem when trying to fix
      starvation cases in the current code. We end up stacking special case
      on special case.
      
      To solve this rework the basic mutex implementation to be a single
      atomic word that contains the owner and uses the low bits for extra
      state.
      
      This matches how PI futexes and rt_mutex already work. By having the
      owner an integral part of the lock state a lot of the problems
      dissapear and we get a better option to deal with starvation cases,
      direct owner handoff.
      
      Changing the basic mutex does however invalidate all the arch specific
      mutex code; this patch leaves that unused in-place, a later patch will
      remove that.
      Tested-by: NJason Low <jason.low2@hpe.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3ca0ff57
  15. 22 9月, 2016 3 次提交
  16. 18 8月, 2016 3 次提交
    • D
      locking/rwsem: Scan the wait_list for readers only once · 70800c3c
      Davidlohr Bueso 提交于
      When wanting to wakeup readers, __rwsem_mark_wakeup() currently
      iterates the wait_list twice while looking to wakeup the first N
      queued reader-tasks. While this can be quite inefficient, it was
      there such that a awoken reader would be first and foremost
      acknowledged by the lock counter.
      
      Keeping the same logic, we can further benefit from the use of
      wake_qs and avoid entirely the first wait_list iteration that sets
      the counter as wake_up_process() isn't going to occur right away,
      and therefore we maintain the counter->list order of going about
      things.
      
      Other than saving cycles with O(n) "scanning", this change also
      nicely cleans up a good chunk of __rwsem_mark_wakeup(); both
      visually and less tedious to read.
      
      For example, the following improvements where seen on some will
      it scale microbenchmarks, on a 48-core Haswell:
      
                                             v4.7              v4.7-rwsem-v1
        Hmean    signal1-processes-8    5792691.42 (  0.00%)  5771971.04 ( -0.36%)
        Hmean    signal1-processes-12   6081199.96 (  0.00%)  6072174.38 ( -0.15%)
        Hmean    signal1-processes-21   3071137.71 (  0.00%)  3041336.72 ( -0.97%)
        Hmean    signal1-processes-48   3712039.98 (  0.00%)  3708113.59 ( -0.11%)
        Hmean    signal1-processes-79   4464573.45 (  0.00%)  4682798.66 (  4.89%)
        Hmean    signal1-processes-110  4486842.01 (  0.00%)  4633781.71 (  3.27%)
        Hmean    signal1-processes-141  4611816.83 (  0.00%)  4692725.38 (  1.75%)
        Hmean    signal1-processes-172  4638157.05 (  0.00%)  4714387.86 (  1.64%)
        Hmean    signal1-processes-203  4465077.80 (  0.00%)  4690348.07 (  5.05%)
        Hmean    signal1-processes-224  4410433.74 (  0.00%)  4687534.43 (  6.28%)
      
        Stddev   signal1-processes-8       6360.47 (  0.00%)     8455.31 ( 32.94%)
        Stddev   signal1-processes-12      4004.98 (  0.00%)     9156.13 (128.62%)
        Stddev   signal1-processes-21      3273.14 (  0.00%)     5016.80 ( 53.27%)
        Stddev   signal1-processes-48     28420.25 (  0.00%)    26576.22 ( -6.49%)
        Stddev   signal1-processes-79     22038.34 (  0.00%)    18992.70 (-13.82%)
        Stddev   signal1-processes-110    23226.93 (  0.00%)    17245.79 (-25.75%)
        Stddev   signal1-processes-141     6358.98 (  0.00%)     7636.14 ( 20.08%)
        Stddev   signal1-processes-172     9523.70 (  0.00%)     4824.75 (-49.34%)
        Stddev   signal1-processes-203    13915.33 (  0.00%)     9326.33 (-32.98%)
        Stddev   signal1-processes-224    15573.94 (  0.00%)    10613.82 (-31.85%)
      
      Other runs that saw improvements include context_switch and pipe; and
      as expected, this is particularly highlighted on larger thread counts
      as it becomes more expensive to walk the list twice.
      
      No change in wakeup ordering or semantics.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman.Long@hp.com
      Cc: dave@stgolabs.net
      Cc: jason.low2@hpe.com
      Cc: wanpeng.li@hotmail.com
      Link: http://lkml.kernel.org/r/1470384285-32163-4-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      70800c3c
    • D
      locking/rwsem: Remove a few useless comments · c2867bba
      Davidlohr Bueso 提交于
      Our rwsem code (xadd, at least) is rather well documented, but
      there are a few really annoying comments in there that serve
      no purpose and we shouldn't bother with them.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman.Long@hp.com
      Cc: dave@stgolabs.net
      Cc: jason.low2@hpe.com
      Cc: wanpeng.li@hotmail.com
      Link: http://lkml.kernel.org/r/1470384285-32163-3-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c2867bba
    • D
      locking/rwsem: Return void in __rwsem_mark_wake() · 84b23f9b
      Davidlohr Bueso 提交于
      We currently return a rw_semaphore structure, which is the
      same lock we passed to the function's argument in the first
      place. While there are several functions that choose this
      return value, the callers use it, for example, for things
      like ERR_PTR. This is not the case for __rwsem_mark_wake(),
      and in addition this function is really about the lock
      waiters (which we know there are at this point), so its
      somewhat odd to be returning the sem structure.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman.Long@hp.com
      Cc: dave@stgolabs.net
      Cc: jason.low2@hpe.com
      Cc: wanpeng.li@hotmail.com
      Link: http://lkml.kernel.org/r/1470384285-32163-2-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      84b23f9b
  17. 10 8月, 2016 5 次提交
  18. 27 6月, 2016 1 次提交
  19. 24 6月, 2016 1 次提交
  20. 16 6月, 2016 3 次提交
  21. 14 6月, 2016 2 次提交