1. 03 6月, 2016 3 次提交
    • D
      locking/rwsem: Enable lockless waiter wakeup(s) · 133e89ef
      Davidlohr Bueso 提交于
      As wake_qs gain users, we can teach rwsems about them such that
      waiters can be awoken without the wait_lock. This is for both
      readers and writer, the former being the most ideal candidate
      as we can batch the wakeups shortening the critical region that
      much more -- ie writer task blocking a bunch of tasks waiting to
      service page-faults (mmap_sem readers).
      
      In general applying wake_qs to rwsem (xadd) is not difficult as
      the wait_lock is intended to be released soon _anyways_, with
      the exception of when a writer slowpath will proactively wakeup
      any queued readers if it sees that the lock is owned by a reader,
      in which we simply do the wakeups with the lock held (see comment
      in __rwsem_down_write_failed_common()).
      
      Similar to other locking primitives, delaying the waiter being
      awoken does allow, at least in theory, the lock to be stolen in
      the case of writers, however no harm was seen in this (in fact
      lock stealing tends to be a _good_ thing in most workloads), and
      this is a tiny window anyways.
      
      Some page-fault (pft) and mmap_sem intensive benchmarks show some
      pretty constant reduction in systime (by up to ~8 and ~10%) on a
      2-socket, 12 core AMD box. In addition, on an 8-core Westmere doing
      page allocations (page_test)
      
      aim9:
      	 4.6-rc6				4.6-rc6
      						rwsemv2
      Min      page_test   378167.89 (  0.00%)   382613.33 (  1.18%)
      Min      exec_test      499.00 (  0.00%)      502.67 (  0.74%)
      Min      fork_test     3395.47 (  0.00%)     3537.64 (  4.19%)
      Hmean    page_test   395433.06 (  0.00%)   414693.68 (  4.87%)
      Hmean    exec_test      499.67 (  0.00%)      505.30 (  1.13%)
      Hmean    fork_test     3504.22 (  0.00%)     3594.95 (  2.59%)
      Stddev   page_test    17426.57 (  0.00%)    26649.92 (-52.93%)
      Stddev   exec_test        0.47 (  0.00%)        1.41 (-199.05%)
      Stddev   fork_test       63.74 (  0.00%)       32.59 ( 48.86%)
      Max      page_test   429873.33 (  0.00%)   456960.00 (  6.30%)
      Max      exec_test      500.33 (  0.00%)      507.66 (  1.47%)
      Max      fork_test     3653.33 (  0.00%)     3650.90 ( -0.07%)
      
      	     4.6-rc6     4.6-rc6
      			 rwsemv2
      User            1.12        0.04
      System          0.23        0.04
      Elapsed       727.27      721.98
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman.Long@hpe.com
      Cc: dave@stgolabs.net
      Cc: jason.low2@hp.com
      Cc: peter@hurleysoftware.com
      Link: http://lkml.kernel.org/r/1463165787-25937-2-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      133e89ef
    • C
      locking/ww_mutex: Report recursive ww_mutex locking early · 0422e83d
      Chris Wilson 提交于
      Recursive locking for ww_mutexes was originally conceived as an
      exception. However, it is heavily used by the DRM atomic modesetting
      code. Currently, the recursive deadlock is checked after we have queued
      up for a busy-spin and as we never release the lock, we spin until
      kicked, whereupon the deadlock is discovered and reported.
      
      A simple solution for the now common problem is to move the recursive
      deadlock discovery to the first action when taking the ww_mutex.
      Suggested-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1464293297-19777-1-git-send-email-chris@chris-wilson.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0422e83d
    • P
      locking/seqcount: Re-fix raw_read_seqcount_latch() · 55eed755
      Peter Zijlstra 提交于
      Commit 50755bc1 ("seqlock: fix raw_read_seqcount_latch()") broke
      raw_read_seqcount_latch().
      
      If you look at the comment that was modified; the thing that changes is
      the seq count, not the latch pointer.
      
       * void latch_modify(struct latch_struct *latch, ...)
       * {
       *	smp_wmb();	<- Ensure that the last data[1] update is visible
       *	latch->seq++;
       *	smp_wmb();	<- Ensure that the seqcount update is visible
       *
       *	modify(latch->data[0], ...);
       *
       *	smp_wmb();	<- Ensure that the data[0] update is visible
       *	latch->seq++;
       *	smp_wmb();	<- Ensure that the seqcount update is visible
       *
       *	modify(latch->data[1], ...);
       * }
       *
       * The query will have a form like:
       *
       * struct entry *latch_query(struct latch_struct *latch, ...)
       * {
       *	struct entry *entry;
       *	unsigned seq, idx;
       *
       *	do {
       *		seq = lockless_dereference(latch->seq);
      
      So here we have:
      
      		seq = READ_ONCE(latch->seq);
      		smp_read_barrier_depends();
      
      Which is exactly what we want; the new code:
      
      		seq = ({ p = READ_ONCE(latch);
      			 smp_read_barrier_depends(); p })->seq;
      
      is just wrong; because it looses the volatile read on seq, which can now
      be torn or worse 'optimized'. And the read_depend barrier is also placed
      wrong, we want it after the load of seq, to match the above data[]
      up-to-date wmb()s.
      
      Such that when we dereference latch->data[] below, we're guaranteed to
      observe the right data.
      
       *
       *		idx = seq & 0x01;
       *		entry = data_query(latch->data[idx], ...);
       *
       *		smp_rmb();
       *	} while (seq != latch->seq);
       *
       *	return entry;
       * }
      
      So yes, not passing a pointer is not pretty, but the code was correct,
      and isn't anymore now.
      
      Change to explicit READ_ONCE()+smp_read_barrier_depends() to avoid
      confusion and allow strict lockless_dereference() checking.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 50755bc1 ("seqlock: fix raw_read_seqcount_latch()")
      Link: http://lkml.kernel.org/r/20160527111117.GL3192@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      55eed755
  2. 02 6月, 2016 2 次提交
  3. 01 6月, 2016 23 次提交
  4. 31 5月, 2016 5 次提交
  5. 30 5月, 2016 7 次提交