提交 9d4646d1 编写于 作者: W Will Deacon 提交者: Ingo Molnar

locking/qspinlock: Elide back-to-back RELEASE operations with smp_wmb()

The qspinlock slowpath must ensure that the MCS node is fully initialised
before it can be reached by another other CPU. This is currently enforced
by using a RELEASE operation when updating the tail and also when linking
the node into the waitqueue, since the control dependency off xchg_tail
is insufficient to enforce sufficient ordering, see:

  95bcade3 ("locking/qspinlock: Ensure node is initialised before updating prev->next")

Back-to-back RELEASE operations may be expensive on some architectures,
particularly those that implement them using fences under the hood. We
can replace the two RELEASE operations with a single smp_wmb() fence and
use RELAXED operations for the subsequent publishing of the node.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWaiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1524738868-31318-12-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
上级 626e5fbc
...@@ -164,10 +164,10 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock) ...@@ -164,10 +164,10 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
{ {
/* /*
* Use release semantics to make sure that the MCS node is properly * We can use relaxed semantics since the caller ensures that the
* initialized before changing the tail code. * MCS node is properly initialized before updating the tail.
*/ */
return (u32)xchg_release(&lock->tail, return (u32)xchg_relaxed(&lock->tail,
tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
} }
...@@ -212,10 +212,11 @@ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) ...@@ -212,10 +212,11 @@ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
for (;;) { for (;;) {
new = (val & _Q_LOCKED_PENDING_MASK) | tail; new = (val & _Q_LOCKED_PENDING_MASK) | tail;
/* /*
* Use release semantics to make sure that the MCS node is * We can use relaxed semantics since the caller ensures that
* properly initialized before changing the tail code. * the MCS node is properly initialized before updating the
* tail.
*/ */
old = atomic_cmpxchg_release(&lock->val, val, new); old = atomic_cmpxchg_relaxed(&lock->val, val, new);
if (old == val) if (old == val)
break; break;
...@@ -388,12 +389,18 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) ...@@ -388,12 +389,18 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
goto release; goto release;
/* /*
* Ensure that the initialisation of @node is complete before we
* publish the updated tail via xchg_tail() and potentially link
* @node into the waitqueue via WRITE_ONCE(prev->next, node) below.
*/
smp_wmb();
/*
* Publish the updated tail.
* We have already touched the queueing cacheline; don't bother with * We have already touched the queueing cacheline; don't bother with
* pending stuff. * pending stuff.
* *
* p,*,* -> n,*,* * p,*,* -> n,*,*
*
* RELEASE, such that the stores to @node must be complete.
*/ */
old = xchg_tail(lock, tail); old = xchg_tail(lock, tail);
next = NULL; next = NULL;
...@@ -405,14 +412,8 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) ...@@ -405,14 +412,8 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
if (old & _Q_TAIL_MASK) { if (old & _Q_TAIL_MASK) {
prev = decode_tail(old); prev = decode_tail(old);
/* /* Link @node into the waitqueue. */
* We must ensure that the stores to @node are observed before WRITE_ONCE(prev->next, node);
* the write to prev->next. The address dependency from
* xchg_tail is not sufficient to ensure this because the read
* component of xchg_tail is unordered with respect to the
* initialisation of @node.
*/
smp_store_release(&prev->next, node);
pv_wait_node(node, prev); pv_wait_node(node, prev);
arch_mcs_spin_lock_contended(&node->locked); arch_mcs_spin_lock_contended(&node->locked);
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册