提交 92400b8c 编写于 作者: L Linus Torvalds

Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - Lots of tidying up changes all across the map for Linux's formal
   memory/locking-model tooling, by Alan Stern, Akira Yokosawa, Andrea
   Parri, Paul E. McKenney and SeongJae Park.

   Notable changes beyond an overall update in the tooling itself is the
   tidying up of spin_is_locked() semantics, which spills over into the
   kernel proper as well.

 - qspinlock improvements: the locking algorithm now guarantees forward
   progress whereas the previous implementation in mainline could starve
   threads indefinitely in cmpxchg() loops. Also other related cleanups
   to the qspinlock code (Will Deacon)

 - misc smaller improvements, cleanups and fixes all across the locking
   subsystem

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  locking/rwsem: Simplify the is-owner-spinnable checks
  tools/memory-model: Add reference for 'Simplifying ARM concurrency'
  tools/memory-model: Update ASPLOS information
  MAINTAINERS, tools/memory-model: Update e-mail address for Andrea Parri
  tools/memory-model: Fix coding style in 'lock.cat'
  tools/memory-model: Remove out-of-date comments and code from lock.cat
  tools/memory-model: Improve mixed-access checking in lock.cat
  tools/memory-model: Improve comments in lock.cat
  tools/memory-model: Remove duplicated code from lock.cat
  tools/memory-model: Flag "cumulativity" and "propagation" tests
  tools/memory-model: Add model support for spin_is_locked()
  tools/memory-model: Add scripts to test memory model
  tools/memory-model: Fix coding style in 'linux-kernel.def'
  tools/memory-model: Model 'smp_store_mb()'
  tools/memory-order: Update the cheat-sheet to show that smp_mb__after_atomic() orders later RMW operations
  tools/memory-order: Improve key for SELF and SV
  tools/memory-model: Fix cheat sheet typo
  tools/memory-model: Update required version of herdtools7
  tools/memory-model: Redefine rb in terms of rcu-fence
  tools/memory-model: Rename link and rcu-path to rcu-link and rb
  ...
...@@ -111,7 +111,6 @@ If the compiler can prove that do_something() does not store to the ...@@ -111,7 +111,6 @@ If the compiler can prove that do_something() does not store to the
variable a, then the compiler is within its rights transforming this to variable a, then the compiler is within its rights transforming this to
the following:: the following::
tmp = a;
if (a > 0) if (a > 0)
for (;;) for (;;)
do_something(); do_something();
...@@ -119,7 +118,7 @@ the following:: ...@@ -119,7 +118,7 @@ the following::
If you don't want the compiler to do this (and you probably don't), then If you don't want the compiler to do this (and you probably don't), then
you should use something like the following:: you should use something like the following::
while (READ_ONCE(a) < 0) while (READ_ONCE(a) > 0)
do_something(); do_something();
Alternatively, you could place a barrier() call in the loop. Alternatively, you could place a barrier() call in the loop.
...@@ -467,10 +466,12 @@ Like the above, except that these routines return a boolean which ...@@ -467,10 +466,12 @@ Like the above, except that these routines return a boolean which
indicates whether the changed bit was set _BEFORE_ the atomic bit indicates whether the changed bit was set _BEFORE_ the atomic bit
operation. operation.
WARNING! It is incredibly important that the value be a boolean,
ie. "0" or "1". Do not try to be fancy and save a few instructions by .. warning::
declaring the above to return "long" and just returning something like It is incredibly important that the value be a boolean, ie. "0" or "1".
"old_val & mask" because that will not work. Do not try to be fancy and save a few instructions by declaring the
above to return "long" and just returning something like "old_val &
mask" because that will not work.
For one thing, this return value gets truncated to int in many code For one thing, this return value gets truncated to int in many code
paths using these interfaces, so on 64-bit if the bit is set in the paths using these interfaces, so on 64-bit if the bit is set in the
......
...@@ -1920,9 +1920,6 @@ There are some more advanced barrier functions: ...@@ -1920,9 +1920,6 @@ There are some more advanced barrier functions:
/* assign ownership */ /* assign ownership */
desc->status = DEVICE_OWN; desc->status = DEVICE_OWN;
/* force memory to sync before notifying device via MMIO */
wmb();
/* notify device of new descriptors */ /* notify device of new descriptors */
writel(DESC_NOTIFY, doorbell); writel(DESC_NOTIFY, doorbell);
} }
...@@ -1930,11 +1927,15 @@ There are some more advanced barrier functions: ...@@ -1930,11 +1927,15 @@ There are some more advanced barrier functions:
The dma_rmb() allows us guarantee the device has released ownership The dma_rmb() allows us guarantee the device has released ownership
before we read the data from the descriptor, and the dma_wmb() allows before we read the data from the descriptor, and the dma_wmb() allows
us to guarantee the data is written to the descriptor before the device us to guarantee the data is written to the descriptor before the device
can see it now has ownership. The wmb() is needed to guarantee that the can see it now has ownership. Note that, when using writel(), a prior
cache coherent memory writes have completed before attempting a write to wmb() is not needed to guarantee that the cache coherent memory writes
the cache incoherent MMIO region. have completed before writing to the MMIO region. The cheaper
writel_relaxed() does not provide this guarantee and must not be used
See Documentation/DMA-API.txt for more information on consistent memory. here.
See the subsection "Kernel I/O barrier effects" for more information on
relaxed I/O accessors and the Documentation/DMA-API.txt file for more
information on consistent memory.
MMIO WRITE BARRIER MMIO WRITE BARRIER
......
...@@ -36,6 +36,9 @@ Documentation/memory-barriers.txt ...@@ -36,6 +36,9 @@ Documentation/memory-barriers.txt
부분도 있고, 의도하진 않았지만 사람에 의해 쓰였다보니 불완전한 부분도 있습니다. 부분도 있고, 의도하진 않았지만 사람에 의해 쓰였다보니 불완전한 부분도 있습니다.
이 문서는 리눅스에서 제공하는 다양한 메모리 배리어들을 사용하기 위한 이 문서는 리눅스에서 제공하는 다양한 메모리 배리어들을 사용하기 위한
안내서입니다만, 뭔가 이상하다 싶으면 (그런게 많을 겁니다) 질문을 부탁드립니다. 안내서입니다만, 뭔가 이상하다 싶으면 (그런게 많을 겁니다) 질문을 부탁드립니다.
일부 이상한 점들은 공식적인 메모리 일관성 모델과 tools/memory-model/ 에 있는
관련 문서를 참고해서 해결될 수 있을 겁니다. 그러나, 이 메모리 모델조차도 그
관리자들의 의견의 집합으로 봐야지, 절대 옳은 예언자로 신봉해선 안될 겁니다.
다시 말하지만, 이 문서는 리눅스가 하드웨어에 기대하는 사항에 대한 명세서가 다시 말하지만, 이 문서는 리눅스가 하드웨어에 기대하는 사항에 대한 명세서가
아닙니다. 아닙니다.
...@@ -77,7 +80,7 @@ Documentation/memory-barriers.txt ...@@ -77,7 +80,7 @@ Documentation/memory-barriers.txt
- 메모리 배리어의 종류. - 메모리 배리어의 종류.
- 메모리 배리어에 대해 가정해선 안될 것. - 메모리 배리어에 대해 가정해선 안될 것.
- 데이터 의존성 배리어. - 데이터 의존성 배리어 (역사적).
- 컨트롤 의존성. - 컨트롤 의존성.
- SMP 배리어 짝맞추기. - SMP 배리어 짝맞추기.
- 메모리 배리어 시퀀스의 예. - 메모리 배리어 시퀀스의 예.
...@@ -255,17 +258,20 @@ CPU 에게 기대할 수 있는 최소한의 보장사항 몇가지가 있습니 ...@@ -255,17 +258,20 @@ CPU 에게 기대할 수 있는 최소한의 보장사항 몇가지가 있습니
(*) 어떤 CPU 든, 의존성이 존재하는 메모리 액세스들은 해당 CPU 자신에게 (*) 어떤 CPU 든, 의존성이 존재하는 메모리 액세스들은 해당 CPU 자신에게
있어서는 순서대로 메모리 시스템에 수행 요청됩니다. 즉, 다음에 대해서: 있어서는 순서대로 메모리 시스템에 수행 요청됩니다. 즉, 다음에 대해서:
Q = READ_ONCE(P); smp_read_barrier_depends(); D = READ_ONCE(*Q); Q = READ_ONCE(P); D = READ_ONCE(*Q);
CPU 는 다음과 같은 메모리 오퍼레이션 시퀀스를 수행 요청합니다: CPU 는 다음과 같은 메모리 오퍼레이션 시퀀스를 수행 요청합니다:
Q = LOAD P, D = LOAD *Q Q = LOAD P, D = LOAD *Q
그리고 그 시퀀스 내에서의 순서는 항상 지켜집니다. 대부분의 시스템에서 그리고 그 시퀀스 내에서의 순서는 항상 지켜집니다. 하지만, DEC Alpha 에서
smp_read_barrier_depends() 는 아무일도 안하지만 DEC Alpha 에서는 READ_ONCE() 는 메모리 배리어 명령도 내게 되어 있어서, DEC Alpha CPU 는
명시적으로 사용되어야 합니다. 보통의 경우에는 smp_read_barrier_depends() 다음과 같은 메모리 오퍼레이션들을 내놓게 됩니다:
를 직접 사용하는 대신 rcu_dereference() 같은 것들을 사용해야 함을
알아두세요. Q = LOAD P, MEMORY_BARRIER, D = LOAD *Q, MEMORY_BARRIER
DEC Alpha 에서 수행되든 아니든, READ_ONCE() 는 컴파일러로부터의 악영향
또한 제거합니다.
(*) 특정 CPU 내에서 겹치는 영역의 메모리에 행해지는 로드와 스토어 들은 해당 (*) 특정 CPU 내에서 겹치는 영역의 메모리에 행해지는 로드와 스토어 들은 해당
CPU 안에서는 순서가 바뀌지 않은 것으로 보여집니다. 즉, 다음에 대해서: CPU 안에서는 순서가 바뀌지 않은 것으로 보여집니다. 즉, 다음에 대해서:
...@@ -421,8 +427,8 @@ CPU 에게 기대할 수 있는 최소한의 보장사항 몇가지가 있습니 ...@@ -421,8 +427,8 @@ CPU 에게 기대할 수 있는 최소한의 보장사항 몇가지가 있습니
데이터 의존성 배리어는 읽기 배리어의 보다 완화된 형태입니다. 두개의 로드 데이터 의존성 배리어는 읽기 배리어의 보다 완화된 형태입니다. 두개의 로드
오퍼레이션이 있고 두번째 것이 첫번째 것의 결과에 의존하고 있을 때(예: 오퍼레이션이 있고 두번째 것이 첫번째 것의 결과에 의존하고 있을 때(예:
두번째 로드가 참조할 주소를 첫번째 로드가 읽는 경우), 두번째 로드가 읽어올 두번째 로드가 참조할 주소를 첫번째 로드가 읽는 경우), 두번째 로드가 읽어올
데이터는 첫번째 로드에 의해 그 주소가 얻어지기 전에 업데이트 되어 있음을 데이터는 첫번째 로드에 의해 그 주소가 얻어진 뒤에 업데이트 됨을 보장하기
보장하기 위해서 데이터 의존성 배리어가 필요할 수 있습니다. 위해서 데이터 의존성 배리어가 필요할 수 있습니다.
데이터 의존성 배리어는 상호 의존적인 로드 오퍼레이션들 사이의 부분적 순서 데이터 의존성 배리어는 상호 의존적인 로드 오퍼레이션들 사이의 부분적 순서
세우기입니다; 스토어 오퍼레이션들이나 독립적인 로드들, 또는 중복되는 세우기입니다; 스토어 오퍼레이션들이나 독립적인 로드들, 또는 중복되는
...@@ -570,8 +576,14 @@ ACQUIRE 는 해당 오퍼레이션의 로드 부분에만 적용되고 RELEASE ...@@ -570,8 +576,14 @@ ACQUIRE 는 해당 오퍼레이션의 로드 부분에만 적용되고 RELEASE
Documentation/DMA-API.txt Documentation/DMA-API.txt
데이터 의존성 배리어 데이터 의존성 배리어 (역사적)
-------------------- -----------------------------
리눅스 커널 v4.15 기준으로, smp_read_barrier_depends() 가 READ_ONCE() 에
추가되었는데, 이는 이 섹션에 주의를 기울여야 하는 사람들은 DEC Alpha 아키텍쳐
전용 코드를 만드는 사람들과 READ_ONCE() 자체를 만드는 사람들 뿐임을 의미합니다.
그런 분들을 위해, 그리고 역사에 관심 있는 분들을 위해, 여기 데이터 의존성
배리어에 대한 이야기를 적습니다.
데이터 의존성 배리어의 사용에 있어 지켜야 하는 사항들은 약간 미묘하고, 데이터 데이터 의존성 배리어의 사용에 있어 지켜야 하는 사항들은 약간 미묘하고, 데이터
의존성 배리어가 사용되어야 하는 상황도 항상 명백하지는 않습니다. 설명을 위해 의존성 배리어가 사용되어야 하는 상황도 항상 명백하지는 않습니다. 설명을 위해
...@@ -1787,7 +1799,7 @@ CPU 메모리 배리어 ...@@ -1787,7 +1799,7 @@ CPU 메모리 배리어
범용 mb() smp_mb() 범용 mb() smp_mb()
쓰기 wmb() smp_wmb() 쓰기 wmb() smp_wmb()
읽기 rmb() smp_rmb() 읽기 rmb() smp_rmb()
데이터 의존성 read_barrier_depends() smp_read_barrier_depends() 데이터 의존성 READ_ONCE()
데이터 의존성 배리어를 제외한 모든 메모리 배리어는 컴파일러 배리어를 데이터 의존성 배리어를 제외한 모든 메모리 배리어는 컴파일러 배리어를
...@@ -2796,8 +2808,9 @@ CPU 2 는 C/D 를 갖습니다)가 병렬로 연결되어 있는 시스템을 ...@@ -2796,8 +2808,9 @@ CPU 2 는 C/D 를 갖습니다)가 병렬로 연결되어 있는 시스템을
여기에 개입하기 위해선, 데이터 의존성 배리어나 읽기 배리어를 로드 오퍼레이션들 여기에 개입하기 위해선, 데이터 의존성 배리어나 읽기 배리어를 로드 오퍼레이션들
사이에 넣어야 합니다. 이렇게 함으로써 캐시가 다음 요청을 처리하기 전에 일관성 사이에 넣어야 합니다 (v4.15 부터는 READ_ONCE() 매크로에 의해 무조건적으로
큐를 처리하도록 강제하게 됩니다. 그렇게 됩니다). 이렇게 함으로써 캐시가 다음 요청을 처리하기 전에 일관성 큐를
처리하도록 강제하게 됩니다.
CPU 1 CPU 2 COMMENT CPU 1 CPU 2 COMMENT
=============== =============== ======================================= =============== =============== =======================================
...@@ -2826,7 +2839,10 @@ CPU 2 는 C/D 를 갖습니다)가 병렬로 연결되어 있는 시스템을 ...@@ -2826,7 +2839,10 @@ CPU 2 는 C/D 를 갖습니다)가 병렬로 연결되어 있는 시스템을
다른 CPU 들도 분할된 캐시를 가지고 있을 수 있지만, 그런 CPU 들은 평범한 메모리 다른 CPU 들도 분할된 캐시를 가지고 있을 수 있지만, 그런 CPU 들은 평범한 메모리
액세스를 위해서도 이 분할된 캐시들 사이의 조정을 해야만 합니다. Alpha 는 가장 액세스를 위해서도 이 분할된 캐시들 사이의 조정을 해야만 합니다. Alpha 는 가장
약한 메모리 순서 시맨틱 (semantic) 을 선택함으로써 메모리 배리어가 명시적으로 약한 메모리 순서 시맨틱 (semantic) 을 선택함으로써 메모리 배리어가 명시적으로
사용되지 않았을 때에는 그런 조정이 필요하지 않게 했습니다. 사용되지 않았을 때에는 그런 조정이 필요하지 않게 했으며, 이는 Alpha 가 당시에
더 높은 CPU 클락 속도를 가질 수 있게 했습니다. 하지만, (다시 말하건대, v4.15
이후부터는) Alpha 아키텍쳐 전용 코드와 READ_ONCE() 매크로 내부에서를 제외하고는
smp_read_barrier_depends() 가 사용되지 않아야 함을 알아두시기 바랍니다.
캐시 일관성 VS DMA 캐시 일관성 VS DMA
...@@ -2988,7 +3004,9 @@ Alpha CPU 의 일부 버전은 분할된 데이터 캐시를 가지고 있어서 ...@@ -2988,7 +3004,9 @@ Alpha CPU 의 일부 버전은 분할된 데이터 캐시를 가지고 있어서
메모리 일관성 시스템과 함께 두개의 캐시를 동기화 시켜서, 포인터 변경과 새로운 메모리 일관성 시스템과 함께 두개의 캐시를 동기화 시켜서, 포인터 변경과 새로운
데이터의 발견을 올바른 순서로 일어나게 하기 때문입니다. 데이터의 발견을 올바른 순서로 일어나게 하기 때문입니다.
리눅스 커널의 메모리 배리어 모델은 Alpha 에 기초해서 정의되었습니다. 리눅스 커널의 메모리 배리어 모델은 Alpha 에 기초해서 정의되었습니다만, v4.15
부터는 리눅스 커널이 READ_ONCE() 내에 smp_read_barrier_depends() 를 추가해서
Alpha 의 메모리 모델로의 영향력이 크게 줄어들긴 했습니다.
위의 "캐시 일관성" 서브섹션을 참고하세요. 위의 "캐시 일관성" 서브섹션을 참고하세요.
......
...@@ -8212,7 +8212,7 @@ F: drivers/misc/lkdtm/* ...@@ -8212,7 +8212,7 @@ F: drivers/misc/lkdtm/*
LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM) LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM)
M: Alan Stern <stern@rowland.harvard.edu> M: Alan Stern <stern@rowland.harvard.edu>
M: Andrea Parri <parri.andrea@gmail.com> M: Andrea Parri <andrea.parri@amarulasolutions.com>
M: Will Deacon <will.deacon@arm.com> M: Will Deacon <will.deacon@arm.com>
M: Peter Zijlstra <peterz@infradead.org> M: Peter Zijlstra <peterz@infradead.org>
M: Boqun Feng <boqun.feng@gmail.com> M: Boqun Feng <boqun.feng@gmail.com>
...@@ -8319,6 +8319,7 @@ F: Documentation/admin-guide/LSM/LoadPin.rst ...@@ -8319,6 +8319,7 @@ F: Documentation/admin-guide/LSM/LoadPin.rst
LOCKING PRIMITIVES LOCKING PRIMITIVES
M: Peter Zijlstra <peterz@infradead.org> M: Peter Zijlstra <peterz@infradead.org>
M: Ingo Molnar <mingo@redhat.com> M: Ingo Molnar <mingo@redhat.com>
M: Will Deacon <will.deacon@arm.com>
L: linux-kernel@vger.kernel.org L: linux-kernel@vger.kernel.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/core T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/core
S: Maintained S: Maintained
......
...@@ -122,11 +122,6 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t lock) ...@@ -122,11 +122,6 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
static inline int arch_spin_is_locked(arch_spinlock_t *lock) static inline int arch_spin_is_locked(arch_spinlock_t *lock)
{ {
/*
* Ensure prior spin_lock operations to other locks have completed
* on this CPU before we test whether "lock" is locked.
*/
smp_mb(); /* ^^^ */
return !arch_spin_value_unlocked(READ_ONCE(*lock)); return !arch_spin_value_unlocked(READ_ONCE(*lock));
} }
......
...@@ -7,6 +7,14 @@ ...@@ -7,6 +7,14 @@
#include <asm-generic/qspinlock_types.h> #include <asm-generic/qspinlock_types.h>
#include <asm/paravirt.h> #include <asm/paravirt.h>
#define _Q_PENDING_LOOPS (1 << 9)
#ifdef CONFIG_PARAVIRT_SPINLOCKS
extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
extern void __pv_init_lock_hash(void);
extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
extern void __raw_callee_save___pv_queued_spin_unlock(struct qspinlock *lock);
#define queued_spin_unlock queued_spin_unlock #define queued_spin_unlock queued_spin_unlock
/** /**
* queued_spin_unlock - release a queued spinlock * queued_spin_unlock - release a queued spinlock
...@@ -16,15 +24,9 @@ ...@@ -16,15 +24,9 @@
*/ */
static inline void native_queued_spin_unlock(struct qspinlock *lock) static inline void native_queued_spin_unlock(struct qspinlock *lock)
{ {
smp_store_release((u8 *)lock, 0); smp_store_release(&lock->locked, 0);
} }
#ifdef CONFIG_PARAVIRT_SPINLOCKS
extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
extern void __pv_init_lock_hash(void);
extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
extern void __raw_callee_save___pv_queued_spin_unlock(struct qspinlock *lock);
static inline void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) static inline void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
{ {
pv_queued_spin_lock_slowpath(lock, val); pv_queued_spin_lock_slowpath(lock, val);
...@@ -40,11 +42,6 @@ static inline bool vcpu_is_preempted(long cpu) ...@@ -40,11 +42,6 @@ static inline bool vcpu_is_preempted(long cpu)
{ {
return pv_vcpu_is_preempted(cpu); return pv_vcpu_is_preempted(cpu);
} }
#else
static inline void queued_spin_unlock(struct qspinlock *lock)
{
native_queued_spin_unlock(lock);
}
#endif #endif
#ifdef CONFIG_PARAVIRT #ifdef CONFIG_PARAVIRT
......
...@@ -22,8 +22,7 @@ PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock_slowpath); ...@@ -22,8 +22,7 @@ PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock_slowpath);
* *
* void __pv_queued_spin_unlock(struct qspinlock *lock) * void __pv_queued_spin_unlock(struct qspinlock *lock)
* { * {
* struct __qspinlock *l = (void *)lock; * u8 lockval = cmpxchg(&lock->locked, _Q_LOCKED_VAL, 0);
* u8 lockval = cmpxchg(&l->locked, _Q_LOCKED_VAL, 0);
* *
* if (likely(lockval == _Q_LOCKED_VAL)) * if (likely(lockval == _Q_LOCKED_VAL))
* return; * return;
......
...@@ -25,6 +25,7 @@ typedef atomic64_t atomic_long_t; ...@@ -25,6 +25,7 @@ typedef atomic64_t atomic_long_t;
#define ATOMIC_LONG_INIT(i) ATOMIC64_INIT(i) #define ATOMIC_LONG_INIT(i) ATOMIC64_INIT(i)
#define ATOMIC_LONG_PFX(x) atomic64 ## x #define ATOMIC_LONG_PFX(x) atomic64 ## x
#define ATOMIC_LONG_TYPE s64
#else #else
...@@ -32,6 +33,7 @@ typedef atomic_t atomic_long_t; ...@@ -32,6 +33,7 @@ typedef atomic_t atomic_long_t;
#define ATOMIC_LONG_INIT(i) ATOMIC_INIT(i) #define ATOMIC_LONG_INIT(i) ATOMIC_INIT(i)
#define ATOMIC_LONG_PFX(x) atomic ## x #define ATOMIC_LONG_PFX(x) atomic ## x
#define ATOMIC_LONG_TYPE int
#endif #endif
...@@ -90,6 +92,21 @@ ATOMIC_LONG_ADD_SUB_OP(sub, _release) ...@@ -90,6 +92,21 @@ ATOMIC_LONG_ADD_SUB_OP(sub, _release)
#define atomic_long_cmpxchg(l, old, new) \ #define atomic_long_cmpxchg(l, old, new) \
(ATOMIC_LONG_PFX(_cmpxchg)((ATOMIC_LONG_PFX(_t) *)(l), (old), (new))) (ATOMIC_LONG_PFX(_cmpxchg)((ATOMIC_LONG_PFX(_t) *)(l), (old), (new)))
#define atomic_long_try_cmpxchg_relaxed(l, old, new) \
(ATOMIC_LONG_PFX(_try_cmpxchg_relaxed)((ATOMIC_LONG_PFX(_t) *)(l), \
(ATOMIC_LONG_TYPE *)(old), (ATOMIC_LONG_TYPE)(new)))
#define atomic_long_try_cmpxchg_acquire(l, old, new) \
(ATOMIC_LONG_PFX(_try_cmpxchg_acquire)((ATOMIC_LONG_PFX(_t) *)(l), \
(ATOMIC_LONG_TYPE *)(old), (ATOMIC_LONG_TYPE)(new)))
#define atomic_long_try_cmpxchg_release(l, old, new) \
(ATOMIC_LONG_PFX(_try_cmpxchg_release)((ATOMIC_LONG_PFX(_t) *)(l), \
(ATOMIC_LONG_TYPE *)(old), (ATOMIC_LONG_TYPE)(new)))
#define atomic_long_try_cmpxchg(l, old, new) \
(ATOMIC_LONG_PFX(_try_cmpxchg)((ATOMIC_LONG_PFX(_t) *)(l), \
(ATOMIC_LONG_TYPE *)(old), (ATOMIC_LONG_TYPE)(new)))
#define atomic_long_xchg_relaxed(v, new) \ #define atomic_long_xchg_relaxed(v, new) \
(ATOMIC_LONG_PFX(_xchg_relaxed)((ATOMIC_LONG_PFX(_t) *)(v), (new))) (ATOMIC_LONG_PFX(_xchg_relaxed)((ATOMIC_LONG_PFX(_t) *)(v), (new)))
#define atomic_long_xchg_acquire(v, new) \ #define atomic_long_xchg_acquire(v, new) \
...@@ -244,6 +261,8 @@ static inline long atomic_long_add_unless(atomic_long_t *l, long a, long u) ...@@ -244,6 +261,8 @@ static inline long atomic_long_add_unless(atomic_long_t *l, long a, long u)
#define atomic_long_inc_not_zero(l) \ #define atomic_long_inc_not_zero(l) \
ATOMIC_LONG_PFX(_inc_not_zero)((ATOMIC_LONG_PFX(_t) *)(l)) ATOMIC_LONG_PFX(_inc_not_zero)((ATOMIC_LONG_PFX(_t) *)(l))
#define atomic_long_cond_read_relaxed(v, c) \
ATOMIC_LONG_PFX(_cond_read_relaxed)((ATOMIC_LONG_PFX(_t) *)(v), (c))
#define atomic_long_cond_read_acquire(v, c) \ #define atomic_long_cond_read_acquire(v, c) \
ATOMIC_LONG_PFX(_cond_read_acquire)((ATOMIC_LONG_PFX(_t) *)(v), (c)) ATOMIC_LONG_PFX(_cond_read_acquire)((ATOMIC_LONG_PFX(_t) *)(v), (c))
......
...@@ -221,18 +221,17 @@ do { \ ...@@ -221,18 +221,17 @@ do { \
#endif #endif
/** /**
* smp_cond_load_acquire() - (Spin) wait for cond with ACQUIRE ordering * smp_cond_load_relaxed() - (Spin) wait for cond with no ordering guarantees
* @ptr: pointer to the variable to wait on * @ptr: pointer to the variable to wait on
* @cond: boolean expression to wait for * @cond: boolean expression to wait for
* *
* Equivalent to using smp_load_acquire() on the condition variable but employs * Equivalent to using READ_ONCE() on the condition variable.
* the control dependency of the wait to reduce the barrier on many platforms.
* *
* Due to C lacking lambda expressions we load the value of *ptr into a * Due to C lacking lambda expressions we load the value of *ptr into a
* pre-named variable @VAL to be used in @cond. * pre-named variable @VAL to be used in @cond.
*/ */
#ifndef smp_cond_load_acquire #ifndef smp_cond_load_relaxed
#define smp_cond_load_acquire(ptr, cond_expr) ({ \ #define smp_cond_load_relaxed(ptr, cond_expr) ({ \
typeof(ptr) __PTR = (ptr); \ typeof(ptr) __PTR = (ptr); \
typeof(*ptr) VAL; \ typeof(*ptr) VAL; \
for (;;) { \ for (;;) { \
...@@ -241,10 +240,26 @@ do { \ ...@@ -241,10 +240,26 @@ do { \
break; \ break; \
cpu_relax(); \ cpu_relax(); \
} \ } \
smp_acquire__after_ctrl_dep(); \
VAL; \ VAL; \
}) })
#endif #endif
/**
* smp_cond_load_acquire() - (Spin) wait for cond with ACQUIRE ordering
* @ptr: pointer to the variable to wait on
* @cond: boolean expression to wait for
*
* Equivalent to using smp_load_acquire() on the condition variable but employs
* the control dependency of the wait to reduce the barrier on many platforms.
*/
#ifndef smp_cond_load_acquire
#define smp_cond_load_acquire(ptr, cond_expr) ({ \
typeof(*ptr) _val; \
_val = smp_cond_load_relaxed(ptr, cond_expr); \
smp_acquire__after_ctrl_dep(); \
_val; \
})
#endif
#endif /* !__ASSEMBLY__ */ #endif /* !__ASSEMBLY__ */
#endif /* __ASM_GENERIC_BARRIER_H */ #endif /* __ASM_GENERIC_BARRIER_H */
...@@ -26,7 +26,6 @@ ...@@ -26,7 +26,6 @@
* @lock: Pointer to queued spinlock structure * @lock: Pointer to queued spinlock structure
* Return: 1 if it is locked, 0 otherwise * Return: 1 if it is locked, 0 otherwise
*/ */
#ifndef queued_spin_is_locked
static __always_inline int queued_spin_is_locked(struct qspinlock *lock) static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
{ {
/* /*
...@@ -35,7 +34,6 @@ static __always_inline int queued_spin_is_locked(struct qspinlock *lock) ...@@ -35,7 +34,6 @@ static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
*/ */
return atomic_read(&lock->val); return atomic_read(&lock->val);
} }
#endif
/** /**
* queued_spin_value_unlocked - is the spinlock structure unlocked? * queued_spin_value_unlocked - is the spinlock structure unlocked?
...@@ -100,7 +98,7 @@ static __always_inline void queued_spin_unlock(struct qspinlock *lock) ...@@ -100,7 +98,7 @@ static __always_inline void queued_spin_unlock(struct qspinlock *lock)
/* /*
* unlock() needs release semantics: * unlock() needs release semantics:
*/ */
(void)atomic_sub_return_release(_Q_LOCKED_VAL, &lock->val); smp_store_release(&lock->locked, 0);
} }
#endif #endif
......
...@@ -29,13 +29,41 @@ ...@@ -29,13 +29,41 @@
#endif #endif
typedef struct qspinlock { typedef struct qspinlock {
atomic_t val; union {
atomic_t val;
/*
* By using the whole 2nd least significant byte for the
* pending bit, we can allow better optimization of the lock
* acquisition for the pending bit holder.
*/
#ifdef __LITTLE_ENDIAN
struct {
u8 locked;
u8 pending;
};
struct {
u16 locked_pending;
u16 tail;
};
#else
struct {
u16 tail;
u16 locked_pending;
};
struct {
u8 reserved[2];
u8 pending;
u8 locked;
};
#endif
};
} arch_spinlock_t; } arch_spinlock_t;
/* /*
* Initializier * Initializier
*/ */
#define __ARCH_SPIN_LOCK_UNLOCKED { ATOMIC_INIT(0) } #define __ARCH_SPIN_LOCK_UNLOCKED { .val = ATOMIC_INIT(0) }
/* /*
* Bitfields in the atomic value: * Bitfields in the atomic value:
......
...@@ -654,6 +654,7 @@ static inline int atomic_dec_if_positive(atomic_t *v) ...@@ -654,6 +654,7 @@ static inline int atomic_dec_if_positive(atomic_t *v)
} }
#endif #endif
#define atomic_cond_read_relaxed(v, c) smp_cond_load_relaxed(&(v)->counter, (c))
#define atomic_cond_read_acquire(v, c) smp_cond_load_acquire(&(v)->counter, (c)) #define atomic_cond_read_acquire(v, c) smp_cond_load_acquire(&(v)->counter, (c))
#ifdef CONFIG_GENERIC_ATOMIC64 #ifdef CONFIG_GENERIC_ATOMIC64
...@@ -1075,6 +1076,7 @@ static inline long long atomic64_fetch_andnot_release(long long i, atomic64_t *v ...@@ -1075,6 +1076,7 @@ static inline long long atomic64_fetch_andnot_release(long long i, atomic64_t *v
} }
#endif #endif
#define atomic64_cond_read_relaxed(v, c) smp_cond_load_relaxed(&(v)->counter, (c))
#define atomic64_cond_read_acquire(v, c) smp_cond_load_acquire(&(v)->counter, (c)) #define atomic64_cond_read_acquire(v, c) smp_cond_load_acquire(&(v)->counter, (c))
#include <asm-generic/atomic-long.h> #include <asm-generic/atomic-long.h>
......
...@@ -29,7 +29,7 @@ ...@@ -29,7 +29,7 @@
#ifdef CONFIG_TASK_DELAY_ACCT #ifdef CONFIG_TASK_DELAY_ACCT
struct task_delay_info { struct task_delay_info {
spinlock_t lock; raw_spinlock_t lock;
unsigned int flags; /* Private per-task flags */ unsigned int flags; /* Private per-task flags */
/* For each stat XXX, add following, aligned appropriately /* For each stat XXX, add following, aligned appropriately
......
...@@ -146,9 +146,6 @@ extern void __mutex_init(struct mutex *lock, const char *name, ...@@ -146,9 +146,6 @@ extern void __mutex_init(struct mutex *lock, const char *name,
*/ */
static inline bool mutex_is_locked(struct mutex *lock) static inline bool mutex_is_locked(struct mutex *lock)
{ {
/*
* XXX think about spin_is_locked
*/
return __mutex_owner(lock) != NULL; return __mutex_owner(lock) != NULL;
} }
......
...@@ -380,6 +380,24 @@ static __always_inline int spin_trylock_irq(spinlock_t *lock) ...@@ -380,6 +380,24 @@ static __always_inline int spin_trylock_irq(spinlock_t *lock)
raw_spin_trylock_irqsave(spinlock_check(lock), flags); \ raw_spin_trylock_irqsave(spinlock_check(lock), flags); \
}) })
/**
* spin_is_locked() - Check whether a spinlock is locked.
* @lock: Pointer to the spinlock.
*
* This function is NOT required to provide any memory ordering
* guarantees; it could be used for debugging purposes or, when
* additional synchronization is needed, accompanied with other
* constructs (memory barriers) enforcing the synchronization.
*
* Returns: 1 if @lock is locked, 0 otherwise.
*
* Note that the function only tells you that the spinlock is
* seen to be locked, not that it is locked on your CPU.
*
* Further, on CONFIG_SMP=n builds with CONFIG_DEBUG_SPINLOCK=n,
* the return value is always 0 (see include/linux/spinlock_up.h).
* Therefore you should not rely heavily on the return value.
*/
static __always_inline int spin_is_locked(spinlock_t *lock) static __always_inline int spin_is_locked(spinlock_t *lock)
{ {
return raw_spin_is_locked(&lock->rlock); return raw_spin_is_locked(&lock->rlock);
......
...@@ -44,23 +44,24 @@ void __delayacct_tsk_init(struct task_struct *tsk) ...@@ -44,23 +44,24 @@ void __delayacct_tsk_init(struct task_struct *tsk)
{ {
tsk->delays = kmem_cache_zalloc(delayacct_cache, GFP_KERNEL); tsk->delays = kmem_cache_zalloc(delayacct_cache, GFP_KERNEL);
if (tsk->delays) if (tsk->delays)
spin_lock_init(&tsk->delays->lock); raw_spin_lock_init(&tsk->delays->lock);
} }
/* /*
* Finish delay accounting for a statistic using its timestamps (@start), * Finish delay accounting for a statistic using its timestamps (@start),
* accumalator (@total) and @count * accumalator (@total) and @count
*/ */
static void delayacct_end(spinlock_t *lock, u64 *start, u64 *total, u32 *count) static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total,
u32 *count)
{ {
s64 ns = ktime_get_ns() - *start; s64 ns = ktime_get_ns() - *start;
unsigned long flags; unsigned long flags;
if (ns > 0) { if (ns > 0) {
spin_lock_irqsave(lock, flags); raw_spin_lock_irqsave(lock, flags);
*total += ns; *total += ns;
(*count)++; (*count)++;
spin_unlock_irqrestore(lock, flags); raw_spin_unlock_irqrestore(lock, flags);
} }
} }
...@@ -127,7 +128,7 @@ int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk) ...@@ -127,7 +128,7 @@ int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)
/* zero XXX_total, non-zero XXX_count implies XXX stat overflowed */ /* zero XXX_total, non-zero XXX_count implies XXX stat overflowed */
spin_lock_irqsave(&tsk->delays->lock, flags); raw_spin_lock_irqsave(&tsk->delays->lock, flags);
tmp = d->blkio_delay_total + tsk->delays->blkio_delay; tmp = d->blkio_delay_total + tsk->delays->blkio_delay;
d->blkio_delay_total = (tmp < d->blkio_delay_total) ? 0 : tmp; d->blkio_delay_total = (tmp < d->blkio_delay_total) ? 0 : tmp;
tmp = d->swapin_delay_total + tsk->delays->swapin_delay; tmp = d->swapin_delay_total + tsk->delays->swapin_delay;
...@@ -137,7 +138,7 @@ int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk) ...@@ -137,7 +138,7 @@ int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)
d->blkio_count += tsk->delays->blkio_count; d->blkio_count += tsk->delays->blkio_count;
d->swapin_count += tsk->delays->swapin_count; d->swapin_count += tsk->delays->swapin_count;
d->freepages_count += tsk->delays->freepages_count; d->freepages_count += tsk->delays->freepages_count;
spin_unlock_irqrestore(&tsk->delays->lock, flags); raw_spin_unlock_irqrestore(&tsk->delays->lock, flags);
return 0; return 0;
} }
...@@ -147,10 +148,10 @@ __u64 __delayacct_blkio_ticks(struct task_struct *tsk) ...@@ -147,10 +148,10 @@ __u64 __delayacct_blkio_ticks(struct task_struct *tsk)
__u64 ret; __u64 ret;
unsigned long flags; unsigned long flags;
spin_lock_irqsave(&tsk->delays->lock, flags); raw_spin_lock_irqsave(&tsk->delays->lock, flags);
ret = nsec_to_clock_t(tsk->delays->blkio_delay + ret = nsec_to_clock_t(tsk->delays->blkio_delay +
tsk->delays->swapin_delay); tsk->delays->swapin_delay);
spin_unlock_irqrestore(&tsk->delays->lock, flags); raw_spin_unlock_irqrestore(&tsk->delays->lock, flags);
return ret; return ret;
} }
......
...@@ -561,20 +561,24 @@ static void print_lock(struct held_lock *hlock) ...@@ -561,20 +561,24 @@ static void print_lock(struct held_lock *hlock)
printk(KERN_CONT ", at: %pS\n", (void *)hlock->acquire_ip); printk(KERN_CONT ", at: %pS\n", (void *)hlock->acquire_ip);
} }
static void lockdep_print_held_locks(struct task_struct *curr) static void lockdep_print_held_locks(struct task_struct *p)
{ {
int i, depth = curr->lockdep_depth; int i, depth = READ_ONCE(p->lockdep_depth);
if (!depth) { if (!depth)
printk("no locks held by %s/%d.\n", curr->comm, task_pid_nr(curr)); printk("no locks held by %s/%d.\n", p->comm, task_pid_nr(p));
else
printk("%d lock%s held by %s/%d:\n", depth,
depth > 1 ? "s" : "", p->comm, task_pid_nr(p));
/*
* It's not reliable to print a task's held locks if it's not sleeping
* and it's not the current task.
*/
if (p->state == TASK_RUNNING && p != current)
return; return;
}
printk("%d lock%s held by %s/%d:\n",
depth, depth > 1 ? "s" : "", curr->comm, task_pid_nr(curr));
for (i = 0; i < depth; i++) { for (i = 0; i < depth; i++) {
printk(" #%d: ", i); printk(" #%d: ", i);
print_lock(curr->held_locks + i); print_lock(p->held_locks + i);
} }
} }
...@@ -4451,8 +4455,6 @@ EXPORT_SYMBOL_GPL(debug_check_no_locks_held); ...@@ -4451,8 +4455,6 @@ EXPORT_SYMBOL_GPL(debug_check_no_locks_held);
void debug_show_all_locks(void) void debug_show_all_locks(void)
{ {
struct task_struct *g, *p; struct task_struct *g, *p;
int count = 10;
int unlock = 1;
if (unlikely(!debug_locks)) { if (unlikely(!debug_locks)) {
pr_warn("INFO: lockdep is turned off.\n"); pr_warn("INFO: lockdep is turned off.\n");
...@@ -4460,50 +4462,18 @@ void debug_show_all_locks(void) ...@@ -4460,50 +4462,18 @@ void debug_show_all_locks(void)
} }
pr_warn("\nShowing all locks held in the system:\n"); pr_warn("\nShowing all locks held in the system:\n");
/* rcu_read_lock();
* Here we try to get the tasklist_lock as hard as possible, for_each_process_thread(g, p) {
* if not successful after 2 seconds we ignore it (but keep if (!p->lockdep_depth)
* trying). This is to enable a debug printout even if a
* tasklist_lock-holding task deadlocks or crashes.
*/
retry:
if (!read_trylock(&tasklist_lock)) {
if (count == 10)
pr_warn("hm, tasklist_lock locked, retrying... ");
if (count) {
count--;
pr_cont(" #%d", 10-count);
mdelay(200);
goto retry;
}
pr_cont(" ignoring it.\n");
unlock = 0;
} else {
if (count != 10)
pr_cont(" locked it.\n");
}
do_each_thread(g, p) {
/*
* It's not reliable to print a task's held locks
* if it's not sleeping (or if it's not the current
* task):
*/
if (p->state == TASK_RUNNING && p != current)
continue; continue;
if (p->lockdep_depth) lockdep_print_held_locks(p);
lockdep_print_held_locks(p);
if (!unlock)
if (read_trylock(&tasklist_lock))
unlock = 1;
touch_nmi_watchdog(); touch_nmi_watchdog();
} while_each_thread(g, p); touch_all_softlockup_watchdogs();
}
rcu_read_unlock();
pr_warn("\n"); pr_warn("\n");
pr_warn("=============================================\n\n"); pr_warn("=============================================\n\n");
if (unlock)
read_unlock(&tasklist_lock);
} }
EXPORT_SYMBOL_GPL(debug_show_all_locks); EXPORT_SYMBOL_GPL(debug_show_all_locks);
#endif #endif
......
...@@ -23,13 +23,15 @@ struct mcs_spinlock { ...@@ -23,13 +23,15 @@ struct mcs_spinlock {
#ifndef arch_mcs_spin_lock_contended #ifndef arch_mcs_spin_lock_contended
/* /*
* Using smp_load_acquire() provides a memory barrier that ensures * Using smp_cond_load_acquire() provides the acquire semantics
* subsequent operations happen after the lock is acquired. * required so that subsequent operations happen after the
* lock is acquired. Additionally, some architectures such as
* ARM64 would like to do spin-waiting instead of purely
* spinning, and smp_cond_load_acquire() provides that behavior.
*/ */
#define arch_mcs_spin_lock_contended(l) \ #define arch_mcs_spin_lock_contended(l) \
do { \ do { \
while (!(smp_load_acquire(l))) \ smp_cond_load_acquire(l, VAL); \
cpu_relax(); \
} while (0) } while (0)
#endif #endif
......
...@@ -139,8 +139,9 @@ static inline bool __mutex_trylock(struct mutex *lock) ...@@ -139,8 +139,9 @@ static inline bool __mutex_trylock(struct mutex *lock)
static __always_inline bool __mutex_trylock_fast(struct mutex *lock) static __always_inline bool __mutex_trylock_fast(struct mutex *lock)
{ {
unsigned long curr = (unsigned long)current; unsigned long curr = (unsigned long)current;
unsigned long zero = 0UL;
if (!atomic_long_cmpxchg_acquire(&lock->owner, 0UL, curr)) if (atomic_long_try_cmpxchg_acquire(&lock->owner, &zero, curr))
return true; return true;
return false; return false;
......
...@@ -12,11 +12,11 @@ ...@@ -12,11 +12,11 @@
* GNU General Public License for more details. * GNU General Public License for more details.
* *
* (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P. * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P.
* (C) Copyright 2013-2014 Red Hat, Inc. * (C) Copyright 2013-2014,2018 Red Hat, Inc.
* (C) Copyright 2015 Intel Corp. * (C) Copyright 2015 Intel Corp.
* (C) Copyright 2015 Hewlett-Packard Enterprise Development LP * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
* *
* Authors: Waiman Long <waiman.long@hpe.com> * Authors: Waiman Long <longman@redhat.com>
* Peter Zijlstra <peterz@infradead.org> * Peter Zijlstra <peterz@infradead.org>
*/ */
...@@ -32,6 +32,11 @@ ...@@ -32,6 +32,11 @@
#include <asm/byteorder.h> #include <asm/byteorder.h>
#include <asm/qspinlock.h> #include <asm/qspinlock.h>
/*
* Include queued spinlock statistics code
*/
#include "qspinlock_stat.h"
/* /*
* The basic principle of a queue-based spinlock can best be understood * The basic principle of a queue-based spinlock can best be understood
* by studying a classic queue-based spinlock implementation called the * by studying a classic queue-based spinlock implementation called the
...@@ -76,6 +81,18 @@ ...@@ -76,6 +81,18 @@
#define MAX_NODES 4 #define MAX_NODES 4
#endif #endif
/*
* The pending bit spinning loop count.
* This heuristic is used to limit the number of lockword accesses
* made by atomic_cond_read_relaxed when waiting for the lock to
* transition out of the "== _Q_PENDING_VAL" state. We don't spin
* indefinitely because there's no guarantee that we'll make forward
* progress.
*/
#ifndef _Q_PENDING_LOOPS
#define _Q_PENDING_LOOPS 1
#endif
/* /*
* Per-CPU queue node structures; we can never have more than 4 nested * Per-CPU queue node structures; we can never have more than 4 nested
* contexts: task, softirq, hardirq, nmi. * contexts: task, softirq, hardirq, nmi.
...@@ -114,41 +131,18 @@ static inline __pure struct mcs_spinlock *decode_tail(u32 tail) ...@@ -114,41 +131,18 @@ static inline __pure struct mcs_spinlock *decode_tail(u32 tail)
#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) #define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
/* #if _Q_PENDING_BITS == 8
* By using the whole 2nd least significant byte for the pending bit, we /**
* can allow better optimization of the lock acquisition for the pending * clear_pending - clear the pending bit.
* bit holder. * @lock: Pointer to queued spinlock structure
* *
* This internal structure is also used by the set_locked function which * *,1,* -> *,0,*
* is not restricted to _Q_PENDING_BITS == 8.
*/ */
struct __qspinlock { static __always_inline void clear_pending(struct qspinlock *lock)
union { {
atomic_t val; WRITE_ONCE(lock->pending, 0);
#ifdef __LITTLE_ENDIAN }
struct {
u8 locked;
u8 pending;
};
struct {
u16 locked_pending;
u16 tail;
};
#else
struct {
u16 tail;
u16 locked_pending;
};
struct {
u8 reserved[2];
u8 pending;
u8 locked;
};
#endif
};
};
#if _Q_PENDING_BITS == 8
/** /**
* clear_pending_set_locked - take ownership and clear the pending bit. * clear_pending_set_locked - take ownership and clear the pending bit.
* @lock: Pointer to queued spinlock structure * @lock: Pointer to queued spinlock structure
...@@ -159,9 +153,7 @@ struct __qspinlock { ...@@ -159,9 +153,7 @@ struct __qspinlock {
*/ */
static __always_inline void clear_pending_set_locked(struct qspinlock *lock) static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
{ {
struct __qspinlock *l = (void *)lock; WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL);
WRITE_ONCE(l->locked_pending, _Q_LOCKED_VAL);
} }
/* /*
...@@ -176,18 +168,27 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock) ...@@ -176,18 +168,27 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
*/ */
static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
{ {
struct __qspinlock *l = (void *)lock;
/* /*
* Use release semantics to make sure that the MCS node is properly * We can use relaxed semantics since the caller ensures that the
* initialized before changing the tail code. * MCS node is properly initialized before updating the tail.
*/ */
return (u32)xchg_release(&l->tail, return (u32)xchg_relaxed(&lock->tail,
tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
} }
#else /* _Q_PENDING_BITS == 8 */ #else /* _Q_PENDING_BITS == 8 */
/**
* clear_pending - clear the pending bit.
* @lock: Pointer to queued spinlock structure
*
* *,1,* -> *,0,*
*/
static __always_inline void clear_pending(struct qspinlock *lock)
{
atomic_andnot(_Q_PENDING_VAL, &lock->val);
}
/** /**
* clear_pending_set_locked - take ownership and clear the pending bit. * clear_pending_set_locked - take ownership and clear the pending bit.
* @lock: Pointer to queued spinlock structure * @lock: Pointer to queued spinlock structure
...@@ -216,10 +217,11 @@ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) ...@@ -216,10 +217,11 @@ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
for (;;) { for (;;) {
new = (val & _Q_LOCKED_PENDING_MASK) | tail; new = (val & _Q_LOCKED_PENDING_MASK) | tail;
/* /*
* Use release semantics to make sure that the MCS node is * We can use relaxed semantics since the caller ensures that
* properly initialized before changing the tail code. * the MCS node is properly initialized before updating the
* tail.
*/ */
old = atomic_cmpxchg_release(&lock->val, val, new); old = atomic_cmpxchg_relaxed(&lock->val, val, new);
if (old == val) if (old == val)
break; break;
...@@ -237,9 +239,7 @@ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) ...@@ -237,9 +239,7 @@ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
*/ */
static __always_inline void set_locked(struct qspinlock *lock) static __always_inline void set_locked(struct qspinlock *lock)
{ {
struct __qspinlock *l = (void *)lock; WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
WRITE_ONCE(l->locked, _Q_LOCKED_VAL);
} }
...@@ -294,86 +294,83 @@ static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock, ...@@ -294,86 +294,83 @@ static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock,
void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
{ {
struct mcs_spinlock *prev, *next, *node; struct mcs_spinlock *prev, *next, *node;
u32 new, old, tail; u32 old, tail;
int idx; int idx;
BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
if (pv_enabled()) if (pv_enabled())
goto queue; goto pv_queue;
if (virt_spin_lock(lock)) if (virt_spin_lock(lock))
return; return;
/* /*
* wait for in-progress pending->locked hand-overs * Wait for in-progress pending->locked hand-overs with a bounded
* number of spins so that we guarantee forward progress.
* *
* 0,1,0 -> 0,0,1 * 0,1,0 -> 0,0,1
*/ */
if (val == _Q_PENDING_VAL) { if (val == _Q_PENDING_VAL) {
while ((val = atomic_read(&lock->val)) == _Q_PENDING_VAL) int cnt = _Q_PENDING_LOOPS;
cpu_relax(); val = atomic_cond_read_relaxed(&lock->val,
(VAL != _Q_PENDING_VAL) || !cnt--);
} }
/*
* If we observe any contention; queue.
*/
if (val & ~_Q_LOCKED_MASK)
goto queue;
/* /*
* trylock || pending * trylock || pending
* *
* 0,0,0 -> 0,0,1 ; trylock * 0,0,0 -> 0,0,1 ; trylock
* 0,0,1 -> 0,1,1 ; pending * 0,0,1 -> 0,1,1 ; pending
*/ */
for (;;) { val = atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val);
if (!(val & ~_Q_LOCKED_MASK)) {
/* /*
* If we observe any contention; queue. * We're pending, wait for the owner to go away.
*
* *,1,1 -> *,1,0
*
* this wait loop must be a load-acquire such that we match the
* store-release that clears the locked bit and create lock
* sequentiality; this is because not all
* clear_pending_set_locked() implementations imply full
* barriers.
*/ */
if (val & ~_Q_LOCKED_MASK) if (val & _Q_LOCKED_MASK) {
goto queue; atomic_cond_read_acquire(&lock->val,
!(VAL & _Q_LOCKED_MASK));
new = _Q_LOCKED_VAL; }
if (val == new)
new |= _Q_PENDING_VAL;
/* /*
* Acquire semantic is required here as the function may * take ownership and clear the pending bit.
* return immediately if the lock was free. *
* *,1,0 -> *,0,1
*/ */
old = atomic_cmpxchg_acquire(&lock->val, val, new); clear_pending_set_locked(lock);
if (old == val) qstat_inc(qstat_lock_pending, true);
break;
val = old;
}
/*
* we won the trylock
*/
if (new == _Q_LOCKED_VAL)
return; return;
}
/* /*
* we're pending, wait for the owner to go away. * If pending was clear but there are waiters in the queue, then
* * we need to undo our setting of pending before we queue ourselves.
* *,1,1 -> *,1,0
*
* this wait loop must be a load-acquire such that we match the
* store-release that clears the locked bit and create lock
* sequentiality; this is because not all clear_pending_set_locked()
* implementations imply full barriers.
*/
smp_cond_load_acquire(&lock->val.counter, !(VAL & _Q_LOCKED_MASK));
/*
* take ownership and clear the pending bit.
*
* *,1,0 -> *,0,1
*/ */
clear_pending_set_locked(lock); if (!(val & _Q_PENDING_MASK))
return; clear_pending(lock);
/* /*
* End of pending bit optimistic spinning and beginning of MCS * End of pending bit optimistic spinning and beginning of MCS
* queuing. * queuing.
*/ */
queue: queue:
qstat_inc(qstat_lock_slowpath, true);
pv_queue:
node = this_cpu_ptr(&mcs_nodes[0]); node = this_cpu_ptr(&mcs_nodes[0]);
idx = node->count++; idx = node->count++;
tail = encode_tail(smp_processor_id(), idx); tail = encode_tail(smp_processor_id(), idx);
...@@ -400,12 +397,18 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) ...@@ -400,12 +397,18 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
goto release; goto release;
/* /*
* Ensure that the initialisation of @node is complete before we
* publish the updated tail via xchg_tail() and potentially link
* @node into the waitqueue via WRITE_ONCE(prev->next, node) below.
*/
smp_wmb();
/*
* Publish the updated tail.
* We have already touched the queueing cacheline; don't bother with * We have already touched the queueing cacheline; don't bother with
* pending stuff. * pending stuff.
* *
* p,*,* -> n,*,* * p,*,* -> n,*,*
*
* RELEASE, such that the stores to @node must be complete.
*/ */
old = xchg_tail(lock, tail); old = xchg_tail(lock, tail);
next = NULL; next = NULL;
...@@ -417,14 +420,8 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) ...@@ -417,14 +420,8 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
if (old & _Q_TAIL_MASK) { if (old & _Q_TAIL_MASK) {
prev = decode_tail(old); prev = decode_tail(old);
/* /* Link @node into the waitqueue. */
* We must ensure that the stores to @node are observed before WRITE_ONCE(prev->next, node);
* the write to prev->next. The address dependency from
* xchg_tail is not sufficient to ensure this because the read
* component of xchg_tail is unordered with respect to the
* initialisation of @node.
*/
smp_store_release(&prev->next, node);
pv_wait_node(node, prev); pv_wait_node(node, prev);
arch_mcs_spin_lock_contended(&node->locked); arch_mcs_spin_lock_contended(&node->locked);
...@@ -453,8 +450,8 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) ...@@ -453,8 +450,8 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
* *
* The PV pv_wait_head_or_lock function, if active, will acquire * The PV pv_wait_head_or_lock function, if active, will acquire
* the lock and return a non-zero value. So we have to skip the * the lock and return a non-zero value. So we have to skip the
* smp_cond_load_acquire() call. As the next PV queue head hasn't been * atomic_cond_read_acquire() call. As the next PV queue head hasn't
* designated yet, there is no way for the locked value to become * been designated yet, there is no way for the locked value to become
* _Q_SLOW_VAL. So both the set_locked() and the * _Q_SLOW_VAL. So both the set_locked() and the
* atomic_cmpxchg_relaxed() calls will be safe. * atomic_cmpxchg_relaxed() calls will be safe.
* *
...@@ -464,44 +461,38 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) ...@@ -464,44 +461,38 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
if ((val = pv_wait_head_or_lock(lock, node))) if ((val = pv_wait_head_or_lock(lock, node)))
goto locked; goto locked;
val = smp_cond_load_acquire(&lock->val.counter, !(VAL & _Q_LOCKED_PENDING_MASK)); val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
locked: locked:
/* /*
* claim the lock: * claim the lock:
* *
* n,0,0 -> 0,0,1 : lock, uncontended * n,0,0 -> 0,0,1 : lock, uncontended
* *,0,0 -> *,0,1 : lock, contended * *,*,0 -> *,*,1 : lock, contended
* *
* If the queue head is the only one in the queue (lock value == tail), * If the queue head is the only one in the queue (lock value == tail)
* clear the tail code and grab the lock. Otherwise, we only need * and nobody is pending, clear the tail code and grab the lock.
* to grab the lock. * Otherwise, we only need to grab the lock.
*/ */
for (;;) {
/* In the PV case we might already have _Q_LOCKED_VAL set */
if ((val & _Q_TAIL_MASK) != tail) {
set_locked(lock);
break;
}
/*
* The smp_cond_load_acquire() call above has provided the
* necessary acquire semantics required for locking. At most
* two iterations of this loop may be ran.
*/
old = atomic_cmpxchg_relaxed(&lock->val, val, _Q_LOCKED_VAL);
if (old == val)
goto release; /* No contention */
val = old; /*
} * In the PV case we might already have _Q_LOCKED_VAL set.
*
* The atomic_cond_read_acquire() call above has provided the
* necessary acquire semantics required for locking.
*/
if (((val & _Q_TAIL_MASK) == tail) &&
atomic_try_cmpxchg_relaxed(&lock->val, &val, _Q_LOCKED_VAL))
goto release; /* No contention */
/* Either somebody is queued behind us or _Q_PENDING_VAL is set */
set_locked(lock);
/* /*
* contended path; wait for next if not observed yet, release. * contended path; wait for next if not observed yet, release.
*/ */
if (!next) { if (!next)
while (!(next = READ_ONCE(node->next))) next = smp_cond_load_relaxed(&node->next, (VAL));
cpu_relax();
}
arch_mcs_spin_unlock_contended(&next->locked); arch_mcs_spin_unlock_contended(&next->locked);
pv_kick_node(lock, next); pv_kick_node(lock, next);
......
...@@ -55,11 +55,6 @@ struct pv_node { ...@@ -55,11 +55,6 @@ struct pv_node {
u8 state; u8 state;
}; };
/*
* Include queued spinlock statistics code
*/
#include "qspinlock_stat.h"
/* /*
* Hybrid PV queued/unfair lock * Hybrid PV queued/unfair lock
* *
...@@ -87,8 +82,6 @@ struct pv_node { ...@@ -87,8 +82,6 @@ struct pv_node {
#define queued_spin_trylock(l) pv_hybrid_queued_unfair_trylock(l) #define queued_spin_trylock(l) pv_hybrid_queued_unfair_trylock(l)
static inline bool pv_hybrid_queued_unfair_trylock(struct qspinlock *lock) static inline bool pv_hybrid_queued_unfair_trylock(struct qspinlock *lock)
{ {
struct __qspinlock *l = (void *)lock;
/* /*
* Stay in unfair lock mode as long as queued mode waiters are * Stay in unfair lock mode as long as queued mode waiters are
* present in the MCS wait queue but the pending bit isn't set. * present in the MCS wait queue but the pending bit isn't set.
...@@ -97,7 +90,7 @@ static inline bool pv_hybrid_queued_unfair_trylock(struct qspinlock *lock) ...@@ -97,7 +90,7 @@ static inline bool pv_hybrid_queued_unfair_trylock(struct qspinlock *lock)
int val = atomic_read(&lock->val); int val = atomic_read(&lock->val);
if (!(val & _Q_LOCKED_PENDING_MASK) && if (!(val & _Q_LOCKED_PENDING_MASK) &&
(cmpxchg_acquire(&l->locked, 0, _Q_LOCKED_VAL) == 0)) { (cmpxchg_acquire(&lock->locked, 0, _Q_LOCKED_VAL) == 0)) {
qstat_inc(qstat_pv_lock_stealing, true); qstat_inc(qstat_pv_lock_stealing, true);
return true; return true;
} }
...@@ -117,16 +110,7 @@ static inline bool pv_hybrid_queued_unfair_trylock(struct qspinlock *lock) ...@@ -117,16 +110,7 @@ static inline bool pv_hybrid_queued_unfair_trylock(struct qspinlock *lock)
#if _Q_PENDING_BITS == 8 #if _Q_PENDING_BITS == 8
static __always_inline void set_pending(struct qspinlock *lock) static __always_inline void set_pending(struct qspinlock *lock)
{ {
struct __qspinlock *l = (void *)lock; WRITE_ONCE(lock->pending, 1);
WRITE_ONCE(l->pending, 1);
}
static __always_inline void clear_pending(struct qspinlock *lock)
{
struct __qspinlock *l = (void *)lock;
WRITE_ONCE(l->pending, 0);
} }
/* /*
...@@ -136,10 +120,8 @@ static __always_inline void clear_pending(struct qspinlock *lock) ...@@ -136,10 +120,8 @@ static __always_inline void clear_pending(struct qspinlock *lock)
*/ */
static __always_inline int trylock_clear_pending(struct qspinlock *lock) static __always_inline int trylock_clear_pending(struct qspinlock *lock)
{ {
struct __qspinlock *l = (void *)lock; return !READ_ONCE(lock->locked) &&
(cmpxchg_acquire(&lock->locked_pending, _Q_PENDING_VAL,
return !READ_ONCE(l->locked) &&
(cmpxchg_acquire(&l->locked_pending, _Q_PENDING_VAL,
_Q_LOCKED_VAL) == _Q_PENDING_VAL); _Q_LOCKED_VAL) == _Q_PENDING_VAL);
} }
#else /* _Q_PENDING_BITS == 8 */ #else /* _Q_PENDING_BITS == 8 */
...@@ -148,11 +130,6 @@ static __always_inline void set_pending(struct qspinlock *lock) ...@@ -148,11 +130,6 @@ static __always_inline void set_pending(struct qspinlock *lock)
atomic_or(_Q_PENDING_VAL, &lock->val); atomic_or(_Q_PENDING_VAL, &lock->val);
} }
static __always_inline void clear_pending(struct qspinlock *lock)
{
atomic_andnot(_Q_PENDING_VAL, &lock->val);
}
static __always_inline int trylock_clear_pending(struct qspinlock *lock) static __always_inline int trylock_clear_pending(struct qspinlock *lock)
{ {
int val = atomic_read(&lock->val); int val = atomic_read(&lock->val);
...@@ -384,7 +361,6 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev) ...@@ -384,7 +361,6 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node) static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
{ {
struct pv_node *pn = (struct pv_node *)node; struct pv_node *pn = (struct pv_node *)node;
struct __qspinlock *l = (void *)lock;
/* /*
* If the vCPU is indeed halted, advance its state to match that of * If the vCPU is indeed halted, advance its state to match that of
...@@ -413,7 +389,7 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node) ...@@ -413,7 +389,7 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
* the hash table later on at unlock time, no atomic instruction is * the hash table later on at unlock time, no atomic instruction is
* needed. * needed.
*/ */
WRITE_ONCE(l->locked, _Q_SLOW_VAL); WRITE_ONCE(lock->locked, _Q_SLOW_VAL);
(void)pv_hash(lock, pn); (void)pv_hash(lock, pn);
} }
...@@ -428,7 +404,6 @@ static u32 ...@@ -428,7 +404,6 @@ static u32
pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node) pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
{ {
struct pv_node *pn = (struct pv_node *)node; struct pv_node *pn = (struct pv_node *)node;
struct __qspinlock *l = (void *)lock;
struct qspinlock **lp = NULL; struct qspinlock **lp = NULL;
int waitcnt = 0; int waitcnt = 0;
int loop; int loop;
...@@ -443,7 +418,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node) ...@@ -443,7 +418,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
/* /*
* Tracking # of slowpath locking operations * Tracking # of slowpath locking operations
*/ */
qstat_inc(qstat_pv_lock_slowpath, true); qstat_inc(qstat_lock_slowpath, true);
for (;; waitcnt++) { for (;; waitcnt++) {
/* /*
...@@ -479,13 +454,13 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node) ...@@ -479,13 +454,13 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
* *
* Matches the smp_rmb() in __pv_queued_spin_unlock(). * Matches the smp_rmb() in __pv_queued_spin_unlock().
*/ */
if (xchg(&l->locked, _Q_SLOW_VAL) == 0) { if (xchg(&lock->locked, _Q_SLOW_VAL) == 0) {
/* /*
* The lock was free and now we own the lock. * The lock was free and now we own the lock.
* Change the lock value back to _Q_LOCKED_VAL * Change the lock value back to _Q_LOCKED_VAL
* and unhash the table. * and unhash the table.
*/ */
WRITE_ONCE(l->locked, _Q_LOCKED_VAL); WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
WRITE_ONCE(*lp, NULL); WRITE_ONCE(*lp, NULL);
goto gotlock; goto gotlock;
} }
...@@ -493,7 +468,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node) ...@@ -493,7 +468,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
WRITE_ONCE(pn->state, vcpu_hashed); WRITE_ONCE(pn->state, vcpu_hashed);
qstat_inc(qstat_pv_wait_head, true); qstat_inc(qstat_pv_wait_head, true);
qstat_inc(qstat_pv_wait_again, waitcnt); qstat_inc(qstat_pv_wait_again, waitcnt);
pv_wait(&l->locked, _Q_SLOW_VAL); pv_wait(&lock->locked, _Q_SLOW_VAL);
/* /*
* Because of lock stealing, the queue head vCPU may not be * Because of lock stealing, the queue head vCPU may not be
...@@ -518,7 +493,6 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node) ...@@ -518,7 +493,6 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
__visible void __visible void
__pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked) __pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
{ {
struct __qspinlock *l = (void *)lock;
struct pv_node *node; struct pv_node *node;
if (unlikely(locked != _Q_SLOW_VAL)) { if (unlikely(locked != _Q_SLOW_VAL)) {
...@@ -547,7 +521,7 @@ __pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked) ...@@ -547,7 +521,7 @@ __pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
* Now that we have a reference to the (likely) blocked pv_node, * Now that we have a reference to the (likely) blocked pv_node,
* release the lock. * release the lock.
*/ */
smp_store_release(&l->locked, 0); smp_store_release(&lock->locked, 0);
/* /*
* At this point the memory pointed at by lock can be freed/reused, * At this point the memory pointed at by lock can be freed/reused,
...@@ -573,7 +547,6 @@ __pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked) ...@@ -573,7 +547,6 @@ __pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
#ifndef __pv_queued_spin_unlock #ifndef __pv_queued_spin_unlock
__visible void __pv_queued_spin_unlock(struct qspinlock *lock) __visible void __pv_queued_spin_unlock(struct qspinlock *lock)
{ {
struct __qspinlock *l = (void *)lock;
u8 locked; u8 locked;
/* /*
...@@ -581,7 +554,7 @@ __visible void __pv_queued_spin_unlock(struct qspinlock *lock) ...@@ -581,7 +554,7 @@ __visible void __pv_queued_spin_unlock(struct qspinlock *lock)
* unhash. Otherwise it would be possible to have multiple @lock * unhash. Otherwise it would be possible to have multiple @lock
* entries, which would be BAD. * entries, which would be BAD.
*/ */
locked = cmpxchg_release(&l->locked, _Q_LOCKED_VAL, 0); locked = cmpxchg_release(&lock->locked, _Q_LOCKED_VAL, 0);
if (likely(locked == _Q_LOCKED_VAL)) if (likely(locked == _Q_LOCKED_VAL))
return; return;
......
...@@ -22,13 +22,14 @@ ...@@ -22,13 +22,14 @@
* pv_kick_wake - # of vCPU kicks used for computing pv_latency_wake * pv_kick_wake - # of vCPU kicks used for computing pv_latency_wake
* pv_latency_kick - average latency (ns) of vCPU kick operation * pv_latency_kick - average latency (ns) of vCPU kick operation
* pv_latency_wake - average latency (ns) from vCPU kick to wakeup * pv_latency_wake - average latency (ns) from vCPU kick to wakeup
* pv_lock_slowpath - # of locking operations via the slowpath
* pv_lock_stealing - # of lock stealing operations * pv_lock_stealing - # of lock stealing operations
* pv_spurious_wakeup - # of spurious wakeups in non-head vCPUs * pv_spurious_wakeup - # of spurious wakeups in non-head vCPUs
* pv_wait_again - # of wait's after a queue head vCPU kick * pv_wait_again - # of wait's after a queue head vCPU kick
* pv_wait_early - # of early vCPU wait's * pv_wait_early - # of early vCPU wait's
* pv_wait_head - # of vCPU wait's at the queue head * pv_wait_head - # of vCPU wait's at the queue head
* pv_wait_node - # of vCPU wait's at a non-head queue node * pv_wait_node - # of vCPU wait's at a non-head queue node
* lock_pending - # of locking operations via pending code
* lock_slowpath - # of locking operations via MCS lock queue
* *
* Writing to the "reset_counters" file will reset all the above counter * Writing to the "reset_counters" file will reset all the above counter
* values. * values.
...@@ -46,13 +47,14 @@ enum qlock_stats { ...@@ -46,13 +47,14 @@ enum qlock_stats {
qstat_pv_kick_wake, qstat_pv_kick_wake,
qstat_pv_latency_kick, qstat_pv_latency_kick,
qstat_pv_latency_wake, qstat_pv_latency_wake,
qstat_pv_lock_slowpath,
qstat_pv_lock_stealing, qstat_pv_lock_stealing,
qstat_pv_spurious_wakeup, qstat_pv_spurious_wakeup,
qstat_pv_wait_again, qstat_pv_wait_again,
qstat_pv_wait_early, qstat_pv_wait_early,
qstat_pv_wait_head, qstat_pv_wait_head,
qstat_pv_wait_node, qstat_pv_wait_node,
qstat_lock_pending,
qstat_lock_slowpath,
qstat_num, /* Total number of statistical counters */ qstat_num, /* Total number of statistical counters */
qstat_reset_cnts = qstat_num, qstat_reset_cnts = qstat_num,
}; };
...@@ -73,12 +75,13 @@ static const char * const qstat_names[qstat_num + 1] = { ...@@ -73,12 +75,13 @@ static const char * const qstat_names[qstat_num + 1] = {
[qstat_pv_spurious_wakeup] = "pv_spurious_wakeup", [qstat_pv_spurious_wakeup] = "pv_spurious_wakeup",
[qstat_pv_latency_kick] = "pv_latency_kick", [qstat_pv_latency_kick] = "pv_latency_kick",
[qstat_pv_latency_wake] = "pv_latency_wake", [qstat_pv_latency_wake] = "pv_latency_wake",
[qstat_pv_lock_slowpath] = "pv_lock_slowpath",
[qstat_pv_lock_stealing] = "pv_lock_stealing", [qstat_pv_lock_stealing] = "pv_lock_stealing",
[qstat_pv_wait_again] = "pv_wait_again", [qstat_pv_wait_again] = "pv_wait_again",
[qstat_pv_wait_early] = "pv_wait_early", [qstat_pv_wait_early] = "pv_wait_early",
[qstat_pv_wait_head] = "pv_wait_head", [qstat_pv_wait_head] = "pv_wait_head",
[qstat_pv_wait_node] = "pv_wait_node", [qstat_pv_wait_node] = "pv_wait_node",
[qstat_lock_pending] = "lock_pending",
[qstat_lock_slowpath] = "lock_slowpath",
[qstat_reset_cnts] = "reset_counters", [qstat_reset_cnts] = "reset_counters",
}; };
......
...@@ -347,6 +347,15 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem) ...@@ -347,6 +347,15 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
} }
} }
static inline bool owner_on_cpu(struct task_struct *owner)
{
/*
* As lock holder preemption issue, we both skip spinning if
* task is not on cpu or its cpu is preempted
*/
return owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
}
static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
{ {
struct task_struct *owner; struct task_struct *owner;
...@@ -359,17 +368,10 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) ...@@ -359,17 +368,10 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
rcu_read_lock(); rcu_read_lock();
owner = READ_ONCE(sem->owner); owner = READ_ONCE(sem->owner);
if (!owner || !is_rwsem_owner_spinnable(owner)) { if (owner) {
ret = !owner; /* !owner is spinnable */ ret = is_rwsem_owner_spinnable(owner) &&
goto done; owner_on_cpu(owner);
} }
/*
* As lock holder preemption issue, we both skip spinning if task is not
* on cpu or its cpu is preempted
*/
ret = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
done:
rcu_read_unlock(); rcu_read_unlock();
return ret; return ret;
} }
...@@ -398,8 +400,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem) ...@@ -398,8 +400,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
* abort spinning when need_resched or owner is not running or * abort spinning when need_resched or owner is not running or
* owner's cpu is preempted. * owner's cpu is preempted.
*/ */
if (!owner->on_cpu || need_resched() || if (need_resched() || !owner_on_cpu(owner)) {
vcpu_is_preempted(task_cpu(owner))) {
rcu_read_unlock(); rcu_read_unlock();
return false; return false;
} }
......
...@@ -37,7 +37,7 @@ struct cpu_stop_done { ...@@ -37,7 +37,7 @@ struct cpu_stop_done {
struct cpu_stopper { struct cpu_stopper {
struct task_struct *thread; struct task_struct *thread;
spinlock_t lock; raw_spinlock_t lock;
bool enabled; /* is this stopper enabled? */ bool enabled; /* is this stopper enabled? */
struct list_head works; /* list of pending works */ struct list_head works; /* list of pending works */
...@@ -81,13 +81,13 @@ static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work) ...@@ -81,13 +81,13 @@ static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work)
unsigned long flags; unsigned long flags;
bool enabled; bool enabled;
spin_lock_irqsave(&stopper->lock, flags); raw_spin_lock_irqsave(&stopper->lock, flags);
enabled = stopper->enabled; enabled = stopper->enabled;
if (enabled) if (enabled)
__cpu_stop_queue_work(stopper, work, &wakeq); __cpu_stop_queue_work(stopper, work, &wakeq);
else if (work->done) else if (work->done)
cpu_stop_signal_done(work->done); cpu_stop_signal_done(work->done);
spin_unlock_irqrestore(&stopper->lock, flags); raw_spin_unlock_irqrestore(&stopper->lock, flags);
wake_up_q(&wakeq); wake_up_q(&wakeq);
...@@ -237,8 +237,8 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, ...@@ -237,8 +237,8 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
DEFINE_WAKE_Q(wakeq); DEFINE_WAKE_Q(wakeq);
int err; int err;
retry: retry:
spin_lock_irq(&stopper1->lock); raw_spin_lock_irq(&stopper1->lock);
spin_lock_nested(&stopper2->lock, SINGLE_DEPTH_NESTING); raw_spin_lock_nested(&stopper2->lock, SINGLE_DEPTH_NESTING);
err = -ENOENT; err = -ENOENT;
if (!stopper1->enabled || !stopper2->enabled) if (!stopper1->enabled || !stopper2->enabled)
...@@ -261,8 +261,8 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, ...@@ -261,8 +261,8 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
__cpu_stop_queue_work(stopper1, work1, &wakeq); __cpu_stop_queue_work(stopper1, work1, &wakeq);
__cpu_stop_queue_work(stopper2, work2, &wakeq); __cpu_stop_queue_work(stopper2, work2, &wakeq);
unlock: unlock:
spin_unlock(&stopper2->lock); raw_spin_unlock(&stopper2->lock);
spin_unlock_irq(&stopper1->lock); raw_spin_unlock_irq(&stopper1->lock);
if (unlikely(err == -EDEADLK)) { if (unlikely(err == -EDEADLK)) {
while (stop_cpus_in_progress) while (stop_cpus_in_progress)
...@@ -457,9 +457,9 @@ static int cpu_stop_should_run(unsigned int cpu) ...@@ -457,9 +457,9 @@ static int cpu_stop_should_run(unsigned int cpu)
unsigned long flags; unsigned long flags;
int run; int run;
spin_lock_irqsave(&stopper->lock, flags); raw_spin_lock_irqsave(&stopper->lock, flags);
run = !list_empty(&stopper->works); run = !list_empty(&stopper->works);
spin_unlock_irqrestore(&stopper->lock, flags); raw_spin_unlock_irqrestore(&stopper->lock, flags);
return run; return run;
} }
...@@ -470,13 +470,13 @@ static void cpu_stopper_thread(unsigned int cpu) ...@@ -470,13 +470,13 @@ static void cpu_stopper_thread(unsigned int cpu)
repeat: repeat:
work = NULL; work = NULL;
spin_lock_irq(&stopper->lock); raw_spin_lock_irq(&stopper->lock);
if (!list_empty(&stopper->works)) { if (!list_empty(&stopper->works)) {
work = list_first_entry(&stopper->works, work = list_first_entry(&stopper->works,
struct cpu_stop_work, list); struct cpu_stop_work, list);
list_del_init(&work->list); list_del_init(&work->list);
} }
spin_unlock_irq(&stopper->lock); raw_spin_unlock_irq(&stopper->lock);
if (work) { if (work) {
cpu_stop_fn_t fn = work->fn; cpu_stop_fn_t fn = work->fn;
...@@ -550,7 +550,7 @@ static int __init cpu_stop_init(void) ...@@ -550,7 +550,7 @@ static int __init cpu_stop_init(void)
for_each_possible_cpu(cpu) { for_each_possible_cpu(cpu) {
struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu); struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
spin_lock_init(&stopper->lock); raw_spin_lock_init(&stopper->lock);
INIT_LIST_HEAD(&stopper->works); INIT_LIST_HEAD(&stopper->works);
} }
......
Prior Operation Subsequent Operation Prior Operation Subsequent Operation
--------------- --------------------------- --------------- ---------------------------
C Self R W RWM Self R W DR DW RMW SV C Self R W RMW Self R W DR DW RMW SV
-- ---- - - --- ---- - - -- -- --- -- -- ---- - - --- ---- - - -- -- --- --
Store, e.g., WRITE_ONCE() Y Y Store, e.g., WRITE_ONCE() Y Y
...@@ -14,7 +14,7 @@ smp_wmb() Y W Y Y W ...@@ -14,7 +14,7 @@ smp_wmb() Y W Y Y W
smp_mb() & synchronize_rcu() CP Y Y Y Y Y Y Y Y smp_mb() & synchronize_rcu() CP Y Y Y Y Y Y Y Y
Successful full non-void RMW CP Y Y Y Y Y Y Y Y Y Y Y Successful full non-void RMW CP Y Y Y Y Y Y Y Y Y Y Y
smp_mb__before_atomic() CP Y Y Y a a a a Y smp_mb__before_atomic() CP Y Y Y a a a a Y
smp_mb__after_atomic() CP a a Y Y Y Y Y smp_mb__after_atomic() CP a a Y Y Y Y Y Y
Key: C: Ordering is cumulative Key: C: Ordering is cumulative
...@@ -26,4 +26,5 @@ Key: C: Ordering is cumulative ...@@ -26,4 +26,5 @@ Key: C: Ordering is cumulative
DR: Dependent read (address dependency) DR: Dependent read (address dependency)
DW: Dependent write (address, data, or control dependency) DW: Dependent write (address, data, or control dependency)
RMW: Atomic read-modify-write operation RMW: Atomic read-modify-write operation
SV Same-variable access SELF: Orders self, as opposed to accesses before and/or after
SV: Orders later accesses to the same variable
...@@ -27,7 +27,7 @@ Explanation of the Linux-Kernel Memory Consistency Model ...@@ -27,7 +27,7 @@ Explanation of the Linux-Kernel Memory Consistency Model
19. AND THEN THERE WAS ALPHA 19. AND THEN THERE WAS ALPHA
20. THE HAPPENS-BEFORE RELATION: hb 20. THE HAPPENS-BEFORE RELATION: hb
21. THE PROPAGATES-BEFORE RELATION: pb 21. THE PROPAGATES-BEFORE RELATION: pb
22. RCU RELATIONS: link, gp-link, rscs-link, and rcu-path 22. RCU RELATIONS: rcu-link, gp, rscs, rcu-fence, and rb
23. ODDS AND ENDS 23. ODDS AND ENDS
...@@ -1451,8 +1451,8 @@ they execute means that it cannot have cycles. This requirement is ...@@ -1451,8 +1451,8 @@ they execute means that it cannot have cycles. This requirement is
the content of the LKMM's "propagation" axiom. the content of the LKMM's "propagation" axiom.
RCU RELATIONS: link, gp-link, rscs-link, and rcu-path RCU RELATIONS: rcu-link, gp, rscs, rcu-fence, and rb
----------------------------------------------------- ----------------------------------------------------
RCU (Read-Copy-Update) is a powerful synchronization mechanism. It RCU (Read-Copy-Update) is a powerful synchronization mechanism. It
rests on two concepts: grace periods and read-side critical sections. rests on two concepts: grace periods and read-side critical sections.
...@@ -1509,8 +1509,8 @@ y, which occurs before the end of the critical section, did not ...@@ -1509,8 +1509,8 @@ y, which occurs before the end of the critical section, did not
propagate to P1 before the end of the grace period, violating the propagate to P1 before the end of the grace period, violating the
Guarantee. Guarantee.
In the kernel's implementations of RCU, the business about stores In the kernel's implementations of RCU, the requirements for stores
propagating to every CPU is realized by placing strong fences at to propagate to every CPU are fulfilled by placing strong fences at
suitable places in the RCU-related code. Thus, if a critical section suitable places in the RCU-related code. Thus, if a critical section
starts before a grace period does then the critical section's CPU will starts before a grace period does then the critical section's CPU will
execute an smp_mb() fence after the end of the critical section and execute an smp_mb() fence after the end of the critical section and
...@@ -1523,72 +1523,124 @@ executes. ...@@ -1523,72 +1523,124 @@ executes.
What exactly do we mean by saying that a critical section "starts What exactly do we mean by saying that a critical section "starts
before" or "ends after" a grace period? Some aspects of the meaning before" or "ends after" a grace period? Some aspects of the meaning
are pretty obvious, as in the example above, but the details aren't are pretty obvious, as in the example above, but the details aren't
entirely clear. The LKMM formalizes this notion by means of a entirely clear. The LKMM formalizes this notion by means of the
relation with the unfortunately generic name "link". It is a very rcu-link relation. rcu-link encompasses a very general notion of
general relation; among other things, X ->link Z includes cases where "before": Among other things, X ->rcu-link Z includes cases where X
X happens-before or is equal to some event Y which is equal to or happens-before or is equal to some event Y which is equal to or comes
comes before Z in the coherence order. Taking Y = Z, this says that before Z in the coherence order. When Y = Z this says that X ->rfe Z
X ->rfe Z implies X ->link Z, and taking Y = X, it says that X ->fr Z implies X ->rcu-link Z. In addition, when Y = X it says that X ->fr Z
and X ->co Z each imply X ->link Z. and X ->co Z each imply X ->rcu-link Z.
The formal definition of the link relation is more than a little The formal definition of the rcu-link relation is more than a little
obscure, and we won't give it here. It is closely related to the pb obscure, and we won't give it here. It is closely related to the pb
relation, and the details don't matter unless you want to comb through relation, and the details don't matter unless you want to comb through
a somewhat lengthy formal proof. Pretty much all you need to know a somewhat lengthy formal proof. Pretty much all you need to know
about link is the information in the preceding paragraph. about rcu-link is the information in the preceding paragraph.
The LKMM goes on to define the gp-link and rscs-link relations. They The LKMM also defines the gp and rscs relations. They bring grace
bring grace periods and read-side critical sections into the picture, periods and read-side critical sections into the picture, in the
in the following way: following way:
E ->gp-link F means there is a synchronize_rcu() fence event S E ->gp F means there is a synchronize_rcu() fence event S such
and an event X such that E ->po S, either S ->po X or S = X, that E ->po S and either S ->po F or S = F. In simple terms,
and X ->link F. In other words, E and F are connected by a there is a grace period po-between E and F.
grace period followed by an instance of link.
E ->rscs F means there is a critical section delimited by an
E ->rscs-link F means there is a critical section delimited by rcu_read_lock() fence L and an rcu_read_unlock() fence U, such
an rcu_read_lock() fence L and an rcu_read_unlock() fence U, that E ->po U and either L ->po F or L = F. You can think of
and an event X such that E ->po U, either L ->po X or L = X, this as saying that E and F are in the same critical section
and X ->link F. Roughly speaking, this says that some event (in fact, it also allows E to be po-before the start of the
in the same critical section as E is connected by link to F. critical section and F to be po-after the end).
If we think of the link relation as standing for an extended "before", If we think of the rcu-link relation as standing for an extended
then E ->gp-link F says that E executes before a grace period which "before", then X ->gp Y ->rcu-link Z says that X executes before a
ends before F executes. (In fact it says more than this, because it grace period which ends before Z executes. (In fact it covers more
includes cases where E executes before a grace period and some store than this, because it also includes cases where X executes before a
propagates to F's CPU before F executes and doesn't propagate to some grace period and some store propagates to Z's CPU before Z executes
other CPU until after the grace period ends.) Similarly, but doesn't propagate to some other CPU until after the grace period
E ->rscs-link F says that E is part of (or before the start of) a ends.) Similarly, X ->rscs Y ->rcu-link Z says that X is part of (or
critical section which starts before F executes. before the start of) a critical section which starts before Z
executes.
The LKMM goes on to define the rcu-fence relation as a sequence of gp
and rscs links separated by rcu-link links, in which the number of gp
links is >= the number of rscs links. For example:
X ->gp Y ->rcu-link Z ->rscs T ->rcu-link U ->gp V
would imply that X ->rcu-fence V, because this sequence contains two
gp links and only one rscs link. (It also implies that X ->rcu-fence T
and Z ->rcu-fence V.) On the other hand:
X ->rscs Y ->rcu-link Z ->rscs T ->rcu-link U ->gp V
does not imply X ->rcu-fence V, because the sequence contains only
one gp link but two rscs links.
The rcu-fence relation is important because the Grace Period Guarantee
means that rcu-fence acts kind of like a strong fence. In particular,
if W is a write and we have W ->rcu-fence Z, the Guarantee says that W
will propagate to every CPU before Z executes.
To prove this in full generality requires some intellectual effort.
We'll consider just a very simple case:
W ->gp X ->rcu-link Y ->rscs Z.
This formula means that there is a grace period G and a critical
section C such that:
1. W is po-before G;
2. X is equal to or po-after G;
3. X comes "before" Y in some sense;
4. Y is po-before the end of C;
5. Z is equal to or po-after the start of C.
From 2 - 4 we deduce that the grace period G ends before the critical
section C. Then the second part of the Grace Period Guarantee says
not only that G starts before C does, but also that W (which executes
on G's CPU before G starts) must propagate to every CPU before C
starts. In particular, W propagates to every CPU before Z executes
(or finishes executing, in the case where Z is equal to the
rcu_read_lock() fence event which starts C.) This sort of reasoning
can be expanded to handle all the situations covered by rcu-fence.
Finally, the LKMM defines the RCU-before (rb) relation in terms of
rcu-fence. This is done in essentially the same way as the pb
relation was defined in terms of strong-fence. We will omit the
details; the end result is that E ->rb F implies E must execute before
F, just as E ->pb F does (and for much the same reasons).
Putting this all together, the LKMM expresses the Grace Period Putting this all together, the LKMM expresses the Grace Period
Guarantee by requiring that there are no cycles consisting of gp-link Guarantee by requiring that the rb relation does not contain a cycle.
and rscs-link connections in which the number of gp-link instances is Equivalently, this "rcu" axiom requires that there are no events E and
>= the number of rscs-link instances. It does this by defining the F with E ->rcu-link F ->rcu-fence E. Or to put it a third way, the
rcu-path relation to link events E and F whenever it is possible to axiom requires that there are no cycles consisting of gp and rscs
pass from E to F by a sequence of gp-link and rscs-link connections alternating with rcu-link, where the number of gp links is >= the
with at least as many of the former as the latter. The LKMM's "rcu" number of rscs links.
axiom then says that there are no events E such that E ->rcu-path E.
Justifying the axiom isn't easy, but it is in fact a valid
Justifying this axiom takes some intellectual effort, but it is in formalization of the Grace Period Guarantee. We won't attempt to go
fact a valid formalization of the Grace Period Guarantee. We won't through the detailed argument, but the following analysis gives a
attempt to go through the detailed argument, but the following taste of what is involved. Suppose we have a violation of the first
analysis gives a taste of what is involved. Suppose we have a part of the Guarantee: A critical section starts before a grace
violation of the first part of the Guarantee: A critical section period, and some store propagates to the critical section's CPU before
starts before a grace period, and some store propagates to the the end of the critical section but doesn't propagate to some other
critical section's CPU before the end of the critical section but CPU until after the end of the grace period.
doesn't propagate to some other CPU until after the end of the grace
period.
Putting symbols to these ideas, let L and U be the rcu_read_lock() and Putting symbols to these ideas, let L and U be the rcu_read_lock() and
rcu_read_unlock() fence events delimiting the critical section in rcu_read_unlock() fence events delimiting the critical section in
question, and let S be the synchronize_rcu() fence event for the grace question, and let S be the synchronize_rcu() fence event for the grace
period. Saying that the critical section starts before S means there period. Saying that the critical section starts before S means there
are events E and F where E is po-after L (which marks the start of the are events E and F where E is po-after L (which marks the start of the
critical section), E is "before" F in the sense of the link relation, critical section), E is "before" F in the sense of the rcu-link
and F is po-before the grace period S: relation, and F is po-before the grace period S:
L ->po E ->link F ->po S. L ->po E ->rcu-link F ->po S.
Let W be the store mentioned above, let Z come before the end of the Let W be the store mentioned above, let Z come before the end of the
critical section and witness that W propagates to the critical critical section and witness that W propagates to the critical
...@@ -1600,16 +1652,19 @@ some event X which is po-after S. Symbolically, this amounts to: ...@@ -1600,16 +1652,19 @@ some event X which is po-after S. Symbolically, this amounts to:
The fr link from Y to W indicates that W has not propagated to Y's CPU The fr link from Y to W indicates that W has not propagated to Y's CPU
at the time that Y executes. From this, it can be shown (see the at the time that Y executes. From this, it can be shown (see the
discussion of the link relation earlier) that X and Z are connected by discussion of the rcu-link relation earlier) that X and Z are related
link, yielding: by rcu-link, yielding:
S ->po X ->rcu-link Z ->po U.
The formulas say that S is po-between F and X, hence F ->gp X. They
also say that Z comes before the end of the critical section and E
comes after its start, hence Z ->rscs E. From all this we obtain:
S ->po X ->link Z ->po U. F ->gp X ->rcu-link Z ->rscs E ->rcu-link F,
These formulas say that S is po-between F and X, hence F ->gp-link Z a forbidden cycle. Thus the "rcu" axiom rules out this violation of
via X. They also say that Z comes before the end of the critical the Grace Period Guarantee.
section and E comes after its start, hence Z ->rscs-link F via E. But
now we have a forbidden cycle: F ->gp-link Z ->rscs-link F. Thus the
"rcu" axiom rules out this violation of the Grace Period Guarantee.
For something a little more down-to-earth, let's see how the axiom For something a little more down-to-earth, let's see how the axiom
works out in practice. Consider the RCU code example from above, this works out in practice. Consider the RCU code example from above, this
...@@ -1635,18 +1690,18 @@ time with statement labels added to the memory access instructions: ...@@ -1635,18 +1690,18 @@ time with statement labels added to the memory access instructions:
} }
If r2 = 0 at the end then P0's store at X overwrites the value If r2 = 0 at the end then P0's store at X overwrites the value that
that P1's load at Z reads from, so we have Z ->fre X and thus P1's load at Z reads from, so we have Z ->fre X and thus Z ->rcu-link X.
Z ->link X. In addition, there is a synchronize_rcu() between Y and In addition, there is a synchronize_rcu() between Y and Z, so therefore
Z, so therefore we have Y ->gp-link X. we have Y ->gp Z.
If r1 = 1 at the end then P1's load at Y reads from P0's store at W, If r1 = 1 at the end then P1's load at Y reads from P0's store at W,
so we have W ->link Y. In addition, W and X are in the same critical so we have W ->rcu-link Y. In addition, W and X are in the same critical
section, so therefore we have X ->rscs-link Y. section, so therefore we have X ->rscs W.
This gives us a cycle, Y ->gp-link X ->rscs-link Y, with one gp-link Then X ->rscs W ->rcu-link Y ->gp Z ->rcu-link X is a forbidden cycle,
and one rscs-link, violating the "rcu" axiom. Hence the outcome is violating the "rcu" axiom. Hence the outcome is not allowed by the
not allowed by the LKMM, as we would expect. LKMM, as we would expect.
For contrast, let's see what can happen in a more complicated example: For contrast, let's see what can happen in a more complicated example:
...@@ -1682,15 +1737,11 @@ For contrast, let's see what can happen in a more complicated example: ...@@ -1682,15 +1737,11 @@ For contrast, let's see what can happen in a more complicated example:
} }
If r0 = r1 = r2 = 1 at the end, then similar reasoning to before shows If r0 = r1 = r2 = 1 at the end, then similar reasoning to before shows
that W ->rscs-link Y via X, Y ->gp-link U via Z, and U ->rscs-link W that W ->rscs X ->rcu-link Y ->gp Z ->rcu-link U ->rscs V ->rcu-link W.
via V. And just as before, this gives a cycle: However this cycle is not forbidden, because the sequence of relations
contains fewer instances of gp (one) than of rscs (two). Consequently
W ->rscs-link Y ->gp-link U ->rscs-link W. the outcome is allowed by the LKMM. The following instruction timing
diagram shows how it might actually occur:
However, this cycle has fewer gp-link instances than rscs-link
instances, and consequently the outcome is not forbidden by the LKMM.
The following instruction timing diagram shows how it might actually
occur:
P0 P1 P2 P0 P1 P2
-------------------- -------------------- -------------------- -------------------- -------------------- --------------------
......
...@@ -63,15 +63,22 @@ o Shaked Flur, Susmit Sarkar, Christopher Pulte, Kyndylan Nienhuis, ...@@ -63,15 +63,22 @@ o Shaked Flur, Susmit Sarkar, Christopher Pulte, Kyndylan Nienhuis,
Principles of Programming Languages (POPL 2017). ACM, New York, Principles of Programming Languages (POPL 2017). ACM, New York,
NY, USA, 429–442. NY, USA, 429–442.
o Christopher Pulte, Shaked Flur, Will Deacon, Jon French,
Susmit Sarkar, and Peter Sewell. 2018. "Simplifying ARM concurrency:
multicopy-atomic axiomatic and operational models for ARMv8". In
Proceedings of the ACM on Programming Languages, Volume 2, Issue
POPL, Article No. 19. ACM, New York, NY, USA.
Linux-kernel memory model Linux-kernel memory model
========================= =========================
o Andrea Parri, Alan Stern, Luc Maranget, Paul E. McKenney, o Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
and Jade Alglave. 2017. "A formal model of Alan Stern. 2018. "Frightening small children and disconcerting
Linux-kernel memory ordering - companion webpage". grown-ups: Concurrency in the Linux kernel". In Proceedings of
http://moscova.inria.fr/∼maranget/cats7/linux/. (2017). [Online; the 23rd International Conference on Architectural Support for
accessed 30-January-2017]. Programming Languages and Operating Systems (ASPLOS 2018). ACM,
New York, NY, USA, 405-418. Webpage: http://diy.inria.fr/linux/.
o Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and o Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
Alan Stern. 2017. "A formal kernel memory-ordering model (part 1)" Alan Stern. 2017. "A formal kernel memory-ordering model (part 1)"
......
...@@ -20,7 +20,7 @@ that litmus test to be exercised within the Linux kernel. ...@@ -20,7 +20,7 @@ that litmus test to be exercised within the Linux kernel.
REQUIREMENTS REQUIREMENTS
============ ============
Version 7.48 of the "herd7" and "klitmus7" tools must be downloaded Version 7.49 of the "herd7" and "klitmus7" tools must be downloaded
separately: separately:
https://github.com/herd/herdtools7 https://github.com/herd/herdtools7
......
...@@ -5,10 +5,10 @@ ...@@ -5,10 +5,10 @@
* Copyright (C) 2017 Alan Stern <stern@rowland.harvard.edu>, * Copyright (C) 2017 Alan Stern <stern@rowland.harvard.edu>,
* Andrea Parri <parri.andrea@gmail.com> * Andrea Parri <parri.andrea@gmail.com>
* *
* An earlier version of this file appears in the companion webpage for * An earlier version of this file appeared in the companion webpage for
* "Frightening small children and disconcerting grown-ups: Concurrency * "Frightening small children and disconcerting grown-ups: Concurrency
* in the Linux kernel" by Alglave, Maranget, McKenney, Parri, and Stern, * in the Linux kernel" by Alglave, Maranget, McKenney, Parri, and Stern,
* which is to appear in ASPLOS 2018. * which appeared in ASPLOS 2018.
*) *)
"Linux-kernel memory consistency model" "Linux-kernel memory consistency model"
......
...@@ -5,10 +5,10 @@ ...@@ -5,10 +5,10 @@
* Copyright (C) 2017 Alan Stern <stern@rowland.harvard.edu>, * Copyright (C) 2017 Alan Stern <stern@rowland.harvard.edu>,
* Andrea Parri <parri.andrea@gmail.com> * Andrea Parri <parri.andrea@gmail.com>
* *
* An earlier version of this file appears in the companion webpage for * An earlier version of this file appeared in the companion webpage for
* "Frightening small children and disconcerting grown-ups: Concurrency * "Frightening small children and disconcerting grown-ups: Concurrency
* in the Linux kernel" by Alglave, Maranget, McKenney, Parri, and Stern, * in the Linux kernel" by Alglave, Maranget, McKenney, Parri, and Stern,
* which is to appear in ASPLOS 2018. * which appeared in ASPLOS 2018.
*) *)
"Linux-kernel memory consistency model" "Linux-kernel memory consistency model"
...@@ -100,22 +100,29 @@ let rscs = po ; crit^-1 ; po? ...@@ -100,22 +100,29 @@ let rscs = po ; crit^-1 ; po?
* one but two non-rf relations, but only in conjunction with an RCU * one but two non-rf relations, but only in conjunction with an RCU
* read-side critical section. * read-side critical section.
*) *)
let link = hb* ; pb* ; prop let rcu-link = hb* ; pb* ; prop
(* Chains that affect the RCU grace-period guarantee *) (*
let gp-link = gp ; link * Any sequence containing at least as many grace periods as RCU read-side
let rscs-link = rscs ; link * critical sections (joined by rcu-link) acts as a generalized strong fence.
*)
let rec rcu-fence = gp |
(gp ; rcu-link ; rscs) |
(rscs ; rcu-link ; gp) |
(gp ; rcu-link ; rcu-fence ; rcu-link ; rscs) |
(rscs ; rcu-link ; rcu-fence ; rcu-link ; gp) |
(rcu-fence ; rcu-link ; rcu-fence)
(* rb orders instructions just as pb does *)
let rb = prop ; rcu-fence ; hb* ; pb*
irreflexive rb as rcu
(* (*
* A cycle containing at least as many grace periods as RCU read-side * The happens-before, propagation, and rcu constraints are all
* critical sections is forbidden. * expressions of temporal ordering. They could be replaced by
* a single constraint on an "executes-before" relation, xb:
*
* let xb = hb | pb | rb
* acyclic xb as executes-before
*) *)
let rec rcu-path =
gp-link |
(gp-link ; rscs-link) |
(rscs-link ; gp-link) |
(rcu-path ; rcu-path) |
(gp-link ; rcu-path ; rscs-link) |
(rscs-link ; rcu-path ; gp-link)
irreflexive rcu-path as rcu
// SPDX-License-Identifier: GPL-2.0+ // SPDX-License-Identifier: GPL-2.0+
// //
// An earlier version of this file appears in the companion webpage for // An earlier version of this file appeared in the companion webpage for
// "Frightening small children and disconcerting grown-ups: Concurrency // "Frightening small children and disconcerting grown-ups: Concurrency
// in the Linux kernel" by Alglave, Maranget, McKenney, Parri, and Stern, // in the Linux kernel" by Alglave, Maranget, McKenney, Parri, and Stern,
// which is to appear in ASPLOS 2018. // which appeared in ASPLOS 2018.
// ONCE // ONCE
READ_ONCE(X) __load{once}(X) READ_ONCE(X) __load{once}(X)
...@@ -14,14 +14,15 @@ smp_store_release(X,V) { __store{release}(*X,V); } ...@@ -14,14 +14,15 @@ smp_store_release(X,V) { __store{release}(*X,V); }
smp_load_acquire(X) __load{acquire}(*X) smp_load_acquire(X) __load{acquire}(*X)
rcu_assign_pointer(X,V) { __store{release}(X,V); } rcu_assign_pointer(X,V) { __store{release}(X,V); }
rcu_dereference(X) __load{once}(X) rcu_dereference(X) __load{once}(X)
smp_store_mb(X,V) { __store{once}(X,V); __fence{mb}; }
// Fences // Fences
smp_mb() { __fence{mb} ; } smp_mb() { __fence{mb}; }
smp_rmb() { __fence{rmb} ; } smp_rmb() { __fence{rmb}; }
smp_wmb() { __fence{wmb} ; } smp_wmb() { __fence{wmb}; }
smp_mb__before_atomic() { __fence{before-atomic} ; } smp_mb__before_atomic() { __fence{before-atomic}; }
smp_mb__after_atomic() { __fence{after-atomic} ; } smp_mb__after_atomic() { __fence{after-atomic}; }
smp_mb__after_spinlock() { __fence{after-spinlock} ; } smp_mb__after_spinlock() { __fence{after-spinlock}; }
// Exchange // Exchange
xchg(X,V) __xchg{mb}(X,V) xchg(X,V) __xchg{mb}(X,V)
...@@ -34,26 +35,27 @@ cmpxchg_acquire(X,V,W) __cmpxchg{acquire}(X,V,W) ...@@ -34,26 +35,27 @@ cmpxchg_acquire(X,V,W) __cmpxchg{acquire}(X,V,W)
cmpxchg_release(X,V,W) __cmpxchg{release}(X,V,W) cmpxchg_release(X,V,W) __cmpxchg{release}(X,V,W)
// Spinlocks // Spinlocks
spin_lock(X) { __lock(X) ; } spin_lock(X) { __lock(X); }
spin_unlock(X) { __unlock(X) ; } spin_unlock(X) { __unlock(X); }
spin_trylock(X) __trylock(X) spin_trylock(X) __trylock(X)
spin_is_locked(X) __islocked(X)
// RCU // RCU
rcu_read_lock() { __fence{rcu-lock}; } rcu_read_lock() { __fence{rcu-lock}; }
rcu_read_unlock() { __fence{rcu-unlock};} rcu_read_unlock() { __fence{rcu-unlock}; }
synchronize_rcu() { __fence{sync-rcu}; } synchronize_rcu() { __fence{sync-rcu}; }
synchronize_rcu_expedited() { __fence{sync-rcu}; } synchronize_rcu_expedited() { __fence{sync-rcu}; }
// Atomic // Atomic
atomic_read(X) READ_ONCE(*X) atomic_read(X) READ_ONCE(*X)
atomic_set(X,V) { WRITE_ONCE(*X,V) ; } atomic_set(X,V) { WRITE_ONCE(*X,V); }
atomic_read_acquire(X) smp_load_acquire(X) atomic_read_acquire(X) smp_load_acquire(X)
atomic_set_release(X,V) { smp_store_release(X,V); } atomic_set_release(X,V) { smp_store_release(X,V); }
atomic_add(V,X) { __atomic_op(X,+,V) ; } atomic_add(V,X) { __atomic_op(X,+,V); }
atomic_sub(V,X) { __atomic_op(X,-,V) ; } atomic_sub(V,X) { __atomic_op(X,-,V); }
atomic_inc(X) { __atomic_op(X,+,1) ; } atomic_inc(X) { __atomic_op(X,+,1); }
atomic_dec(X) { __atomic_op(X,-,1) ; } atomic_dec(X) { __atomic_op(X,-,1); }
atomic_add_return(V,X) __atomic_op_return{mb}(X,+,V) atomic_add_return(V,X) __atomic_op_return{mb}(X,+,V)
atomic_add_return_relaxed(V,X) __atomic_op_return{once}(X,+,V) atomic_add_return_relaxed(V,X) __atomic_op_return{once}(X,+,V)
......
...@@ -7,7 +7,7 @@ C IRIW+mbonceonces+OnceOnce ...@@ -7,7 +7,7 @@ C IRIW+mbonceonces+OnceOnce
* between each pairs of reads. In other words, is smp_mb() sufficient to * between each pairs of reads. In other words, is smp_mb() sufficient to
* cause two different reading processes to agree on the order of a pair * cause two different reading processes to agree on the order of a pair
* of writes, where each write is to a different variable by a different * of writes, where each write is to a different variable by a different
* process? * process? This litmus test exercises LKMM's "propagation" rule.
*) *)
{} {}
......
C MP+polockmbonce+poacquiresilsil
(*
* Result: Never
*
* Do spinlocks combined with smp_mb__after_spinlock() provide order
* to outside observers using spin_is_locked() to sense the lock-held
* state, ordered by acquire? Note that when the first spin_is_locked()
* returns false and the second true, we know that the smp_load_acquire()
* executed before the lock was acquired (loosely speaking).
*)
{
}
P0(spinlock_t *lo, int *x)
{
spin_lock(lo);
smp_mb__after_spinlock();
WRITE_ONCE(*x, 1);
spin_unlock(lo);
}
P1(spinlock_t *lo, int *x)
{
int r1;
int r2;
int r3;
r1 = smp_load_acquire(x);
r2 = spin_is_locked(lo);
r3 = spin_is_locked(lo);
}
exists (1:r1=1 /\ 1:r2=0 /\ 1:r3=1)
C MP+polockonce+poacquiresilsil
(*
* Result: Sometimes
*
* Do spinlocks provide order to outside observers using spin_is_locked()
* to sense the lock-held state, ordered by acquire? Note that when the
* first spin_is_locked() returns false and the second true, we know that
* the smp_load_acquire() executed before the lock was acquired (loosely
* speaking).
*)
{
}
P0(spinlock_t *lo, int *x)
{
spin_lock(lo);
WRITE_ONCE(*x, 1);
spin_unlock(lo);
}
P1(spinlock_t *lo, int *x)
{
int r1;
int r2;
int r3;
r1 = smp_load_acquire(x);
r2 = spin_is_locked(lo);
r3 = spin_is_locked(lo);
}
exists (1:r1=1 /\ 1:r2=0 /\ 1:r3=1)
...@@ -23,7 +23,8 @@ IRIW+mbonceonces+OnceOnce.litmus ...@@ -23,7 +23,8 @@ IRIW+mbonceonces+OnceOnce.litmus
between each pairs of reads. In other words, is smp_mb() between each pairs of reads. In other words, is smp_mb()
sufficient to cause two different reading processes to agree on sufficient to cause two different reading processes to agree on
the order of a pair of writes, where each write is to a different the order of a pair of writes, where each write is to a different
variable by a different process? variable by a different process? This litmus test is forbidden
by LKMM's propagation rule.
IRIW+poonceonces+OnceOnce.litmus IRIW+poonceonces+OnceOnce.litmus
Test of independent reads from independent writes with nothing Test of independent reads from independent writes with nothing
...@@ -63,6 +64,16 @@ LB+poonceonces.litmus ...@@ -63,6 +64,16 @@ LB+poonceonces.litmus
MP+onceassign+derefonce.litmus MP+onceassign+derefonce.litmus
As below, but with rcu_assign_pointer() and an rcu_dereference(). As below, but with rcu_assign_pointer() and an rcu_dereference().
MP+polockmbonce+poacquiresilsil.litmus
Protect the access with a lock and an smp_mb__after_spinlock()
in one process, and use an acquire load followed by a pair of
spin_is_locked() calls in the other process.
MP+polockonce+poacquiresilsil.litmus
Protect the access with a lock in one process, and use an
acquire load followed by a pair of spin_is_locked() calls
in the other process.
MP+polocks.litmus MP+polocks.litmus
As below, but with the second access of the writer process As below, but with the second access of the writer process
and the first access of reader process protected by a lock. and the first access of reader process protected by a lock.
...@@ -109,8 +120,10 @@ S+wmbonceonce+poacquireonce.litmus ...@@ -109,8 +120,10 @@ S+wmbonceonce+poacquireonce.litmus
WRC+poonceonces+Once.litmus WRC+poonceonces+Once.litmus
WRC+pooncerelease+rmbonceonce+Once.litmus WRC+pooncerelease+rmbonceonce+Once.litmus
These two are members of an extension of the MP litmus-test class These two are members of an extension of the MP litmus-test
in which the first write is moved to a separate process. class in which the first write is moved to a separate process.
The second is forbidden because smp_store_release() is
A-cumulative in LKMM.
Z6.0+pooncelock+pooncelock+pombonce.litmus Z6.0+pooncelock+pooncelock+pombonce.litmus
Is the ordering provided by a spin_unlock() and a subsequent Is the ordering provided by a spin_unlock() and a subsequent
......
...@@ -5,7 +5,9 @@ C WRC+pooncerelease+rmbonceonce+Once ...@@ -5,7 +5,9 @@ C WRC+pooncerelease+rmbonceonce+Once
* *
* This litmus test is an extension of the message-passing pattern, where * This litmus test is an extension of the message-passing pattern, where
* the first write is moved to a separate process. Because it features * the first write is moved to a separate process. Because it features
* a release and a read memory barrier, it should be forbidden. * a release and a read memory barrier, it should be forbidden. More
* specifically, this litmus test is forbidden because smp_store_release()
* is A-cumulative in LKMM.
*) *)
{} {}
......
...@@ -4,46 +4,72 @@ ...@@ -4,46 +4,72 @@
* Copyright (C) 2017 Alan Stern <stern@rowland.harvard.edu> * Copyright (C) 2017 Alan Stern <stern@rowland.harvard.edu>
*) *)
(* Generate coherence orders and handle lock operations *) (*
* Generate coherence orders and handle lock operations
*
* Warning: spin_is_locked() crashes herd7 versions strictly before 7.48.
* spin_is_locked() is functional from herd7 version 7.49.
*)
include "cross.cat" include "cross.cat"
(* From lock reads to their partner lock writes *)
let lk-rmw = ([LKR] ; po-loc ; [LKW]) \ (po ; po)
let rmw = rmw | lk-rmw
(* (*
* A paired LKR must always see an unlocked value; spin_lock() calls nested * The lock-related events generated by herd are as follows:
* inside a critical section (for the same lock) always deadlock. *
* LKR Lock-Read: the read part of a spin_lock() or successful
* spin_trylock() read-modify-write event pair
* LKW Lock-Write: the write part of a spin_lock() or successful
* spin_trylock() RMW event pair
* UL Unlock: a spin_unlock() event
* LF Lock-Fail: a failed spin_trylock() event
* RL Read-Locked: a spin_is_locked() event which returns True
* RU Read-Unlocked: a spin_is_locked() event which returns False
*
* LKR and LKW events always come paired, like all RMW event sequences.
*
* LKR, LF, RL, and RU are read events; LKR has Acquire ordering.
* LKW and UL are write events; UL has Release ordering.
* LKW, LF, RL, and RU have no ordering properties.
*) *)
empty ([LKW] ; po-loc ; [domain(lk-rmw)]) \ (po-loc ; [UL] ; po-loc)
as lock-nest
(* The litmus test is invalid if an LKW event is not part of an RMW pair *) (* Backward compatibility *)
flag ~empty LKW \ range(lk-rmw) as unpaired-LKW let RL = try RL with emptyset
let RU = try RU with emptyset
(* This will be allowed if we implement spin_is_locked() *) (* Treat RL as a kind of LF: a read with no ordering properties *)
flag ~empty LKR \ domain(lk-rmw) as unpaired-LKR let LF = LF | RL
(* There should be no R or W accesses to spinlocks *) (* There should be no ordinary R or W accesses to spinlocks *)
let ALL-LOCKS = LKR | LKW | UL | LF let ALL-LOCKS = LKR | LKW | UL | LF | RU
flag ~empty [M \ IW] ; loc ; [ALL-LOCKS] as mixed-lock-accesses flag ~empty [M \ IW] ; loc ; [ALL-LOCKS] as mixed-lock-accesses
(* Link Lock-Reads to their RMW-partner Lock-Writes *)
let lk-rmw = ([LKR] ; po-loc ; [LKW]) \ (po ; po)
let rmw = rmw | lk-rmw
(* The litmus test is invalid if an LKR/LKW event is not part of an RMW pair *)
flag ~empty LKW \ range(lk-rmw) as unpaired-LKW
flag ~empty LKR \ domain(lk-rmw) as unpaired-LKR
(*
* An LKR must always see an unlocked value; spin_lock() calls nested
* inside a critical section (for the same lock) always deadlock.
*)
empty ([LKW] ; po-loc ; [LKR]) \ (po-loc ; [UL] ; po-loc) as lock-nest
(* The final value of a spinlock should not be tested *) (* The final value of a spinlock should not be tested *)
flag ~empty [FW] ; loc ; [ALL-LOCKS] as lock-final flag ~empty [FW] ; loc ; [ALL-LOCKS] as lock-final
(* (*
* Put lock operations in their appropriate classes, but leave UL out of W * Put lock operations in their appropriate classes, but leave UL out of W
* until after the co relation has been generated. * until after the co relation has been generated.
*) *)
let R = R | LKR | LF let R = R | LKR | LF | RU
let W = W | LKW let W = W | LKW
let Release = Release | UL let Release = Release | UL
let Acquire = Acquire | LKR let Acquire = Acquire | LKR
(* Match LKW events to their corresponding UL events *) (* Match LKW events to their corresponding UL events *)
let critical = ([LKW] ; po-loc ; [UL]) \ (po-loc ; [LKW | UL] ; po-loc) let critical = ([LKW] ; po-loc ; [UL]) \ (po-loc ; [LKW | UL] ; po-loc)
...@@ -53,27 +79,48 @@ flag ~empty UL \ range(critical) as unmatched-unlock ...@@ -53,27 +79,48 @@ flag ~empty UL \ range(critical) as unmatched-unlock
let UNMATCHED-LKW = LKW \ domain(critical) let UNMATCHED-LKW = LKW \ domain(critical)
empty ([UNMATCHED-LKW] ; loc ; [UNMATCHED-LKW]) \ id as unmatched-locks empty ([UNMATCHED-LKW] ; loc ; [UNMATCHED-LKW]) \ id as unmatched-locks
(* rfi for LF events: link each LKW to the LF events in its critical section *) (* rfi for LF events: link each LKW to the LF events in its critical section *)
let rfi-lf = ([LKW] ; po-loc ; [LF]) \ ([LKW] ; po-loc ; [UL] ; po-loc) let rfi-lf = ([LKW] ; po-loc ; [LF]) \ ([LKW] ; po-loc ; [UL] ; po-loc)
(* rfe for LF events *) (* rfe for LF events *)
let all-possible-rfe-lf = let all-possible-rfe-lf =
(* (*
* Given an LF event r, compute the possible rfe edges for that event * Given an LF event r, compute the possible rfe edges for that event
* (all those starting from LKW events in other threads), * (all those starting from LKW events in other threads),
* and then convert that relation to a set of single-edge relations. * and then convert that relation to a set of single-edge relations.
*) *)
let possible-rfe-lf r = let possible-rfe-lf r =
let pair-to-relation p = p ++ 0 let pair-to-relation p = p ++ 0
in map pair-to-relation ((LKW * {r}) & loc & ext) in map pair-to-relation ((LKW * {r}) & loc & ext)
(* Do this for each LF event r that isn't in rfi-lf *) (* Do this for each LF event r that isn't in rfi-lf *)
in map possible-rfe-lf (LF \ range(rfi-lf)) in map possible-rfe-lf (LF \ range(rfi-lf))
(* Generate all rf relations for LF events *) (* Generate all rf relations for LF events *)
with rfe-lf from cross(all-possible-rfe-lf) with rfe-lf from cross(all-possible-rfe-lf)
let rf = rf | rfi-lf | rfe-lf let rf-lf = rfe-lf | rfi-lf
(*
* RU, i.e., spin_is_locked() returning False, is slightly different.
* We rely on the memory model to rule out cases where spin_is_locked()
* within one of the lock's critical sections returns False.
*)
(* rfi for RU events: an RU may read from the last po-previous UL *)
let rfi-ru = ([UL] ; po-loc ; [RU]) \ ([UL] ; po-loc ; [LKW] ; po-loc)
(* rfe for RU events: an RU may read from an external UL or the initial write *)
let all-possible-rfe-ru =
let possible-rfe-ru r =
let pair-to-relation p = p ++ 0
in map pair-to-relation (((UL | IW) * {r}) & loc & ext)
in map possible-rfe-ru RU
(* Generate all rf relations for RU events *)
with rfe-ru from cross(all-possible-rfe-ru)
let rf-ru = rfe-ru | rfi-ru
(* Final rf relation *)
let rf = rf | rf-lf | rf-ru
(* Generate all co relations, including LKW events but not UL *) (* Generate all co relations, including LKW events but not UL *)
let co0 = co0 | ([IW] ; loc ; [LKW]) | let co0 = co0 | ([IW] ; loc ; [LKW]) |
......
#!/bin/sh
#
# Run herd tests on all .litmus files in the specified directory (which
# defaults to litmus-tests) and check each file's result against a "Result:"
# comment within that litmus test. If the verification result does not
# match that specified in the litmus test, this script prints an error
# message prefixed with "^^^". It also outputs verification results to
# a file whose name is that of the specified litmus test, but with ".out"
# appended.
#
# Usage:
# sh checkalllitmus.sh [ directory ]
#
# The LINUX_HERD_OPTIONS environment variable may be used to specify
# arguments to herd, whose default is defined by the checklitmus.sh script.
# Thus, one would normally run this in the directory containing the memory
# model, specifying the pathname of the litmus test to check.
#
# This script makes no attempt to run the litmus tests concurrently.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, you can access it online at
# http://www.gnu.org/licenses/gpl-2.0.html.
#
# Copyright IBM Corporation, 2018
#
# Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
litmusdir=${1-litmus-tests}
if test -d "$litmusdir" -a -r "$litmusdir" -a -x "$litmusdir"
then
:
else
echo ' --- ' error: $litmusdir is not an accessible directory
exit 255
fi
# Find the checklitmus script. If it is not where we expect it, then
# assume that the caller has the PATH environment variable set
# appropriately.
if test -x scripts/checklitmus.sh
then
clscript=scripts/checklitmus.sh
else
clscript=checklitmus.sh
fi
# Run the script on all the litmus tests in the specified directory
ret=0
for i in litmus-tests/*.litmus
do
if ! $clscript $i
then
ret=1
fi
done
if test "$ret" -ne 0
then
echo " ^^^ VERIFICATION MISMATCHES"
else
echo All litmus tests verified as was expected.
fi
exit $ret
#!/bin/sh
#
# Run a herd test and check the result against a "Result:" comment within
# the litmus test. If the verification result does not match that specified
# in the litmus test, this script prints an error message prefixed with
# "^^^" and exits with a non-zero status. It also outputs verification
# results to a file whose name is that of the specified litmus test, but
# with ".out" appended.
#
# Usage:
# sh checklitmus.sh file.litmus
#
# The LINUX_HERD_OPTIONS environment variable may be used to specify
# arguments to herd, which default to "-conf linux-kernel.cfg". Thus,
# one would normally run this in the directory containing the memory model,
# specifying the pathname of the litmus test to check.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, you can access it online at
# http://www.gnu.org/licenses/gpl-2.0.html.
#
# Copyright IBM Corporation, 2018
#
# Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
litmus=$1
herdoptions=${LINUX_HERD_OPTIONS--conf linux-kernel.cfg}
if test -f "$litmus" -a -r "$litmus"
then
:
else
echo ' --- ' error: \"$litmus\" is not a readable file
exit 255
fi
if grep -q '^ \* Result: ' $litmus
then
outcome=`grep -m 1 '^ \* Result: ' $litmus | awk '{ print $3 }'`
else
outcome=specified
fi
echo Herd options: $herdoptions > $litmus.out
/usr/bin/time herd7 -o ~/tmp $herdoptions $litmus >> $litmus.out 2>&1
grep "Herd options:" $litmus.out
grep '^Observation' $litmus.out
if grep -q '^Observation' $litmus.out
then
:
else
cat $litmus.out
echo ' ^^^ Verification error'
echo ' ^^^ Verification error' >> $litmus.out 2>&1
exit 255
fi
if test "$outcome" = DEADLOCK
then
echo grep 3 and 4
if grep '^Observation' $litmus.out | grep -q 'Never 0 0$'
then
ret=0
else
echo " ^^^ Unexpected non-$outcome verification"
echo " ^^^ Unexpected non-$outcome verification" >> $litmus.out 2>&1
ret=1
fi
elif grep '^Observation' $litmus.out | grep -q $outcome || test "$outcome" = Maybe
then
ret=0
else
echo " ^^^ Unexpected non-$outcome verification"
echo " ^^^ Unexpected non-$outcome verification" >> $litmus.out 2>&1
ret=1
fi
tail -2 $litmus.out | head -1
exit $ret
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册