提交 00bcf5cd 编写于 作者: L Linus Torvalds

Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - rwsem micro-optimizations (Davidlohr Bueso)

   - Improve the implementation and optimize the performance of
     percpu-rwsems. (Peter Zijlstra.)

   - Convert all lglock users to better facilities such as percpu-rwsems
     or percpu-spinlocks and remove lglocks. (Peter Zijlstra)

   - Remove the ticket (spin)lock implementation. (Peter Zijlstra)

   - Korean translation of memory-barriers.txt and related fixes to the
     English document. (SeongJae Park)

   - misc fixes and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/cmpxchg, locking/atomics: Remove superfluous definitions
  x86, locking/spinlocks: Remove ticket (spin)lock implementation
  locking/lglock: Remove lglock implementation
  stop_machine: Remove stop_cpus_lock and lg_double_lock/unlock()
  fs/locks: Use percpu_down_read_preempt_disable()
  locking/percpu-rwsem: Add down_read_preempt_disable()
  fs/locks: Replace lg_local with a per-cpu spinlock
  fs/locks: Replace lg_global with a percpu-rwsem
  locking/percpu-rwsem: Add DEFINE_STATIC_PERCPU_RWSEMand percpu_rwsem_assert_held()
  locking/pv-qspinlock: Use cmpxchg_release() in __pv_queued_spin_unlock()
  locking/rwsem, x86: Drop a bogus cc clobber
  futex: Add some more function commentry
  locking/hung_task: Show all locks
  locking/rwsem: Scan the wait_list for readers only once
  locking/rwsem: Remove a few useless comments
  locking/rwsem: Return void in __rwsem_mark_wake()
  locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write()
  locking/Documentation: Add Korean translation
  locking/Documentation: Fix a typo of example result
  locking/Documentation: Fix wrong section reference
  ...
此差异已折叠。
lglock - local/global locks for mostly local access patterns
------------------------------------------------------------
Origin: Nick Piggin's VFS scalability series introduced during
2.6.35++ [1] [2]
Location: kernel/locking/lglock.c
include/linux/lglock.h
Users: currently only the VFS and stop_machine related code
Design Goal:
------------
Improve scalability of globally used large data sets that are
distributed over all CPUs as per_cpu elements.
To manage global data structures that are partitioned over all CPUs
as per_cpu elements but can be mostly handled by CPU local actions
lglock will be used where the majority of accesses are cpu local
reading and occasional cpu local writing with very infrequent
global write access.
* deal with things locally whenever possible
- very fast access to the local per_cpu data
- reasonably fast access to specific per_cpu data on a different
CPU
* while making global action possible when needed
- by expensive access to all CPUs locks - effectively
resulting in a globally visible critical section.
Design:
-------
Basically it is an array of per_cpu spinlocks with the
lg_local_lock/unlock accessing the local CPUs lock object and the
lg_local_lock_cpu/unlock_cpu accessing a remote CPUs lock object
the lg_local_lock has to disable preemption as migration protection so
that the reference to the local CPUs lock does not go out of scope.
Due to the lg_local_lock/unlock only touching cpu-local resources it
is fast. Taking the local lock on a different CPU will be more
expensive but still relatively cheap.
One can relax the migration constraints by acquiring the current
CPUs lock with lg_local_lock_cpu, remember the cpu, and release that
lock at the end of the critical section even if migrated. This should
give most of the performance benefits without inhibiting migration
though needs careful considerations for nesting of lglocks and
consideration of deadlocks with lg_global_lock.
The lg_global_lock/unlock locks all underlying spinlocks of all
possible CPUs (including those off-line). The preemption disable/enable
are needed in the non-RT kernels to prevent deadlocks like:
on cpu 1
task A task B
lg_global_lock
got cpu 0 lock
<<<< preempt <<<<
lg_local_lock_cpu for cpu 0
spin on cpu 0 lock
On -RT this deadlock scenario is resolved by the arch_spin_locks in the
lglocks being replaced by rt_mutexes which resolve the above deadlock
by boosting the lock-holder.
Implementation:
---------------
The initial lglock implementation from Nick Piggin used some complex
macros to generate the lglock/brlock in lglock.h - they were later
turned into a set of functions by Andi Kleen [7]. The change to functions
was motivated by the presence of multiple lock users and also by them
being easier to maintain than the generating macros. This change to
functions is also the basis to eliminated the restriction of not
being initializeable in kernel modules (the remaining problem is that
locks are not explicitly initialized - see lockdep-design.txt)
Declaration and initialization:
-------------------------------
#include <linux/lglock.h>
DEFINE_LGLOCK(name)
or:
DEFINE_STATIC_LGLOCK(name);
lg_lock_init(&name, "lockdep_name_string");
on UP this is mapped to DEFINE_SPINLOCK(name) in both cases, note
also that as of 3.18-rc6 all declaration in use are of the _STATIC_
variant (and it seems that the non-static was never in use).
lg_lock_init is initializing the lockdep map only.
Usage:
------
From the locking semantics it is a spinlock. It could be called a
locality aware spinlock. lg_local_* behaves like a per_cpu
spinlock and lg_global_* like a global spinlock.
No surprises in the API.
lg_local_lock(*lglock);
access to protected per_cpu object on this CPU
lg_local_unlock(*lglock);
lg_local_lock_cpu(*lglock, cpu);
access to protected per_cpu object on other CPU cpu
lg_local_unlock_cpu(*lglock, cpu);
lg_global_lock(*lglock);
access all protected per_cpu objects on all CPUs
lg_global_unlock(*lglock);
There are no _trylock variants of the lglocks.
Note that the lg_global_lock/unlock has to iterate over all possible
CPUs rather than the actually present CPUs or a CPU could go off-line
with a held lock [4] and that makes it very expensive. A discussion on
these issues can be found at [5]
Constraints:
------------
* currently the declaration of lglocks in kernel modules is not
possible, though this should be doable with little change.
* lglocks are not recursive.
* suitable for code that can do most operations on the CPU local
data and will very rarely need the global lock
* lg_global_lock/unlock is *very* expensive and does not scale
* on UP systems all lg_* primitives are simply spinlocks
* in PREEMPT_RT the spinlock becomes an rt-mutex and can sleep but
does not change the tasks state while sleeping [6].
* in PREEMPT_RT the preempt_disable/enable in lg_local_lock/unlock
is downgraded to a migrate_disable/enable, the other
preempt_disable/enable are downgraded to barriers [6].
The deadlock noted for non-RT above is resolved due to rt_mutexes
boosting the lock-holder in this case which arch_spin_locks do
not do.
lglocks were designed for very specific problems in the VFS and probably
only are the right answer in these corner cases. Any new user that looks
at lglocks probably wants to look at the seqlock and RCU alternatives as
her first choice. There are also efforts to resolve the RCU issues that
currently prevent using RCU in place of view remaining lglocks.
Note on brlock history:
-----------------------
The 'Big Reader' read-write spinlocks were originally introduced by
Ingo Molnar in 2000 (2.4/2.5 kernel series) and removed in 2003. They
later were introduced by the VFS scalability patch set in 2.6 series
again as the "big reader lock" brlock [2] variant of lglock which has
been replaced by seqlock primitives or by RCU based primitives in the
3.13 kernel series as was suggested in [3] in 2003. The brlock was
entirely removed in the 3.13 kernel series.
Link: 1 http://lkml.org/lkml/2010/8/2/81
Link: 2 http://lwn.net/Articles/401738/
Link: 3 http://lkml.org/lkml/2003/3/9/205
Link: 4 https://lkml.org/lkml/2011/8/24/185
Link: 5 http://lkml.org/lkml/2011/12/18/189
Link: 6 https://www.kernel.org/pub/linux/kernel/projects/rt/
patch series - lglocks-rt.patch.patch
Link: 7 http://lkml.org/lkml/2012/3/5/26
......@@ -609,7 +609,7 @@ A data-dependency barrier must also order against dependent writes:
The data-dependency barrier must order the read into Q with the store
into *Q. This prohibits this outcome:
(Q == B) && (B == 4)
(Q == &B) && (B == 4)
Please note that this pattern should be rare. After all, the whole point
of dependency ordering is to -prevent- writes to the data structure, along
......@@ -1928,6 +1928,7 @@ There are some more advanced barrier functions:
See Documentation/DMA-API.txt for more information on consistent memory.
MMIO WRITE BARRIER
------------------
......@@ -2075,7 +2076,7 @@ systems, and so cannot be counted on in such a situation to actually achieve
anything at all - especially with respect to I/O accesses - unless combined
with interrupt disabling operations.
See also the section on "Inter-CPU locking barrier effects".
See also the section on "Inter-CPU acquiring barrier effects".
As an example, consider the following:
......
......@@ -705,7 +705,6 @@ config PARAVIRT_DEBUG
config PARAVIRT_SPINLOCKS
bool "Paravirtualization layer for spinlocks"
depends on PARAVIRT && SMP
select UNINLINE_SPIN_UNLOCK if !QUEUED_SPINLOCKS
---help---
Paravirtualized spinlocks allow a pvops backend to replace the
spinlock implementation with something virtualization-friendly
......@@ -718,7 +717,7 @@ config PARAVIRT_SPINLOCKS
config QUEUED_LOCK_STAT
bool "Paravirt queued spinlock statistics"
depends on PARAVIRT_SPINLOCKS && DEBUG_FS && QUEUED_SPINLOCKS
depends on PARAVIRT_SPINLOCKS && DEBUG_FS
---help---
Enable the collection of statistical data on the slowpath
behavior of paravirtualized queued spinlocks and report
......
......@@ -158,53 +158,9 @@ extern void __add_wrong_size(void)
* value of "*ptr".
*
* xadd() is locked when multiple CPUs are online
* xadd_sync() is always locked
* xadd_local() is never locked
*/
#define __xadd(ptr, inc, lock) __xchg_op((ptr), (inc), xadd, lock)
#define xadd(ptr, inc) __xadd((ptr), (inc), LOCK_PREFIX)
#define xadd_sync(ptr, inc) __xadd((ptr), (inc), "lock; ")
#define xadd_local(ptr, inc) __xadd((ptr), (inc), "")
#define __add(ptr, inc, lock) \
({ \
__typeof__ (*(ptr)) __ret = (inc); \
switch (sizeof(*(ptr))) { \
case __X86_CASE_B: \
asm volatile (lock "addb %b1, %0\n" \
: "+m" (*(ptr)) : "qi" (inc) \
: "memory", "cc"); \
break; \
case __X86_CASE_W: \
asm volatile (lock "addw %w1, %0\n" \
: "+m" (*(ptr)) : "ri" (inc) \
: "memory", "cc"); \
break; \
case __X86_CASE_L: \
asm volatile (lock "addl %1, %0\n" \
: "+m" (*(ptr)) : "ri" (inc) \
: "memory", "cc"); \
break; \
case __X86_CASE_Q: \
asm volatile (lock "addq %1, %0\n" \
: "+m" (*(ptr)) : "ri" (inc) \
: "memory", "cc"); \
break; \
default: \
__add_wrong_size(); \
} \
__ret; \
})
/*
* add_*() adds "inc" to "*ptr"
*
* __add() takes a lock prefix
* add_smp() is locked when multiple CPUs are online
* add_sync() is always locked
*/
#define add_smp(ptr, inc) __add((ptr), (inc), LOCK_PREFIX)
#define add_sync(ptr, inc) __add((ptr), (inc), "lock; ")
#define __cmpxchg_double(pfx, p1, p2, o1, o2, n1, n2) \
({ \
......
......@@ -661,8 +661,6 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
#if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
#ifdef CONFIG_QUEUED_SPINLOCKS
static __always_inline void pv_queued_spin_lock_slowpath(struct qspinlock *lock,
u32 val)
{
......@@ -684,22 +682,6 @@ static __always_inline void pv_kick(int cpu)
PVOP_VCALL1(pv_lock_ops.kick, cpu);
}
#else /* !CONFIG_QUEUED_SPINLOCKS */
static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
__ticket_t ticket)
{
PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
}
static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
__ticket_t ticket)
{
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
}
#endif /* CONFIG_QUEUED_SPINLOCKS */
#endif /* SMP && PARAVIRT_SPINLOCKS */
#ifdef CONFIG_X86_32
......
......@@ -301,23 +301,16 @@ struct pv_mmu_ops {
struct arch_spinlock;
#ifdef CONFIG_SMP
#include <asm/spinlock_types.h>
#else
typedef u16 __ticket_t;
#endif
struct qspinlock;
struct pv_lock_ops {
#ifdef CONFIG_QUEUED_SPINLOCKS
void (*queued_spin_lock_slowpath)(struct qspinlock *lock, u32 val);
struct paravirt_callee_save queued_spin_unlock;
void (*wait)(u8 *ptr, u8 val);
void (*kick)(int cpu);
#else /* !CONFIG_QUEUED_SPINLOCKS */
struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
#endif /* !CONFIG_QUEUED_SPINLOCKS */
};
/* This contains all the paravirt structures: we get a convenient
......
......@@ -154,7 +154,7 @@ static inline bool __down_write_trylock(struct rw_semaphore *sem)
: "+m" (sem->count), "=&a" (tmp0), "=&r" (tmp1),
CC_OUT(e) (result)
: "er" (RWSEM_ACTIVE_WRITE_BIAS)
: "memory", "cc");
: "memory");
return result;
}
......
......@@ -20,187 +20,13 @@
* (the type definitions are in asm/spinlock_types.h)
*/
#ifdef CONFIG_X86_32
# define LOCK_PTR_REG "a"
#else
# define LOCK_PTR_REG "D"
#endif
#if defined(CONFIG_X86_32) && (defined(CONFIG_X86_PPRO_FENCE))
/*
* On PPro SMP, we use a locked operation to unlock
* (PPro errata 66, 92)
*/
# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
#else
# define UNLOCK_LOCK_PREFIX
#endif
/* How long a lock should spin before we consider blocking */
#define SPIN_THRESHOLD (1 << 15)
extern struct static_key paravirt_ticketlocks_enabled;
static __always_inline bool static_key_false(struct static_key *key);
#ifdef CONFIG_QUEUED_SPINLOCKS
#include <asm/qspinlock.h>
#else
#ifdef CONFIG_PARAVIRT_SPINLOCKS
static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
{
set_bit(0, (volatile unsigned long *)&lock->tickets.head);
}
#else /* !CONFIG_PARAVIRT_SPINLOCKS */
static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock,
__ticket_t ticket)
{
}
static inline void __ticket_unlock_kick(arch_spinlock_t *lock,
__ticket_t ticket)
{
}
#endif /* CONFIG_PARAVIRT_SPINLOCKS */
static inline int __tickets_equal(__ticket_t one, __ticket_t two)
{
return !((one ^ two) & ~TICKET_SLOWPATH_FLAG);
}
static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock,
__ticket_t head)
{
if (head & TICKET_SLOWPATH_FLAG) {
arch_spinlock_t old, new;
old.tickets.head = head;
new.tickets.head = head & ~TICKET_SLOWPATH_FLAG;
old.tickets.tail = new.tickets.head + TICKET_LOCK_INC;
new.tickets.tail = old.tickets.tail;
/* try to clear slowpath flag when there are no contenders */
cmpxchg(&lock->head_tail, old.head_tail, new.head_tail);
}
}
static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
{
return __tickets_equal(lock.tickets.head, lock.tickets.tail);
}
/*
* Ticket locks are conceptually two parts, one indicating the current head of
* the queue, and the other indicating the current tail. The lock is acquired
* by atomically noting the tail and incrementing it by one (thus adding
* ourself to the queue and noting our position), then waiting until the head
* becomes equal to the the initial value of the tail.
*
* We use an xadd covering *both* parts of the lock, to increment the tail and
* also load the position of the head, which takes care of memory ordering
* issues and should be optimal for the uncontended case. Note the tail must be
* in the high part, because a wide xadd increment of the low part would carry
* up and contaminate the high part.
*/
static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
{
register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
inc = xadd(&lock->tickets, inc);
if (likely(inc.head == inc.tail))
goto out;
for (;;) {
unsigned count = SPIN_THRESHOLD;
do {
inc.head = READ_ONCE(lock->tickets.head);
if (__tickets_equal(inc.head, inc.tail))
goto clear_slowpath;
cpu_relax();
} while (--count);
__ticket_lock_spinning(lock, inc.tail);
}
clear_slowpath:
__ticket_check_and_clear_slowpath(lock, inc.head);
out:
barrier(); /* make sure nothing creeps before the lock is taken */
}
static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
{
arch_spinlock_t old, new;
old.tickets = READ_ONCE(lock->tickets);
if (!__tickets_equal(old.tickets.head, old.tickets.tail))
return 0;
new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT);
new.head_tail &= ~TICKET_SLOWPATH_FLAG;
/* cmpxchg is a full barrier, so nothing can move before it */
return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
}
static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
{
if (TICKET_SLOWPATH_FLAG &&
static_key_false(&paravirt_ticketlocks_enabled)) {
__ticket_t head;
BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
head = xadd(&lock->tickets.head, TICKET_LOCK_INC);
if (unlikely(head & TICKET_SLOWPATH_FLAG)) {
head &= ~TICKET_SLOWPATH_FLAG;
__ticket_unlock_kick(lock, (head + TICKET_LOCK_INC));
}
} else
__add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
}
static inline int arch_spin_is_locked(arch_spinlock_t *lock)
{
struct __raw_tickets tmp = READ_ONCE(lock->tickets);
return !__tickets_equal(tmp.tail, tmp.head);
}
static inline int arch_spin_is_contended(arch_spinlock_t *lock)
{
struct __raw_tickets tmp = READ_ONCE(lock->tickets);
tmp.head &= ~TICKET_SLOWPATH_FLAG;
return (__ticket_t)(tmp.tail - tmp.head) > TICKET_LOCK_INC;
}
#define arch_spin_is_contended arch_spin_is_contended
static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
unsigned long flags)
{
arch_spin_lock(lock);
}
static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
__ticket_t head = READ_ONCE(lock->tickets.head);
for (;;) {
struct __raw_tickets tmp = READ_ONCE(lock->tickets);
/*
* We need to check "unlocked" in a loop, tmp.head == head
* can be false positive because of overflow.
*/
if (__tickets_equal(tmp.head, tmp.tail) ||
!__tickets_equal(tmp.head, head))
break;
cpu_relax();
}
}
#endif /* CONFIG_QUEUED_SPINLOCKS */
/*
* Read-write spinlocks, allowing multiple readers
......
......@@ -23,20 +23,7 @@ typedef u32 __ticketpair_t;
#define TICKET_SHIFT (sizeof(__ticket_t) * 8)
#ifdef CONFIG_QUEUED_SPINLOCKS
#include <asm-generic/qspinlock_types.h>
#else
typedef struct arch_spinlock {
union {
__ticketpair_t head_tail;
struct __raw_tickets {
__ticket_t head, tail;
} tickets;
};
} arch_spinlock_t;
#define __ARCH_SPIN_LOCK_UNLOCKED { { 0 } }
#endif /* CONFIG_QUEUED_SPINLOCKS */
#include <asm-generic/qrwlock_types.h>
......
......@@ -575,9 +575,6 @@ static void kvm_kick_cpu(int cpu)
kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
}
#ifdef CONFIG_QUEUED_SPINLOCKS
#include <asm/qspinlock.h>
static void kvm_wait(u8 *ptr, u8 val)
......@@ -606,243 +603,6 @@ static void kvm_wait(u8 *ptr, u8 val)
local_irq_restore(flags);
}
#else /* !CONFIG_QUEUED_SPINLOCKS */
enum kvm_contention_stat {
TAKEN_SLOW,
TAKEN_SLOW_PICKUP,
RELEASED_SLOW,
RELEASED_SLOW_KICKED,
NR_CONTENTION_STATS
};
#ifdef CONFIG_KVM_DEBUG_FS
#define HISTO_BUCKETS 30
static struct kvm_spinlock_stats
{
u32 contention_stats[NR_CONTENTION_STATS];
u32 histo_spin_blocked[HISTO_BUCKETS+1];
u64 time_blocked;
} spinlock_stats;
static u8 zero_stats;
static inline void check_zero(void)
{
u8 ret;
u8 old;
old = READ_ONCE(zero_stats);
if (unlikely(old)) {
ret = cmpxchg(&zero_stats, old, 0);
/* This ensures only one fellow resets the stat */
if (ret == old)
memset(&spinlock_stats, 0, sizeof(spinlock_stats));
}
}
static inline void add_stats(enum kvm_contention_stat var, u32 val)
{
check_zero();
spinlock_stats.contention_stats[var] += val;
}
static inline u64 spin_time_start(void)
{
return sched_clock();
}
static void __spin_time_accum(u64 delta, u32 *array)
{
unsigned index;
index = ilog2(delta);
check_zero();
if (index < HISTO_BUCKETS)
array[index]++;
else
array[HISTO_BUCKETS]++;
}
static inline void spin_time_accum_blocked(u64 start)
{
u32 delta;
delta = sched_clock() - start;
__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
spinlock_stats.time_blocked += delta;
}
static struct dentry *d_spin_debug;
static struct dentry *d_kvm_debug;
static struct dentry *kvm_init_debugfs(void)
{
d_kvm_debug = debugfs_create_dir("kvm-guest", NULL);
if (!d_kvm_debug)
printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
return d_kvm_debug;
}
static int __init kvm_spinlock_debugfs(void)
{
struct dentry *d_kvm;
d_kvm = kvm_init_debugfs();
if (d_kvm == NULL)
return -ENOMEM;
d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
debugfs_create_u32("taken_slow", 0444, d_spin_debug,
&spinlock_stats.contention_stats[TAKEN_SLOW]);
debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
&spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
debugfs_create_u32("released_slow", 0444, d_spin_debug,
&spinlock_stats.contention_stats[RELEASED_SLOW]);
debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
&spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
debugfs_create_u64("time_blocked", 0444, d_spin_debug,
&spinlock_stats.time_blocked);
debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
return 0;
}
fs_initcall(kvm_spinlock_debugfs);
#else /* !CONFIG_KVM_DEBUG_FS */
static inline void add_stats(enum kvm_contention_stat var, u32 val)
{
}
static inline u64 spin_time_start(void)
{
return 0;
}
static inline void spin_time_accum_blocked(u64 start)
{
}
#endif /* CONFIG_KVM_DEBUG_FS */
struct kvm_lock_waiting {
struct arch_spinlock *lock;
__ticket_t want;
};
/* cpus 'waiting' on a spinlock to become available */
static cpumask_t waiting_cpus;
/* Track spinlock on which a cpu is waiting */
static DEFINE_PER_CPU(struct kvm_lock_waiting, klock_waiting);
__visible void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
{
struct kvm_lock_waiting *w;
int cpu;
u64 start;
unsigned long flags;
__ticket_t head;
if (in_nmi())
return;
w = this_cpu_ptr(&klock_waiting);
cpu = smp_processor_id();
start = spin_time_start();
/*
* Make sure an interrupt handler can't upset things in a
* partially setup state.
*/
local_irq_save(flags);
/*
* The ordering protocol on this is that the "lock" pointer
* may only be set non-NULL if the "want" ticket is correct.
* If we're updating "want", we must first clear "lock".
*/
w->lock = NULL;
smp_wmb();
w->want = want;
smp_wmb();
w->lock = lock;
add_stats(TAKEN_SLOW, 1);
/*
* This uses set_bit, which is atomic but we should not rely on its
* reordering gurantees. So barrier is needed after this call.
*/
cpumask_set_cpu(cpu, &waiting_cpus);
barrier();
/*
* Mark entry to slowpath before doing the pickup test to make
* sure we don't deadlock with an unlocker.
*/
__ticket_enter_slowpath(lock);
/* make sure enter_slowpath, which is atomic does not cross the read */
smp_mb__after_atomic();
/*
* check again make sure it didn't become free while
* we weren't looking.
*/
head = READ_ONCE(lock->tickets.head);
if (__tickets_equal(head, want)) {
add_stats(TAKEN_SLOW_PICKUP, 1);
goto out;
}
/*
* halt until it's our turn and kicked. Note that we do safe halt
* for irq enabled case to avoid hang when lock info is overwritten
* in irq spinlock slowpath and no spurious interrupt occur to save us.
*/
if (arch_irqs_disabled_flags(flags))
halt();
else
safe_halt();
out:
cpumask_clear_cpu(cpu, &waiting_cpus);
w->lock = NULL;
local_irq_restore(flags);
spin_time_accum_blocked(start);
}
PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
/* Kick vcpu waiting on @lock->head to reach value @ticket */
static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
{
int cpu;
add_stats(RELEASED_SLOW, 1);
for_each_cpu(cpu, &waiting_cpus) {
const struct kvm_lock_waiting *w = &per_cpu(klock_waiting, cpu);
if (READ_ONCE(w->lock) == lock &&
READ_ONCE(w->want) == ticket) {
add_stats(RELEASED_SLOW_KICKED, 1);
kvm_kick_cpu(cpu);
break;
}
}
}
#endif /* !CONFIG_QUEUED_SPINLOCKS */
/*
* Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
*/
......@@ -854,16 +614,11 @@ void __init kvm_spinlock_init(void)
if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
return;
#ifdef CONFIG_QUEUED_SPINLOCKS
__pv_init_lock_hash();
pv_lock_ops.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
pv_lock_ops.queued_spin_unlock = PV_CALLEE_SAVE(__pv_queued_spin_unlock);
pv_lock_ops.wait = kvm_wait;
pv_lock_ops.kick = kvm_kick_cpu;
#else /* !CONFIG_QUEUED_SPINLOCKS */
pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
pv_lock_ops.unlock_kick = kvm_unlock_kick;
#endif
}
static __init int kvm_spinlock_init_jump(void)
......
......@@ -8,7 +8,6 @@
#include <asm/paravirt.h>
#ifdef CONFIG_QUEUED_SPINLOCKS
__visible void __native_queued_spin_unlock(struct qspinlock *lock)
{
native_queued_spin_unlock(lock);
......@@ -21,19 +20,13 @@ bool pv_is_native_spin_unlock(void)
return pv_lock_ops.queued_spin_unlock.func ==
__raw_callee_save___native_queued_spin_unlock;
}
#endif
struct pv_lock_ops pv_lock_ops = {
#ifdef CONFIG_SMP
#ifdef CONFIG_QUEUED_SPINLOCKS
.queued_spin_lock_slowpath = native_queued_spin_lock_slowpath,
.queued_spin_unlock = PV_CALLEE_SAVE(__native_queued_spin_unlock),
.wait = paravirt_nop,
.kick = paravirt_nop,
#else /* !CONFIG_QUEUED_SPINLOCKS */
.lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
.unlock_kick = paravirt_nop,
#endif /* !CONFIG_QUEUED_SPINLOCKS */
#endif /* SMP */
};
EXPORT_SYMBOL(pv_lock_ops);
......
......@@ -10,7 +10,7 @@ DEF_NATIVE(pv_mmu_ops, write_cr3, "mov %eax, %cr3");
DEF_NATIVE(pv_mmu_ops, read_cr3, "mov %cr3, %eax");
DEF_NATIVE(pv_cpu_ops, clts, "clts");
#if defined(CONFIG_PARAVIRT_SPINLOCKS) && defined(CONFIG_QUEUED_SPINLOCKS)
#if defined(CONFIG_PARAVIRT_SPINLOCKS)
DEF_NATIVE(pv_lock_ops, queued_spin_unlock, "movb $0, (%eax)");
#endif
......@@ -49,7 +49,7 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
PATCH_SITE(pv_mmu_ops, read_cr3);
PATCH_SITE(pv_mmu_ops, write_cr3);
PATCH_SITE(pv_cpu_ops, clts);
#if defined(CONFIG_PARAVIRT_SPINLOCKS) && defined(CONFIG_QUEUED_SPINLOCKS)
#if defined(CONFIG_PARAVIRT_SPINLOCKS)
case PARAVIRT_PATCH(pv_lock_ops.queued_spin_unlock):
if (pv_is_native_spin_unlock()) {
start = start_pv_lock_ops_queued_spin_unlock;
......
......@@ -19,7 +19,7 @@ DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs");
DEF_NATIVE(, mov32, "mov %edi, %eax");
DEF_NATIVE(, mov64, "mov %rdi, %rax");
#if defined(CONFIG_PARAVIRT_SPINLOCKS) && defined(CONFIG_QUEUED_SPINLOCKS)
#if defined(CONFIG_PARAVIRT_SPINLOCKS)
DEF_NATIVE(pv_lock_ops, queued_spin_unlock, "movb $0, (%rdi)");
#endif
......@@ -61,7 +61,7 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
PATCH_SITE(pv_cpu_ops, clts);
PATCH_SITE(pv_mmu_ops, flush_tlb_single);
PATCH_SITE(pv_cpu_ops, wbinvd);
#if defined(CONFIG_PARAVIRT_SPINLOCKS) && defined(CONFIG_QUEUED_SPINLOCKS)
#if defined(CONFIG_PARAVIRT_SPINLOCKS)
case PARAVIRT_PATCH(pv_lock_ops.queued_spin_unlock):
if (pv_is_native_spin_unlock()) {
start = start_pv_lock_ops_queued_spin_unlock;
......
......@@ -21,8 +21,6 @@ static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
static DEFINE_PER_CPU(char *, irq_name);
static bool xen_pvspin = true;
#ifdef CONFIG_QUEUED_SPINLOCKS
#include <asm/qspinlock.h>
static void xen_qlock_kick(int cpu)
......@@ -71,207 +69,6 @@ static void xen_qlock_wait(u8 *byte, u8 val)
xen_poll_irq(irq);
}
#else /* CONFIG_QUEUED_SPINLOCKS */
enum xen_contention_stat {
TAKEN_SLOW,
TAKEN_SLOW_PICKUP,
TAKEN_SLOW_SPURIOUS,
RELEASED_SLOW,
RELEASED_SLOW_KICKED,
NR_CONTENTION_STATS
};
#ifdef CONFIG_XEN_DEBUG_FS
#define HISTO_BUCKETS 30
static struct xen_spinlock_stats
{
u32 contention_stats[NR_CONTENTION_STATS];
u32 histo_spin_blocked[HISTO_BUCKETS+1];
u64 time_blocked;
} spinlock_stats;
static u8 zero_stats;
static inline void check_zero(void)
{
u8 ret;
u8 old = READ_ONCE(zero_stats);
if (unlikely(old)) {
ret = cmpxchg(&zero_stats, old, 0);
/* This ensures only one fellow resets the stat */
if (ret == old)
memset(&spinlock_stats, 0, sizeof(spinlock_stats));
}
}
static inline void add_stats(enum xen_contention_stat var, u32 val)
{
check_zero();
spinlock_stats.contention_stats[var] += val;
}
static inline u64 spin_time_start(void)
{
return xen_clocksource_read();
}
static void __spin_time_accum(u64 delta, u32 *array)
{
unsigned index = ilog2(delta);
check_zero();
if (index < HISTO_BUCKETS)
array[index]++;
else
array[HISTO_BUCKETS]++;
}
static inline void spin_time_accum_blocked(u64 start)
{
u32 delta = xen_clocksource_read() - start;
__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
spinlock_stats.time_blocked += delta;
}
#else /* !CONFIG_XEN_DEBUG_FS */
static inline void add_stats(enum xen_contention_stat var, u32 val)
{
}
static inline u64 spin_time_start(void)
{
return 0;
}
static inline void spin_time_accum_blocked(u64 start)
{
}
#endif /* CONFIG_XEN_DEBUG_FS */
struct xen_lock_waiting {
struct arch_spinlock *lock;
__ticket_t want;
};
static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting);
static cpumask_t waiting_cpus;
__visible void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
{
int irq = __this_cpu_read(lock_kicker_irq);
struct xen_lock_waiting *w = this_cpu_ptr(&lock_waiting);
int cpu = smp_processor_id();
u64 start;
__ticket_t head;
unsigned long flags;
/* If kicker interrupts not initialized yet, just spin */
if (irq == -1)
return;
start = spin_time_start();
/*
* Make sure an interrupt handler can't upset things in a
* partially setup state.
*/
local_irq_save(flags);
/*
* We don't really care if we're overwriting some other
* (lock,want) pair, as that would mean that we're currently
* in an interrupt context, and the outer context had
* interrupts enabled. That has already kicked the VCPU out
* of xen_poll_irq(), so it will just return spuriously and
* retry with newly setup (lock,want).
*
* The ordering protocol on this is that the "lock" pointer
* may only be set non-NULL if the "want" ticket is correct.
* If we're updating "want", we must first clear "lock".
*/
w->lock = NULL;
smp_wmb();
w->want = want;
smp_wmb();
w->lock = lock;
/* This uses set_bit, which atomic and therefore a barrier */
cpumask_set_cpu(cpu, &waiting_cpus);
add_stats(TAKEN_SLOW, 1);
/* clear pending */
xen_clear_irq_pending(irq);
/* Only check lock once pending cleared */
barrier();
/*
* Mark entry to slowpath before doing the pickup test to make
* sure we don't deadlock with an unlocker.
*/
__ticket_enter_slowpath(lock);
/* make sure enter_slowpath, which is atomic does not cross the read */
smp_mb__after_atomic();
/*
* check again make sure it didn't become free while
* we weren't looking
*/
head = READ_ONCE(lock->tickets.head);
if (__tickets_equal(head, want)) {
add_stats(TAKEN_SLOW_PICKUP, 1);
goto out;
}
/* Allow interrupts while blocked */
local_irq_restore(flags);
/*
* If an interrupt happens here, it will leave the wakeup irq
* pending, which will cause xen_poll_irq() to return
* immediately.
*/
/* Block until irq becomes pending (or perhaps a spurious wakeup) */
xen_poll_irq(irq);
add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq));
local_irq_save(flags);
kstat_incr_irq_this_cpu(irq);
out:
cpumask_clear_cpu(cpu, &waiting_cpus);
w->lock = NULL;
local_irq_restore(flags);
spin_time_accum_blocked(start);
}
PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
{
int cpu;
add_stats(RELEASED_SLOW, 1);
for_each_cpu(cpu, &waiting_cpus) {
const struct xen_lock_waiting *w = &per_cpu(lock_waiting, cpu);
/* Make sure we read lock before want */
if (READ_ONCE(w->lock) == lock &&
READ_ONCE(w->want) == next) {
add_stats(RELEASED_SLOW_KICKED, 1);
xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
break;
}
}
}
#endif /* CONFIG_QUEUED_SPINLOCKS */
static irqreturn_t dummy_handler(int irq, void *dev_id)
{
BUG();
......@@ -334,16 +131,12 @@ void __init xen_init_spinlocks(void)
return;
}
printk(KERN_DEBUG "xen: PV spinlocks enabled\n");
#ifdef CONFIG_QUEUED_SPINLOCKS
__pv_init_lock_hash();
pv_lock_ops.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
pv_lock_ops.queued_spin_unlock = PV_CALLEE_SAVE(__pv_queued_spin_unlock);
pv_lock_ops.wait = xen_qlock_wait;
pv_lock_ops.kick = xen_qlock_kick;
#else
pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
#endif
}
/*
......@@ -372,44 +165,3 @@ static __init int xen_parse_nopvspin(char *arg)
}
early_param("xen_nopvspin", xen_parse_nopvspin);
#if defined(CONFIG_XEN_DEBUG_FS) && !defined(CONFIG_QUEUED_SPINLOCKS)
static struct dentry *d_spin_debug;
static int __init xen_spinlock_debugfs(void)
{
struct dentry *d_xen = xen_init_debugfs();
if (d_xen == NULL)
return -ENOMEM;
if (!xen_pvspin)
return 0;
d_spin_debug = debugfs_create_dir("spinlocks", d_xen);
debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
debugfs_create_u32("taken_slow", 0444, d_spin_debug,
&spinlock_stats.contention_stats[TAKEN_SLOW]);
debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
&spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
debugfs_create_u32("taken_slow_spurious", 0444, d_spin_debug,
&spinlock_stats.contention_stats[TAKEN_SLOW_SPURIOUS]);
debugfs_create_u32("released_slow", 0444, d_spin_debug,
&spinlock_stats.contention_stats[RELEASED_SLOW]);
debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
&spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
debugfs_create_u64("time_blocked", 0444, d_spin_debug,
&spinlock_stats.time_blocked);
debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
return 0;
}
fs_initcall(xen_spinlock_debugfs);
#endif /* CONFIG_XEN_DEBUG_FS */
......@@ -79,6 +79,7 @@ config EXPORTFS_BLOCK_OPS
config FILE_LOCKING
bool "Enable POSIX file locking API" if EXPERT
default y
select PERCPU_RWSEM
help
This option enables standard file locking support, required
for filesystems like NFS and for the flock() system
......
......@@ -127,7 +127,6 @@
#include <linux/pid_namespace.h>
#include <linux/hashtable.h>
#include <linux/percpu.h>
#include <linux/lglock.h>
#define CREATE_TRACE_POINTS
#include <trace/events/filelock.h>
......@@ -158,12 +157,18 @@ int lease_break_time = 45;
/*
* The global file_lock_list is only used for displaying /proc/locks, so we
* keep a list on each CPU, with each list protected by its own spinlock via
* the file_lock_lglock. Note that alterations to the list also require that
* the relevant flc_lock is held.
* keep a list on each CPU, with each list protected by its own spinlock.
* Global serialization is done using file_rwsem.
*
* Note that alterations to the list also require that the relevant flc_lock is
* held.
*/
DEFINE_STATIC_LGLOCK(file_lock_lglock);
static DEFINE_PER_CPU(struct hlist_head, file_lock_list);
struct file_lock_list_struct {
spinlock_t lock;
struct hlist_head hlist;
};
static DEFINE_PER_CPU(struct file_lock_list_struct, file_lock_list);
DEFINE_STATIC_PERCPU_RWSEM(file_rwsem);
/*
* The blocked_hash is used to find POSIX lock loops for deadlock detection.
......@@ -587,15 +592,23 @@ static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2)
/* Must be called with the flc_lock held! */
static void locks_insert_global_locks(struct file_lock *fl)
{
lg_local_lock(&file_lock_lglock);
struct file_lock_list_struct *fll = this_cpu_ptr(&file_lock_list);
percpu_rwsem_assert_held(&file_rwsem);
spin_lock(&fll->lock);
fl->fl_link_cpu = smp_processor_id();
hlist_add_head(&fl->fl_link, this_cpu_ptr(&file_lock_list));
lg_local_unlock(&file_lock_lglock);
hlist_add_head(&fl->fl_link, &fll->hlist);
spin_unlock(&fll->lock);
}
/* Must be called with the flc_lock held! */
static void locks_delete_global_locks(struct file_lock *fl)
{
struct file_lock_list_struct *fll;
percpu_rwsem_assert_held(&file_rwsem);
/*
* Avoid taking lock if already unhashed. This is safe since this check
* is done while holding the flc_lock, and new insertions into the list
......@@ -603,9 +616,11 @@ static void locks_delete_global_locks(struct file_lock *fl)
*/
if (hlist_unhashed(&fl->fl_link))
return;
lg_local_lock_cpu(&file_lock_lglock, fl->fl_link_cpu);
fll = per_cpu_ptr(&file_lock_list, fl->fl_link_cpu);
spin_lock(&fll->lock);
hlist_del_init(&fl->fl_link);
lg_local_unlock_cpu(&file_lock_lglock, fl->fl_link_cpu);
spin_unlock(&fll->lock);
}
static unsigned long
......@@ -915,6 +930,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request)
return -ENOMEM;
}
percpu_down_read_preempt_disable(&file_rwsem);
spin_lock(&ctx->flc_lock);
if (request->fl_flags & FL_ACCESS)
goto find_conflict;
......@@ -955,6 +971,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request)
out:
spin_unlock(&ctx->flc_lock);
percpu_up_read_preempt_enable(&file_rwsem);
if (new_fl)
locks_free_lock(new_fl);
locks_dispose_list(&dispose);
......@@ -991,6 +1008,7 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
new_fl2 = locks_alloc_lock();
}
percpu_down_read_preempt_disable(&file_rwsem);
spin_lock(&ctx->flc_lock);
/*
* New lock request. Walk all POSIX locks and look for conflicts. If
......@@ -1162,6 +1180,7 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
}
out:
spin_unlock(&ctx->flc_lock);
percpu_up_read_preempt_enable(&file_rwsem);
/*
* Free any unused locks.
*/
......@@ -1436,6 +1455,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
return error;
}
percpu_down_read_preempt_disable(&file_rwsem);
spin_lock(&ctx->flc_lock);
time_out_leases(inode, &dispose);
......@@ -1487,9 +1507,13 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
locks_insert_block(fl, new_fl);
trace_break_lease_block(inode, new_fl);
spin_unlock(&ctx->flc_lock);
percpu_up_read_preempt_enable(&file_rwsem);
locks_dispose_list(&dispose);
error = wait_event_interruptible_timeout(new_fl->fl_wait,
!new_fl->fl_next, break_time);
percpu_down_read_preempt_disable(&file_rwsem);
spin_lock(&ctx->flc_lock);
trace_break_lease_unblock(inode, new_fl);
locks_delete_block(new_fl);
......@@ -1506,6 +1530,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
}
out:
spin_unlock(&ctx->flc_lock);
percpu_up_read_preempt_enable(&file_rwsem);
locks_dispose_list(&dispose);
locks_free_lock(new_fl);
return error;
......@@ -1660,6 +1685,7 @@ generic_add_lease(struct file *filp, long arg, struct file_lock **flp, void **pr
return -EINVAL;
}
percpu_down_read_preempt_disable(&file_rwsem);
spin_lock(&ctx->flc_lock);
time_out_leases(inode, &dispose);
error = check_conflicting_open(dentry, arg, lease->fl_flags);
......@@ -1730,6 +1756,7 @@ generic_add_lease(struct file *filp, long arg, struct file_lock **flp, void **pr
lease->fl_lmops->lm_setup(lease, priv);
out:
spin_unlock(&ctx->flc_lock);
percpu_up_read_preempt_enable(&file_rwsem);
locks_dispose_list(&dispose);
if (is_deleg)
inode_unlock(inode);
......@@ -1752,6 +1779,7 @@ static int generic_delete_lease(struct file *filp, void *owner)
return error;
}
percpu_down_read_preempt_disable(&file_rwsem);
spin_lock(&ctx->flc_lock);
list_for_each_entry(fl, &ctx->flc_lease, fl_list) {
if (fl->fl_file == filp &&
......@@ -1764,6 +1792,7 @@ static int generic_delete_lease(struct file *filp, void *owner)
if (victim)
error = fl->fl_lmops->lm_change(victim, F_UNLCK, &dispose);
spin_unlock(&ctx->flc_lock);
percpu_up_read_preempt_enable(&file_rwsem);
locks_dispose_list(&dispose);
return error;
}
......@@ -2703,9 +2732,9 @@ static void *locks_start(struct seq_file *f, loff_t *pos)
struct locks_iterator *iter = f->private;
iter->li_pos = *pos + 1;
lg_global_lock(&file_lock_lglock);
percpu_down_write(&file_rwsem);
spin_lock(&blocked_lock_lock);
return seq_hlist_start_percpu(&file_lock_list, &iter->li_cpu, *pos);
return seq_hlist_start_percpu(&file_lock_list.hlist, &iter->li_cpu, *pos);
}
static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
......@@ -2713,14 +2742,14 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
struct locks_iterator *iter = f->private;
++iter->li_pos;
return seq_hlist_next_percpu(v, &file_lock_list, &iter->li_cpu, pos);
return seq_hlist_next_percpu(v, &file_lock_list.hlist, &iter->li_cpu, pos);
}
static void locks_stop(struct seq_file *f, void *v)
__releases(&blocked_lock_lock)
{
spin_unlock(&blocked_lock_lock);
lg_global_unlock(&file_lock_lglock);
percpu_up_write(&file_rwsem);
}
static const struct seq_operations locks_seq_operations = {
......@@ -2761,10 +2790,13 @@ static int __init filelock_init(void)
filelock_cache = kmem_cache_create("file_lock_cache",
sizeof(struct file_lock), 0, SLAB_PANIC, NULL);
lg_lock_init(&file_lock_lglock, "file_lock_lglock");
for_each_possible_cpu(i)
INIT_HLIST_HEAD(per_cpu_ptr(&file_lock_list, i));
for_each_possible_cpu(i) {
struct file_lock_list_struct *fll = per_cpu_ptr(&file_lock_list, i);
spin_lock_init(&fll->lock);
INIT_HLIST_HEAD(&fll->hlist);
}
return 0;
}
......
/*
* Specialised local-global spinlock. Can only be declared as global variables
* to avoid overhead and keep things simple (and we don't want to start using
* these inside dynamically allocated structures).
*
* "local/global locks" (lglocks) can be used to:
*
* - Provide fast exclusive access to per-CPU data, with exclusive access to
* another CPU's data allowed but possibly subject to contention, and to
* provide very slow exclusive access to all per-CPU data.
* - Or to provide very fast and scalable read serialisation, and to provide
* very slow exclusive serialisation of data (not necessarily per-CPU data).
*
* Brlocks are also implemented as a short-hand notation for the latter use
* case.
*
* Copyright 2009, 2010, Nick Piggin, Novell Inc.
*/
#ifndef __LINUX_LGLOCK_H
#define __LINUX_LGLOCK_H
#include <linux/spinlock.h>
#include <linux/lockdep.h>
#include <linux/percpu.h>
#include <linux/cpu.h>
#include <linux/notifier.h>
#ifdef CONFIG_SMP
#ifdef CONFIG_DEBUG_LOCK_ALLOC
#define LOCKDEP_INIT_MAP lockdep_init_map
#else
#define LOCKDEP_INIT_MAP(a, b, c, d)
#endif
struct lglock {
arch_spinlock_t __percpu *lock;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lock_class_key lock_key;
struct lockdep_map lock_dep_map;
#endif
};
#define DEFINE_LGLOCK(name) \
static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock) \
= __ARCH_SPIN_LOCK_UNLOCKED; \
struct lglock name = { .lock = &name ## _lock }
#define DEFINE_STATIC_LGLOCK(name) \
static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock) \
= __ARCH_SPIN_LOCK_UNLOCKED; \
static struct lglock name = { .lock = &name ## _lock }
void lg_lock_init(struct lglock *lg, char *name);
void lg_local_lock(struct lglock *lg);
void lg_local_unlock(struct lglock *lg);
void lg_local_lock_cpu(struct lglock *lg, int cpu);
void lg_local_unlock_cpu(struct lglock *lg, int cpu);
void lg_double_lock(struct lglock *lg, int cpu1, int cpu2);
void lg_double_unlock(struct lglock *lg, int cpu1, int cpu2);
void lg_global_lock(struct lglock *lg);
void lg_global_unlock(struct lglock *lg);
#else
/* When !CONFIG_SMP, map lglock to spinlock */
#define lglock spinlock
#define DEFINE_LGLOCK(name) DEFINE_SPINLOCK(name)
#define DEFINE_STATIC_LGLOCK(name) static DEFINE_SPINLOCK(name)
#define lg_lock_init(lg, name) spin_lock_init(lg)
#define lg_local_lock spin_lock
#define lg_local_unlock spin_unlock
#define lg_local_lock_cpu(lg, cpu) spin_lock(lg)
#define lg_local_unlock_cpu(lg, cpu) spin_unlock(lg)
#define lg_global_lock spin_lock
#define lg_global_unlock spin_unlock
#endif
#endif
......@@ -10,32 +10,122 @@
struct percpu_rw_semaphore {
struct rcu_sync rss;
unsigned int __percpu *fast_read_ctr;
unsigned int __percpu *read_count;
struct rw_semaphore rw_sem;
atomic_t slow_read_ctr;
wait_queue_head_t write_waitq;
wait_queue_head_t writer;
int readers_block;
};
extern void percpu_down_read(struct percpu_rw_semaphore *);
extern int percpu_down_read_trylock(struct percpu_rw_semaphore *);
extern void percpu_up_read(struct percpu_rw_semaphore *);
#define DEFINE_STATIC_PERCPU_RWSEM(name) \
static DEFINE_PER_CPU(unsigned int, __percpu_rwsem_rc_##name); \
static struct percpu_rw_semaphore name = { \
.rss = __RCU_SYNC_INITIALIZER(name.rss, RCU_SCHED_SYNC), \
.read_count = &__percpu_rwsem_rc_##name, \
.rw_sem = __RWSEM_INITIALIZER(name.rw_sem), \
.writer = __WAIT_QUEUE_HEAD_INITIALIZER(name.writer), \
}
extern int __percpu_down_read(struct percpu_rw_semaphore *, int);
extern void __percpu_up_read(struct percpu_rw_semaphore *);
static inline void percpu_down_read_preempt_disable(struct percpu_rw_semaphore *sem)
{
might_sleep();
rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 0, _RET_IP_);
preempt_disable();
/*
* We are in an RCU-sched read-side critical section, so the writer
* cannot both change sem->state from readers_fast and start checking
* counters while we are here. So if we see !sem->state, we know that
* the writer won't be checking until we're past the preempt_enable()
* and that one the synchronize_sched() is done, the writer will see
* anything we did within this RCU-sched read-size critical section.
*/
__this_cpu_inc(*sem->read_count);
if (unlikely(!rcu_sync_is_idle(&sem->rss)))
__percpu_down_read(sem, false); /* Unconditional memory barrier */
barrier();
/*
* The barrier() prevents the compiler from
* bleeding the critical section out.
*/
}
static inline void percpu_down_read(struct percpu_rw_semaphore *sem)
{
percpu_down_read_preempt_disable(sem);
preempt_enable();
}
static inline int percpu_down_read_trylock(struct percpu_rw_semaphore *sem)
{
int ret = 1;
preempt_disable();
/*
* Same as in percpu_down_read().
*/
__this_cpu_inc(*sem->read_count);
if (unlikely(!rcu_sync_is_idle(&sem->rss)))
ret = __percpu_down_read(sem, true); /* Unconditional memory barrier */
preempt_enable();
/*
* The barrier() from preempt_enable() prevents the compiler from
* bleeding the critical section out.
*/
if (ret)
rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 1, _RET_IP_);
return ret;
}
static inline void percpu_up_read_preempt_enable(struct percpu_rw_semaphore *sem)
{
/*
* The barrier() prevents the compiler from
* bleeding the critical section out.
*/
barrier();
/*
* Same as in percpu_down_read().
*/
if (likely(rcu_sync_is_idle(&sem->rss)))
__this_cpu_dec(*sem->read_count);
else
__percpu_up_read(sem); /* Unconditional memory barrier */
preempt_enable();
rwsem_release(&sem->rw_sem.dep_map, 1, _RET_IP_);
}
static inline void percpu_up_read(struct percpu_rw_semaphore *sem)
{
preempt_disable();
percpu_up_read_preempt_enable(sem);
}
extern void percpu_down_write(struct percpu_rw_semaphore *);
extern void percpu_up_write(struct percpu_rw_semaphore *);
extern int __percpu_init_rwsem(struct percpu_rw_semaphore *,
const char *, struct lock_class_key *);
extern void percpu_free_rwsem(struct percpu_rw_semaphore *);
#define percpu_init_rwsem(brw) \
#define percpu_init_rwsem(sem) \
({ \
static struct lock_class_key rwsem_key; \
__percpu_init_rwsem(brw, #brw, &rwsem_key); \
__percpu_init_rwsem(sem, #sem, &rwsem_key); \
})
#define percpu_rwsem_is_held(sem) lockdep_is_held(&(sem)->rw_sem)
#define percpu_rwsem_assert_held(sem) \
lockdep_assert_held(&(sem)->rw_sem)
static inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem,
bool read, unsigned long ip)
{
......
......@@ -59,6 +59,7 @@ static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
}
extern void rcu_sync_init(struct rcu_sync *, enum rcu_sync_type);
extern void rcu_sync_enter_start(struct rcu_sync *);
extern void rcu_sync_enter(struct rcu_sync *);
extern void rcu_sync_exit(struct rcu_sync *);
extern void rcu_sync_dtor(struct rcu_sync *);
......
......@@ -5627,6 +5627,12 @@ int __init cgroup_init(void)
BUG_ON(cgroup_init_cftypes(NULL, cgroup_dfl_base_files));
BUG_ON(cgroup_init_cftypes(NULL, cgroup_legacy_base_files));
/*
* The latency of the synchronize_sched() is too high for cgroups,
* avoid it at the cost of forcing all readers into the slow path.
*/
rcu_sync_enter_start(&cgroup_threadgroup_rwsem.rss);
get_user_ns(init_cgroup_ns.user_ns);
mutex_lock(&cgroup_mutex);
......
......@@ -381,8 +381,12 @@ static inline int hb_waiters_pending(struct futex_hash_bucket *hb)
#endif
}
/*
* We hash on the keys returned from get_futex_key (see below).
/**
* hash_futex - Return the hash bucket in the global hash
* @key: Pointer to the futex key for which the hash is calculated
*
* We hash on the keys returned from get_futex_key (see below) and return the
* corresponding hash bucket in the global hash.
*/
static struct futex_hash_bucket *hash_futex(union futex_key *key)
{
......@@ -392,7 +396,12 @@ static struct futex_hash_bucket *hash_futex(union futex_key *key)
return &futex_queues[hash & (futex_hashsize - 1)];
}
/*
/**
* match_futex - Check whether two futex keys are equal
* @key1: Pointer to key1
* @key2: Pointer to key2
*
* Return 1 if two futex_keys are equal, 0 otherwise.
*/
static inline int match_futex(union futex_key *key1, union futex_key *key2)
......
......@@ -117,7 +117,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
" disables this message.\n");
sched_show_task(t);
debug_show_held_locks(t);
debug_show_all_locks();
touch_nmi_watchdog();
......
......@@ -18,7 +18,6 @@ obj-$(CONFIG_LOCKDEP) += lockdep_proc.o
endif
obj-$(CONFIG_SMP) += spinlock.o
obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
obj-$(CONFIG_SMP) += lglock.o
obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o
obj-$(CONFIG_RT_MUTEXES) += rtmutex.o
......
/* See include/linux/lglock.h for description */
#include <linux/module.h>
#include <linux/lglock.h>
#include <linux/cpu.h>
#include <linux/string.h>
/*
* Note there is no uninit, so lglocks cannot be defined in
* modules (but it's fine to use them from there)
* Could be added though, just undo lg_lock_init
*/
void lg_lock_init(struct lglock *lg, char *name)
{
LOCKDEP_INIT_MAP(&lg->lock_dep_map, name, &lg->lock_key, 0);
}
EXPORT_SYMBOL(lg_lock_init);
void lg_local_lock(struct lglock *lg)
{
arch_spinlock_t *lock;
preempt_disable();
lock_acquire_shared(&lg->lock_dep_map, 0, 0, NULL, _RET_IP_);
lock = this_cpu_ptr(lg->lock);
arch_spin_lock(lock);
}
EXPORT_SYMBOL(lg_local_lock);
void lg_local_unlock(struct lglock *lg)
{
arch_spinlock_t *lock;
lock_release(&lg->lock_dep_map, 1, _RET_IP_);
lock = this_cpu_ptr(lg->lock);
arch_spin_unlock(lock);
preempt_enable();
}
EXPORT_SYMBOL(lg_local_unlock);
void lg_local_lock_cpu(struct lglock *lg, int cpu)
{
arch_spinlock_t *lock;
preempt_disable();
lock_acquire_shared(&lg->lock_dep_map, 0, 0, NULL, _RET_IP_);
lock = per_cpu_ptr(lg->lock, cpu);
arch_spin_lock(lock);
}
EXPORT_SYMBOL(lg_local_lock_cpu);
void lg_local_unlock_cpu(struct lglock *lg, int cpu)
{
arch_spinlock_t *lock;
lock_release(&lg->lock_dep_map, 1, _RET_IP_);
lock = per_cpu_ptr(lg->lock, cpu);
arch_spin_unlock(lock);
preempt_enable();
}
EXPORT_SYMBOL(lg_local_unlock_cpu);
void lg_double_lock(struct lglock *lg, int cpu1, int cpu2)
{
BUG_ON(cpu1 == cpu2);
/* lock in cpu order, just like lg_global_lock */
if (cpu2 < cpu1)
swap(cpu1, cpu2);
preempt_disable();
lock_acquire_shared(&lg->lock_dep_map, 0, 0, NULL, _RET_IP_);
arch_spin_lock(per_cpu_ptr(lg->lock, cpu1));
arch_spin_lock(per_cpu_ptr(lg->lock, cpu2));
}
void lg_double_unlock(struct lglock *lg, int cpu1, int cpu2)
{
lock_release(&lg->lock_dep_map, 1, _RET_IP_);
arch_spin_unlock(per_cpu_ptr(lg->lock, cpu1));
arch_spin_unlock(per_cpu_ptr(lg->lock, cpu2));
preempt_enable();
}
void lg_global_lock(struct lglock *lg)
{
int i;
preempt_disable();
lock_acquire_exclusive(&lg->lock_dep_map, 0, 0, NULL, _RET_IP_);
for_each_possible_cpu(i) {
arch_spinlock_t *lock;
lock = per_cpu_ptr(lg->lock, i);
arch_spin_lock(lock);
}
}
EXPORT_SYMBOL(lg_global_lock);
void lg_global_unlock(struct lglock *lg)
{
int i;
lock_release(&lg->lock_dep_map, 1, _RET_IP_);
for_each_possible_cpu(i) {
arch_spinlock_t *lock;
lock = per_cpu_ptr(lg->lock, i);
arch_spin_unlock(lock);
}
preempt_enable();
}
EXPORT_SYMBOL(lg_global_unlock);
此差异已折叠。
......@@ -70,11 +70,14 @@ struct pv_node {
static inline bool pv_queued_spin_steal_lock(struct qspinlock *lock)
{
struct __qspinlock *l = (void *)lock;
int ret = !(atomic_read(&lock->val) & _Q_LOCKED_PENDING_MASK) &&
(cmpxchg(&l->locked, 0, _Q_LOCKED_VAL) == 0);
qstat_inc(qstat_pv_lock_stealing, ret);
return ret;
if (!(atomic_read(&lock->val) & _Q_LOCKED_PENDING_MASK) &&
(cmpxchg(&l->locked, 0, _Q_LOCKED_VAL) == 0)) {
qstat_inc(qstat_pv_lock_stealing, true);
return true;
}
return false;
}
/*
......@@ -257,7 +260,6 @@ static struct pv_node *pv_unhash(struct qspinlock *lock)
static inline bool
pv_wait_early(struct pv_node *prev, int loop)
{
if ((loop & PV_PREV_CHECK_MASK) != 0)
return false;
......@@ -286,12 +288,10 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
{
struct pv_node *pn = (struct pv_node *)node;
struct pv_node *pp = (struct pv_node *)prev;
int waitcnt = 0;
int loop;
bool wait_early;
/* waitcnt processing will be compiled out if !QUEUED_LOCK_STAT */
for (;; waitcnt++) {
for (;;) {
for (wait_early = false, loop = SPIN_THRESHOLD; loop; loop--) {
if (READ_ONCE(node->locked))
return;
......@@ -315,7 +315,6 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
if (!READ_ONCE(node->locked)) {
qstat_inc(qstat_pv_wait_node, true);
qstat_inc(qstat_pv_wait_again, waitcnt);
qstat_inc(qstat_pv_wait_early, wait_early);
pv_wait(&pn->state, vcpu_halted);
}
......@@ -456,12 +455,9 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
pv_wait(&l->locked, _Q_SLOW_VAL);
/*
* The unlocker should have freed the lock before kicking the
* CPU. So if the lock is still not free, it is a spurious
* wakeup or another vCPU has stolen the lock. The current
* vCPU should spin again.
* Because of lock stealing, the queue head vCPU may not be
* able to acquire the lock before it has to wait again.
*/
qstat_inc(qstat_pv_spurious_wakeup, READ_ONCE(l->locked));
}
/*
......@@ -544,7 +540,7 @@ __visible void __pv_queued_spin_unlock(struct qspinlock *lock)
* unhash. Otherwise it would be possible to have multiple @lock
* entries, which would be BAD.
*/
locked = cmpxchg(&l->locked, _Q_LOCKED_VAL, 0);
locked = cmpxchg_release(&l->locked, _Q_LOCKED_VAL, 0);
if (likely(locked == _Q_LOCKED_VAL))
return;
......
......@@ -24,8 +24,8 @@
* pv_latency_wake - average latency (ns) from vCPU kick to wakeup
* pv_lock_slowpath - # of locking operations via the slowpath
* pv_lock_stealing - # of lock stealing operations
* pv_spurious_wakeup - # of spurious wakeups
* pv_wait_again - # of vCPU wait's that happened after a vCPU kick
* pv_spurious_wakeup - # of spurious wakeups in non-head vCPUs
* pv_wait_again - # of wait's after a queue head vCPU kick
* pv_wait_early - # of early vCPU wait's
* pv_wait_head - # of vCPU wait's at the queue head
* pv_wait_node - # of vCPU wait's at a non-head queue node
......
此差异已折叠。
......@@ -68,6 +68,8 @@ void rcu_sync_lockdep_assert(struct rcu_sync *rsp)
RCU_LOCKDEP_WARN(!gp_ops[rsp->gp_type].held(),
"suspicious rcu_sync_is_idle() usage");
}
EXPORT_SYMBOL_GPL(rcu_sync_lockdep_assert);
#endif
/**
......@@ -82,6 +84,18 @@ void rcu_sync_init(struct rcu_sync *rsp, enum rcu_sync_type type)
rsp->gp_type = type;
}
/**
* Must be called after rcu_sync_init() and before first use.
*
* Ensures rcu_sync_is_idle() returns false and rcu_sync_{enter,exit}()
* pairs turn into NO-OPs.
*/
void rcu_sync_enter_start(struct rcu_sync *rsp)
{
rsp->gp_count++;
rsp->gp_state = GP_PASSED;
}
/**
* rcu_sync_enter() - Force readers onto slowpath
* @rsp: Pointer to rcu_sync structure to use for synchronization
......
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册