提交 · da40d036fd716f0efb2917076220814b1e927ae1 · openanolis / cloud-kernel

07 1月, 2011 9 次提交

由 Nick Piggin 提交于 1月 07, 2011

The problem that this patch aims to fix is vfsmount refcounting scalability.
We need to take a reference on the vfsmount for every successful path lookup,
which often go to the same mount point.

The fundamental difficulty is that a "simple" reference count can never be made
scalable, because any time a reference is dropped, we must check whether that
was the last reference. To do that requires communication with all other CPUs
that may have taken a reference count.

We can make refcounts more scalable in a couple of ways, involving keeping
distributed counters, and checking for the global-zero condition less
frequently.

- check the global sum once every interval (this will delay zero detection
  for some interval, so it's probably a showstopper for vfsmounts).

- keep a local count and only taking the global sum when local reaches 0 (this
  is difficult for vfsmounts, because we can't hold preempt off for the life of
  a reference, so a counter would need to be per-thread or tied strongly to a
  particular CPU which requires more locking).

- keep a local difference of increments and decrements, which allows us to sum
  the total difference and hence find the refcount when summing all CPUs. Then,
  keep a single integer "long" refcount for slow and long lasting references,
  and only take the global sum of local counters when the long refcount is 0.

This last scheme is what I implemented here. Attached mounts and process root
and working directory references are "long" references, and everything else is
a short reference.

This allows scalable vfsmount references during path walking over mounted
subtrees and unattached (lazy umounted) mounts with processes still running
in them.

This results in one fewer atomic op in the fastpath: mntget is now just a
per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
and non-atomic decrement in the common case. However code is otherwise bigger
and heavier, so single threaded performance is basically a wash.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

b3e19d92

fs: dcache reduce branches in lookup path · fb045adb

由 Nick Piggin 提交于 1月 07, 2011

Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/$[^\t ]*$->d_op = $.*$;/d_set_d_op(\1, \2);/' -e 's/$[^\t ]*$\.d_op = $.*$;/d_set_d_op(\&\1, \2);/' -i
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fb045adb

fs: icache RCU free inodes · fa0d7e3d

由 Nick Piggin 提交于 1月 07, 2011

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fa0d7e3d

fs: dcache rationalise dget variants · dc0474be

由 Nick Piggin 提交于 1月 07, 2011

dget_locked was a shortcut to avoid the lazy lru manipulation when we already
held dcache_lock (lru manipulation was relatively cheap at that point).
However, how that the lru lock is an innermost one, we never hold it at any
caller, so the lock cost can now be avoided. We already have well working lazy
dcache LRU, so it should be fine to defer LRU manipulations to scan time.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

dc0474be

fs: dcache remove dcache_lock · b5c84bf6

由 Nick Piggin 提交于 1月 07, 2011

dcache_lock no longer protects anything. remove it.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

b5c84bf6

fs: dcache scale d_unhashed · da502956

由 Nick Piggin 提交于 1月 07, 2011

Protect d_unhashed(dentry) condition with d_lock. This means keeping
DCACHE_UNHASHED bit in synch with hash manipulations.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

da502956

fs: dcache scale dentry refcount · b7ab39f6

由 Nick Piggin 提交于 1月 07, 2011

Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

b7ab39f6

fs: change d_delete semantics · fe15ce44

由 Nick Piggin 提交于 1月 07, 2011

Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fe15ce44

ARM: DMA: add support for DMA debugging · 24056f52

由 Russell King 提交于 1月 03, 2011

Add ARM support for the DMA debug infrastructure, which allows the
DMA API usage to be debugged.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

24056f52

06 1月, 2011 1 次提交

sh: include Migo-R TS driver in Migo-R defconfig · 6f09e41d

由 Magnus Damm 提交于 1月 06, 2011

This patch enables the Migo-R specific touch screen
driver in the Migo-R defconfig.
Signed-off-by: NMagnus Damm <damm@opensource.se>
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

6f09e41d

05 1月, 2011 23 次提交

x86, NMI: Add touch_nmi_watchdog to io_check_error delay · 74d91e3c

由 Huang Ying 提交于 1月 04, 2011

Prevent the long delay in io_check_error making NMI watchdog
timeout.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
LKML-Reference: <1294198689-15447-3-git-send-email-dzickus@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

74d91e3c

x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time · 554ec063

由 Dongdong Deng 提交于 1月 04, 2011

The spin_lock_debug/rcu_cpu_stall detector uses
trigger_all_cpu_backtrace() to dump cpu backtrace.
Therefore it is possible that trigger_all_cpu_backtrace()
could be called at the same time on different CPUs, which
triggers and 'unknown reason NMI' warning. The following case
illustrates the problem:

      CPU1                    CPU2                     ...   CPU N
                       trigger_all_cpu_backtrace()
                       set "backtrace_mask" to cpu mask
                               |
generate NMI interrupts  generate NMI interrupts       ...
    \                          |                               /
     \                         |                              /

The "backtrace_mask" will be cleaned by the first NMI interrupt
at nmi_watchdog_tick(), then the following NMI interrupts
generated by other cpus's arch_trigger_all_cpu_backtrace() will
be taken as unknown reason NMI interrupts.

This patch uses a test_and_set to avoid the problem, and stop
the arch_trigger_all_cpu_backtrace() from calling to avoid
dumping a double cpu backtrace info when there is already a
trigger_all_cpu_backtrace() in progress.
Signed-off-by: NDongdong Deng <dongdong.deng@windriver.com>
Reviewed-by: NBruce Ashfield <bruce.ashfield@windriver.com>
Cc: fweisbec@gmail.com
LKML-Reference: <1294198689-15447-2-git-send-email-dzickus@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDon Zickus <dzickus@redhat.com>

554ec063

x86: Only call smp_processor_id in non-preempt cases · 9ab181fa

由 Don Zickus 提交于 1月 04, 2011

There are some paths that walk the die_chain with preemption on.
Make sure we are in an NMI call before we start doing anything.

This was triggered by do_general_protection calling notify_die
with DIE_GPF.
Reported-by: NJan Kiszka <jan.kiszka@web.de>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
LKML-Reference: <1294198689-15447-1-git-send-email-dzickus@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9ab181fa

x86: Fix APIC ID sizing bug on larger systems, clean up MAX_APICS confusion · cb2ded37

由 Yinghai Lu 提交于 1月 04, 2011

Found one x2apic pre-enabled system, x2apic_mode suddenly get
corrupted after register some cpus, when compiled
CONFIG_NR_CPUS=255 instead of 512.

It turns out that generic_processor_info() ==> phyid_set(apicid,
phys_cpu_present_map) causes the problem.

phys_cpu_present_map is sized by MAX_APICS bits, and pre-enabled
system some cpus have an apic id > 255.

The variable after phys_cpu_present_map may get corrupted
silently:

 ffffffff828e8420 B phys_cpu_present_map
 ffffffff828e8440 B apic_verbosity
 ffffffff828e8444 B local_apic_timer_c2_ok
 ffffffff828e8448 B disable_apic
 ffffffff828e844c B x2apic_mode
 ffffffff828e8450 B x2apic_disabled
 ffffffff828e8454 B num_processors
 ...

Actually phys_cpu_present_map is referenced via apic id, instead
index. We should use MAX_LOCAL_APIC instead MAX_APICS.

For 64-bit it will be 32768 in all cases. BSS will increase by 4k bytes
on 64-bit:

	text		data		bss		dec		filename
	21696943	4193748		12787712	38678403	vmlinux.before
	21696943	4193748		12791808	38682499	vmlinux.after

No change on 32bit.

Finally we can remove MAX_APCIS that was rather confusing.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <4D23BD9C.3070102@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cb2ded37

ARM: vexpress: add sched_clock() for Versatile Express · 0af85dda

由 Russell King 提交于 12月 15, 2010

Add a sched_clock() implementation to Versatile Express using the new
sched_clock() infrastructure for extending 32bit counters to full
64-bit nanoseconds.
Tested-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

0af85dda

ARM: mach-shmobile: specify sh7372 MIPI DSI register layout · 6fd46595

由 Guennadi Liakhovetski 提交于 12月 29, 2010

Prepare the ap4evb board for the MIPI DSI driver transition to support

different register layouts.
Signed-off-by: NGuennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

6fd46595

ARM: mach-shmobile: improve MIPI DSI clock configuration · 0851d50d

由 Guennadi Liakhovetski 提交于 12月 27, 2010

Now, that the MIPI DSI driver implements runtime PM, we don't need anymore to
configure clocks statically in the platform code. This patch also adds a DSITX1
clock definition for sh7372 and attaches PHY clocks to respective devices.
Signed-off-by: NGuennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

0851d50d

sh: correct definitions to access stack pointers · eac676e5

由 Roel Kluin 提交于 1月 01, 2011

A definition like:

#define regs_return_value(regs)	((regs)->regs[0])

called with regs_return_value(foo) will be preprocessed to:

((foo)->foo[0])
        ^^^
So to fix this to ensure the preprocessor compiles such calls correctly.
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

eac676e5

sparc: update copyright in piggyback.c · 05085588