提交 · 5ef3041e4a7cd817bc5ebbb0e5e956a2bdd32c38 · openeuler / Kernel

16 1月, 2009 2 次提交

lib/idr.c: use kmem_cache_zalloc() for the idr_layer cache · 5b019e99

由 Andrew Morton 提交于 16年前

David points out that the idr_remove_all() function returns unused slabs
to the kmem cache, but needs to zero them first or else they will be
uninitialized upon next use.  This causes crashes which have been observed
in the firewire subsystem.

He fixed this by zeroing the object before freeing it in idr_remove_all().

But we agree that simply removing the constructor and zeroing the object
at allocation time is simpler than relying upon slab constructor machinery
and might even be faster.

This problem was introduced by "idr: make idr_remove rcu-safe" (commit
cf481c20), which was first released in
2.6.27.

There are no known codesites which trigger this bug in 2.6.27 or 2.6.28.
The post-2.6.28 firewire changes are the only known triggerer.

There might of course be not-yet-discovered triggerers in 2.6.27 and
2.6.28, and there might be out-of-tree triggerers which are added to those
kernel versions.  I'll let the -stable guys decide whether they want to
backport this fix.
Reported-by: NDavid Moore <dcm@acm.org>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Kristian Hgsberg <krh@redhat.com>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5b019e99

idr: fix wrong kernel-doc · b098161b

由 Li Zefan 提交于 16年前

idr_get_new_above() and ida_get_new_above() return an id in the range of
@staring_id ... 0x7fffffff, not 0 ... 0x7fffffff.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b098161b

10 1月, 2009 1 次提交

rbtree: add const qualifier to some functions · f4b477c4

由 Artem Bityutskiy 提交于 16年前

The 'rb_first()', 'rb_last()', 'rb_next()' and 'rb_prev()' calls
take a pointer to an RB node or RB root. They do not change the
pointed objects, so add a 'const' qualifier in order to make life
of the users of these functions easier.

Indeed, if I have my own constant pointer &const struct my_type *p,
and I call 'rb_next(&p->rb)', I get a GCC warning:

warning: passing argument 1 of ‘rb_next’ discards qualifiers from pointer target type
Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f4b477c4

09 1月, 2009 1 次提交

generic swap(): lib/sort.c: rename swap to swap_func · b53907c0

由 Wu Fengguang 提交于 16年前

This is to avoid name clashes for the introduction of a global swap()
macro.
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b53907c0

08 1月, 2009 1 次提交

NOMMU: Make VMAs per MM as for MMU-mode linux · 8feae131

由 David Howells 提交于 16年前

Make VMAs per mm_struct as for MMU-mode linux.  This solves two problems:

 (1) In SYSV SHM where nattch for a segment does not reflect the number of
     shmat's (and forks) done.

 (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
     exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
     that a VMA might be shared and already have its vm_mm assigned to another
     process or a dead process.

A new struct (vm_region) is introduced to track a mapped region and to remember
the circumstances under which it may be shared and the vm_list_struct structure
is discarded as it's no longer required.

This patch makes the following additional changes:

 (1) Regions are now allocated with alloc_pages() rather than kmalloc() and
     with no recourse to __GFP_COMP, so the pages are not composite.  Instead,
     each page has a reference on it held by the region.  Anything else that is
     interested in such a page will have to get a reference on it to retain it.
     When the pages are released due to unmapping, each page is passed to
     put_page() and will be freed when the page usage count reaches zero.

 (2) Excess pages are trimmed after an allocation as the allocation must be
     made as a power-of-2 quantity of pages.

 (3) VMAs are added to the parent MM's R/B tree and mmap lists.  As an MM may
     end up with overlapping VMAs within the tree, the VMA struct address is
     appended to the sort key.

 (4) Non-anonymous VMAs are now added to the backing inode's prio list.

 (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
     the backing region.  The VMA and region structs will be split if
     necessary.

 (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
     segment instead of all the attachments at that addresss.  Multiple
     shmat()'s return the same address under NOMMU-mode instead of different
     virtual addresses as under MMU-mode.

 (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.

 (8) /proc/maps is now the global list of mapped regions, and may list bits
     that aren't actually mapped anywhere.

 (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
     of RAM currently allocated by mmap to hold mappable regions that can't be
     mapped directly.  These are copies of the backing device or file if not
     anonymous.

These changes make NOMMU mode more similar to MMU mode.  The downside is that
NOMMU mode requires some extra memory to track things over NOMMU without this
patch (VMAs are no longer shared, and there are now region structs).
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-by: NMike Frysinger <vapier.adi@gmail.com>
Acked-by: NPaul Mundt <lethal@linux-sh.org>

8feae131

07 1月, 2009 13 次提交

x86: offer frame pointers in all build modes · da4276b8

由 Ingo Molnar 提交于 16年前

CONFIG_FRAME_POINTERS=y results in much better debug info for the
kernel (clear and precise backtraces), with the only drawback being
a ~1% increase in kernel size.

So offer it unconditionally and enable it by default.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

da4276b8

percpu_counter: FBC_BATCH should be a variable · 179f7ebf

由 Eric Dumazet 提交于 16年前

For NR_CPUS >= 16 values, FBC_BATCH is 2*NR_CPUS

Considering more and more distros are using high NR_CPUS values, it makes
sense to use a more sensible value for FBC_BATCH, and get rid of NR_CPUS.

A sensible value is 2*num_online_cpus(), with a minimum value of 32 (This
minimum value helps branch prediction in __percpu_counter_add())

We already have a hotcpu notifier, so we can adjust FBC_BATCH dynamically.

We rename FBC_BATCH to percpu_counter_batch since its not a constant
anymore.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

179f7ebf

strict_strto* is not strict enough · e899aa82

由 Pavel Machek 提交于 16年前

It decodes "\n" as 0, which is bad, because stray echo into backlight
will turn your backlight off, etc...
Signed-off-by: NPavel Machek <pavel@suse.cz>
Cc: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e899aa82

lib: proportions.c trivial sparse lock annotation · 30079677

由 Harvey Harrison 提交于 16年前

Suppresses sparse warning:
lib/proportions.c:159:16: warning: context imbalance in 'prop_get_global': wrong count at exit
lib/proportions.c:159:16: context 'RCU': wanted 0, got 1
lib/proportions.c:164:2: warning: context imbalance in 'prop_put_global': unexpected unlock
lib/proportions.c:164:2: context 'RCU': wanted 0, got -1
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

30079677

lib: radix_tree.c make percpu variable static · 8cef7d57

由 Harvey Harrison 提交于 16年前

radix_tree_preloads is unused outside of this file, make it static.

Noticed by sparse:
lib/radix-tree.c:84:1: warning: symbol 'per_cpu__radix_tree_preloads' was not declared. Should it be static?
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8cef7d57

lib: fix sparse shadowed variable warning · 40bc1f2d

由 Harvey Harrison 提交于 16年前

pos is always set before being used, no need to declare a
second one inside the if() block.

lib/prio_heap.c:34:7: warning: symbol 'pos' shadows an earlier one
lib/prio_heap.c:30:6: originally declared here
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

40bc1f2d

Remove remaining unwinder code · f1883f86

由 Alexey Dobriyan 提交于 16年前

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Gabor Gombas <gombasg@sztaki.hu>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>,
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f1883f86

oops handling: ensure that any oops is flushed to the mtdoops console · b61312d3

由 Viktor Rosendahl 提交于 16年前

This used to work unpatched with older kernels, during the development
phase of mtdoops.  Before commit e3e8a75d
a space was printed with console_loglevel set to 15, which probably
flushed the oops message as a side effect.

This is another patch from the Nokia N810 kernel.
Signed-off-by: NViktor Rosendahl <viktor.rosendahl@nokia.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b61312d3

K
swiotlb: struct device - replace bus_id with dev_name(), dev_set_name() · 94b32486
由 Kay Sievers 提交于 16年前
```
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
```
94b32486

dynamic_printk: reduce one level of indentation · 2e5ba26a

由 Wu Fengguang 提交于 16年前

Cleanup pr_debug_write() to reduce one level of indentation.

Cc: Marcel Holtmann <marcel@holtmann.org>
Acked-by: NJason Baron <jbaron@redhat.com>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

2e5ba26a

kobject: return the result of uevent sending by netlink · e0d7bf5d

由 Ming Lei 提交于 16年前

We need to return the result of uevent sending by netlink
to caller, when uevent_helper is disabled and CONFIG_NET
is defined.
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

e0d7bf5d

uevent: don't pass envp_ext[] as format string in kobject_uevent_env() · c65b9145

由 Tejun Heo 提交于 16年前

kobject_uevent_env() uses envp_ext[] as verbatim format string which
can cause problems ranging from unexpectedly mangled string to oops if
a string in envp_ext[] contains substring which can be interpreted as
format.  Fix it.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

c65b9145

driver core: Remove completion from struct klist_node · 210272a2

由 Matthew Wilcox 提交于 16年前

Removing the completion from klist_node reduces its size from 64 bytes
to 28 on x86-64. To maintain the semantics of klist_remove(), we add
a single list of klist nodes which are pending deletion and scan them.
Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

210272a2

06 1月, 2009 1 次提交

trivial: radix-tree: document wrap-around issue of radix_tree_next_hole() · 8e6bdb7f

由 Wu Fengguang 提交于 16年前

And some 80-line cleanups.
Signed-off-by: NWu Fengguang <wfg@linux.intel.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

8e6bdb7f

04 1月, 2009 2 次提交

swiotlb: Don't include linux/swiotlb.h twice in lib/swiotlb.c · 52942b6b

由 Jesper Juhl 提交于 16年前

There's no point in including the linux/swiotlb.h header twice in
lib/swiotlb.c - this patch gets rid of the unneeded include.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

52942b6b

Make %p print '(null)' for NULL pointers · d97106ab

由 Linus Torvalds 提交于 16年前

Before, when we only ever printed out the pointer value itself, a NULL
pointer would never cause issues and might as well be printed out as
just its numeric value.

However, with the extended %p formats, especially %pR, we might validly
want to print out resources for debugging.  And sometimes they don't
even exist, and the resource pointer is just NULL.  Print it out as
such, rather than oopsing.

This is a more generic version of a patch done by Trent Piepho (catching
all %p cases rather than just %pR, and using "(null)" instead of
"[NULL]" to match glibc).
Requested-by: NTrent Piepho <xyzzy@speakeasy.org>
Acked-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d97106ab

03 1月, 2009 1 次提交

swiotlb: add missing __init annotations · 79ff56eb

由 Roland Dreier 提交于 16年前

Impact: cleanup, reduce kernel size a bit

The current kernel build warns:

    WARNING: vmlinux.o(.text+0x11458): Section mismatch in reference from the function swiotlb_alloc_boot() to the function .init.text:__alloc_bootmem_low()
    The function swiotlb_alloc_boot() references
    the function __init __alloc_bootmem_low().
    This is often because swiotlb_alloc_boot lacks a __init
    annotation or the annotation of __alloc_bootmem_low is wrong.

    WARNING: vmlinux.o(.text+0x1011f2): Section mismatch in reference from the function swiotlb_late_init_with_default_size() to the function .init.text:__alloc_bootmem_low()
    The function swiotlb_late_init_with_default_size() references
    the function __init __alloc_bootmem_low().
    This is often because swiotlb_late_init_with_default_size lacks a __init
    annotation or the annotation of __alloc_bootmem_low is wrong.

and indeed the functions calling __alloc_bootmem_low() can be marked
__init as well.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

79ff56eb

01 1月, 2009 4 次提交

cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS · 8c384cde

由 Rusty Russell 提交于 16年前

Impact: new debug CONFIG options

This helps find unconverted code.  It currently breaks compile horribly,
but we never wanted a flag day so that's expected.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

8c384cde

cpumask: zero extra bits in alloc_cpumask_var_node · 2a530080

由 Rusty Russell 提交于 16年前

Impact: extra safety checks during transition

When CONFIG_CPUMASKS_OFFSTACK is set, the new cpumask_ operators only
use bits up to nr_cpu_ids, not NR_CPUS.  Using the old cpus_ operators
on these masks can mean accessing undefined bits.

After some discussion, Mike and I decided to err on the side of caution;
we zero the "undefined" bits in alloc_cpumask_var_node() until all the
old cpumask functions are removed.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

2a530080

bitmap: find_last_bit() · ab53d472

由 Rusty Russell 提交于 16年前

Impact: New API

As the name suggests.  For the moment everyone uses the generic one.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

ab53d472

cpumask: fix bogus kernel-doc · e9690a6e

由 Li Zefan 提交于 16年前

Impact: fix kernel-doc

alloc_bootmem_cpumask_var() returns avoid.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

e9690a6e

29 12月, 2008 2 次提交

locking, percpu counters: introduce separate lock classes · ea319518

由 Peter Zijlstra 提交于 16年前

Impact: fix lockdep false positives

Classify percpu_counter instances similar to regular lock objects --
that is, per instantiation site.

The networking code has increased its use of percpu_counters, which
leads to false positives if they are treated as a single class.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ea319518

SLUB: failslab support · 773ff60e

由 Akinobu Mita 提交于 16年前

Currently fault-injection capability for SLAB allocator is only
available to SLAB. This patch makes it available to SLUB, too.

[penberg@cs.helsinki.fi: unify slab and slub implementations]
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

773ff60e

28 12月, 2008 5 次提交

swiotlb: clean up EXPORT_SYMBOL usage · 874d6a95

由 FUJITA Tomonori 提交于 16年前

Impact: cleanup

swiotlb uses EXPORT_SYMBOL in an inconsistent way. Some functions use
EXPORT_SYMBOL at the end of functions. Some use it at the end of
swiotlb.c.

This cleans up swiotlb to use EXPORT_SYMBOL in a consistent way (at
the end of functions).
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

874d6a95

swiotlb: remove unnecessary declaration · ac86ccc6

由 FUJITA Tomonori 提交于 16年前

Impact: cleanup
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ac86ccc6

swiotlb: add support for systems with highmem · fb05a379

由 Becky Bruce 提交于 16年前

Impact: extend code for highmem - existing users unaffected

On highmem systems, the original dma buffer might not
have a virtual mapping - we need to kmap it in to perform
the bounce.  Extract the code that does the actual
copy into a function that does the kmap if highmem
is enabled, and default to the normal swiotlb memcpy
if not.

[ ported by Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> ]
Signed-off-by: NBecky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fb05a379

swiotlb: store phys address in io_tlb_orig_addr array · bc40ac66

由 Becky Bruce 提交于 16年前

Impact: refactor code, cleanup

When we enable swiotlb for platforms that support HIGHMEM, we
can no longer store the virtual address of the original dma
buffer, because that buffer might not have a permament mapping.

Change the swiotlb code to instead store the physical address of
the original buffer.
Signed-off-by: NBecky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

bc40ac66

swiotlb: add hwdev to swiotlb_phys_to_bus() / swiotlb_sg_to_bus() · 70a7d3cc

由 Jeremy Fitzhardinge 提交于 16年前

Impact: extend functions with a (yet unused) parameter, update callsites

Some architectures need it - in preparation for highmem swiotlb.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

70a7d3cc

25 12月, 2008 3 次提交

libcrc32c: Select CRYPTO in Kconfig · 93027354

由 Herbert Xu 提交于 16年前

Selecting CRYPTO_CRC32C is not enough as CRYPTO which CRYPTO_CRC32C
depends on may be disabled. This patch adds the select on CRYPTO.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

93027354

libcrc32c: Fix "crc32c undefined" compilation error · 53b146ae

由 Adrian-Ken Rueegsegger 提交于 16年前

The latest shash changes leave crc32c undefined:

[...]
Building modules, stage 2.
  MODPOST 1381 modules
  ERROR: "crc32c" [net/sctp/sctp.ko] undefined!
  ERROR: "crc32c" [net/ipv4/netfilter/nf_nat_proto_sctp.ko] undefined!

Adding EXPORT_SYMBOL(crc32c) to lib/libcrc32c.c fixes the compile error.
This patch has been compile-tested only.
Signed-off-by: NAdrian-Ken Rueegsegger <rueegsegger@swiss-it.ch>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

53b146ae

libcrc32c: Move implementation to crypto crc32c · 69c35efc

由 Herbert Xu 提交于 16年前

This patch swaps the role of libcrc32c and crc32c.  Previously
the implementation was in libcrc32c and crc32c was a wrapper.
Now the code is in crc32c and libcrc32c just calls the crypto
layer.

The reason for the change is to tap into the algorithm selection
capability of the crypto API so that optimised implementations
such as the one utilising Intel's CRC32C instruction can be
used where available.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

69c35efc

19 12月, 2008 3 次提交

cpumask: documentation for cpumask_var_t · ec26b805

由 Mike Travis 提交于 16年前

Impact: New kerneldoc comments

Additional documentation added to all the alloc_cpumask and free_cpumask
functions.
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor additions)

ec26b805

cpumask: Add alloc_cpumask_var_node() · 7b4967c5

由 Mike Travis 提交于 16年前

Impact: New API

This will be needed in x86 code to allocate the domain and old_domain
cpumasks on the same node as where the containing irq_cfg struct is
allocated.

(Also fixes double-dump_stack on rare CONFIG_DEBUG_PER_CPU_MAPS case)
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (re-impl alloc_cpumask_var)

7b4967c5

"Tree RCU": scalable classic RCU implementation · 64db4cff

由 Paul E. McKenney 提交于 16年前

This patch fixes a long-standing performance bug in classic RCU that
results in massive internal-to-RCU lock contention on systems with
more than a few hundred CPUs. Although this patch creates a separate
flavor of RCU for ease of review and patch maintenance, it is intended
to replace classic RCU.

This patch still handles stress better than does mainline, so I am still
calling it ready for inclusion. This patch is against the -tip tree.
Nevertheless, experience on an actual 1000+ CPU machine would still be
most welcome.

Most of the changes noted below were found while creating an rcutiny
(which should permit ejecting the current rcuclassic) and while doing
detailed line-by-line documentation.

Updates from v9 (http://lkml.org/lkml/2008/12/2/334):

o Fixes from remainder of line-by-line code walkthrough,
including comment spelling, initialization, undesirable
narrowing due to type conversion, removing redundant memory
barriers, removing redundant local-variable initialization,
and removing redundant local variables.

I do not believe that any of these fixes address the CPU-hotplug
issues that Andi Kleen was seeing, but please do give it a whirl
in case the machine is smarter than I am.

A writeup from the walkthrough may be found at the following
URL, in case you are suffering from terminal insomnia or
masochism:

http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf

o Made rcutree tracing use seq_file, as suggested some time
ago by Lai Jiangshan.

o Added a .csv variant of the rcudata debugfs trace file, to allow
people having thousands of CPUs to drop the data into
a spreadsheet. Tested with oocalc and gnumeric. Updated
documentation to suit.

Updates from v8 (http://lkml.org/lkml/2008/11/15/139):

o Fix a theoretical race between grace-period initialization and
force_quiescent_state() that could occur if more than three
jiffies were required to carry out the grace-period
initialization. Which it might, if you had enough CPUs.

o Apply Ingo's printk-standardization patch.

o Substitute local variables for repeated accesses to global
variables.

o Fix comment misspellings and redundant (but harmless) increments
of ->n_rcu_pending (this latter after having explicitly added it).

o Apply checkpatch fixes.

Updates from v7 (http://lkml.org/lkml/2008/10/10/291):

o Fixed a number of problems noted by Gautham Shenoy, including
the cpu-stall-detection bug that he was having difficulty
convincing me was real. ;-)

o Changed cpu-stall detection to wait for ten seconds rather than
three in order to reduce false positive, as suggested by Ingo
Molnar.

o Produced a design document (http://lwn.net/Articles/305782/).
The act of writing this document uncovered a number of both
theoretical and "here and now" bugs as noted below.

o Fix dynticks_nesting accounting confusion, simplify WARN_ON()
condition, fix kerneldoc comments, and add memory barriers
in dynticks interface functions.

o Add more data to tracing.

o Remove unused "rcu_barrier" field from rcu_data structure.

o Count calls to rcu_pending() from scheduling-clock interrupt
to use as a surrogate timebase should jiffies stop counting.

o Fix a theoretical race between force_quiescent_state() and
grace-period initialization. Yes, initialization does have to
go on for some jiffies for this race to occur, but given enough
CPUs...

Updates from v6 (http://lkml.org/lkml/2008/9/23/448):

o Fix a number of checkpatch.pl complaints.

o Apply review comments from Ingo Molnar and Lai Jiangshan
on the stall-detection code.

o Fix several bugs in !CONFIG_SMP builds.

o Fix a misspelled config-parameter name so that RCU now announces
at boot time if stall detection is configured.

o Run tests on numerous combinations of configurations parameters,
which after the fixes above, now build and run correctly.

Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line):

o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a
changeset some time ago, and finally got around to retesting
this option).

o Fix some tracing bugs in rcupreempt that caused incorrect
totals to be printed.

o I now test with a more brutal random-selection online/offline
script (attached). Probably more brutal than it needs to be
on the people reading it as well, but so it goes.

o A number of optimizations and usability improvements:

o Make rcu_pending() ignore the grace-period timeout when
there is no grace period in progress.

o Make force_quiescent_state() avoid going for a global
lock in the case where there is no grace period in
progress.

o Rearrange struct fields to improve struct layout.

o Make call_rcu() initiate a grace period if RCU was
idle, rather than waiting for the next scheduling
clock interrupt.

o Invoke rcu_irq_enter() and rcu_irq_exit() only when
idle, as suggested by Andi Kleen. I still don't
completely trust this change, and might back it out.

o Make CONFIG_RCU_TRACE be the single config variable
manipulated for all forms of RCU, instead of the prior
confusion.

o Document tracing files and formats for both rcupreempt
and rcutree.

Updates from v4 for those missing v5 given its bad subject line:

o Separated dynticks interface so that NMIs and irqs call separate
functions, greatly simplifying it. In particular, this code
no longer requires a proof of correctness. ;-)

o Separated dynticks state out into its own per-CPU structure,
avoiding the duplicated accounting.

o The case where a dynticks-idle CPU runs an irq handler that
invokes call_rcu() is now correctly handled, forcing that CPU
out of dynticks-idle mode.

o Review comments have been applied (thank you all!!!).
For but one example, fixed the dynticks-ordering issue that
Manfred pointed out, saving me much debugging. ;-)

o Adjusted rcuclassic and rcupreempt to handle dynticks changes.

Attached is an updated patch to Classic RCU that applies a hierarchy,
greatly reducing the contention on the top-level lock for large machines.
This passes 10-hour concurrent rcutorture and online-offline testing on
128-CPU ppc64 without dynticks enabled, and exposes some timekeeping
bugs in presence of dynticks (exciting working on a system where
"sleep 1" hangs until interrupted...), which were fixed in the
2.6.27 kernel. It is getting more reliable than mainline by some
measures, so the next version will be against -tip for inclusion.
See also Manfred Spraul's recent patches (or his earlier work from
2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2).
We will converge onto a common patch in the fullness of time, but are
currently exploring different regions of the design space. That said,
I have already gratefully stolen quite a few of Manfred's ideas.

This patch provides CONFIG_RCU_FANOUT, which controls the bushiness
of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on
64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT,
there is no hierarchy. By default, the RCU initialization code will
adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA
architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable
this balancing, allowing the hierarchy to be exactly aligned to the
underlying hardware. Up to two levels of hierarchy are permitted
(in addition to the root node), allowing up to 16,384 CPUs on 32-bit
systems and up to 262,144 CPUs on 64-bit systems. I just know that I
am going to regret saying this, but this seems more than sufficient
for the foreseeable future. (Some architectures might wish to set
CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs.
If this becomes a real problem, additional levels can be added, but I
doubt that it will make a significant difference on real hardware.)

In the common case, a given CPU will manipulate its private rcu_data
structure and the rcu_node structure that it shares with its immediate
neighbors. This can reduce both lock and memory contention by multiple
orders of magnitude, which should eliminate the need for the strange
manipulations that are reported to be required when running Linux on
very large systems.

Some shortcomings:

o More bugs will probably surface as a result of an ongoing
line-by-line code inspection.

Patches will be provided as required.

o There are probably hangs, rcutorture failures, &c. Seems
quite stable on a 128-CPU machine, but that is kind of small
compared to 4096 CPUs. However, seems to do better than
mainline.

Patches will be provided as required.

o The memory footprint of this version is several KB larger
than rcuclassic.

A separate UP-only rcutiny patch will be provided, which will
reduce the memory footprint significantly, even compared
to the old rcuclassic. One such patch passes light testing,
and has a memory footprint smaller even than rcuclassic.
Initial reaction from various embedded guys was "it is not
worth it", so am putting it aside.

Credits:

o Manfred Spraul for ideas, review comments, and bugs spotted,
as well as some good friendly competition. ;-)

o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers,
Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton
for reviews and comments.

o Thomas Gleixner for much-needed help with some timer issues
(see patches below).

o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos,
Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton
Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines
alive despite my heavy abuse^Wtesting.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

64db4cff

18 12月, 2008 1 次提交

driver core: add newlines to debugging enabled/disabled messages · aa6f3c64

由 Marcel Holtmann 提交于 16年前

Both messages are missing the newline and thus dmesg output gets
scrambled.
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

aa6f3c64

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功