提交 · 31d33baf36bda7a2fea800648d87c9fe6155e7ca · openeuler / raspberrypi-kernel

27 4月, 2008 12 次提交

slub: Simplify any_slab_object checks · 31d33baf

由 Christoph Lameter 提交于 4月 14, 2008

Since we now have total_objects counter per node use that to
check for the presence of any objects. The loop over all cpu slabs
is not that useful since any cpu slab would require an object allocation
first. So drop that.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

31d33baf

slub: Make the order configurable for each slab cache · 06b285dc

由 Christoph Lameter 提交于 4月 14, 2008

Makes /sys/kernel/slab/<slabname>/order writable. The allocation
order of a slab cache can then be changed dynamically during runtime.
This can be used to override the objects per slabs value establisheed
with the slub_min_objects setting that was manually specified or
calculated on bootup.

The changes of the slab order can occur while allocate_slab() runs.
Allocate slab needs the order and the number of slab objects that
are both changed by the change of order. Both are put into
a single word (struct kmem_cache_order_objects). They can then
be atomically updated and retrieved.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

06b285dc

slub: Drop fallback to page allocator method · 319d1e24

由 Christoph Lameter 提交于 4月 14, 2008

There is now a generic method of falling back to a slab page of minimal
order. No need anymore for the fallback to kmalloc_large().
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

319d1e24

slub: Fallback to minimal order during slab page allocation · 65c3376a

由 Christoph Lameter 提交于 4月 14, 2008

If any higher order allocation fails then fall back the smallest order
necessary to contain at least one object. This enables fallback for all
allocations to order 0 pages. The fallback will waste more memory (objects
will not fit neatly) and the fallback slabs will be not as efficient as larger
slabs since they contain less objects.

Note that SLAB also depends on order 1 allocations for some slabs that waste
too much memory if forced into PAGE_SIZE'd page. SLUB now can now deal with
failing order 1 allocs which SLAB cannot do.

Add a new field min that will contain the objects for the smallest possible order
for a slab cache.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

65c3376a

slub: Update statistics handling for variable order slabs · 205ab99d

由 Christoph Lameter 提交于 4月 14, 2008

Change the statistics to consider that slabs of the same slabcache
can have different number of objects in them since they may be of
different order.

Provide a new sysfs field

	total_objects

which shows the total objects that the allocated slabs of a slabcache
could hold.

Add a max field that holds the largest slab order that was ever used
for a slab cache.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

205ab99d

slub: Add kmem_cache_order_objects struct · 834f3d11

由 Christoph Lameter 提交于 4月 14, 2008

Pack the order and the number of objects into a single word.
This saves some memory in the kmem_cache_structure and more importantly
allows us to fetch both values atomically.

Later the slab orders become runtime configurable and we need to fetch these
two items together in order to properly allocate a slab and initialize its
objects.

Fix the race by fetching the order and the number of objects in one word.

[penberg@cs.helsinki.fi: fix memset() page order in new_slab()]
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

834f3d11

slub: for_each_object must be passed the number of objects in a slab · 224a88be

由 Christoph Lameter 提交于 4月 14, 2008

Pass the number of objects to the for_each_object macro. Most of these are
debug related.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

224a88be

slub: Store max number of objects in the page struct. · 39b26464

由 Christoph Lameter 提交于 4月 14, 2008

Split the inuse field up to be able to store the number of objects in this
page in the page struct as well. Necessary if we want to have pages of
various orders for a slab. Also avoids touching struct kmem_cache cachelines in
__slab_alloc().

Update diagnostic code to check the number of objects and make sure that
the number of objects always stays within the bounds of a 16 bit unsigned
integer.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

39b26464

slub: Dump list of objects not freed on kmem_cache_close() · 33b12c38

由 Christoph Lameter 提交于 4月 25, 2008

Dump a list of unfreed objects if a slab cache is closed but
objects still remain.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

33b12c38

slub: free_list() cleanup · 599870b1

由 Christoph Lameter 提交于 4月 23, 2008

free_list looked a bit screwy so here is an attempt to clean it up.

free_list is is only used for freeing partial lists. We do not need to return a
parameter if we decrement nr_partial within the function which allows a
simplification of the whole thing.

The current version modifies nr_partial outside of the list_lock which is
technically not correct. It was only ok because we should be the only user of
this slab cache at this point.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

599870b1

slub: improve kmem_cache_destroy() error message · d629d819

由 Pekka Enberg 提交于 4月 23, 2008

As pointed out by Ingo, the SLUB warning of calling kmem_cache_destroy()
with cache that still has objects triggers in practice. So turn this
WARN_ON() into a nice SLUB specific error message to avoid people
confusing it to a SLUB bug.

Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

d629d819

slob: fix bug - when slob allocates "struct kmem_cache", it does not force alignment. · 0701a9e6

由 Yi Li 提交于 4月 25, 2008

This may trigger misaligned memory access exception.
Acked-by: NMatt Mackall <mpm@selenic.com>
Signed-off-by: NYi Li <yi.li@analog.com>
Signed-off-by: NBryan Wu <cooloney@kernel.org>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

0701a9e6

24 4月, 2008 1 次提交

slab_err: Pass parameters correctly to slab_bug · 3dc50637

由 Christoph Lameter 提交于 4月 23, 2008

Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3dc50637

22 4月, 2008 1 次提交

trivial: small cleanups · f5264481

由 Pavel Machek 提交于 4月 21, 2008

These are small cleanups all over the tree.

Trivial style and comment changes to
  fs/select.c, kernel/signal.c, kernel/stop_machine.c & mm/pdflush.c
Signed-off-by: NPavel Machek <pavel@suse.cz>
Signed-off-by: NJesper Juhl <jesper.juhl@gmail.com>

f5264481

20 4月, 2008 4 次提交

driver core: memory: semaphore to mutex · da19cbcf

由 Daniel Walker 提交于 2月 04, 2008

Signed-off-by: NDaniel Walker <dwalker@mvista.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

da19cbcf

nodemask: use new node_to_cpumask_ptr function · c5f59f08

由 Mike Travis 提交于 4月 04, 2008

  * Use new node_to_cpumask_ptr.  This creates a pointer to the
    cpumask for a given node.  This definition is in mm patch:

	asm-generic-add-node_to_cpumask_ptr-macro.patch

  * Use new set_cpus_allowed_ptr function.

Depends on:
	[mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
	[sched-devel]: sched: add new set_cpus_allowed_ptr function
	[x86/latest]: x86: add cpus_scnprintf function

Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Greg Banks <gnb@melbourne.sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c5f59f08

cpuset: modify cpuset_set_cpus_allowed to use cpumask pointer · f9a86fcb

由 Mike Travis 提交于 4月 04, 2008

  * Modify cpuset_cpus_allowed to return the currently allowed cpuset
    via a pointer argument instead of as the function return value.

  * Use new set_cpus_allowed_ptr function.

  * Cleanup CPU_MASK_ALL and NODE_MASK_ALL uses.

Depends on:
	[sched-devel]: sched: add new set_cpus_allowed_ptr function
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f9a86fcb

cpumask: Cleanup more uses of CPU_MASK and NODE_MASK · d366f8cb

由 Mike Travis 提交于 4月 04, 2008

 *  Replace usages of CPU_MASK_NONE, CPU_MASK_ALL, NODE_MASK_NONE,
    NODE_MASK_ALL to reduce stack requirements for large NR_CPUS
    and MAXNODES counts.

 *  In some cases, the cpumask variable was initialized but then overwritten
    with another value.  This is the case for changes like this:

    -       cpumask_t oldmask = CPU_MASK_ALL;
    +       cpumask_t oldmask;
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d366f8cb

18 4月, 2008 2 次提交

kgdb: fix optional arch functions and probe_kernel_* · b4b8ac52

由 Jason Wessel 提交于 2月 20, 2008

Fix two regressions dealing with the kgdb core.

1) kgdb_skipexception and kgdb_post_primary_code are optional
functions that are only required on archs that need special exception
fixups.

2) The kernel address space scope must be set on any probe_kernel_*
function or archs such as ARCH=arm will not allow access to the kernel
memory space.  As an example, it is required to allow the full kernel
address space is when you the kernel debugger to inspect a system
call.
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b4b8ac52

uaccess: add probe_kernel_write() · c33fa9f5

由 Ingo Molnar 提交于 4月 17, 2008

add probe_kernel_read() and probe_kernel_write().

Uninlined and restricted to kernel range memory only, as suggested
by Linus.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

c33fa9f5

16 4月, 2008 3 次提交

add "Isolate" migratetype name to /proc/pagetypeinfo · 91446b06

由 KOSAKI Motohiro 提交于 4月 15, 2008

In a5d76b54 (memory unplug: page isolation by
KAMEZAWA Hiroyuki), "isolate" migratetype added.  but unfortunately, it
doesn't treat /proc/pagetypeinfo display logic.

this patch add "Isolate" to pagetype name field.

/proc/pagetype
before:
------------------------------------------------------------------------------------------------------------------------
Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
Node    0, zone      DMA, type    Unmovable      1      2      2      2      1      2      2      1      1      0      0
Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone      DMA, type      Movable      2      3      3      1      3      3      2      0      0      0      0
Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
Node    0, zone      DMA, type       <NULL>      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type    Unmovable      1      9      7      4      1      1      1      1      0      0      0
Node    0, zone   Normal, type  Reclaimable      5      2      0      0      1      1      0      0      0      1      0
Node    0, zone   Normal, type      Movable      0      1      1      0      0      0      1      0      0      1     60
Node    0, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
Node    0, zone   Normal, type       <NULL>      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone  HighMem, type    Unmovable      0      0      1      1      1      0      1      1      2      2      0
Node    0, zone  HighMem, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone  HighMem, type      Movable    236     62      6      2      2      1      1      0      1      1     16
Node    0, zone  HighMem, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
Node    0, zone  HighMem, type       <NULL>      0      0      0      0      0      0      0      0      0      0      0

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve       <NULL>
Node 0, zone      DMA            1            0            2       1            0
Node 0, zone   Normal           10           40          169       1            0
Node 0, zone  HighMem            2            0          283       1            0

after:
------------------------------------------------------------------------------------------------------------------------
Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
Node    0, zone      DMA, type    Unmovable      1      2      2      2      1      2      2      1      1      0      0
Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone      DMA, type      Movable      2      3      3      1      3      3      2      0      0      0      0
Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type    Unmovable      0      2      1      1      0      1      0      0      0      0      0
Node    0, zone   Normal, type  Reclaimable      1      1      1      1      1      0      1      1      1      0      0
Node    0, zone   Normal, type      Movable      0      1      1      1      0      1      0      1      0      0    196
Node    0, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone  HighMem, type    Unmovable      0      1      0      0      0      1      1      1      2      2      0
Node    0, zone  HighMem, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone  HighMem, type      Movable      1      0      1      1      0      0      0      0      1      0    200
Node    0, zone  HighMem, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
Node    0, zone  HighMem, type      Isolate      0      0      0      0      0      0      0      0      0      0      0

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate
Node 0, zone      DMA            1            0            2       1            0
Node 0, zone   Normal            8            4          207       1            0
Node 0, zone  HighMem            2            0          283       1            0
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

91446b06

memcg: fix oops in oom handling · e115f2d8

由 Li Zefan 提交于 4月 15, 2008

When I used a test program to fork mass processes and immediately move them to
a cgroup where the memory limit is low enough to trigger oom kill, I got oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000808
IP: [<ffffffff8045c47f>] _spin_lock_irqsave+0x8/0x18
PGD 4c95f067 PUD 4406c067 PMD 0
Oops: 0002 [1] SMP
CPU 2
Modules linked in:

Pid: 11973, comm: a.out Not tainted 2.6.25-rc7 #5
RIP: 0010:[<ffffffff8045c47f>]  [<ffffffff8045c47f>] _spin_lock_irqsave+0x8/0x18
RSP: 0018:ffff8100448c7c30  EFLAGS: 00010002
RAX: 0000000000000202 RBX: 0000000000000009 RCX: 000000000001c9f3
RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000808
RBP: ffff81007e444080 R08: 0000000000000000 R09: ffff8100448c7900
R10: ffff81000105f480 R11: 00000100ffffffff R12: ffff810067c84140
R13: 0000000000000001 R14: ffff8100441d0018 R15: ffff81007da56200
FS:  00007f70eb1856f0(0000) GS:ffff81007fbad3c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000808 CR3: 000000004498a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process a.out (pid: 11973, threadinfo ffff8100448c6000, task ffff81007da533e0)
Stack:  ffffffff8023ef5a 00000000000000d0 ffffffff80548dc0 00000000000000d0
 ffff810067c84140 ffff81007e444080 ffffffff8026cef9 00000000000000d0
 ffff8100441d0000 00000000000000d0 ffff8100441d0000 ffff8100505445c0
Call Trace:
 [<ffffffff8023ef5a>] ? force_sig_info+0x25/0xb9
 [<ffffffff8026cef9>] ? oom_kill_task+0x77/0xe2
 [<ffffffff8026d696>] ? mem_cgroup_out_of_memory+0x55/0x67
 [<ffffffff802910ad>] ? mem_cgroup_charge_common+0xec/0x202
 [<ffffffff8027997b>] ? handle_mm_fault+0x24e/0x77f
 [<ffffffff8022c4af>] ? default_wake_function+0x0/0xe
 [<ffffffff8027a17a>] ? get_user_pages+0x2ce/0x3af
 [<ffffffff80290fee>] ? mem_cgroup_charge_common+0x2d/0x202
 [<ffffffff8027a441>] ? make_pages_present+0x8e/0xa4
 [<ffffffff8027d1ab>] ? mmap_region+0x373/0x429
 [<ffffffff8027d7eb>] ? do_mmap_pgoff+0x2ff/0x364
 [<ffffffff80210471>] ? sys_mmap+0xe5/0x111
 [<ffffffff8020bfc9>] ? tracesys+0xdc/0xe1

Code: 00 00 01 48 8b 3c 24 e9 46 d4 dd ff f0 ff 07 48 8b 3c 24 e9 3a d4 dd ff fe 07 48 8b 3c 24 e9 2f d4 dd ff 9c 58 fa ba 00 01 00 00 <f0> 66 0f c1 17 38 f2 74 06 f3 90 8a 17 eb f6 c3 fa b8 00 01 00
RIP  [<ffffffff8045c47f>] _spin_lock_irqsave+0x8/0x18
 RSP <ffff8100448c7c30>
CR2: 0000000000000808
---[ end trace c3702fa668021ea4 ]---

It's reproducable in a x86_64 box, but doesn't happen in x86_32.

This is because tsk->sighand is not guarded by RCU, so we have to
hold tasklist_lock, just as what out_of_memory() does.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: David Rientjes <rientjes@cs.washington.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e115f2d8

mm: sparsemem memory_present() fix · bead9a3a

由 Ingo Molnar 提交于 4月 16, 2008

Fix memory corruption and crash on 32-bit x86 systems.

If a !PAE x86 kernel is booted on a 32-bit system with more than 4GB of
RAM, then we call memory_present() with a start/end that goes outside
the scope of MAX_PHYSMEM_BITS.

That causes this loop to happily walk over the limit of the sparse
memory section map:

    for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
                unsigned long section = pfn_to_section_nr(pfn);
                struct mem_section *ms;

                sparse_index_init(section, nid);
                set_section_nid(section, nid);

                ms = __nr_to_section(section);
                if (!ms->section_mem_map)
                        ms->section_mem_map = sparse_encode_early_nid(nid) |
			                                SECTION_MARKED_PRESENT;

'ms' will be out of bounds and we'll corrupt a small amount of memory by
encoding the node ID and writing SECTION_MARKED_PRESENT (==0x1) over it.

The corruption might happen when encoding a non-zero node ID, or due to
the SECTION_MARKED_PRESENT which is 0x1:

	mmzone.h:#define	SECTION_MARKED_PRESENT	(1UL<<0)

The fix is to sanity check anything the architecture passes to
sparsemem.

This bug seems to be rather old (as old as sparsemem support itself),
but the exact incarnation depended on random details like configs, which
made this bug more prominent in v2.6.25-to-be.

An additional enhancement might be to print a warning about ignored or
trimmed memory ranges.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Tested-by: NChristoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Yinghai Lu <Yinghai.Lu@sun.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bead9a3a

14 4月, 2008 6 次提交

slub: No need for per node slab counters if !SLUB_DEBUG · 0f389ec6

由 Christoph Lameter 提交于 4月 14, 2008

The per node counters are used mainly for showing data through the sysfs API.
If that API is not compiled in then there is no point in keeping track of this
data. Disable counters for the number of slabs and the number of total slabs
if !SLUB_DEBUG. Incrementing the per node counters is also accessing a
potentially contended cacheline so this could actually be a performance
benefit to embedded systems.

SLABINFO support is also affected. It now must depends on SLUB_DEBUG (which
is on by default).

Patch also avoids a check for a NULL kmem_cache_node pointer in new_slab()
if the system is not compiled with NUMA support.

[penberg@cs.helsinki.fi: fix oops and move ->nr_slabs into CONFIG_SLUB_DEBUG]
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

0f389ec6

slub: Move map/flag clearing to __free_slab · 49bd5221

由 Christoph Lameter 提交于 4月 14, 2008

__free_slab does some diagnostics. The resetting of mapcount etc
in discard_slab() can interfere with debug processing. So move
the reset immediately before the page is freed.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

49bd5221

slub: Fixes to per cpu stat output in sysfs · 50ef37b9

由 Christoph Lameter 提交于 4月 14, 2008

Only output per cpu stats if the kernel is build for SMP.

Use a capital "C" as a leading character for the processor number
(same as the numa statistics that also use a capital letter "N").
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

50ef37b9

slub: Deal with config variable dependencies · 5b06c853

由 Christoph Lameter 提交于 4月 14, 2008

count_partial() is used by both slabinfo and the sysfs proc support. Move
the function directly before the beginning of the sysfs code so that it can
be easily found. Rework the preprocessor conditional to take into account
that slub sysfs support depends on CONFIG_SYSFS *and* CONFIG_SLUB_DEBUG.

Make CONFIG_SLUB_STATS depend on CONFIG_SLUB_DEBUG and CONFIG_SYSFS. There
is no point of keeping statistics if no one can restrive them.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

5b06c853

slub: Reduce #ifdef ZONE_DMA by moving kmalloc_caches_dma near dma logic · 4097d601

由 Christoph Lameter 提交于 4月 14, 2008

Move the definition of kmalloc_caches_dma() into a later #ifdef CONFIG_ZONE_DMA.
This saves one #ifdef and leaves us with a total of two #ifdefs for dma slab support.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

4097d601

slub: Initialize per-cpu stats · 62f75532

由 Pekka Enberg 提交于 4月 14, 2008

As spotted by kmemcheck, we need to initialize the per-CPU ->stat array before
using it.

[kmem_cache_cpu structures are usually allocated from arrays defined via
DEFINE_PER_CPU that are zeroed so we have not noticed this so far --cl].
Reported-by: NVegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

62f75532

09 4月, 2008 1 次提交

memcg: fix node_state handling · 41e3355d

由 KAMEZAWA Hiroyuki 提交于 4月 08, 2008

This should be N_NORMAL_MEMORY.

N_NORMAL_MEMORY is "true" if a node has memory for the kernel.  N_HIGH_MEMORY
is "true" if a node has memory for HIGHMEM.  (If CONFIG_HIGHMEM=n, always
"true")

This check is used for testing whether we can use kmalloc_node() on a node.
Then, if there is a node which only contains HIGHMEM, the system will call
kmalloc_node() which doesn't contain memory for the kernel.  If it happens
under SLUB, the kernel will panic.  I think this only happens on x86_32-numa.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41e3355d

05 4月, 2008 1 次提交

memory controller: make memory resource control aware of boot options · 4077960e

由 Balbir Singh 提交于 4月 04, 2008

A boot option for the memory controller was discussed on lkml.  It is a good
idea to add it, since it saves memory for people who want to turn off the
memory controller.

By default the option is on for the following two reasons:

1. It provides compatibility with the current scheme where the memory
   controller turns on if the config option is enabled
2. It allows for wider testing of the memory controller, once the config
   option is enabled

We still allow the create, destroy callbacks to succeed, since they are not
aware of boot options.  We do not populate the directory will memory resource
controller specific files.
Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4077960e

02 4月, 2008 1 次提交

Fix undefined count_partial if !CONFIG_SLABINFO · 00460dd5

由 Christoph Lameter 提交于 4月 01, 2008

Small typo in the patch recently merged to avoid the unused symbol
message for count_partial(). Discussion thread with confirmation of fix at
http://marc.info/?t=120696854400001&r=1&w=2

Typo in the check if we need the count_partial function that was
introduced by 53625b42Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00460dd5

31 3月, 2008 1 次提交

NULL noise: fs/*, mm/*, kernel/* · 9dce07f1

由 Al Viro 提交于 3月 29, 2008

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9dce07f1

28 3月, 2008 1 次提交

Revert "SLUB: remove useless masking of GFP_ZERO" · e72e9c23

由 Linus Torvalds 提交于 3月 27, 2008

This reverts commit 3811dbf6.

The masking was not at all useless, and it was sensible.  We handle
GFP_ZERO in the caller, and passing it down to any page allocator logic
is buggy and wrong.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e72e9c23

27 3月, 2008 4 次提交

hugetlb: fix potential livelock in return_unused_surplus_hugepages() · 11320d17

由 Nishanth Aravamudan 提交于 3月 26, 2008

Running the counters testcase from libhugetlbfs results in on 2.6.25-rc5
and 2.6.25-rc5-mm1:

    BUG: soft lockup - CPU#3 stuck for 61s! [counters:10531]
    NIP: c0000000000d1f3c LR: c0000000000d1f2c CTR: c0000000001b5088
    REGS: c000005db12cb360 TRAP: 0901   Not tainted  (2.6.25-rc5-autokern1)
    MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 48008448  XER: 20000000
    TASK = c000005dbf3d6000[10531] 'counters' THREAD: c000005db12c8000 CPU: 3
    GPR00: 0000000000000004 c000005db12cb5e0 c000000000879228 0000000000000004
    GPR04: 0000000000000010 0000000000000000 0000000000200200 0000000000100100
    GPR08: c0000000008aba10 000000000000ffff 0000000000000004 0000000000000000
    GPR12: 0000000028000442 c000000000770080
    NIP [c0000000000d1f3c] .return_unused_surplus_pages+0x84/0x18c
    LR [c0000000000d1f2c] .return_unused_surplus_pages+0x74/0x18c
    Call Trace:
    [c000005db12cb5e0] [c000005db12cb670] 0xc000005db12cb670 (unreliable)
    [c000005db12cb670] [c0000000000d24c4] .hugetlb_acct_memory+0x2e0/0x354
    [c000005db12cb740] [c0000000001b5048] .truncate_hugepages+0x1d4/0x214
    [c000005db12cb890] [c0000000001b50a4] .hugetlbfs_delete_inode+0x1c/0x3c
    [c000005db12cb920] [c000000000103fd8] .generic_delete_inode+0xf8/0x1c0
    [c000005db12cb9b0] [c0000000001b5100] .hugetlbfs_drop_inode+0x3c/0x24c
    [c000005db12cba50] [c00000000010287c] .iput+0xdc/0xf8
    [c000005db12cbad0] [c0000000000fee54] .dentry_iput+0x12c/0x194
    [c000005db12cbb60] [c0000000000ff050] .d_kill+0x6c/0xa4
    [c000005db12cbbf0] [c0000000000ffb74] .dput+0x18c/0x1b0
    [c000005db12cbc70] [c0000000000e9e98] .__fput+0x1a4/0x1e8
    [c000005db12cbd10] [c0000000000e61ec] .filp_close+0xb8/0xe0
    [c000005db12cbda0] [c0000000000e62d0] .sys_close+0xbc/0x134
    [c000005db12cbe30] [c00000000000872c] syscall_exit+0x0/0x40
    Instruction dump:
    ebbe8038 38800010 e8bf0002 3bbd0008 7fa3eb78 38a50001 7ca507b4 4818df25
    60000000 38800010 38a00000 7c601b78 <7fa3eb78> 2f800010 409d0008 38000010

This was tracked down to a potential livelock in
return_unused_surplus_hugepages().  In the case where we have surplus
pages on some node, but no free pages on the same node, we may never
break out of the loop. To avoid this livelock, terminate the search if
we iterate a number of times equal to the number of online nodes without
freeing a page.

Thanks to Andy Whitcroft and Adam Litke for helping with debugging and
the patch.
Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

11320d17

hugetlb: indicate surplus huge page counts in per-node meminfo · a1de0919

由 Nishanth Aravamudan 提交于 3月 26, 2008

Currently we show the surplus hugetlb pool state in /proc/meminfo, but
not in the per-node meminfo files, even though we track the information
on a per-node basis. Printing it there can help track down dynamic pool
bugs including the one in the follow-on patch.
Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a1de0919

slab: fix cache_cache bootstrap in kmem_cache_init() · ec1f5eee

由 Daniel Yeisley 提交于 3月 25, 2008

Commit 556a169d ("slab: fix bootstrap on
memoryless node") introduced bootstrap-time cache_cache list3s for all nodes
but forgot that initkmem_list3 needs to be accessed by [somevalue + node]. This
patch fixes list_add() corruption in mm/slab.c seen on the ES7000.

Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Olaf Hering <olaf@aepfle.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: NDan Yeisley <dan.yeisley@unisys.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NChristoph Lameter <clameter@sgi.com>

ec1f5eee

count_partial() is not used if !SLUB_DEBUG and !CONFIG_SLABINFO · 53625b42

由 Christoph Lameter 提交于 3月 19, 2008

Avoid warnings about unused functions if neither SLUB_DEBUG nor CONFIG_SLABINFO
is defined. This patch will be reversed when slab defrag is merged since slab
defrag requires count_partial() to determine the fragmentation status of
slab caches.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>

53625b42

25 3月, 2008 2 次提交

revert "kswapd should only wait on IO if there is IO" · 4dd4b920

由 Andrew Morton 提交于 3月 24, 2008

Revert commit f1a9ee75:

  Author: Rik van Riel <riel@redhat.com>
  Date:   Thu Feb 7 00:14:08 2008 -0800

    kswapd should only wait on IO if there is IO

    The current kswapd (and try_to_free_pages) code has an oddity where the
    code will wait on IO, even if there is no IO in flight.  This problem is
    notable especially when the system scans through many unfreeable pages,
    causing unnecessary stalls in the VM.

    Additionally, tasks without __GFP_FS or __GFP_IO in the direct reclaim path
    will sleep if a significant number of pages are encountered that should be
    written out.  This gives kswapd a chance to write out those pages, while
    the direct reclaim task sleeps.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

Because of large latencies and interactivity problems reported by Carlos,
here: http://lkml.org/lkml/2008/3/22/211

Cc: Rik van Riel <riel@redhat.com>
Cc: "Carlos R.  Mafra" <crmafra2@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4dd4b920

mm: fix boundary checking in free_bootmem_core · 5a982cbc

由 Yinghai Lu 提交于 3月 24, 2008

With numa enabled, some callers could have a range of memory on one node
but try to free that on other node.  This can cause some pages to be
freed wrongly.

For example: when we try to allocate 128g boot ram early for
gart/swiotlb, and free that range later so gart/swiotlb can get some
range afterwards.

With this patch, we don't need to care which node holds the range, just
loop to call free_bootmem_node for all online nodes.

This patch makes free_bootmem_core() more robust by trimming the sidx
and eidx according the ram range that the node has.

And make the free_bootmem_core handle this out of range case.  We could
use bdata_list to make sure the range can be freed for sure.  So next
time, we don't need to loop online nodes and could use free_bootmem
directly.
Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Tested-by: NIngo Molnar <mingo@elte.hu>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5a982cbc