提交 · 543585cc5b07fa99a2dc897159fbf48c1eb73058 · openeuler / raspberrypi-kernel

11 11月, 2011 1 次提交

slab: rename slab_break_gfp_order to slab_max_order · 543585cc

由 David Rientjes 提交于 10月 18, 2011

slab_break_gfp_order is more appropriately named slab_max_order since it
enforces the maximum order size of slabs as long as a single object will
still fit.

Also rename BREAK_GFP_ORDER_{LO,HI} accordingly.
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

543585cc

28 9月, 2011 1 次提交

mm: restrict access to slab files under procfs and sysfs · ab067e99

由 Vasiliy Kulikov 提交于 9月 27, 2011

Historically /proc/slabinfo and files under /sys/kernel/slab/* have
world read permissions and are accessible to the world.  slabinfo
contains rather private information related both to the kernel and
userspace tasks.  Depending on the situation, it might reveal either
private information per se or information useful to make another
targeted attack.  Some examples of what can be learned by
reading/watching for /proc/slabinfo entries:

1) dentry (and different *inode*) number might reveal other processes fs
activity.  The number of dentry "active objects" doesn't strictly show
file count opened/touched by a process, however, there is a good
correlation between them.  The patch "proc: force dcache drop on
unauthorized access" relies on the privacy of dentry count.

2) different inode entries might reveal the same information as (1), but
these are more fine granted counters.  If a filesystem is mounted in a
private mount point (or even a private namespace) and fs type differs from
other mounted fs types, fs activity in this mount point/namespace is
revealed.  If there is a single ecryptfs mount point, the whole fs
activity of a single user is revealed.  Number of files in ecryptfs
mount point is a private information per se.

3) fuse_* reveals number of files / fs activity of a user in a user
private mount point.  It is approx. the same severity as ecryptfs
infoleak in (2).

4) sysfs_dir_cache similar to (2) reveals devices' addition/removal,
which can be otherwise hidden by "chmod 0700 /sys/".  With 0444 slabinfo
the precise number of sysfs files is known to the world.

5) buffer_head might reveal some kernel activity.  With other
information leaks an attacker might identify what specific kernel
routines generate buffer_head activity.

6) *kmalloc* infoleaks are very situational.  Attacker should watch for
the specific kmalloc size entry and filter the noise related to the unrelated
kernel activity.  If an attacker has relatively silent victim system, he
might get rather precise counters.

Additional information sources might significantly increase the slabinfo
infoleak benefits.  E.g. if an attacker knows that the processes
activity on the system is very low (only core daemons like syslog and
cron), he may run setxid binaries / trigger local daemon activity /
trigger network services activity / await sporadic cron jobs activity
/ etc. and get rather precise counters for fs and network activity of
these privileged tasks, which is unknown otherwise.

Also hiding slabinfo and /sys/kernel/slab/* is a one step to complicate
exploitation of kernel heap overflows (and possibly, other bugs).  The
related discussion:

http://thread.gmane.org/gmane.linux.kernel/1108378

To keep compatibility with old permission model where non-root
monitoring daemon could watch for kernel memleaks though slabinfo one
should do:

    groupadd slabinfo
    usermod -a -G slabinfo $MONITOR_USER

And add the following commands to init scripts (to mountall.conf in
Ubuntu's upstart case):

    chmod g+r /proc/slabinfo /sys/kernel/slab/*/*
    chgrp slabinfo /proc/slabinfo /sys/kernel/slab/*/*
Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
Reviewed-by: NKees Cook <kees@ubuntu.com>
Reviewed-by: NDave Hansen <dave@linux.vnet.ibm.com>
Acked-by: NChristoph Lameter <cl@gentwo.org>
Acked-by: NDavid Rientjes <rientjes@google.com>
CC: Valdis.Kletnieks@vt.edu
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Alan Cox <alan@linux.intel.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

ab067e99

04 8月, 2011 2 次提交

slab, lockdep: Annotate the locks before using them · 30765b92

由 Peter Zijlstra 提交于 7月 28, 2011

Fernando found we hit the regular OFF_SLAB 'recursion' before we
annotate the locks, cure this.

The relevant portion of the stack-trace:

> [    0.000000]  [<c085e24f>] rt_spin_lock+0x50/0x56
> [    0.000000]  [<c04fb406>] __cache_free+0x43/0xc3
> [    0.000000]  [<c04fb23f>] kmem_cache_free+0x6c/0xdc
> [    0.000000]  [<c04fb2fe>] slab_destroy+0x4f/0x53
> [    0.000000]  [<c04fb396>] free_block+0x94/0xc1
> [    0.000000]  [<c04fc551>] do_tune_cpucache+0x10b/0x2bb
> [    0.000000]  [<c04fc8dc>] enable_cpucache+0x7b/0xa7
> [    0.000000]  [<c0bd9d3c>] kmem_cache_init_late+0x1f/0x61
> [    0.000000]  [<c0bba687>] start_kernel+0x24c/0x363
> [    0.000000]  [<c0bba0ba>] i386_start_kernel+0xa9/0xaf
Reported-by: NFernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
Acked-by: NPekka Enberg <penberg@kernel.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1311888176.2617.379.camel@laptopSigned-off-by: NIngo Molnar <mingo@elte.hu>

30765b92

slab, lockdep: Annotate slab -> rcu -> debug_object -> slab · 83835b3d

由 Peter Zijlstra 提交于 7月 22, 2011

Lockdep thinks there's lock recursion through:

	kmem_cache_free()
	  cache_flusharray()
	    spin_lock(&l3->list_lock)  <----------------.
	    free_block()                                |
	      slab_destroy()                            |
		call_rcu()                              |
		  debug_object_activate()               |
		    debug_object_init()                 |
		      __debug_object_init()             |
			kmem_cache_alloc()              |
			  cache_alloc_refill()          |
			    spin_lock(&l3->list_lock) --'

Now debug objects doesn't use SLAB_DESTROY_BY_RCU and hence there is no
actual possibility of recursing. Luckily debug objects marks it slab
with SLAB_DEBUG_OBJECTS so we can identify the thing.

Mark all SLAB_DEBUG_OBJECTS (all one!) slab caches with a special
lockdep key so that lockdep sees its a different cachep.

Also add a WARN on trying to create a SLAB_DESTROY_BY_RCU |
SLAB_DEBUG_OBJECTS cache, to avoid possible future trouble.
Reported-and-tested-by: NSebastian Siewior <sebastian@breakpoint.cc>
[ fixes to the initial patch ]
Reported-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPekka Enberg <penberg@kernel.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1311341165.27400.58.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

83835b3d

01 8月, 2011 1 次提交

slab: use print_hex_dump · fdde6abb

由 Sebastian Andrzej Siewior 提交于 7月 29, 2011

Less code and the advantage of ascii dump.

before:
| Slab corruption: names_cache start=c5788000, len=4096
| 000: 6b 6b 01 00 00 00 56 00 00 00 24 00 00 00 2a 00
| 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
| 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff
| 030: ff ff ff ff e2 b4 17 18 c7 e4 08 06 00 01 08 00
| 040: 06 04 00 01 e2 b4 17 18 c7 e4 0a 00 00 01 00 00
| 050: 00 00 00 00 0a 00 00 02 6b 6b 6b 6b 6b 6b 6b 6b

after:
| Slab corruption: size-4096 start=c38a9000, len=4096
| 000: 6b 6b 01 00 00 00 56 00 00 00 24 00 00 00 2a 00  kk....V...$...*.
| 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
| 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff  ................
| 030: ff ff ff ff d2 56 5f aa db 9c 08 06 00 01 08 00  .....V_.........
| 040: 06 04 00 01 d2 56 5f aa db 9c 0a 00 00 01 00 00  .....V_.........
| 050: 00 00 00 00 0a 00 00 02 6b 6b 6b 6b 6b 6b 6b 6b  ........kkkkkkkk
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

fdde6abb

31 7月, 2011 1 次提交

slab: use NUMA_NO_NODE · eacbbae3

由 Andrew Morton 提交于 7月 28, 2011

Use the nice enumerated constant.

Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

eacbbae3

28 7月, 2011 1 次提交

slab: remove one NR_CPUS dependency · acfe7d74

由 Eric Dumazet 提交于 7月 25, 2011

Reduce high order allocations in do_tune_cpucache() for some setups.
(NR_CPUS=4096 -> we need 64KB)
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

acfe7d74

22 7月, 2011 1 次提交

slab: fix DEBUG_SLAB warning · 7ea466f2

由 Tetsuo Handa 提交于 7月 21, 2011

In commit c225150b "slab: fix DEBUG_SLAB build",
"if ((unsigned long)objp & (ARCH_SLAB_MINALIGN-1))" is always true if
ARCH_SLAB_MINALIGN == 0. Do not print warning if ARCH_SLAB_MINALIGN == 0.
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

7ea466f2

21 7月, 2011 1 次提交

slab: shrink sizeof(struct kmem_cache) · b56efcf0

由 Eric Dumazet 提交于 7月 20, 2011

Reduce high order allocations for some setups.
(NR_CPUS=4096 -> we need 64KB per kmem_cache struct)

We now allocate exact needed size (using nr_cpu_ids and nr_node_ids)

This also makes code a bit smaller on x86_64, since some field offsets
are less than the 127 limit :

Before patch :
# size mm/slab.o
   text    data     bss     dec     hex filename
  22605  361665      32  384302   5dd2e mm/slab.o

After patch :
# size mm/slab.o
   text    data     bss     dec     hex filename
  22349	 353473	   8224	 384046	  5dc2e	mm/slab.o

CC: Andrew Morton <akpm@linux-foundation.org>
Reported-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

b56efcf0

18 7月, 2011 1 次提交

slab: fix DEBUG_SLAB build · c225150b

由 Hugh Dickins 提交于 7月 11, 2011

Fix CONFIG_SLAB=y CONFIG_DEBUG_SLAB=y build error and warnings.

Now that ARCH_SLAB_MINALIGN defaults to __alignof__(unsigned long long),
it is always defined (when slab.h included), but cannot be used in #if:
mm/slab.c: In function `cache_alloc_debugcheck_after':
mm/slab.c:3156:5: warning: "__alignof__" is not defined
mm/slab.c:3156:5: error: missing binary operator before token "("
make[1]: *** [mm/slab.o] Error 1

So just remove the #if and #endif lines, but then 64-bit build warns:
mm/slab.c: In function `cache_alloc_debugcheck_after':
mm/slab.c:3156:6: warning: cast from pointer to integer of different size
mm/slab.c:3158:10: warning: format `%d' expects type `int', but argument
                            3 has type `long unsigned int'
Fix those with casts, whatever the actual type of ARCH_SLAB_MINALIGN.
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

c225150b

04 6月, 2011 1 次提交

SLAB: Record actual last user of freed objects. · a947eb95

由 Suleiman Souhlal 提交于 6月 02, 2011

Currently, when using CONFIG_DEBUG_SLAB, we put in kfree() or
kmem_cache_free() as the last user of free objects, which is not
very useful, so change it to the caller of those functions instead.
Acked-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NSuleiman Souhlal <suleiman@google.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

a947eb95

21 5月, 2011 1 次提交

sanitize <linux/prefetch.h> usage · 268bb0ce

由 Linus Torvalds 提交于 5月 20, 2011

Commit e66eed65 ("list: remove prefetching from regular list
iterators") removed the include of prefetch.h from list.h, which
uncovered several cases that had apparently relied on that rather
obscure header file dependency.

So this fixes things up a bit, using

   grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]')
   grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]')

to guide us in finding files that either need <linux/prefetch.h>
inclusion, or have it despite not needing it.

There are more of them around (mostly network drivers), but this gets
many core ones.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

268bb0ce

31 3月, 2011 1 次提交

Fix common misspellings · 25985edc

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

23 3月, 2011 1 次提交

mm: notifier_from_errno() cleanup · 5fda1bd5

由 Prarit Bhargava 提交于 3月 22, 2011

While looking at some other notifier callbacks I noticed this code could
use a simple cleanup.

notifier_from_errno() no longer needs the if (ret)/else conditional.  That
same conditional is now done in notifier_from_errno().
Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5fda1bd5

12 3月, 2011 1 次提交

slab,rcu: don't assume the size of struct rcu_head · 5bfe53a7

由 Lai Jiangshan 提交于 3月 10, 2011

The size of struct rcu_head may be changed. When it becomes larger,
it may pollute the data after struct slab.
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

5bfe53a7

14 2月, 2011 1 次提交

Revert "slab: Fix missing DEBUG_SLAB last user" · 3ff84a7f

由 Pekka Enberg 提交于 2月 14, 2011

This reverts commit 5c5e3b33.

The commit breaks ARM thusly:

| Mount-cache hash table entries: 512
| slab error in verify_redzone_free(): cache `idr_layer_cache': memory outside object was overwritten
| Backtrace:
| [<c0227088>] (dump_backtrace+0x0/0x110) from [<c0431afc>] (dump_stack+0x18/0x1c)
| [<c0431ae4>] (dump_stack+0x0/0x1c) from [<c0293304>] (__slab_error+0x28/0x30)
| [<c02932dc>] (__slab_error+0x0/0x30) from [<c0293a74>] (cache_free_debugcheck+0x1c0/0x2b8)
| [<c02938b4>] (cache_free_debugcheck+0x0/0x2b8) from [<c0293f78>] (kmem_cache_free+0x3c/0xc0)
| [<c0293f3c>] (kmem_cache_free+0x0/0xc0) from [<c032b1c8>] (ida_get_new_above+0x19c/0x1c0)
| [<c032b02c>] (ida_get_new_above+0x0/0x1c0) from [<c02af7ec>] (alloc_vfsmnt+0x54/0x144)
| [<c02af798>] (alloc_vfsmnt+0x0/0x144) from [<c0299830>] (vfs_kern_mount+0x30/0xec)
| [<c0299800>] (vfs_kern_mount+0x0/0xec) from [<c0299908>] (kern_mount_data+0x1c/0x20)
| [<c02998ec>] (kern_mount_data+0x0/0x20) from [<c02146c4>] (sysfs_init+0x68/0xc8)
| [<c021465c>] (sysfs_init+0x0/0xc8) from [<c02137d4>] (mnt_init+0x90/0x1b0)
| [<c0213744>] (mnt_init+0x0/0x1b0) from [<c0213388>] (vfs_caches_init+0x100/0x140)
| [<c0213288>] (vfs_caches_init+0x0/0x140) from [<c0208c0c>] (start_kernel+0x2e8/0x368)
| [<c0208924>] (start_kernel+0x0/0x368) from [<c0208034>] (__enable_mmu+0x0/0x2c)
| c0113268: redzone 1:0xd84156c5c032b3ac, redzone 2:0xd84156c5635688c0.
| slab error in cache_alloc_debugcheck_after(): cache `idr_layer_cache': double free, or memory outside object was overwritten
| ...
| c011307c: redzone 1:0x9f91102ffffffff, redzone 2:0x9f911029d74e35b
| slab: Internal list corruption detected in cache 'idr_layer_cache'(24), slabp c0113000(16). Hexdump:
|
| 000: 20 4f 10 c0 20 4f 10 c0 7c 00 00 00 7c 30 11 c0
| 010: 10 00 00 00 10 00 00 00 00 00 c9 17 fe ff ff ff
| 020: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff
| 030: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff
| 040: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff
| 050: fe ff ff ff fe ff ff ff fe ff ff ff 11 00 00 00
| 060: 12 00 00 00 13 00 00 00 14 00 00 00 15 00 00 00
| 070: 16 00 00 00 17 00 00 00 c0 88 56 63
| kernel BUG at /home/rmk/git/linux-2.6-rmk/mm/slab.c:2928!

Reference: https://lkml.org/lkml/2011/2/7/238
Cc: <stable@kernel.org> # 2.6.35.y and later
Reported-and-analyzed-by: NRussell King <rmk@arm.linux.org.uk>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

3ff84a7f

24 1月, 2011 1 次提交

mm: Remove support for kmem_cache_name() · 63310467

由 Christoph Lameter 提交于 1月 20, 2011

The last user was ext4 and Eric Sandeen removed the call in a recent patch. See
the following URL for the discussion:

http://marc.info/?l=linux-ext4&m=129546975702198&w=2Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

63310467

15 1月, 2011 1 次提交

mm/slab.c: make local symbols static · 68a1b195

由 H Hartley Sweeten 提交于 1月 11, 2011

Local symbols should be static.
Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

68a1b195

07 1月, 2011 1 次提交

kernel: kmem_ptr_validate considered harmful · ccd35fb9

由 Nick Piggin 提交于 1月 07, 2011

This is a nasty and error prone API. It is no longer used, remove it.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

ccd35fb9

17 12月, 2010 1 次提交

core: Replace __get_cpu_var with __this_cpu_read if not used for an address. · 909ea964

由 Christoph Lameter 提交于 12月 08, 2010

__get_cpu_var() can be replaced with this_cpu_read and will then use a
single read instruction with implied address calculation to access the
correct per cpu instance.

However, the address of a per cpu variable passed to __this_cpu_read()
cannot be determined (since it's an implied address conversion through
segment prefixes).  Therefore apply this only to uses of __get_cpu_var
where the address of the variable is not used.

Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Hugh Dickins <hughd@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

909ea964

15 12月, 2010 1 次提交

workqueue: convert cancel_rearming_delayed_work[queue]() users to cancel_delayed_work_sync() · afe2c511

由 Tejun Heo 提交于 12月 14, 2010

cancel_rearming_delayed_work[queue]() has been superceded by
cancel_delayed_work_sync() quite some time ago.  Convert all the
in-kernel users.  The conversions are completely equivalent and
trivial.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: N"David S. Miller" <davem@davemloft.net>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: netdev@vger.kernel.org
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alex Elder <aelder@sgi.com>
Cc: xfs-masters@oss.sgi.com
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: netfilter-devel@vger.kernel.org
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: linux-nfs@vger.kernel.org

afe2c511

29 11月, 2010 1 次提交

tracing/slab: Move kmalloc tracepoint out of inline code · 85beb586

由 Steven Rostedt 提交于 11月 24, 2010

The tracepoint for kmalloc is in the slab inlined code which causes
every instance of kmalloc to have the tracepoint.

This patch moves the tracepoint out of the inline code to the
slab C file, which removes a large number of inlined trace
points.

  objdump -dr vmlinux.slab| grep 'jmpq.*<trace_kmalloc' |wc -l
213
  objdump -dr vmlinux.slab.patched| grep 'jmpq.*<trace_kmalloc' |wc -l
1

This also has a nice impact on size.

   text	   data	    bss	    dec	    hex	filename
7023060	2121564	2482432	11627056	 b16a30	vmlinux.slab
6970579	2109772	2482432	11562783	 b06f1f	vmlinux.slab.patched
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

85beb586

27 10月, 2010 1 次提交

replace nested max/min macros with {max,min}3 macro · 732eacc0

由 Hagen Paul Pfeifer 提交于 10月 26, 2010

Use the new {max,min}3 macros to save some cycles and bytes on the stack.
This patch substitutes trivial nested macros with their counterpart.
Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
Cc: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

732eacc0

10 8月, 2010 1 次提交

gcc-4.6: mm: fix unused but set warnings · 4e60c86b

由 Andi Kleen 提交于 8月 09, 2010

No real bugs, just some dead code and some fixups.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4e60c86b

09 8月, 2010 1 次提交

slab: fix object alignment · 1ab335d8

由 Carsten Otte 提交于 8月 06, 2010

This patch fixes alignment of slab objects in case CONFIG_DEBUG_PAGEALLOC is
active.
Before this spot in kmem_cache_create, we have this situation:
- align contains the required alignment of the object
- cachep->obj_offset is 0 or equals align in case of CONFIG_DEBUG_SLAB
- size equals the size of the object, or object plus trailing redzone in case
  of CONFIG_DEBUG_SLAB

This spot tries to fill one page per object if the object is in certain size
limits, however setting obj_offset to PAGE_SIZE - size does break the object
alignment since size may not be aligned with the required alignment.
This patch simply adds an ALIGN(size, align) to the equation and fixes the
object size detection accordingly.

This code in drivers/s390/cio/qdio_setup_init has lead to incorrectly aligned
slab objects (sizeof(struct qdio_q) equals 1792):
	qdio_q_cache = kmem_cache_create("qdio_q", sizeof(struct qdio_q),
					 256, 0, NULL);
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

1ab335d8

20 7月, 2010 1 次提交

slab: use deferable timers for its periodic housekeeping · 78b43536

由 Arjan van de Ven 提交于 7月 19, 2010

slab has a "once every 2 second" timer for its housekeeping.
As the number of logical processors is growing, its more and more
common that this 2 second timer becomes the primary wakeup source.

This patch turns this housekeeping timer into a deferable timer,
which means that the timer does not interrupt idle, but just runs
at the next event that wakes the cpu up.

The impact is that the timer likely runs a bit later, but during the
delay no code is running so there's not all that much reason for
a difference in housekeeping to occur because of this delay.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

78b43536

09 6月, 2010 1 次提交

tracing: Remove kmemtrace ftrace plugin · 039ca4e7

由 Li Zefan 提交于 5月 26, 2010

We have been resisting new ftrace plugins and removing existing
ones, and kmemtrace has been superseded by kmem trace events
and perf-kmem, so we remove it.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Acked-by: NEduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
[ remove kmemtrace from the makefile, handle slob too ]
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

039ca4e7

28 5月, 2010 3 次提交

numa: slab: use numa_mem_id() for slab local memory node · 7d6e6d09

由 Lee Schermerhorn 提交于 5月 26, 2010

Example usage of generic "numa_mem_id()":

The mainline slab code, since ~ 2.6.19, does not handle memoryless nodes
well.  Specifically, the "fast path"--____cache_alloc()--will never
succeed as slab doesn't cache offnode object on the per cpu queues, and
for memoryless nodes, all memory will be "off node" relative to
numa_node_id().  This adds significant overhead to all kmem cache
allocations, incurring a significant regression relative to earlier
kernels [from before slab.c was reorganized].

This patch uses the generic topology function "numa_mem_id()" to return
the "effective local memory node" for the calling context.  This is the
first node in the local node's generic fallback zonelist-- the same node
that "local" mempolicy-based allocations would use.  This lets slab cache
these "local" allocations and avoid fallback/refill on every allocation.

N.B.: Slab will need to handle node and memory hotplug events that could
change the value returned by numa_mem_id() for any given node if recent
changes to address memory hotplug don't already address this.  E.g., flush
all per cpu slab queues before rebuilding the zonelists while the
"machine" is held in the stopped state.

Performance impact on "hackbench 400 process 200"

2.6.34-rc3-mmotm-100405-1609		no-patch	this-patch
ia64 no memoryless nodes [avg of 10]:     11.713       11.637  ~0.65 diff
ia64 cpus all on memless nodes  [10]:    228.259       26.484  ~8.6x speedup

The slowdown of the patched kernel from ~12 sec to ~28 seconds when
configured with memoryless nodes is the result of all cpus allocating from
a single node's mm pagepool.  The cache lines of the single node are
distributed/interleaved over the memory of the real physical nodes, but
the zone lock, list heads, ...  of the single node with memory still each
live in a single cache line that is accessed from all processors.

x86_64 [8x6 AMD] [avg of 40]:		2.883	   2.845
Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7d6e6d09

slab: convert cpu notifier to return encapsulate errno value · eac40680

由 Akinobu Mita 提交于 5月 26, 2010

By the previous modification, the cpu notifier can return encapsulate
errno value.  This converts the cpu notifiers for slab.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eac40680

cpusets: new round-robin rotor for SLAB allocations · 6adef3eb

由 Jack Steiner 提交于 5月 26, 2010

We have observed several workloads running on multi-node systems where
memory is assigned unevenly across the nodes in the system.  There are
numerous reasons for this but one is the round-robin rotor in
cpuset_mem_spread_node().

For example, a simple test that writes a multi-page file will allocate
pages on nodes 0 2 4 6 ...  Odd nodes are skipped.  (Sometimes it
allocates on odd nodes & skips even nodes).

An example is shown below.  The program "lfile" writes a file consisting
of 10 pages.  The program then mmaps the file & uses get_mempolicy(...,
MPOL_F_NODE) to determine the nodes where the file pages were allocated.
The output is shown below:

	# ./lfile
	 allocated on nodes: 2 4 6 0 1 2 6 0 2

There is a single rotor that is used for allocating both file pages & slab
pages.  Writing the file allocates both a data page & a slab page
(buffer_head).  This advances the RR rotor 2 nodes for each page
allocated.

A quick confirmation seems to confirm this is the cause of the uneven
allocation:

	# echo 0 >/dev/cpuset/memory_spread_slab
	# ./lfile
	 allocated on nodes: 6 7 8 9 0 1 2 3 4 5

This patch introduces a second rotor that is used for slab allocations.
Signed-off-by: NJack Steiner <steiner@sgi.com>
Acked-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Menage <menage@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6adef3eb

25 5月, 2010 1 次提交

cpuset,mm: fix no node to alloc memory when changing cpuset's mems · c0ff7453

由 Miao Xie 提交于 5月 24, 2010

Before applying this patch, cpuset updates task->mems_allowed and
mempolicy by setting all new bits in the nodemask first, and clearing all
old unallowed bits later.  But in the way, the allocator may find that
there is no node to alloc memory.

The reason is that cpuset rebinds the task's mempolicy, it cleans the
nodes which the allocater can alloc pages on, for example:

(mpol: mempolicy)
	task1			task1's mpol	task2
	alloc page		1
	  alloc on node0? NO	1
				1		change mems from 1 to 0
				1		rebind task1's mpol
				0-1		  set new bits
				0	  	  clear disallowed bits
	  alloc on node1? NO	0
	  ...
	can't alloc page
	  goto oom

This patch fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits).  So we
use a variable to tell the write-side task that read-side task is reading
nodemask, and the write-side task clears newly disallowed nodes after
read-side task ends the current memory allocation.

[akpm@linux-foundation.org: fix spello]
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Paul Menage <menage@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c0ff7453

20 5月, 2010 1 次提交

mm: Move ARCH_SLAB_MINALIGN and ARCH_KMALLOC_MINALIGN to <linux/slab_def.h> · 1f0ce8b3

由 David Woodhouse 提交于 5月 19, 2010

Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

1f0ce8b3

15 4月, 2010 1 次提交

slab: Fix missing DEBUG_SLAB last user · 5c5e3b33

由 Shiyong Li 提交于 4月 12, 2010

Even with SLAB_RED_ZONE and SLAB_STORE_USER enabled, kernel would NOT store
redzone and last user data around allocated memory space if "arch cache line >
sizeof(unsigned long long)". As a result, last user information is unexpectedly
MISSED while dumping slab corruption log.

This fix makes sure that redzone and last user tags get stored unless the
required alignment breaks redzone's.
Signed-off-by: NShiyong Li <shi-yong.li@motorola.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

5c5e3b33

10 4月, 2010 1 次提交

slab: Generify kernel pointer validation · fc1c1833

由 Pekka Enberg 提交于 4月 07, 2010

As suggested by Linus, introduce a kern_ptr_validate() helper that does some
sanity checks to make sure a pointer is a valid kernel pointer.  This is a
preparational step for fixing SLUB kmem_ptr_validate().

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fc1c1833

08 4月, 2010 1 次提交

slab: add memory hotplug support · 8f9f8d9e

由 David Rientjes 提交于 3月 27, 2010

Slab lacks any memory hotplug support for nodes that are hotplugged
without cpus being hotplugged.  This is possible at least on x86
CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
node.  It can also be done manually by writing the start address to
/sys/devices/system/memory/probe for kernels that have
CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
then onlining the new memory region.

When a node is hotadded, a nodelist for that node is allocated and
initialized for each slab cache.  If this isn't completed due to a lack
of memory, the hotadd is aborted: we have a reasonable expectation that
kmalloc_node(nid) will work for all caches if nid is online and memory is
available.

Since nodelists must be allocated and initialized prior to the new node's
memory actually being online, the struct kmem_list3 is allocated off-node
due to kmalloc_node()'s fallback.

When an entire node would be offlined, its nodelists are subsequently
drained.  If slab objects still exist and cannot be freed, the offline is
aborted.  It is possible that objects will be allocated between this
drain and page isolation, so it's still possible that the offline will
still fail, however.
Acked-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

8f9f8d9e

29 3月, 2010 1 次提交

slab: Fix continuation lines · e92dd4fd

由 Joe Perches 提交于 3月 26, 2010

Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

e92dd4fd

27 2月, 2010 1 次提交

failslab: add ability to filter slab caches · 4c13dd3b

由 Dmitry Monakhov 提交于 2月 26, 2010

This patch allow to inject faults only for specific slabs.
In order to preserve default behavior cache filter is off by
default (all caches are faulty).

One may define specific set of slabs like this:
# mark skbuff_head_cache as faulty
echo 1 > /sys/kernel/slab/skbuff_head_cache/failslab
# Turn on cache filter (off by default)
echo 1 > /sys/kernel/debug/failslab/cache-filter
# Turn on fault injection
echo 1 > /sys/kernel/debug/failslab/times
echo 1 > /sys/kernel/debug/failslab/probability
Acked-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NAkinobu Mita <akinobu.mita@gmail.com>
Acked-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

4c13dd3b

30 1月, 2010 1 次提交

slab: fix regression in touched logic · 44b57f1c

由 Nick Piggin 提交于 1月 27, 2010

When factoring common code into transfer_objects in commit 3ded175a ("slab: add
transfer_objects() function"), the 'touched' logic got a bit broken. When
refilling from the shared array (taking objects from the shared array), we are
making use of the shared array so it should be marked as touched.

Subsequently pulling an element from the cpu array and allocating it should
also touch the cpu array, but that is taken care of after the alloc_done label.
(So yes, the cpu array was getting touched = 1 twice).

So revert this logic to how it worked in earlier kernels.

This also affects the behaviour in __drain_alien_cache, which would previously
'touch' the shared array and now does not. I think it is more logical not to
touch there, because we are pushing objects into the shared array rather than
pulling them off. So there is no good reason to postpone reaping them -- if the
shared array is getting utilized, then it will get 'touched' in the alloc path
(where this patch now restores the touch).
Acked-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

44b57f1c

12 1月, 2010 1 次提交

slab: initialize unused alien cache entry as NULL at alloc_alien_cache(). · f3186a9c

由 Haicheng Li 提交于 1月 06, 2010

Comparing with existing code, it's a simpler way to use kzalloc_node()
to ensure that each unused alien cache entry is NULL.

CC: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NChristoph Lameter <cl@linux-foundation.org>
Acked-by: NMatt Mackall <mpm@selenic.com>
Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

f3186a9c

29 12月, 2009 1 次提交

SLAB: Fix lockdep annotation breakage · 00afa758

由 Pekka Enberg 提交于 12月 27, 2009

Commit ce79ddc8 ("SLAB: Fix lockdep annotations
for CPU hotplug") broke init_node_lock_keys() off-slab logic which causes
lockdep false positives.

Fix that up by reverting the logic back to original while keeping CPU hotplug
fixes intact.
Reported-and-tested-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Reported-and-tested-by: NAndi Kleen <andi@firstfloor.org>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

00afa758