提交 · 73736e0387ba0e6d2b703407b4d26168d31516a7 · openeuler / Kernel

14 12月, 2011 1 次提交

slub: fix a possible memleak in __slab_alloc() · 73736e03

由 Eric Dumazet 提交于 12月 13, 2011

Zhihua Che reported a possible memleak in slub allocator on
CONFIG_PREEMPT=y builds.

It is possible current thread migrates right before disabling irqs in
__slab_alloc(). We must check again c->freelist, and perform a normal
allocation instead of scratching c->freelist.

Many thanks to Zhihua Che for spotting this bug, introduced in 2.6.39

V2: Its also possible an IRQ freed one (or several) object(s) and
populated c->freelist, so its not a CONFIG_PREEMPT only problem.

Cc: <stable@vger.kernel.org>        [2.6.39+]
Reported-by: NZhihua Che <zhihua.che@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

73736e03

28 11月, 2011 1 次提交

slub: add missed accounting · 4c493a5a

由 Shaohua Li 提交于 11月 11, 2011

With per-cpu partial list, slab is added to partial list first and then moved
to node list. The __slab_free() code path for add/remove_partial is almost
deprecated(except for slub debug). But we forget to account add/remove_partial
when move per-cpu partial pages to node list, so the statistics for such events
are always 0. Add corresponding accounting.

This is against the patch "slub: use correct parameter to add a page to
partial list tail"
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

4c493a5a

24 11月, 2011 2 次提交

slub: avoid potential NULL dereference or corruption · bc6697d8

由 Eric Dumazet 提交于 11月 22, 2011

show_slab_objects() can trigger NULL dereferences or memory corruption.

Another cpu can change its c->page to NULL or c->node to NUMA_NO_NODE
while we use them.

Use ACCESS_ONCE(c->page) and ACCESS_ONCE(c->node) to make sure this
cannot happen.
Acked-by: NChristoph Lameter <cl@linux.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

bc6697d8

slub: use irqsafe_cpu_cmpxchg for put_cpu_partial · 42d623a8

由 Christoph Lameter 提交于 11月 23, 2011

The cmpxchg must be irq safe. The fallback for this_cpu_cmpxchg only
disables preemption which results in per cpu partial page operation
potentially failing on non x86 platforms.

This patch fixes the following problem reported by Christian Kujau:

  I seem to hit it with heavy disk & cpu IO is in progress on this
  PowerBook
  G4. Full dmesg & .config: http://nerdbynature.de/bits/3.2.0-rc1/oops/

  I've enabled some debug options and now it really points to slub.c:2166

    http://nerdbynature.de/bits/3.2.0-rc1/oops/oops4m.jpg

  With debug options enabled I'm currently in the xmon debugger, not sure
  what to make of it yet, I'll try to get something useful out of it :)
Reported-by: NChristian Kujau <lists@nerdbynature.de>
Tested-by: NChristian Kujau <lists@nerdbynature.de>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

42d623a8

17 11月, 2011 1 次提交

slub: add taint flag outputting to debug paths · 265d47e7

由 Dave Jones 提交于 11月 15, 2011

When we get corruption reports, it's useful to see if the kernel was
tainted, to rule out problems we can't do anything about.
Signed-off-by: NDave Jones <davej@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

265d47e7

16 11月, 2011 2 次提交

slub: move discard_slab out of node lock · 9ada1934

由 Shaohua Li 提交于 11月 14, 2011

Lockdep reports there is potential deadlock for slub node list_lock.
discard_slab() is called with the lock hold in unfreeze_partials(),
which could trigger a slab allocation, which could hold the lock again.

discard_slab() doesn't need hold the lock actually, if the slab is
already removed from partial list.
Acked-by: NChristoph Lameter <cl@linux.com>
Reported-and-tested-by: NYong Zhang <yong.zhang0@gmail.com>
Reported-and-tested-by: NJulie Sullivan <kernelmail.jms@gmail.com>
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

9ada1934

slub: use correct parameter to add a page to partial list tail · f64ae042

由 Shaohua Li 提交于 11月 11, 2011

unfreeze_partials() needs add the page to partial list tail, since such page
hasn't too many free objects. We now explictly use DEACTIVATE_TO_TAIL for this,
while DEACTIVATE_TO_TAIL != 1. This will cause performance regression (eg, more
lock contention in node->list_lock) without below fix.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

f64ae042

01 11月, 2011 1 次提交

lib/string.c: introduce memchr_inv() · 79824820

由 Akinobu Mita 提交于 10月 31, 2011

memchr_inv() is mainly used to check whether the whole buffer is filled
with just a specified byte.

The function name and prototype are stolen from logfs and the
implementation is from SLUB.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Acked-by: NChristoph Lameter <cl@linux-foundation.org>
Acked-by: NPekka Enberg <penberg@kernel.org>
Cc: Matt Mackall <mpm@selenic.com>
Acked-by: NJoern Engel <joern@logfs.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

79824820

28 9月, 2011 3 次提交

slub: Discard slab page when node partial > minimum partial number · dcc3be6a

由 Alex Shi 提交于 9月 06, 2011

Discarding slab should be done when node partial > min_partial.  Otherwise,
node partial slab may eat up all memory.
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

dcc3be6a

slub: correct comments error for per cpu partial · 9f264904

由 Alex Shi 提交于 9月 01, 2011

Correct comment errors, that mistake cpu partial objects number as pages
number, may make reader misunderstand.
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Reviewed-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

9f264904

mm: restrict access to slab files under procfs and sysfs · ab067e99

由 Vasiliy Kulikov 提交于 9月 27, 2011

Historically /proc/slabinfo and files under /sys/kernel/slab/* have
world read permissions and are accessible to the world.  slabinfo
contains rather private information related both to the kernel and
userspace tasks.  Depending on the situation, it might reveal either
private information per se or information useful to make another
targeted attack.  Some examples of what can be learned by
reading/watching for /proc/slabinfo entries:

1) dentry (and different *inode*) number might reveal other processes fs
activity.  The number of dentry "active objects" doesn't strictly show
file count opened/touched by a process, however, there is a good
correlation between them.  The patch "proc: force dcache drop on
unauthorized access" relies on the privacy of dentry count.

2) different inode entries might reveal the same information as (1), but
these are more fine granted counters.  If a filesystem is mounted in a
private mount point (or even a private namespace) and fs type differs from
other mounted fs types, fs activity in this mount point/namespace is
revealed.  If there is a single ecryptfs mount point, the whole fs
activity of a single user is revealed.  Number of files in ecryptfs
mount point is a private information per se.

3) fuse_* reveals number of files / fs activity of a user in a user
private mount point.  It is approx. the same severity as ecryptfs
infoleak in (2).

4) sysfs_dir_cache similar to (2) reveals devices' addition/removal,
which can be otherwise hidden by "chmod 0700 /sys/".  With 0444 slabinfo
the precise number of sysfs files is known to the world.

5) buffer_head might reveal some kernel activity.  With other
information leaks an attacker might identify what specific kernel
routines generate buffer_head activity.

6) *kmalloc* infoleaks are very situational.  Attacker should watch for
the specific kmalloc size entry and filter the noise related to the unrelated
kernel activity.  If an attacker has relatively silent victim system, he
might get rather precise counters.

Additional information sources might significantly increase the slabinfo
infoleak benefits.  E.g. if an attacker knows that the processes
activity on the system is very low (only core daemons like syslog and
cron), he may run setxid binaries / trigger local daemon activity /
trigger network services activity / await sporadic cron jobs activity
/ etc. and get rather precise counters for fs and network activity of
these privileged tasks, which is unknown otherwise.

Also hiding slabinfo and /sys/kernel/slab/* is a one step to complicate
exploitation of kernel heap overflows (and possibly, other bugs).  The
related discussion:

http://thread.gmane.org/gmane.linux.kernel/1108378

To keep compatibility with old permission model where non-root
monitoring daemon could watch for kernel memleaks though slabinfo one
should do:

    groupadd slabinfo
    usermod -a -G slabinfo $MONITOR_USER

And add the following commands to init scripts (to mountall.conf in
Ubuntu's upstart case):

    chmod g+r /proc/slabinfo /sys/kernel/slab/*/*
    chgrp slabinfo /proc/slabinfo /sys/kernel/slab/*/*
Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
Reviewed-by: NKees Cook <kees@ubuntu.com>
Reviewed-by: NDave Hansen <dave@linux.vnet.ibm.com>
Acked-by: NChristoph Lameter <cl@gentwo.org>
Acked-by: NDavid Rientjes <rientjes@google.com>
CC: Valdis.Kletnieks@vt.edu
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Alan Cox <alan@linux.intel.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

ab067e99

14 9月, 2011 1 次提交

slub: Code optimization in get_partial_node() · 12d79634

由 Alex,Shi 提交于 9月 07, 2011

I find a way to reduce a variable in get_partial_node(). That is also helpful
for code understanding.
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

12d79634

27 8月, 2011 2 次提交

slub: explicitly document position of inserting slab to partial list · 136333d1

由 Shaohua Li 提交于 8月 24, 2011

Adding slab to partial list head/tail is sensitive to performance.
So explicitly uses DEACTIVATE_TO_TAIL/DEACTIVATE_TO_HEAD to document
it to avoid we get it wrong.
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NShaohua Li <shli@kernel.org>
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

136333d1

slub: add slab with one free object to partial list tail · 130655ef

由 Shaohua Li 提交于 8月 23, 2011

The slab has just one free object, adding it to partial list head doesn't make
sense. And it can cause lock contentation. For example,
1. CPU takes the slab from partial list
2. fetch an object
3. switch to another slab
4. free an object, then the slab is added to partial list again
In this way n->list_lock will be heavily contended.
In fact, Alex had a hackbench regression. 3.1-rc1 performance drops about 70%
against 3.0. This patch fixes it.
Acked-by: NChristoph Lameter <cl@linux.com>
Reported-by: NAlex Shi <alex.shi@intel.com>
Signed-off-by: NShaohua Li <shli@kernel.org>
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

130655ef

20 8月, 2011 6 次提交

slub: per cpu cache for partial pages · 49e22585

由 Christoph Lameter 提交于 8月 09, 2011

Allow filling out the rest of the kmem_cache_cpu cacheline with pointers to
partial pages. The partial page list is used in slab_free() to avoid
per node lock taking.

In __slab_alloc() we can then take multiple partial pages off the per
node partial list in one go reducing node lock pressure.

We can also use the per cpu partial list in slab_alloc() to avoid scanning
partial lists for pages with free objects.

The main effect of a per cpu partial list is that the per node list_lock
is taken for batches of partial pages instead of individual ones.

Potential future enhancements:

1. The pickup from the partial list could be perhaps be done without disabling
   interrupts with some work. The free path already puts the page into the
   per cpu partial list without disabling interrupts.

2. __slab_free() may have some code paths that could use optimization.

Performance:

				Before		After
./hackbench 100 process 200000
				Time: 1953.047	1564.614
./hackbench 100 process 20000
				Time: 207.176   156.940
./hackbench 100 process 20000
				Time: 204.468	156.940
./hackbench 100 process 20000
				Time: 204.879	158.772
./hackbench 10 process 20000
				Time: 20.153	15.853
./hackbench 10 process 20000
				Time: 20.153	15.986
./hackbench 10 process 20000
				Time: 19.363	16.111
./hackbench 1 process 20000
				Time: 2.518	2.307
./hackbench 1 process 20000
				Time: 2.258	2.339
./hackbench 1 process 20000
				Time: 2.864	2.163
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

49e22585

slub: return object pointer from get_partial() / new_slab(). · 497b66f2

由 Christoph Lameter 提交于 8月 09, 2011

There is no need anymore to return the pointer to a slab page from get_partial()
since the page reference can be stored in the kmem_cache_cpu structures "page" field.

Return an object pointer instead.

That in turn allows a simplification of the spaghetti code in __slab_alloc().
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

497b66f2

slub: pass kmem_cache_cpu pointer to get_partial() · acd19fd1

由 Christoph Lameter 提交于 8月 09, 2011

Pass the kmem_cache_cpu pointer to get_partial(). That way
we can avoid the this_cpu_write() statements.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

acd19fd1

slub: Prepare inuse field in new_slab() · e6e82ea1

由 Christoph Lameter 提交于 8月 09, 2011

inuse will always be set to page->objects. There is no point in
initializing the field to zero in new_slab() and then overwriting
the value in __slab_alloc().
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

e6e82ea1

slub: Remove useless statements in __slab_alloc · 7db0d705

由 Christoph Lameter 提交于 8月 09, 2011

Two statements in __slab_alloc() do not have any effect.

1. c->page is already set to NULL by deactivate_slab() called right before.

2. gfpflags are masked in new_slab() before being passed to the page
   allocator. There is no need to mask gfpflags in __slab_alloc in particular
   since most frequent processing in __slab_alloc does not require the use of a
   gfpmask.

Cc: torvalds@linux-foundation.org
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

7db0d705

slub: free slabs without holding locks · 69cb8e6b

由 Christoph Lameter 提交于 8月 09, 2011

There are two situations in which slub holds a lock while releasing
pages:

	A. During kmem_cache_shrink()
	B. During kmem_cache_close()

For A build a list while holding the lock and then release the pages
later. In case of B we are the last remaining user of the slab so
there is no need to take the listlock.

After this patch all calls to the page allocator to free pages are
done without holding any spinlocks. kmem_cache_destroy() will still
hold the slub_lock semaphore.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

69cb8e6b

10 8月, 2011 1 次提交

slub: Fix partial count comparison confusion · 81107188

由 Christoph Lameter 提交于 8月 09, 2011

deactivate_slab() has the comparison if more than the minimum number of
partial pages are in the partial list wrong. An effect of this may be that
empty pages are not freed from deactivate_slab(). The result could be an
OOM due to growth of the partial slabs per node. Frees mostly occur from
__slab_free which is okay so this would only affect use cases where a lot
of switching around of per cpu slabs occur.

Switching per cpu slabs occurs with high frequency if debugging options are
enabled.
Reported-and-tested-by: NXiaotian Feng <xtfeng@gmail.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

81107188

09 8月, 2011 2 次提交

slub: fix check_bytes() for slub debugging · ef62fb32

由 Akinobu Mita 提交于 8月 07, 2011

The check_bytes() function is used by slub debugging.  It returns a pointer
to the first unmatching byte for a character in the given memory area.

If the character for matching byte is greater than 0x80, check_bytes()
doesn't work.  Becuase 64-bit pattern is generated as below.

	value64 = value | value << 8 | value << 16 | value << 24;
	value64 = value64 | value64 << 32;

The integer promotions are performed and sign-extended as the type of value
is u8.  The upper 32 bits of value64 is 0xffffffff in the first line, and
the second line has no effect.

This fixes the 64-bit pattern generation.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>
Reviewed-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

ef62fb32

slub: Fix full list corruption if debugging is on · 6fbabb20

由 Christoph Lameter 提交于 8月 08, 2011

When a slab is freed by __slab_free() and the slab can only contain a
single object ever then it was full (and therefore not on the partial
lists but on the full list in the debug case) before we reached
slab_empty.

This caused the following full list corruption when SLUB debugging was enabled:

  [ 5913.233035] ------------[ cut here ]------------
  [ 5913.233097] WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98()
  [ 5913.233101] Hardware name: Adamo 13
  [ 5913.233105] list_del corruption. prev->next should be ffffea000434fd20, but was ffffea0004199520
  [ 5913.233108] Modules linked in: nfs fscache fuse ebtable_nat ebtables ppdev parport_pc lp parport ipt_MASQUERADE iptable_nat nf_nat nfsd lockd nfs_acl auth_rpcgss xt_CHECKSUM sunrpc iptable_mangle bridge stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables rfcomm bnep arc4 iwlagn snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_intel btusb mac80211 snd_hda_codec bluetooth snd_hwdep snd_seq snd_seq_device snd_pcm usb_debug dell_wmi sparse_keymap cdc_ether usbnet cdc_acm uvcvideo cdc_wdm mii cfg80211 snd_timer dell_laptop videodev dcdbas snd microcode v4l2_compat_ioctl32 soundcore joydev tg3 pcspkr snd_page_alloc iTCO_wdt i2c_i801 rfkill iTCO_vendor_support wmi virtio_net kvm_intel kvm ipv6 xts gf128mul dm_crypt i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
  [ 5913.233213] Pid: 0, comm: swapper Not tainted 3.0.0+ #127
  [ 5913.233213] Call Trace:
  [ 5913.233213]  <IRQ>  [<ffffffff8105df18>] warn_slowpath_common+0x83/0x9b
  [ 5913.233213]  [<ffffffff8105dfd3>] warn_slowpath_fmt+0x46/0x48
  [ 5913.233213]  [<ffffffff8127e7c1>] __list_del_entry+0x8d/0x98
  [ 5913.233213]  [<ffffffff8127e7da>] list_del+0xe/0x2d
  [ 5913.233213]  [<ffffffff814e0430>] __slab_free+0x1db/0x235
  [ 5913.233213]  [<ffffffff811706ab>] ? bvec_free_bs+0x35/0x37
  [ 5913.233213]  [<ffffffff811706ab>] ? bvec_free_bs+0x35/0x37
  [ 5913.233213]  [<ffffffff811706ab>] ? bvec_free_bs+0x35/0x37
  [ 5913.233213]  [<ffffffff81133085>] kmem_cache_free+0x88/0x102
  [ 5913.233213]  [<ffffffff811706ab>] bvec_free_bs+0x35/0x37
  [ 5913.233213]  [<ffffffff811706e1>] bio_free+0x34/0x64
  [ 5913.233213]  [<ffffffff813dc390>] dm_bio_destructor+0x12/0x14
  [ 5913.233213]  [<ffffffff8116fef6>] bio_put+0x2b/0x2d
  [ 5913.233213]  [<ffffffff813dccab>] clone_endio+0x9e/0xb4
  [ 5913.233213]  [<ffffffff8116f7dd>] bio_endio+0x2d/0x2f
  [ 5913.233213]  [<ffffffffa00148da>] crypt_dec_pending+0x5c/0x8b [dm_crypt]
  [ 5913.233213]  [<ffffffffa00150a9>] crypt_endio+0x78/0x81 [dm_crypt]

[ Full discussion here: https://lkml.org/lkml/2011/8/4/375 ]

Make sure that we remove such a slab also from the full lists.
Reported-and-tested-by: NDave Jones <davej@redhat.com>
Reported-and-tested-by: NXiaotian Feng <xtfeng@gmail.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

6fbabb20

01 8月, 2011 1 次提交

slub: use print_hex_dump · ffc79d28

由 Sebastian Andrzej Siewior 提交于 7月 29, 2011

Less code and same functionality. The output would be:

| Object c7428000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
| Object c7428010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
| Object c7428020: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
| Object c7428030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5              kkkkkkkkkkk.
| Redzone c742803c: bb bb bb bb                                      ....
| Padding c7428064: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
| Padding c7428074: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a              ZZZZZZZZZZZZ
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

ffc79d28

26 7月, 2011 1 次提交

slub: When allocating a new slab also prep the first object · 9e577e8b

由 Christoph Lameter 提交于 7月 22, 2011

We need to branch to the debug code for the first object if we allocate
a new slab otherwise the first object will be marked wrongly as inactive.
Tested-by: NRabin Vincent <rabin@rab.in>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

9e577e8b

21 7月, 2011 1 次提交

treewide: fix potentially dangerous trailing ';' in #defined values/expressions · 497888cf

由 Phil Carmody 提交于 7月 14, 2011

All these are instances of
  #define NAME value;
or
  #define NAME(params_opt) value;

These of course fail to build when used in contexts like
  if(foo $OP NAME)
  while(bar $OP NAME)
and may silently generate the wrong code in contexts such as
  foo = NAME + 1;    /* foo = value; + 1; */
  bar = NAME - 1;    /* bar = value; - 1; */
  baz = NAME & quux; /* baz = value; & quux; */

Reported on comp.lang.c,
Message-ID: <ab0d55fe-25e5-482b-811e-c475aa6065c3@c29g2000yqd.googlegroups.com>
Initial analysis of the dangers provided by Keith Thompson in that thread.

There are many more instances of more complicated macros having unnecessary
trailing semicolons, but this pile seems to be all of the cases of simple
values suffering from the problem. (Thus things that are likely to be found
in one of the contexts above, more complicated ones aren't.)
Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

497888cf

18 7月, 2011 1 次提交

slub: disable interrupts in cmpxchg_double_slab when falling back to pagelock · 1d07171c

由 Christoph Lameter 提交于 7月 14, 2011

Split cmpxchg_double_slab into two functions. One for the case where we know that
interrupts are disabled (and therefore the fallback does not need to disable
interrupts) and one for the other cases where fallback will also disable interrupts.

This fixes the issue that __slab_free called cmpxchg_double_slab in some scenarios
without disabling interrupts.
Tested-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

1d07171c

08 7月, 2011 4 次提交

SLUB: Fix missing <linux/stacktrace.h> include · bfa71457

由 Pekka Enberg 提交于 7月 07, 2011

This fixes the following build breakage commit d6543e39 ("slub: Enable backtrace
for create/delete points"):

  CC      mm/slub.o
mm/slub.c: In function ‘set_track’:
mm/slub.c:428: error: storage size of ‘trace’ isn’t known
mm/slub.c:435: error: implicit declaration of function ‘save_stack_trace’
mm/slub.c:428: warning: unused variable ‘trace’
make[1]: *** [mm/slub.o] Error 1
make: *** [mm/slub.o] Error 2
Signed-off-by: NPekka Enberg <penberg@kernel.org>

bfa71457

slub: reduce overhead of slub_debug · c4089f98

由 Marcin Slusarz 提交于 6月 26, 2011

slub checks for poison one byte by one, which is highly inefficient
and shows up frequently as a highest cpu-eater in perf top.

Joining reads gives nice speedup:

(Compiling some project with different options)
                                 make -j12    make clean
slub_debug disabled:             1m 27s       1.2 s
slub_debug enabled:              1m 46s       7.6 s
slub_debug enabled + this patch: 1m 33s       3.2 s

check_bytes still shows up high, but not always at the top.
Signed-off-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: linux-mm@kvack.org
Signed-off-by: NPekka Enberg <penberg@kernel.org>

c4089f98

slub: Add method to verify memory is not freed · d18a90dd

由 Ben Greear 提交于 7月 07, 2011

This is for tracking down suspect memory usage.
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NBen Greear <greearb@candelatech.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

d18a90dd

slub: Enable backtrace for create/delete points · d6543e39

由 Ben Greear 提交于 7月 07, 2011

This patch attempts to grab a backtrace for the creation
and deletion points of the slub object.  When a fault is
detected, we can then get a better idea of where the item
was deleted.

Example output from debugging some funky nfs/rpc behaviour:

=============================================================================
BUG kmalloc-64: Object is on free-list
-----------------------------------------------------------------------------

INFO: Allocated in rpcb_getport_async+0x39c/0x5a5 [sunrpc] age=381 cpu=3 pid=3750
       __slab_alloc+0x348/0x3ba
       kmem_cache_alloc_trace+0x67/0xe7
       rpcb_getport_async+0x39c/0x5a5 [sunrpc]
       call_bind+0x70/0x75 [sunrpc]
       __rpc_execute+0x78/0x24b [sunrpc]
       rpc_execute+0x3d/0x42 [sunrpc]
       rpc_run_task+0x79/0x81 [sunrpc]
       rpc_call_sync+0x3f/0x60 [sunrpc]
       rpc_ping+0x42/0x58 [sunrpc]
       rpc_create+0x4aa/0x527 [sunrpc]
       nfs_create_rpc_client+0xb1/0xf6 [nfs]
       nfs_init_client+0x3b/0x7d [nfs]
       nfs_get_client+0x453/0x5ab [nfs]
       nfs_create_server+0x10b/0x437 [nfs]
       nfs_fs_mount+0x4ca/0x708 [nfs]
       mount_fs+0x6b/0x152
INFO: Freed in rpcb_map_release+0x3f/0x44 [sunrpc] age=30 cpu=2 pid=29049
       __slab_free+0x57/0x150
       kfree+0x107/0x13a
       rpcb_map_release+0x3f/0x44 [sunrpc]
       rpc_release_calldata+0x12/0x14 [sunrpc]
       rpc_free_task+0x59/0x61 [sunrpc]
       rpc_final_put_task+0x82/0x8a [sunrpc]
       __rpc_execute+0x23c/0x24b [sunrpc]
       rpc_async_schedule+0x10/0x12 [sunrpc]
       process_one_work+0x230/0x41d
       worker_thread+0x133/0x217
       kthread+0x7d/0x85
       kernel_thread_helper+0x4/0x10
INFO: Slab 0xffffea00029aa470 objects=20 used=9 fp=0xffff8800be7830d8 flags=0x20000000004081
INFO: Object 0xffff8800be7830d8 @offset=4312 fp=0xffff8800be7827a8

Bytes b4 0xffff8800be7830c8:  87 a8 96 00 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a .�......ZZZZZZZZ
 Object 0xffff8800be7830d8:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
 Object 0xffff8800be7830e8:  6b 6b 6b 6b 01 08 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkk..kkkkkkkkkk
 Object 0xffff8800be7830f8:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
 Object 0xffff8800be783108:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk�
 Redzone 0xffff8800be783118:  bb bb bb bb bb bb bb bb                         �������������
 Padding 0xffff8800be783258:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ
Pid: 29049, comm: kworker/2:2 Not tainted 3.0.0-rc4+ #8
Call Trace:
 [<ffffffff811055c3>] print_trailer+0x131/0x13a
 [<ffffffff81105601>] object_err+0x35/0x3e
 [<ffffffff8110746f>] verify_mem_not_deleted+0x7a/0xb7
 [<ffffffffa02851b5>] rpcb_getport_done+0x23/0x126 [sunrpc]
 [<ffffffffa027d0ba>] rpc_exit_task+0x3f/0x6d [sunrpc]
 [<ffffffffa027d4ab>] __rpc_execute+0x78/0x24b [sunrpc]
 [<ffffffffa027d6c0>] ? rpc_execute+0x42/0x42 [sunrpc]
 [<ffffffffa027d6d0>] rpc_async_schedule+0x10/0x12 [sunrpc]
 [<ffffffff810611b7>] process_one_work+0x230/0x41d
 [<ffffffff81061102>] ? process_one_work+0x17b/0x41d
 [<ffffffff81063613>] worker_thread+0x133/0x217
 [<ffffffff810634e0>] ? manage_workers+0x191/0x191
 [<ffffffff81066e10>] kthread+0x7d/0x85
 [<ffffffff81485924>] kernel_thread_helper+0x4/0x10
 [<ffffffff8147eb18>] ? retint_restore_args+0x13/0x13
 [<ffffffff81066d93>] ? __init_kthread_worker+0x56/0x56
 [<ffffffff81485920>] ? gs_change+0x13/0x13
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NBen Greear <greearb@candelatech.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

d6543e39

02 7月, 2011 9 次提交

slub: Not necessary to check for empty slab on load_freelist · 4eade540

由 Christoph Lameter 提交于 6月 01, 2011

load_freelist is now only branched to only if there are objects available.
So no need to check the object variable for NULL.
Signed-off-by: NPekka Enberg <penberg@kernel.org>

4eade540

slub: fast release on full slab · 03e404af

由 Christoph Lameter 提交于 6月 01, 2011

Make deactivation occur implicitly while checking out the current freelist.

This avoids one cmpxchg operation on a slab that is now fully in use.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

03e404af

slub: Add statistics for the case that the current slab does not match the node · e36a2652

由 Christoph Lameter 提交于 6月 01, 2011

Slub reloads the per cpu slab if the page does not satisfy the NUMA condition. Track
those reloads since doing so has a performance impact.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

e36a2652

slub: Get rid of the another_slab label · fc59c053

由 Christoph Lameter 提交于 6月 01, 2011

We can avoid deactivate slab in special cases if we do the
deactivation of slabs in each code flow that leads to new_slab.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

fc59c053

slub: Avoid disabling interrupts in free slowpath · 80f08c19

由 Christoph Lameter 提交于 6月 01, 2011

Disabling interrupts can be avoided now. However, list operation still require
disabling interrupts since allocations can occur from interrupt
contexts and there is no way to perform atomic list operations.

The acquition of the list_lock therefore has to disable interrupts as well.

Dropping interrupt handling significantly simplifies the slowpath.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

80f08c19

slub: Disable interrupts in free_debug processing · 5c2e4bbb

由 Christoph Lameter 提交于 6月 01, 2011

We will be calling free_debug_processing with interrupts disabled
in some case when the later patches are applied. Some of the
functions called by free_debug_processing expect interrupts to be
off.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

5c2e4bbb

slub: Invert locking and avoid slab lock · 881db7fb

由 Christoph Lameter 提交于 6月 01, 2011

Locking slabs is no longer necesary if the arch supports cmpxchg operations
and if no debuggin features are used on a slab. If the arch does not support
cmpxchg then we fallback to use the slab lock to do a cmpxchg like operation.

The patch also changes the lock order. Slab locks are subsumed to the node lock
now. With that approach slab_trylocking is no longer necessary.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

881db7fb

slub: Rework allocator fastpaths · 2cfb7455

由 Christoph Lameter 提交于 6月 01, 2011

Rework the allocation paths so that updates of the page freelist, frozen state
and number of objects use cmpxchg_double_slab().
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

2cfb7455

slub: Pass kmem_cache struct to lock and freeze slab · 61728d1e

由 Christoph Lameter 提交于 6月 01, 2011

We need more information about the slab for the cmpxchg implementation.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

61728d1e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功