提交 · bcee6e2a13d580f6c21d748fcd7239ccc66cb4b8 · openeuler / Kernel

24 10月, 2012 2 次提交

mm/sl[au]b: Move print_slabinfo_header to slab_common.c · bcee6e2a

由 Glauber Costa 提交于 10月 19, 2012

The header format is highly similar between slab and slub. The main
difference lays in the fact that slab may optionally have statistics
added here in case of CONFIG_SLAB_DEBUG, while the slub will stick them
somewhere else.

By making sure that information conditionally lives inside a
globally-visible CONFIG_DEBUG_SLAB switch, we can move the header
printing to a common location.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NChristoph Lameter <cl@linux.com>
CC: David Rientjes <rientjes@google.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

bcee6e2a

mm/sl[au]b: Move slabinfo processing to slab_common.c · b7454ad3

由 Glauber Costa 提交于 10月 19, 2012

This patch moves all the common machinery to slabinfo processing
to slab_common.c. We can do better by noticing that the output is
heavily common, and having the allocators to just provide finished
information about this. But after this first step, this can be done
easier.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NChristoph Lameter <cl@linux.com>
CC: David Rientjes <rientjes@google.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

b7454ad3

03 10月, 2012 1 次提交

slub: init_kmem_cache_cpus() and put_cpu_partial() can be static · 788e1aad

由 Fengguang Wu 提交于 9月 28, 2012

Acked-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

788e1aad

25 9月, 2012 1 次提交

mm, slub: Rename slab_alloc() -> slab_alloc_node() to match SLAB · 2b847c3c

由 Ezequiel Garcia 提交于 9月 08, 2012

This patch does not fix anything, and its only goal is to enable us
to obtain some common code between SLAB and SLUB.
Neither behavior nor produced code is affected.

Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: NEzequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

2b847c3c

19 9月, 2012 1 次提交

mm, sl[au]b: Taint kernel when we detect a corrupted slab · 645df230

由 Dave Jones 提交于 9月 18, 2012

It doesn't seem worth adding a new taint flag for this, so just re-use
the one from 'bad page'

Acked-by: Christoph Lameter <cl@linux.com> # SLUB
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NDave Jones <davej@redhat.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

645df230

18 9月, 2012 1 次提交

slub: consider pfmemalloc_match() in get_partial_node() · 8ba00bb6

由 Joonsoo Kim 提交于 9月 17, 2012

get_partial() is currently not checking pfmemalloc_match() meaning that
it is possible for pfmemalloc pages to leak to non-pfmemalloc users.
This is a problem in the following situation.  Assume that there is a
request from normal allocation and there are no objects in the per-cpu
cache and no node-partial slab.

In this case, slab_alloc enters the slow path and new_slab_objects() is
called which may return a PFMEMALLOC page.  As the current user is not
allowed to access PFMEMALLOC page, deactivate_slab() is called
([5091b74a: mm: slub: optimise the SLUB fast path to avoid pfmemalloc
checks]) and returns an object from PFMEMALLOC page.

Next time, when we get another request from normal allocation,
slab_alloc() enters the slow-path and calls new_slab_objects().  In
new_slab_objects(), we call get_partial() and get a partial slab which
was just deactivated but is a pfmemalloc page.  We extract one object
from it and re-deactivate.

  "deactivate -> re-get in get_partial -> re-deactivate" occures repeatedly.

As a result, access to PFMEMALLOC page is not properly restricted and it
can cause a performance degradation due to frequent deactivation.
deactivation frequently.

This patch changes get_partial_node() to take pfmemalloc_match() into
account and prevents the "deactivate -> re-get in get_partial()
scenario.  Instead, new_slab() is called.
Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ba00bb6

10 9月, 2012 1 次提交

slub: Zero initial memory segment for kmem_cache and kmem_cache_node · 9df53b15

由 Christoph Lameter 提交于 9月 08, 2012

Tony Luck reported the following problem on IA-64:

  Worked fine yesterday on next-20120905, crashes today. First sign of
  trouble was an unaligned access, then a NULL dereference. SL*B related
  bits of my config:

  CONFIG_SLUB_DEBUG=y
  # CONFIG_SLAB is not set
  CONFIG_SLUB=y
  CONFIG_SLABINFO=y
  # CONFIG_SLUB_DEBUG_ON is not set
  # CONFIG_SLUB_STATS is not set

  And he console log.

  PID hash table entries: 4096 (order: 1, 32768 bytes)
  Dentry cache hash table entries: 262144 (order: 7, 2097152 bytes)
  Inode-cache hash table entries: 131072 (order: 6, 1048576 bytes)
  Memory: 2047920k/2086064k available (13992k code, 38144k reserved,
  6012k data, 880k init)
  kernel unaligned access to 0xca2ffc55fb373e95, ip=0xa0000001001be550
  swapper[0]: error during unaligned kernel access
   -1 [1]
  Modules linked in:

  Pid: 0, CPU 0, comm:              swapper
  psr : 00001010084a2018 ifs : 800000000000060f ip  :
  [<a0000001001be550>]    Not tainted (3.6.0-rc4-zx1-smp-next-20120906)
  ip is at new_slab+0x90/0x680
  unat: 0000000000000000 pfs : 000000000000060f rsc : 0000000000000003
  rnat: 9666960159966a59 bsps: a0000001001441c0 pr  : 9666960159965a59
  ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f
  csd : 0000000000000000 ssd : 0000000000000000
  b0  : a0000001001be500 b6  : a00000010112cb20 b7  : a0000001011660a0
  f6  : 0fff7f0f0f0f0e54f0000 f7  : 0ffe8c5c1000000000000
  f8  : 1000d8000000000000000 f9  : 100068800000000000000
  f10 : 10005f0f0f0f0e54f0000 f11 : 1003e0000000000000078
  r1  : a00000010155eef0 r2  : 0000000000000000 r3  : fffffffffffc1638
  r8  : e0000040600081b8 r9  : ca2ffc55fb373e95 r10 : 0000000000000000
  r11 : e000004040001646 r12 : a000000101287e20 r13 : a000000101280000
  r14 : 0000000000004000 r15 : 0000000000000078 r16 : ca2ffc55fb373e75
  r17 : e000004040040000 r18 : fffffffffffc1646 r19 : e000004040001646
  r20 : fffffffffffc15f8 r21 : 000000000000004d r22 : a00000010132fa68
  r23 : 00000000000000ed r24 : 0000000000000000 r25 : 0000000000000000
  r26 : 0000000000000001 r27 : a0000001012b8500 r28 : a00000010135f4a0
  r29 : 0000000000000000 r30 : 0000000000000000 r31 : 0000000000000001
  Unable to handle kernel NULL pointer dereference (address
  0000000000000018)
  swapper[0]: Oops 11003706212352 [2]
  Modules linked in:

  Pid: 0, CPU 0, comm:              swapper
  psr : 0000121008022018 ifs : 800000000000cc18 ip  :
  [<a0000001004dc8f1>]    Not tainted (3.6.0-rc4-zx1-smp-next-20120906)
  ip is at __copy_user+0x891/0x960
  unat: 0000000000000000 pfs : 0000000000000813 rsc : 0000000000000003
  rnat: 0000000000000000 bsps: 0000000000000000 pr  : 9666960159961765
  ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
  csd : 0000000000000000 ssd : 0000000000000000
  b0  : a00000010004b550 b6  : a00000010004b740 b7  : a00000010000c750
  f6  : 000000000000000000000 f7  : 1003e9e3779b97f4a7c16
  f8  : 1003e0a00000010001550 f9  : 100068800000000000000
  f10 : 10005f0f0f0f0e54f0000 f11 : 1003e0000000000000078
  r1  : a00000010155eef0 r2  : a0000001012870b0 r3  : a0000001012870b8
  r8  : 0000000000000298 r9  : 0000000000000013 r10 : 0000000000000000
  r11 : 9666960159961a65 r12 : a000000101287010 r13 : a000000101280000
  r14 : a000000101287068 r15 : a000000101287080 r16 : 0000000000000298
  r17 : 0000000000000010 r18 : 0000000000000018 r19 : a000000101287310
  r20 : 0000000000000290 r21 : 0000000000000000 r22 : 0000000000000000
  r23 : a000000101386f58 r24 : 0000000000000000 r25 : 000000007fffffff
  r26 : a000000101287078 r27 : a0000001013c69b0 r28 : 0000000000000000
  r29 : 0000000000000014 r30 : 0000000000000000 r31 : 0000000000000813

Sedat Dilek and Hugh Dickins reported similar problems as well.

Earlier patches in the common set moved the zeroing of the kmem_cache
structure into common code. See "Move allocation of kmem_cache into
common code".

The allocation for the two special structures is still done from SLUB
specific code but no zeroing is done since the cache creation functions
used to zero. This now needs to be updated so that the structures are
zeroed during allocation in kmem_cache_init().  Otherwise random pointer
values may be followed.
Reported-by: NTony Luck <tony.luck@intel.com>
Reported-by: NSedat Dilek <sedat.dilek@gmail.com>
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
Reported-by: NHugh Dickins <hughd@google.com>
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

9df53b15

05 9月, 2012 14 次提交

Revert "mm/sl[aou]b: Move sysfs_slab_add to common" · aac3a166

由 Pekka Enberg 提交于 9月 05, 2012

This reverts commit 96d17b7b which
caused the following errors at boot:

  [    1.114885] kobject (ffff88001a802578): tried to init an initialized object, something is seriously wrong.
  [    1.114885] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ #6
  [    1.114885] Call Trace:
  [    1.114885]  [<ffffffff81273f37>] kobject_init+0x87/0xa0
  [    1.115555]  [<ffffffff8127426a>] kobject_init_and_add+0x2a/0x90
  [    1.115555]  [<ffffffff8127c870>] ? sprintf+0x40/0x50
  [    1.115555]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
  [    1.115555]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
  [    1.115555]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
  [    1.115555]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
  [    1.115555]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
  [    1.115836]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
  [    1.116834]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
  [    1.117835]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
  [    1.117835]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
  [    1.118401]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
  [    1.119832]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
  [    1.120325] ------------[ cut here ]------------
  [    1.120835] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xc1/0xf0()
  [    1.121437] sysfs: cannot create duplicate filename '/kernel/slab/:t-0000016'
  [    1.121831] Modules linked in:
  [    1.122138] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ #6
  [    1.122831] Call Trace:
  [    1.123074]  [<ffffffff81195ce1>] ? sysfs_add_one+0xc1/0xf0
  [    1.123833]  [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0
  [    1.124405]  [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50
  [    1.124832]  [<ffffffff81195ce1>] sysfs_add_one+0xc1/0xf0
  [    1.125337]  [<ffffffff81195eb3>] create_dir+0x73/0xd0
  [    1.125832]  [<ffffffff81196221>] sysfs_create_dir+0x81/0xe0
  [    1.126363]  [<ffffffff81273d3d>] kobject_add_internal+0x9d/0x210
  [    1.126832]  [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90
  [    1.127406]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
  [    1.127832]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
  [    1.128384]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
  [    1.128833]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
  [    1.129831]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
  [    1.130305]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
  [    1.130831]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
  [    1.131351]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
  [    1.131830]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
  [    1.132392]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
  [    1.132830]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
  [    1.133315] ---[ end trace 2703540871c8fab7 ]---
  [    1.133830] ------------[ cut here ]------------
  [    1.134274] WARNING: at lib/kobject.c:196 kobject_add_internal+0x1f5/0x210()
  [    1.134829] kobject_add_internal failed for :t-0000016 with -EEXIST, don't try to register things with the same name in the same directory.
  [    1.135829] Modules linked in:
  [    1.136135] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ #6
  [    1.136828] Call Trace:
  [    1.137071]  [<ffffffff81273e95>] ? kobject_add_internal+0x1f5/0x210
  [    1.137830]  [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0
  [    1.138402]  [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50
  [    1.138830]  [<ffffffff811955a3>] ? release_sysfs_dirent+0x73/0xf0
  [    1.139419]  [<ffffffff81273e95>] kobject_add_internal+0x1f5/0x210
  [    1.139830]  [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90
  [    1.140429]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
  [    1.140830]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
  [    1.141829]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
  [    1.142307]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
  [    1.142829]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
  [    1.143307]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
  [    1.143829]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
  [    1.144352]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
  [    1.144829]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
  [    1.145405]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
  [    1.145828]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
  [    1.146313] ---[ end trace 2703540871c8fab8 ]---

Conflicts:

	mm/slub.c
Signed-off-by: NPekka Enberg <penberg@kernel.org>

aac3a166

mm/sl[aou]b: Move kmem_cache refcounting to common code · cce89f4f

由 Christoph Lameter 提交于 9月 04, 2012

Get rid of the refcount stuff in the allocators and do that part of
kmem_cache management in the common code.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

cce89f4f

mm/sl[aou]b: Shrink __kmem_cache_create() parameter lists · 8a13a4cc

由 Christoph Lameter 提交于 9月 04, 2012

Do the initial settings of the fields in common code. This will allow us
to push more processing into common code later and improve readability.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

8a13a4cc

mm/sl[aou]b: Move kmem_cache allocations into common code · 278b1bb1

由 Christoph Lameter 提交于 9月 05, 2012

Shift the allocations to common code. That way the allocation and
freeing of the kmem_cache structures is handled by common code.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

278b1bb1

mm/sl[aou]b: Move sysfs_slab_add to common · 96d17b7b

由 Christoph Lameter 提交于 9月 05, 2012

Simplify locking by moving the slab_add_sysfs after all locks have been
dropped. Eases the upcoming move to provide sysfs support for all
allocators.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

96d17b7b

mm/sl[aou]b: Do slab aliasing call from common code · cbb79694

由 Christoph Lameter 提交于 9月 05, 2012

The slab aliasing logic causes some strange contortions in slub. So add
a call to deal with aliases to slab_common.c but disable it for other
slab allocators by providng stubs that fail to create aliases.

Full general support for aliases will require additional cleanup passes
and more standardization of fields in kmem_cache.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

cbb79694

mm/sl[aou]b: Move duping of slab name to slab_common.c · db265eca

由 Christoph Lameter 提交于 9月 04, 2012

Duping of the slabname has to be done by each slab. Moving this code to
slab_common avoids duplicate implementations.

With this patch we have common string handling for all slab allocators.
Strings passed to kmem_cache_create() are copied internally. Subsystems
can create temporary strings to create slab caches.

Slabs allocated in early states of bootstrap will never be freed (and
those can never be freed since they are essential to slab allocator
operations).  During bootstrap we therefore do not have to worry about
duping names.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

db265eca

mm/sl[aou]b: Get rid of __kmem_cache_destroy · 12c3667f

由 Christoph Lameter 提交于 9月 04, 2012

What is done there can be done in __kmem_cache_shutdown.

This affects RCU handling somewhat. On rcu free all slab allocators do
not refer to other management structures than the kmem_cache structure.
Therefore these other structures can be freed before the rcu deferred
free to the page allocator occurs.
Reviewed-by: NJoonsoo Kim <js1304@gmail.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

12c3667f

mm/sl[aou]b: Move freeing of kmem_cache structure to common code · 8f4c765c

由 Christoph Lameter 提交于 9月 05, 2012

The freeing action is basically the same in all slab allocators.
Move to the common kmem_cache_destroy() function.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NJoonsoo Kim <js1304@gmail.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

8f4c765c

mm/sl[aou]b: Use "kmem_cache" name for slab cache with kmem_cache struct · 9b030cb8

由 Christoph Lameter 提交于 9月 05, 2012

Make all allocators use the "kmem_cache" slabname for the "kmem_cache"
structure.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NJoonsoo Kim <js1304@gmail.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

9b030cb8

mm/sl[aou]b: Extract a common function for kmem_cache_destroy · 945cf2b6

由 Christoph Lameter 提交于 9月 04, 2012

kmem_cache_destroy does basically the same in all allocators.

Extract common code which is easy since we already have common mutex
handling.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

945cf2b6

mm/sl[aou]b: Move list_add() to slab_common.c · 7c9adf5a

由 Christoph Lameter 提交于 9月 04, 2012

Move the code to append the new kmem_cache to the list of slab caches to
the kmem_cache_create code in the shared code.

This is possible now since the acquisition of the mutex was moved into
kmem_cache_create().
Acked-by: NDavid Rientjes <rientjes@google.com>
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NJoonsoo Kim <js1304@gmail.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

7c9adf5a

mm/slub: Use kmem_cache for the kmem_cache structure · 208c4358

由 Christoph Lameter 提交于 9月 04, 2012

Do not use kmalloc() but kmem_cache_alloc() for the allocation
of the kmem_cache structures in slub.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

208c4358

mm/slub: Add debugging to verify correct cache use on kmem_cache_free() · 79576102

由 Christoph Lameter 提交于 9月 04, 2012

Add additional debugging to check that the objects is actually from the cache
the caller claims. Doing so currently trips up some other debugging code. It
takes a lot to infer from that what was happening.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
[ penberg@kernel.org: Use pr_err() ]
Signed-off-by: NPekka Enberg <penberg@kernel.org>

79576102

16 8月, 2012 3 次提交

slub: reduce failure of this_cpu_cmpxchg in put_cpu_partial() after unfreezing · e24fc410

由 Joonsoo Kim 提交于 6月 23, 2012

In current implementation, after unfreezing, we doesn't touch oldpage,
so it remain 'NOT NULL'. When we call this_cpu_cmpxchg()
with this old oldpage, this_cpu_cmpxchg() is mostly be failed.

We can change value of oldpage to NULL after unfreezing,
because unfreeze_partial() ensure that all the cpu partial slabs is removed
from cpu partial list. In this time, we could expect that
this_cpu_cmpxchg is mostly succeed.
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

e24fc410

slub: Take node lock during object free checks · 19c7ff9e

由 Christoph Lameter 提交于 5月 30, 2012

Only applies to scenarios where debugging is on:

Validation of slabs can currently occur while debugging
information is updated from the fast paths of the allocator.
This results in various races where we get false reports about
slab metadata not being in order.

This patch makes the fast paths take the node lock so that
serialization with slab validation will occur. Causes additional
slowdown in debug scenarios.
Reported-by: NWaiman Long <Waiman.Long@hp.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

19c7ff9e

slub: use free_page instead of put_page for freeing kmalloc allocation · d9b7f226

由 Glauber Costa 提交于 8月 03, 2012

When freeing objects, the slub allocator will most of the time free
empty pages by calling __free_pages(). But high-order kmalloc will be
diposed by means of put_page() instead. It makes no sense to call
put_page() in kernel pages that are provided by the object allocators,
so we shouldn't be doing this ourselves. Aside from the consistency
change, we don't change the flow too much. put_page()'s would call its
dtor function, which is __free_pages. We also already do all of the
Compound page tests ourselves, and the Mlock test we lose don't really
matter.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NChristoph Lameter <cl@linux.com>
CC: David Rientjes <rientjes@google.com>
CC: Pekka Enberg <penberg@kernel.org>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

d9b7f226

01 8月, 2012 2 次提交

mm: slub: optimise the SLUB fast path to avoid pfmemalloc checks · 5091b74a

由 Christoph Lameter 提交于 7月 31, 2012

This patch removes the check for pfmemalloc from the alloc hotpath and
puts the logic after the election of a new per cpu slab.  For a pfmemalloc
page we do not use the fast path but force the use of the slow path which
is also used for the debug case.

This has the side-effect of weakening pfmemalloc processing in the
following way;

1. A process that is allocating for network swap calls __slab_alloc.
   pfmemalloc_match is true so the freelist is loaded and c->freelist is
   now pointing to a pfmemalloc page.

2. A process that is attempting normal allocations calls slab_alloc,
   finds the pfmemalloc page on the freelist and uses it because it did
   not check pfmemalloc_match()

The patch allows non-pfmemalloc allocations to use pfmemalloc pages with
the kmalloc slabs being the most vunerable caches on the grounds they
are most likely to have a mix of pfmemalloc and !pfmemalloc requests. A
later patch will still protect the system as processes will get throttled
if the pfmemalloc reserves get depleted but performance will not degrade
as smoothly.

[mgorman@suse.de: Expanded changelog]
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5091b74a

mm: sl[au]b: add knowledge of PFMEMALLOC reserve pages · 072bb0aa

由 Mel Gorman 提交于 7月 31, 2012

When a user or administrator requires swap for their application, they
create a swap partition and file, format it with mkswap and activate it
with swapon.  Swap over the network is considered as an option in diskless
systems.  The two likely scenarios are when blade servers are used as part
of a cluster where the form factor or maintenance costs do not allow the
use of disks and thin clients.

The Linux Terminal Server Project recommends the use of the Network Block
Device (NBD) for swap according to the manual at
https://sourceforge.net/projects/ltsp/files/Docs-Admin-Guide/LTSPManual.pdf/download
There is also documentation and tutorials on how to setup swap over NBD at
places like https://help.ubuntu.com/community/UbuntuLTSP/EnableNBDSWAP The
nbd-client also documents the use of NBD as swap.  Despite this, the fact
is that a machine using NBD for swap can deadlock within minutes if swap
is used intensively.  This patch series addresses the problem.

The core issue is that network block devices do not use mempools like
normal block devices do.  As the host cannot control where they receive
packets from, they cannot reliably work out in advance how much memory
they might need.  Some years ago, Peter Zijlstra developed a series of
patches that supported swap over an NFS that at least one distribution is
carrying within their kernels.  This patch series borrows very heavily
from Peter's work to support swapping over NBD as a pre-requisite to
supporting swap-over-NFS.  The bulk of the complexity is concerned with
preserving memory that is allocated from the PFMEMALLOC reserves for use
by the network layer which is needed for both NBD and NFS.

Patch 1 adds knowledge of the PFMEMALLOC reserves to SLAB and SLUB to
	preserve access to pages allocated under low memory situations
	to callers that are freeing memory.

Patch 2 optimises the SLUB fast path to avoid pfmemalloc checks

Patch 3 introduces __GFP_MEMALLOC to allow access to the PFMEMALLOC
	reserves without setting PFMEMALLOC.

Patch 4 opens the possibility for softirqs to use PFMEMALLOC reserves
	for later use by network packet processing.

Patch 5 only sets page->pfmemalloc when ALLOC_NO_WATERMARKS was required

Patch 6 ignores memory policies when ALLOC_NO_WATERMARKS is set.

Patches 7-12 allows network processing to use PFMEMALLOC reserves when
	the socket has been marked as being used by the VM to clean pages. If
	packets are received and stored in pages that were allocated under
	low-memory situations and are unrelated to the VM, the packets
	are dropped.

	Patch 11 reintroduces __skb_alloc_page which the networking
	folk may object to but is needed in some cases to propogate
	pfmemalloc from a newly allocated page to an skb. If there is a
	strong objection, this patch can be dropped with the impact being
	that swap-over-network will be slower in some cases but it should
	not fail.

Patch 13 is a micro-optimisation to avoid a function call in the
	common case.

Patch 14 tags NBD sockets as being SOCK_MEMALLOC so they can use
	PFMEMALLOC if necessary.

Patch 15 notes that it is still possible for the PFMEMALLOC reserve
	to be depleted. To prevent this, direct reclaimers get throttled on
	a waitqueue if 50% of the PFMEMALLOC reserves are depleted.  It is
	expected that kswapd and the direct reclaimers already running
	will clean enough pages for the low watermark to be reached and
	the throttled processes are woken up.

Patch 16 adds a statistic to track how often processes get throttled

Some basic performance testing was run using kernel builds, netperf on
loopback for UDP and TCP, hackbench (pipes and sockets), iozone and
sysbench.  Each of them were expected to use the sl*b allocators
reasonably heavily but there did not appear to be significant performance
variances.

For testing swap-over-NBD, a machine was booted with 2G of RAM with a
swapfile backed by NBD.  8*NUM_CPU processes were started that create
anonymous memory mappings and read them linearly in a loop.  The total
size of the mappings were 4*PHYSICAL_MEMORY to use swap heavily under
memory pressure.

Without the patches and using SLUB, the machine locks up within minutes
and runs to completion with them applied.  With SLAB, the story is
different as an unpatched kernel run to completion.  However, the patched
kernel completed the test 45% faster.

MICRO
                                         3.5.0-rc2 3.5.0-rc2
					 vanilla     swapnbd
Unrecognised test vmscan-anon-mmap-write
MMTests Statistics: duration
Sys Time Running Test (seconds)             197.80    173.07
User+Sys Time Running Test (seconds)        206.96    182.03
Total Elapsed Time (seconds)               3240.70   1762.09

This patch: mm: sl[au]b: add knowledge of PFMEMALLOC reserve pages

Allocations of pages below the min watermark run a risk of the machine
hanging due to a lack of memory.  To prevent this, only callers who have
PF_MEMALLOC or TIF_MEMDIE set and are not processing an interrupt are
allowed to allocate with ALLOC_NO_WATERMARKS.  Once they are allocated to
a slab though, nothing prevents other callers consuming free objects
within those slabs.  This patch limits access to slab pages that were
alloced from the PFMEMALLOC reserves.

When this patch is applied, pages allocated from below the low watermark
are returned with page->pfmemalloc set and it is up to the caller to
determine how the page should be protected.  SLAB restricts access to any
page with page->pfmemalloc set to callers which are known to able to
access the PFMEMALLOC reserve.  If one is not available, an attempt is
made to allocate a new page rather than use a reserve.  SLUB is a bit more
relaxed in that it only records if the current per-CPU page was allocated
from PFMEMALLOC reserve and uses another partial slab if the caller does
not have the necessary GFP or process flags.  This was found to be
sufficient in tests to avoid hangs due to SLUB generally maintaining
smaller lists than SLAB.

In low-memory conditions it does mean that !PFMEMALLOC allocators can fail
a slab allocation even though free objects are available because they are
being preserved for callers that are freeing pages.

[a.p.zijlstra@chello.nl: Original implementation]
[sebastian@breakpoint.cc: Correct order of page flag clearing]
Signed-off-by: NMel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

072bb0aa

11 7月, 2012 1 次提交

mm, slub: ensure irqs are enabled for kmemcheck · 737b719e

由 David Rientjes 提交于 7月 09, 2012

kmemcheck_alloc_shadow() requires irqs to be enabled, so wait to disable
them until after its called for __GFP_WAIT allocations.

This fixes a warning for such allocations:

	WARNING: at kernel/lockdep.c:2739 lockdep_trace_alloc+0x14e/0x1c0()
Acked-by: NFengguang Wu <fengguang.wu@intel.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Tested-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

737b719e

09 7月, 2012 5 次提交

mm, sl[aou]b: Move kmem_cache_create mutex handling to common code · 20cea968

由 Christoph Lameter 提交于 7月 06, 2012

Move the mutex handling into the common kmem_cache_create()
function.

Then we can also move more checks out of SLAB's kmem_cache_create()
into the common code.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

20cea968

mm, sl[aou]b: Use a common mutex definition · 18004c5d

由 Christoph Lameter 提交于 7月 06, 2012

Use the mutex definition from SLAB and make it the common way to take a sleeping lock.

This has the effect of using a mutex instead of a rw semaphore for SLUB.

SLOB gains the use of a mutex for kmem_cache_create serialization.
Not needed now but SLOB may acquire some more features later (like slabinfo
/ sysfs support) through the expansion of the common code that will
need this.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NJoonsoo Kim <js1304@gmail.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

18004c5d

mm, sl[aou]b: Common definition for boot state of the slab allocators · 97d06609

由 Christoph Lameter 提交于 7月 06, 2012

All allocators have some sort of support for the bootstrap status.

Setup a common definition for the boot states and make all slab
allocators use that definition.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NJoonsoo Kim <js1304@gmail.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

97d06609

mm, sl[aou]b: Extract common code for kmem_cache_create() · 039363f3

由 Christoph Lameter 提交于 7月 06, 2012

Kmem_cache_create() does a variety of sanity checks but those
vary depending on the allocator. Use the strictest tests and put them into
a slab_common file. Make the tests conditional on CONFIG_DEBUG_VM.

This patch has the effect of adding sanity checks for SLUB and SLOB
under CONFIG_DEBUG_VM and removes the checks in SLAB for !CONFIG_DEBUG_VM.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

039363f3

slub: remove invalid reference to list iterator variable · 068ce415

由 Julia Lawall 提交于 7月 08, 2012

If list_for_each_entry, etc complete a traversal of the list, the iterator
variable ends up pointing to an address at an offset from the list head,
and not a meaningful structure.  Thus this value should not be used after
the end of the iterator.  The patch replaces s->name by al->name, which is
referenced nearby.

This problem was found using Coccinelle (http://coccinelle.lip6.fr/).
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

068ce415

20 6月, 2012 3 次提交

slub: refactoring unfreeze_partials() · 43d77867

由 Joonsoo Kim 提交于 6月 09, 2012

Current implementation of unfreeze_partials() is so complicated,
but benefit from it is insignificant. In addition many code in
do {} while loop have a bad influence to a fail rate of cmpxchg_double_slab.
Under current implementation which test status of cpu partial slab
and acquire list_lock in do {} while loop,
we don't need to acquire a list_lock and gain a little benefit
when front of the cpu partial slab is to be discarded, but this is a rare case.
In case that add_partial is performed and cmpxchg_double_slab is failed,
remove_partial should be called case by case.

I think that these are disadvantages of current implementation,
so I do refactoring unfreeze_partials().

Minimizing code in do {} while loop introduce a reduced fail rate
of cmpxchg_double_slab. Below is output of 'slabinfo -r kmalloc-256'
when './perf stat -r 33 hackbench 50 process 4000 > /dev/null' is done.

** before **
Cmpxchg_double Looping
------------------------
Locked Cmpxchg Double redos   182685
Unlocked Cmpxchg Double redos 0

** after **
Cmpxchg_double Looping
------------------------
Locked Cmpxchg Double redos   177995
Unlocked Cmpxchg Double redos 1

We can see cmpxchg_double_slab fail rate is improved slightly.

Bolow is output of './perf stat -r 30 hackbench 50 process 4000 > /dev/null'.

** before **
 Performance counter stats for './hackbench 50 process 4000' (30 runs):

     108517.190463 task-clock                #    7.926 CPUs utilized            ( +-  0.24% )
         2,919,550 context-switches          #    0.027 M/sec                    ( +-  3.07% )
           100,774 CPU-migrations            #    0.929 K/sec                    ( +-  4.72% )
           124,201 page-faults               #    0.001 M/sec                    ( +-  0.15% )
   401,500,234,387 cycles                    #    3.700 GHz                      ( +-  0.24% )
   <not supported> stalled-cycles-frontend
   <not supported> stalled-cycles-backend
   250,576,913,354 instructions              #    0.62  insns per cycle          ( +-  0.13% )
    45,934,956,860 branches                  #  423.297 M/sec                    ( +-  0.14% )
       188,219,787 branch-misses             #    0.41% of all branches          ( +-  0.56% )

      13.691837307 seconds time elapsed                                          ( +-  0.24% )

** after **
 Performance counter stats for './hackbench 50 process 4000' (30 runs):

     107784.479767 task-clock                #    7.928 CPUs utilized            ( +-  0.22% )
         2,834,781 context-switches          #    0.026 M/sec                    ( +-  2.33% )
            93,083 CPU-migrations            #    0.864 K/sec                    ( +-  3.45% )
           123,967 page-faults               #    0.001 M/sec                    ( +-  0.15% )
   398,781,421,836 cycles                    #    3.700 GHz                      ( +-  0.22% )
   <not supported> stalled-cycles-frontend
   <not supported> stalled-cycles-backend
   250,189,160,419 instructions              #    0.63  insns per cycle          ( +-  0.09% )
    45,855,370,128 branches                  #  425.436 M/sec                    ( +-  0.10% )
       169,881,248 branch-misses             #    0.37% of all branches          ( +-  0.43% )

      13.596272341 seconds time elapsed                                          ( +-  0.22% )

No regression is found, but rather we can see slightly better result.
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

43d77867

slub: use __cmpxchg_double_slab() at interrupt disabled place · d24ac77f

由 Joonsoo Kim 提交于 5月 18, 2012

get_freelist(), unfreeze_partials() are only called with interrupt disabled,
so __cmpxchg_double_slab() is suitable.
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

d24ac77f

slab/mempolicy: always use local policy from interrupt context · e7b691b0

由 Andi Kleen 提交于 6月 09, 2012

slab_node() could access current->mempolicy from interrupt context.
However there's a race condition during exit where the mempolicy
is first freed and then the pointer zeroed.

Using this from interrupts seems bogus anyways. The interrupt
will interrupt a random process and therefore get a random
mempolicy. Many times, this will be idle's, which noone can change.

Just disable this here and always use local for slab
from interrupts. I also cleaned up the callers of slab_node a bit
which always passed the same argument.

I believe the original mempolicy code did that in fact,
so it's likely a regression.

v2: send version with correct logic
v3: simplify. fix typo.
Reported-by: NArun Sharma <asharma@fb.com>
Cc: penberg@kernel.org
Cc: cl@linux.com
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
[tdmackey@twitter.com: Rework control flow based on feedback from
cl@linux.com, fix logic, and cleanup current task_struct reference]
Acked-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NDavid Mackey <tdmackey@twitter.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

e7b691b0

14 6月, 2012 1 次提交

mm, sl[aou]b: Extract common fields from struct kmem_cache · 3b0efdfa

由 Christoph Lameter 提交于 6月 13, 2012

Define a struct that describes common fields used in all slab allocators.
A slab allocator either uses the common definition (like SLOB) or is
required to provide members of kmem_cache with the definition given.

After that it will be possible to share code that
only operates on those fields of kmem_cache.

The patch basically takes the slob definition of kmem cache and
uses the field namees for the other allocators.

It also standardizes the names used for basic object lengths in
allocators:

object_size	Struct size specified at kmem_cache_create. Basically
		the payload expected to be used by the subsystem.

size		The size of memory allocator for each object. This size
		is larger than object_size and includes padding, alignment
		and extra metadata for each object (f.e. for debugging
		and rcu).
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

3b0efdfa

01 6月, 2012 4 次提交

slub: pass page to node_match() instead of kmem_cache_cpu structure · 57d437d2

由 Christoph Lameter 提交于 5月 09, 2012

Avoid passing the kmem_cache_cpu pointer to node_match. This makes the
node_match function more generic and easier to understand.
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

57d437d2

slub: Use page variable instead of c->page. · f6e7def7

由 Christoph Lameter 提交于 5月 09, 2012

Store the value of c->page to avoid additional fetches
from per cpu data.
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

f6e7def7

slub: Separate out kmem_cache_cpu processing from deactivate_slab · c17dda40

由 Christoph Lameter 提交于 5月 09, 2012

Processing on fields of kmem_cache_cpu is cleaner if code working on fields
of this struct is taken out of deactivate_slab().
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

c17dda40

slub: Get rid of the node field · ec3ab083

由 Christoph Lameter 提交于 5月 09, 2012

The node field is always page_to_nid(c->page). So its rather easy to
replace. Note that there maybe slightly more overhead in various hot paths
due to the need to shift the bits from page->flags. However, that is mostly
compensated for by a smaller footprint of the kmem_cache_cpu structure (this
patch reduces that to 3 words per cache) which allows better caching.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

ec3ab083

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功