提交 · 107dab5c92d5f9c3afe962036e47c207363255c7 · openeuler / raspberrypi-kernel

19 12月, 2012 5 次提交

slub: slub-specific propagation changes · 107dab5c

由 Glauber Costa 提交于 12月 18, 2012

SLUB allows us to tune a particular cache behavior with sysfs-based
tunables.  When creating a new memcg cache copy, we'd like to preserve any
tunables the parent cache already had.

This can be done by tapping into the store attribute function provided by
the allocator.  We of course don't need to mess with read-only fields.
Since the attributes can have multiple types and are stored internally by
sysfs, the best strategy is to issue a ->show() in the root cache, and
then ->store() in the memcg cache.

The drawback of that, is that sysfs can allocate up to a page in buffering
for show(), that we are likely not to need, but also can't guarantee.  To
avoid always allocating a page for that, we can update the caches at store
time with the maximum attribute size ever stored to the root cache.  We
will then get a buffer big enough to hold it.  The corolary to this, is
that if no stores happened, nothing will be propagated.

It can also happen that a root cache has its tunables updated during
normal system operation.  In this case, we will propagate the change to
all caches that are already active.

[akpm@linux-foundation.org: tweak code to avoid __maybe_unused]
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

107dab5c

memcg: destroy memcg caches · 1f458cbf

由 Glauber Costa 提交于 12月 18, 2012

Implement destruction of memcg caches.  Right now, only caches where our
reference counter is the last remaining are deleted.  If there are any
other reference counters around, we just leave the caches lying around
until they go away.

When that happens, a destruction function is called from the cache code.
Caches are only destroyed in process context, so we queue them up for
later processing in the general case.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1f458cbf

sl[au]b: allocate objects from memcg cache · d79923fa

由 Glauber Costa 提交于 12月 18, 2012

We are able to match a cache allocation to a particular memcg.  If the
task doesn't change groups during the allocation itself - a rare event,
this will give us a good picture about who is the first group to touch a
cache page.

This patch uses the now available infrastructure by calling
memcg_kmem_get_cache() before all the cache allocations.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d79923fa

sl[au]b: always get the cache from its page in kmem_cache_free() · b9ce5ef4

由 Glauber Costa 提交于 12月 18, 2012

struct page already has this information.  If we start chaining caches,
this information will always be more trustworthy than whatever is passed
into the function.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b9ce5ef4

slab/slub: consider a memcg parameter in kmem_create_cache · 2633d7a0

由 Glauber Costa 提交于 12月 18, 2012

Allow a memcg parameter to be passed during cache creation.  When the slub
allocator is being used, it will only merge caches that belong to the same
memcg.  We'll do this by scanning the global list, and then translating
the cache to a memcg-specific cache

Default function is created as a wrapper, passing NULL to the memcg
version.  We only merge caches that belong to the same memcg.

A helper is provided, memcg_css_id: because slub needs a unique cache name
for sysfs.  Since this is visible, but not the canonical location for slab
data, the cache name is not used, the css_id should suffice.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2633d7a0

12 12月, 2012 1 次提交

slub, hotplug: ignore unrelated node's hot-adding and hot-removing · b9d5ab25

由 Lai Jiangshan 提交于 12月 11, 2012

SLUB only focuses on the nodes which have normal memory and it ignores the
other node's hot-adding and hot-removing.

Aka: if some memory of a node which has no onlined memory is online, but
this new memory onlined is not normal memory (for example, highmem), we
should not allocate kmem_cache_node for SLUB.

And if the last normal memory is offlined, but the node still has memory,
we should remove kmem_cache_node for that node.  (The current code delays
it when all of the memory is offlined)

So we only do something when marg->status_change_nid_normal > 0.
marg->status_change_nid is not suitable here.

The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
for every node even the node don't have normal memory, SLAB tolerates
kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Rob Landley <rob@landley.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b9d5ab25

11 12月, 2012 4 次提交

mm/sl[aou]b: Common alignment code · 45906855

由 Christoph Lameter 提交于 11月 28, 2012

Extract the code to do object alignment from the allocators.
Do the alignment calculations in slab_common so that the
__kmem_cache_create functions of the allocators do not have
to deal with alignment.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

45906855

slub: Use statically allocated kmem_cache boot structure for bootstrap · dffb4d60

由 Christoph Lameter 提交于 11月 28, 2012

Simplify bootstrap by statically allocated two kmem_cache structures. These are
freed after bootup is complete. Allows us to no longer worry about calculations
of sizes of kmem_cache structures during bootstrap.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

dffb4d60

mm, sl[au]b: create common functions for boot slab creation · 45530c44

由 Christoph Lameter 提交于 11月 28, 2012

Use a special function to create kmalloc caches and use that function in
SLAB and SLUB.
Acked-by: NJoonsoo Kim <js1304@gmail.com>
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

45530c44

slub: Use correct cpu_slab on dead cpu · 59a09917

由 Christoph Lameter 提交于 11月 28, 2012

Pass a kmem_cache_cpu pointer into unfreeze partials so that a different
kmem_cache_cpu structure than the local one can be specified.
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

59a09917

31 10月, 2012 2 次提交

slab: Ignore internal flags in cache creation · d8843922

由 Glauber Costa 提交于 10月 17, 2012

Some flags are used internally by the allocators for management
purposes. One example of that is the CFLGS_OFF_SLAB flag that slab uses
to mark that the metadata for that cache is stored outside of the slab.

No cache should ever pass those as a creation flags. We can just ignore
this bit if it happens to be passed (such as when duplicating a cache in
the kmem memcg patches).

Because such flags can vary from allocator to allocator, we allow them
to make their own decisions on that, defining SLAB_AVAILABLE_FLAGS with
all flags that are valid at creation time.  Allocators that doesn't have
any specific flag requirement should define that to mean all flags.

Common code will mask out all flags not belonging to that set.
Acked-by: NChristoph Lameter <cl@linux.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

d8843922

mm/sl[aou]b: Move common kmem_cache_size() to slab.h · 242860a4

由 Ezequiel Garcia 提交于 10月 19, 2012

This function is identically defined in all three allocators
and it's trivial to move it to slab.h

Since now it's static, inline, header-defined function
this patch also drops the EXPORT_SYMBOL tag.

Cc: Pekka Enberg <penberg@kernel.org>
Cc: Matt Mackall <mpm@selenic.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NEzequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

242860a4

24 10月, 2012 4 次提交

slub: Commonize slab_cache field in struct page · 1b4f59e3

由 Glauber Costa 提交于 10月 22, 2012

Right now, slab and slub have fields in struct page to derive which
cache a page belongs to, but they do it slightly differently.

slab uses a field called slab_cache, that lives in the third double
word. slub, uses a field called "slab", living outside of the
doublewords area.

Ideally, we could use the same field for this. Since slub heavily makes
use of the doubleword region, there isn't really much room to move
slub's slab_cache field around. Since slab does not have such strict
placement restrictions, we can move it outside the doubleword area.

The naming used by slab, "slab_cache", is less confusing, and it is
preferred over slub's generic "slab".
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NChristoph Lameter <cl@linux.com>
CC: David Rientjes <rientjes@google.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

1b4f59e3

sl[au]b: Process slabinfo_show in common code · 0d7561c6

由 Glauber Costa 提交于 10月 19, 2012

With all the infrastructure in place, we can now have slabinfo_show
done from slab_common.c. A cache-specific function is called to grab
information about the cache itself, since that is still heavily
dependent on the implementation. But with the values produced by it, all
the printing and handling is done from common code.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
CC: Christoph Lameter <cl@linux.com>
CC: David Rientjes <rientjes@google.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

0d7561c6

mm/sl[au]b: Move print_slabinfo_header to slab_common.c · bcee6e2a

由 Glauber Costa 提交于 10月 19, 2012

The header format is highly similar between slab and slub. The main
difference lays in the fact that slab may optionally have statistics
added here in case of CONFIG_SLAB_DEBUG, while the slub will stick them
somewhere else.

By making sure that information conditionally lives inside a
globally-visible CONFIG_DEBUG_SLAB switch, we can move the header
printing to a common location.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NChristoph Lameter <cl@linux.com>
CC: David Rientjes <rientjes@google.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

bcee6e2a

mm/sl[au]b: Move slabinfo processing to slab_common.c · b7454ad3

由 Glauber Costa 提交于 10月 19, 2012

This patch moves all the common machinery to slabinfo processing
to slab_common.c. We can do better by noticing that the output is
heavily common, and having the allocators to just provide finished
information about this. But after this first step, this can be done
easier.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NChristoph Lameter <cl@linux.com>
CC: David Rientjes <rientjes@google.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

b7454ad3

19 10月, 2012 1 次提交

slub: remove one code path and reduce lock contention in __slab_free() · 837d678d

由 Joonsoo Kim 提交于 8月 16, 2012

When we try to free object, there is some of case that we need
to take a node lock. This is the necessary step for preventing a race.
After taking a lock, then we try to cmpxchg_double_slab().
But, there is a possible scenario that cmpxchg_double_slab() is failed
with taking a lock. Following example explains it.

CPU A               CPU B
need lock
...                 need lock
...                 lock!!
lock..but spin      free success
spin...             unlock
lock!!
free fail

In this case, retry with taking a lock is occured in CPU A.
I think that in this case for CPU A,
"release a lock first, and re-take a lock if necessary" is preferable way.

There are two reasons for this.

First, this makes __slab_free()'s logic somehow simple.
With this patch, 'was_frozen = 1' is "always" handled without taking a lock.
So we can remove one code path.

Second, it may reduce lock contention.
When we do retrying, status of slab is already changed,
so we don't need a lock anymore in almost every case.
"release a lock first, and re-take a lock if necessary" policy is
helpful to this.
Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

837d678d

03 10月, 2012 1 次提交

slub: init_kmem_cache_cpus() and put_cpu_partial() can be static · 788e1aad

由 Fengguang Wu 提交于 9月 28, 2012

Acked-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

788e1aad

25 9月, 2012 1 次提交

mm, slub: Rename slab_alloc() -> slab_alloc_node() to match SLAB · 2b847c3c

由 Ezequiel Garcia 提交于 9月 08, 2012

This patch does not fix anything, and its only goal is to enable us
to obtain some common code between SLAB and SLUB.
Neither behavior nor produced code is affected.

Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: NEzequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

2b847c3c

19 9月, 2012 1 次提交

mm, sl[au]b: Taint kernel when we detect a corrupted slab · 645df230

由 Dave Jones 提交于 9月 18, 2012

It doesn't seem worth adding a new taint flag for this, so just re-use
the one from 'bad page'

Acked-by: Christoph Lameter <cl@linux.com> # SLUB
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NDave Jones <davej@redhat.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

645df230

18 9月, 2012 1 次提交

slub: consider pfmemalloc_match() in get_partial_node() · 8ba00bb6

由 Joonsoo Kim 提交于 9月 17, 2012

get_partial() is currently not checking pfmemalloc_match() meaning that
it is possible for pfmemalloc pages to leak to non-pfmemalloc users.
This is a problem in the following situation.  Assume that there is a
request from normal allocation and there are no objects in the per-cpu
cache and no node-partial slab.

In this case, slab_alloc enters the slow path and new_slab_objects() is
called which may return a PFMEMALLOC page.  As the current user is not
allowed to access PFMEMALLOC page, deactivate_slab() is called
([5091b74a: mm: slub: optimise the SLUB fast path to avoid pfmemalloc
checks]) and returns an object from PFMEMALLOC page.

Next time, when we get another request from normal allocation,
slab_alloc() enters the slow-path and calls new_slab_objects().  In
new_slab_objects(), we call get_partial() and get a partial slab which
was just deactivated but is a pfmemalloc page.  We extract one object
from it and re-deactivate.

  "deactivate -> re-get in get_partial -> re-deactivate" occures repeatedly.

As a result, access to PFMEMALLOC page is not properly restricted and it
can cause a performance degradation due to frequent deactivation.
deactivation frequently.

This patch changes get_partial_node() to take pfmemalloc_match() into
account and prevents the "deactivate -> re-get in get_partial()
scenario.  Instead, new_slab() is called.
Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ba00bb6

10 9月, 2012 1 次提交

slub: Zero initial memory segment for kmem_cache and kmem_cache_node · 9df53b15

由 Christoph Lameter 提交于 9月 08, 2012

Tony Luck reported the following problem on IA-64:

  Worked fine yesterday on next-20120905, crashes today. First sign of
  trouble was an unaligned access, then a NULL dereference. SL*B related
  bits of my config:

  CONFIG_SLUB_DEBUG=y
  # CONFIG_SLAB is not set
  CONFIG_SLUB=y
  CONFIG_SLABINFO=y
  # CONFIG_SLUB_DEBUG_ON is not set
  # CONFIG_SLUB_STATS is not set

  And he console log.

  PID hash table entries: 4096 (order: 1, 32768 bytes)
  Dentry cache hash table entries: 262144 (order: 7, 2097152 bytes)
  Inode-cache hash table entries: 131072 (order: 6, 1048576 bytes)
  Memory: 2047920k/2086064k available (13992k code, 38144k reserved,
  6012k data, 880k init)
  kernel unaligned access to 0xca2ffc55fb373e95, ip=0xa0000001001be550
  swapper[0]: error during unaligned kernel access
   -1 [1]
  Modules linked in:

  Pid: 0, CPU 0, comm:              swapper
  psr : 00001010084a2018 ifs : 800000000000060f ip  :
  [<a0000001001be550>]    Not tainted (3.6.0-rc4-zx1-smp-next-20120906)
  ip is at new_slab+0x90/0x680
  unat: 0000000000000000 pfs : 000000000000060f rsc : 0000000000000003
  rnat: 9666960159966a59 bsps: a0000001001441c0 pr  : 9666960159965a59
  ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f
  csd : 0000000000000000 ssd : 0000000000000000
  b0  : a0000001001be500 b6  : a00000010112cb20 b7  : a0000001011660a0
  f6  : 0fff7f0f0f0f0e54f0000 f7  : 0ffe8c5c1000000000000
  f8  : 1000d8000000000000000 f9  : 100068800000000000000
  f10 : 10005f0f0f0f0e54f0000 f11 : 1003e0000000000000078
  r1  : a00000010155eef0 r2  : 0000000000000000 r3  : fffffffffffc1638
  r8  : e0000040600081b8 r9  : ca2ffc55fb373e95 r10 : 0000000000000000
  r11 : e000004040001646 r12 : a000000101287e20 r13 : a000000101280000
  r14 : 0000000000004000 r15 : 0000000000000078 r16 : ca2ffc55fb373e75
  r17 : e000004040040000 r18 : fffffffffffc1646 r19 : e000004040001646
  r20 : fffffffffffc15f8 r21 : 000000000000004d r22 : a00000010132fa68
  r23 : 00000000000000ed r24 : 0000000000000000 r25 : 0000000000000000
  r26 : 0000000000000001 r27 : a0000001012b8500 r28 : a00000010135f4a0
  r29 : 0000000000000000 r30 : 0000000000000000 r31 : 0000000000000001
  Unable to handle kernel NULL pointer dereference (address
  0000000000000018)
  swapper[0]: Oops 11003706212352 [2]
  Modules linked in:

  Pid: 0, CPU 0, comm:              swapper
  psr : 0000121008022018 ifs : 800000000000cc18 ip  :
  [<a0000001004dc8f1>]    Not tainted (3.6.0-rc4-zx1-smp-next-20120906)
  ip is at __copy_user+0x891/0x960
  unat: 0000000000000000 pfs : 0000000000000813 rsc : 0000000000000003
  rnat: 0000000000000000 bsps: 0000000000000000 pr  : 9666960159961765
  ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
  csd : 0000000000000000 ssd : 0000000000000000
  b0  : a00000010004b550 b6  : a00000010004b740 b7  : a00000010000c750
  f6  : 000000000000000000000 f7  : 1003e9e3779b97f4a7c16
  f8  : 1003e0a00000010001550 f9  : 100068800000000000000
  f10 : 10005f0f0f0f0e54f0000 f11 : 1003e0000000000000078
  r1  : a00000010155eef0 r2  : a0000001012870b0 r3  : a0000001012870b8
  r8  : 0000000000000298 r9  : 0000000000000013 r10 : 0000000000000000
  r11 : 9666960159961a65 r12 : a000000101287010 r13 : a000000101280000
  r14 : a000000101287068 r15 : a000000101287080 r16 : 0000000000000298
  r17 : 0000000000000010 r18 : 0000000000000018 r19 : a000000101287310
  r20 : 0000000000000290 r21 : 0000000000000000 r22 : 0000000000000000
  r23 : a000000101386f58 r24 : 0000000000000000 r25 : 000000007fffffff
  r26 : a000000101287078 r27 : a0000001013c69b0 r28 : 0000000000000000
  r29 : 0000000000000014 r30 : 0000000000000000 r31 : 0000000000000813

Sedat Dilek and Hugh Dickins reported similar problems as well.

Earlier patches in the common set moved the zeroing of the kmem_cache
structure into common code. See "Move allocation of kmem_cache into
common code".

The allocation for the two special structures is still done from SLUB
specific code but no zeroing is done since the cache creation functions
used to zero. This now needs to be updated so that the structures are
zeroed during allocation in kmem_cache_init().  Otherwise random pointer
values may be followed.
Reported-by: NTony Luck <tony.luck@intel.com>
Reported-by: NSedat Dilek <sedat.dilek@gmail.com>
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
Reported-by: NHugh Dickins <hughd@google.com>
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

9df53b15

05 9月, 2012 14 次提交

Revert "mm/sl[aou]b: Move sysfs_slab_add to common" · aac3a166

由 Pekka Enberg 提交于 9月 05, 2012

This reverts commit 96d17b7b which
caused the following errors at boot:

  [    1.114885] kobject (ffff88001a802578): tried to init an initialized object, something is seriously wrong.
  [    1.114885] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ #6
  [    1.114885] Call Trace:
  [    1.114885]  [<ffffffff81273f37>] kobject_init+0x87/0xa0
  [    1.115555]  [<ffffffff8127426a>] kobject_init_and_add+0x2a/0x90
  [    1.115555]  [<ffffffff8127c870>] ? sprintf+0x40/0x50
  [    1.115555]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
  [    1.115555]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
  [    1.115555]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
  [    1.115555]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
  [    1.115555]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
  [    1.115836]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
  [    1.116834]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
  [    1.117835]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
  [    1.117835]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
  [    1.118401]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
  [    1.119832]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
  [    1.120325] ------------[ cut here ]------------
  [    1.120835] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xc1/0xf0()
  [    1.121437] sysfs: cannot create duplicate filename '/kernel/slab/:t-0000016'
  [    1.121831] Modules linked in:
  [    1.122138] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ #6
  [    1.122831] Call Trace:
  [    1.123074]  [<ffffffff81195ce1>] ? sysfs_add_one+0xc1/0xf0
  [    1.123833]  [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0
  [    1.124405]  [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50
  [    1.124832]  [<ffffffff81195ce1>] sysfs_add_one+0xc1/0xf0
  [    1.125337]  [<ffffffff81195eb3>] create_dir+0x73/0xd0
  [    1.125832]  [<ffffffff81196221>] sysfs_create_dir+0x81/0xe0
  [    1.126363]  [<ffffffff81273d3d>] kobject_add_internal+0x9d/0x210
  [    1.126832]  [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90
  [    1.127406]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
  [    1.127832]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
  [    1.128384]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
  [    1.128833]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
  [    1.129831]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
  [    1.130305]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
  [    1.130831]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
  [    1.131351]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
  [    1.131830]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
  [    1.132392]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
  [    1.132830]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
  [    1.133315] ---[ end trace 2703540871c8fab7 ]---
  [    1.133830] ------------[ cut here ]------------
  [    1.134274] WARNING: at lib/kobject.c:196 kobject_add_internal+0x1f5/0x210()
  [    1.134829] kobject_add_internal failed for :t-0000016 with -EEXIST, don't try to register things with the same name in the same directory.
  [    1.135829] Modules linked in:
  [    1.136135] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ #6
  [    1.136828] Call Trace:
  [    1.137071]  [<ffffffff81273e95>] ? kobject_add_internal+0x1f5/0x210
  [    1.137830]  [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0
  [    1.138402]  [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50
  [    1.138830]  [<ffffffff811955a3>] ? release_sysfs_dirent+0x73/0xf0
  [    1.139419]  [<ffffffff81273e95>] kobject_add_internal+0x1f5/0x210
  [    1.139830]  [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90
  [    1.140429]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
  [    1.140830]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
  [    1.141829]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
  [    1.142307]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
  [    1.142829]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
  [    1.143307]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
  [    1.143829]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
  [    1.144352]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
  [    1.144829]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
  [    1.145405]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
  [    1.145828]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
  [    1.146313] ---[ end trace 2703540871c8fab8 ]---

Conflicts:

	mm/slub.c
Signed-off-by: NPekka Enberg <penberg@kernel.org>

aac3a166

mm/sl[aou]b: Move kmem_cache refcounting to common code · cce89f4f

由 Christoph Lameter 提交于 9月 04, 2012

Get rid of the refcount stuff in the allocators and do that part of
kmem_cache management in the common code.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

cce89f4f

mm/sl[aou]b: Shrink __kmem_cache_create() parameter lists · 8a13a4cc

由 Christoph Lameter 提交于 9月 04, 2012

Do the initial settings of the fields in common code. This will allow us
to push more processing into common code later and improve readability.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

8a13a4cc

mm/sl[aou]b: Move kmem_cache allocations into common code · 278b1bb1

由 Christoph Lameter 提交于 9月 05, 2012

Shift the allocations to common code. That way the allocation and
freeing of the kmem_cache structures is handled by common code.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

278b1bb1

mm/sl[aou]b: Move sysfs_slab_add to common · 96d17b7b

由 Christoph Lameter 提交于 9月 05, 2012

Simplify locking by moving the slab_add_sysfs after all locks have been
dropped. Eases the upcoming move to provide sysfs support for all
allocators.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

96d17b7b

mm/sl[aou]b: Do slab aliasing call from common code · cbb79694

由 Christoph Lameter 提交于 9月 05, 2012

The slab aliasing logic causes some strange contortions in slub. So add
a call to deal with aliases to slab_common.c but disable it for other
slab allocators by providng stubs that fail to create aliases.

Full general support for aliases will require additional cleanup passes
and more standardization of fields in kmem_cache.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

cbb79694

mm/sl[aou]b: Move duping of slab name to slab_common.c · db265eca

由 Christoph Lameter 提交于 9月 04, 2012

Duping of the slabname has to be done by each slab. Moving this code to
slab_common avoids duplicate implementations.

With this patch we have common string handling for all slab allocators.
Strings passed to kmem_cache_create() are copied internally. Subsystems
can create temporary strings to create slab caches.

Slabs allocated in early states of bootstrap will never be freed (and
those can never be freed since they are essential to slab allocator
operations).  During bootstrap we therefore do not have to worry about
duping names.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

db265eca

mm/sl[aou]b: Get rid of __kmem_cache_destroy · 12c3667f

由 Christoph Lameter 提交于 9月 04, 2012

What is done there can be done in __kmem_cache_shutdown.

This affects RCU handling somewhat. On rcu free all slab allocators do
not refer to other management structures than the kmem_cache structure.
Therefore these other structures can be freed before the rcu deferred
free to the page allocator occurs.
Reviewed-by: NJoonsoo Kim <js1304@gmail.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

12c3667f

mm/sl[aou]b: Move freeing of kmem_cache structure to common code · 8f4c765c

由 Christoph Lameter 提交于 9月 05, 2012

The freeing action is basically the same in all slab allocators.
Move to the common kmem_cache_destroy() function.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NJoonsoo Kim <js1304@gmail.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

8f4c765c

mm/sl[aou]b: Use "kmem_cache" name for slab cache with kmem_cache struct · 9b030cb8

由 Christoph Lameter 提交于 9月 05, 2012

Make all allocators use the "kmem_cache" slabname for the "kmem_cache"
structure.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NJoonsoo Kim <js1304@gmail.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

9b030cb8

mm/sl[aou]b: Extract a common function for kmem_cache_destroy · 945cf2b6

由 Christoph Lameter 提交于 9月 04, 2012

kmem_cache_destroy does basically the same in all allocators.

Extract common code which is easy since we already have common mutex
handling.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

945cf2b6

mm/sl[aou]b: Move list_add() to slab_common.c · 7c9adf5a

由 Christoph Lameter 提交于 9月 04, 2012

Move the code to append the new kmem_cache to the list of slab caches to
the kmem_cache_create code in the shared code.

This is possible now since the acquisition of the mutex was moved into
kmem_cache_create().
Acked-by: NDavid Rientjes <rientjes@google.com>
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NJoonsoo Kim <js1304@gmail.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

7c9adf5a

mm/slub: Use kmem_cache for the kmem_cache structure · 208c4358

由 Christoph Lameter 提交于 9月 04, 2012

Do not use kmalloc() but kmem_cache_alloc() for the allocation
of the kmem_cache structures in slub.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

208c4358

mm/slub: Add debugging to verify correct cache use on kmem_cache_free() · 79576102

由 Christoph Lameter 提交于 9月 04, 2012

Add additional debugging to check that the objects is actually from the cache
the caller claims. Doing so currently trips up some other debugging code. It
takes a lot to infer from that what was happening.
Reviewed-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
[ penberg@kernel.org: Use pr_err() ]
Signed-off-by: NPekka Enberg <penberg@kernel.org>

79576102

16 8月, 2012 3 次提交

slub: reduce failure of this_cpu_cmpxchg in put_cpu_partial() after unfreezing · e24fc410

由 Joonsoo Kim 提交于 6月 23, 2012

In current implementation, after unfreezing, we doesn't touch oldpage,
so it remain 'NOT NULL'. When we call this_cpu_cmpxchg()
with this old oldpage, this_cpu_cmpxchg() is mostly be failed.

We can change value of oldpage to NULL after unfreezing,
because unfreeze_partial() ensure that all the cpu partial slabs is removed
from cpu partial list. In this time, we could expect that
this_cpu_cmpxchg is mostly succeed.
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

e24fc410

slub: Take node lock during object free checks · 19c7ff9e

由 Christoph Lameter 提交于 5月 30, 2012

Only applies to scenarios where debugging is on:

Validation of slabs can currently occur while debugging
information is updated from the fast paths of the allocator.
This results in various races where we get false reports about
slab metadata not being in order.

This patch makes the fast paths take the node lock so that
serialization with slab validation will occur. Causes additional
slowdown in debug scenarios.
Reported-by: NWaiman Long <Waiman.Long@hp.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

19c7ff9e

slub: use free_page instead of put_page for freeing kmalloc allocation · d9b7f226

由 Glauber Costa 提交于 8月 03, 2012

When freeing objects, the slub allocator will most of the time free
empty pages by calling __free_pages(). But high-order kmalloc will be
diposed by means of put_page() instead. It makes no sense to call
put_page() in kernel pages that are provided by the object allocators,
so we shouldn't be doing this ourselves. Aside from the consistency
change, we don't change the flow too much. put_page()'s would call its
dtor function, which is __free_pages. We also already do all of the
Compound page tests ourselves, and the Mlock test we lose don't really
matter.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NChristoph Lameter <cl@linux.com>
CC: David Rientjes <rientjes@google.com>
CC: Pekka Enberg <penberg@kernel.org>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

d9b7f226

01 8月, 2012 1 次提交

mm: slub: optimise the SLUB fast path to avoid pfmemalloc checks · 5091b74a

由 Christoph Lameter 提交于 7月 31, 2012

This patch removes the check for pfmemalloc from the alloc hotpath and
puts the logic after the election of a new per cpu slab.  For a pfmemalloc
page we do not use the fast path but force the use of the slow path which
is also used for the debug case.

This has the side-effect of weakening pfmemalloc processing in the
following way;

1. A process that is allocating for network swap calls __slab_alloc.
   pfmemalloc_match is true so the freelist is loaded and c->freelist is
   now pointing to a pfmemalloc page.

2. A process that is attempting normal allocations calls slab_alloc,
   finds the pfmemalloc page on the freelist and uses it because it did
   not check pfmemalloc_match()

The patch allows non-pfmemalloc allocations to use pfmemalloc pages with
the kmalloc slabs being the most vunerable caches on the grounds they
are most likely to have a mix of pfmemalloc and !pfmemalloc requests. A
later patch will still protect the system as processes will get throttled
if the pfmemalloc reserves get depleted but performance will not degrade
as smoothly.

[mgorman@suse.de: Expanded changelog]
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5091b74a