1. 22 9月, 2009 1 次提交
  2. 16 9月, 2009 1 次提交
  3. 14 9月, 2009 1 次提交
  4. 04 9月, 2009 2 次提交
  5. 30 8月, 2009 1 次提交
  6. 20 8月, 2009 1 次提交
  7. 19 8月, 2009 1 次提交
  8. 01 8月, 2009 1 次提交
  9. 28 7月, 2009 1 次提交
    • D
      slub: use size and objsize orders to disable debug flags · 3de47213
      David Rientjes 提交于
      This patch moves the masking of debugging flags which increase a cache's
      min order due to metadata when `slub_debug=O' is used from
      kmem_cache_flags() to kmem_cache_open().
      
      Instead of defining the maximum metadata size increase in a preprocessor
      macro, this approach uses the cache's ->size and ->objsize members to
      determine if the min order increased due to debugging options.  If so,
      the flags specified in the more appropriately named DEBUG_METADATA_FLAGS
      are masked off.
      
      This approach was suggested by Christoph Lameter
      <cl@linux-foundation.org>.
      
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      3de47213
  10. 10 7月, 2009 1 次提交
    • D
      slub: add option to disable higher order debugging slabs · fa5ec8a1
      David Rientjes 提交于
      When debugging is enabled, slub requires that additional metadata be
      stored in slabs for certain options: SLAB_RED_ZONE, SLAB_POISON, and
      SLAB_STORE_USER.
      
      Consequently, it may require that the minimum possible slab order needed
      to allocate a single object be greater when using these options.  The
      most notable example is for objects that are PAGE_SIZE bytes in size.
      
      Higher minimum slab orders may cause page allocation failures when oom or
      under heavy fragmentation.
      
      This patch adds a new slub_debug option, which disables debugging by
      default for caches that would have resulted in higher minimum orders:
      
      	slub_debug=O
      
      When this option is used on systems with 4K pages, kmalloc-4096, for
      example, will not have debugging enabled by default even if
      CONFIG_SLUB_DEBUG_ON is defined because it would have resulted in a
      order-1 minimum slab order.
      Reported-by: NLarry Finger <Larry.Finger@lwfinger.net>
      Tested-by: NLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      fa5ec8a1
  11. 08 7月, 2009 1 次提交
  12. 26 6月, 2009 1 次提交
  13. 25 6月, 2009 1 次提交
  14. 24 6月, 2009 1 次提交
    • T
      percpu: cleanup percpu array definitions · 204fba4a
      Tejun Heo 提交于
      Currently, the following three different ways to define percpu arrays
      are in use.
      
      1. DEFINE_PER_CPU(elem_type[array_len], array_name);
      2. DEFINE_PER_CPU(elem_type, array_name[array_len]);
      3. DEFINE_PER_CPU(elem_type, array_name)[array_len];
      
      Unify to #1 which correctly separates the roles of the two parameters
      and thus allows more flexibility in the way percpu variables are
      defined.
      
      [ Impact: cleanup ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: linux-mm@kvack.org
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: David S. Miller <davem@davemloft.net>
      204fba4a
  15. 19 6月, 2009 1 次提交
  16. 17 6月, 2009 1 次提交
  17. 15 6月, 2009 3 次提交
  18. 14 6月, 2009 2 次提交
  19. 12 6月, 2009 3 次提交
    • P
      slab,slub: don't enable interrupts during early boot · 7e85ee0c
      Pekka Enberg 提交于
      As explained by Benjamin Herrenschmidt:
      
        Oh and btw, your patch alone doesn't fix powerpc, because it's missing
        a whole bunch of GFP_KERNEL's in the arch code... You would have to
        grep the entire kernel for things that check slab_is_available() and
        even then you'll be missing some.
      
        For example, slab_is_available() didn't always exist, and so in the
        early days on powerpc, we used a mem_init_done global that is set form
        mem_init() (not perfect but works in practice). And we still have code
        using that to do the test.
      
      Therefore, mask out __GFP_WAIT, __GFP_IO, and __GFP_FS in the slab allocators
      in early boot code to avoid enabling interrupts.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      7e85ee0c
    • P
      slab: setup allocators earlier in the boot sequence · 83b519e8
      Pekka Enberg 提交于
      This patch makes kmalloc() available earlier in the boot sequence so we can get
      rid of some bootmem allocations. The bulk of the changes are due to
      kmem_cache_init() being called with interrupts disabled which requires some
      changes to allocator boostrap code.
      
      Note: 32-bit x86 does WP protect test in mem_init() so we must setup traps
      before we call mem_init() during boot as reported by Ingo Molnar:
      
        We have a hard crash in the WP-protect code:
      
        [    0.000000] Checking if this processor honours the WP bit even in supervisor mode...BUG: Int 14: CR2 ffcff000
        [    0.000000]      EDI 00000188  ESI 00000ac7  EBP c17eaf9c  ESP c17eaf8c
        [    0.000000]      EBX 000014e0  EDX 0000000e  ECX 01856067  EAX 00000001
        [    0.000000]      err 00000003  EIP c10135b1   CS 00000060  flg 00010002
        [    0.000000] Stack: c17eafa8 c17fd410 c16747bc c17eafc4 c17fd7e5 000011fd f8616000 c18237cc
        [    0.000000]        00099800 c17bb000 c17eafec c17f1668 000001c5 c17f1322 c166e039 c1822bf0
        [    0.000000]        c166e033 c153a014 c18237cc 00020800 c17eaff8 c17f106a 00020800 01ba5003
        [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.30-tip-02161-g7a74539-dirty #52203
        [    0.000000] Call Trace:
        [    0.000000]  [<c15357c2>] ? printk+0x14/0x16
        [    0.000000]  [<c10135b1>] ? do_test_wp_bit+0x19/0x23
        [    0.000000]  [<c17fd410>] ? test_wp_bit+0x26/0x64
        [    0.000000]  [<c17fd7e5>] ? mem_init+0x1ba/0x1d8
        [    0.000000]  [<c17f1668>] ? start_kernel+0x164/0x2f7
        [    0.000000]  [<c17f1322>] ? unknown_bootoption+0x0/0x19c
        [    0.000000]  [<c17f106a>] ? __init_begin+0x6a/0x6f
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      83b519e8
    • C
      kmemleak: Add the slub memory allocation/freeing hooks · 06f22f13
      Catalin Marinas 提交于
      This patch adds the callbacks to kmemleak_(alloc|free) functions from the
      slub allocator.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi>
      06f22f13
  20. 11 6月, 2009 1 次提交
  21. 06 5月, 2009 1 次提交
  22. 23 4月, 2009 1 次提交
  23. 12 4月, 2009 1 次提交
  24. 03 4月, 2009 2 次提交
  25. 23 3月, 2009 1 次提交
  26. 25 2月, 2009 1 次提交
  27. 23 2月, 2009 2 次提交
    • D
      slub: add min_partial sysfs tunable · 73d342b1
      David Rientjes 提交于
      Now that a cache's min_partial has been moved to struct kmem_cache, it's
      possible to easily tune it from userspace by adding a sysfs attribute.
      
      It may not be desirable to keep a large number of partial slabs around
      if a cache is used infrequently and memory, especially when constrained
      by a cgroup, is scarce.  It's better to allow userspace to set the
      minimum policy per cache instead of relying explicitly on
      kmem_cache_shrink().
      
      The memory savings from simply moving min_partial from struct
      kmem_cache_node to struct kmem_cache is obviously not significant
      (unless maybe you're from SGI or something), at the largest it's
      
      	# allocated caches * (MAX_NUMNODES - 1) * sizeof(unsigned long)
      
      The true savings occurs when userspace reduces the number of partial
      slabs that would otherwise be wasted, especially on machines with a
      large number of nodes (ia64 with CONFIG_NODES_SHIFT at 10 for default?).
      As well as the kernel estimates ideal values for n->min_partial and
      ensures it's within a sane range, userspace has no other input other
      than writing to /sys/kernel/slab/cache/shrink.
      
      There simply isn't any better heuristic to add when calculating the
      partial values for a better estimate that works for all possible caches.
      And since it's currently a static value, the user really has no way of
      reclaiming that wasted space, which can be significant when constrained
      by a cgroup (either cpusets or, later, memory controller slab limits)
      without shrinking it entirely.
      
      This also allows the user to specify that increased fragmentation and
      more partial slabs are actually desired to avoid the cost of allocating
      new slabs at runtime for specific caches.
      
      There's also no reason why this should be a per-struct kmem_cache_node
      value in the first place.  You could argue that a machine would have
      such node size asymmetries that it should be specified on a per-node
      basis, but we know nobody is doing that right now since it's a purely
      static value at the moment and there's no convenient way to tune that
      via slub's sysfs interface.
      
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      73d342b1
    • D
      slub: move min_partial to struct kmem_cache · 3b89d7d8
      David Rientjes 提交于
      Although it allows for better cacheline use, it is unnecessary to save a
      copy of the cache's min_partial value in each kmem_cache_node.
      
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      3b89d7d8
  28. 20 2月, 2009 3 次提交
  29. 15 2月, 2009 1 次提交
    • N
      lockdep: annotate reclaim context (__GFP_NOFS) · cf40bd16
      Nick Piggin 提交于
      Here is another version, with the incremental patch rolled up, and
      added reclaim context annotation to kswapd, and allocation tracing
      to slab allocators (which may only ever reach the page allocator
      in rare cases, so it is good to put annotations here too).
      
      Haven't tested this version as such, but it should be getting closer
      to merge worthy ;)
      
      --
      After noticing some code in mm/filemap.c accidentally perform a __GFP_FS
      allocation when it should not have been, I thought it might be a good idea to
      try to catch this kind of thing with lockdep.
      
      I coded up a little idea that seems to work. Unfortunately the system has to
      actually be in __GFP_FS page reclaim, then take the lock, before it will mark
      it. But at least that might still be some orders of magnitude more common
      (and more debuggable) than an actual deadlock condition, so we have some
      improvement I hope (the concept is no less complete than discovery of a lock's
      interrupt contexts).
      
      I guess we could even do the same thing with __GFP_IO (normal reclaim), and
      even GFP_NOIO locks too... but filesystems will have the most locks and fiddly
      code paths, so let's start there and see how it goes.
      
      It *seems* to work. I did a quick test.
      
      =================================
      [ INFO: inconsistent lock state ]
      2.6.28-rc6-00007-ged313489-dirty #26
      ---------------------------------
      inconsistent {in-reclaim-W} -> {ov-reclaim-W} usage.
      modprobe/8526 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
      {in-reclaim-W} state was registered at:
        [<ffffffff80267bdb>] __lock_acquire+0x75b/0x1a60
        [<ffffffff80268f71>] lock_acquire+0x91/0xc0
        [<ffffffff8070f0e1>] mutex_lock_nested+0xb1/0x310
        [<ffffffffa002002b>] brd_init+0x2b/0x216 [brd]
        [<ffffffff8020903b>] _stext+0x3b/0x170
        [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
        [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
        [<ffffffffffffffff>] 0xffffffffffffffff
      irq event stamp: 3929
      hardirqs last  enabled at (3929): [<ffffffff8070f2b5>] mutex_lock_nested+0x285/0x310
      hardirqs last disabled at (3928): [<ffffffff8070f089>] mutex_lock_nested+0x59/0x310
      softirqs last  enabled at (3732): [<ffffffff8061f623>] sk_filter+0x83/0xe0
      softirqs last disabled at (3730): [<ffffffff8061f5b6>] sk_filter+0x16/0xe0
      
      other info that might help us debug this:
      1 lock held by modprobe/8526:
       #0:  (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
      
      stack backtrace:
      Pid: 8526, comm: modprobe Not tainted 2.6.28-rc6-00007-ged313489-dirty #26
      Call Trace:
       [<ffffffff80265483>] print_usage_bug+0x193/0x1d0
       [<ffffffff80266530>] mark_lock+0xaf0/0xca0
       [<ffffffff80266735>] mark_held_locks+0x55/0xc0
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffff802667ca>] trace_reclaim_fs+0x2a/0x60
       [<ffffffff80285005>] __alloc_pages_internal+0x475/0x580
       [<ffffffff8070f29e>] ? mutex_lock_nested+0x26e/0x310
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffffa002006a>] brd_init+0x6a/0x216 [brd]
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffff8020903b>] _stext+0x3b/0x170
       [<ffffffff8070f8b9>] ? mutex_unlock+0x9/0x10
       [<ffffffff8070f83d>] ? __mutex_unlock_slowpath+0x10d/0x180
       [<ffffffff802669ec>] ? trace_hardirqs_on_caller+0x12c/0x190
       [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
       [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cf40bd16
  30. 12 2月, 2009 1 次提交