1. 28 7月, 2009 1 次提交
    • D
      slub: use size and objsize orders to disable debug flags · 3de47213
      David Rientjes 提交于
      This patch moves the masking of debugging flags which increase a cache's
      min order due to metadata when `slub_debug=O' is used from
      kmem_cache_flags() to kmem_cache_open().
      
      Instead of defining the maximum metadata size increase in a preprocessor
      macro, this approach uses the cache's ->size and ->objsize members to
      determine if the min order increased due to debugging options.  If so,
      the flags specified in the more appropriately named DEBUG_METADATA_FLAGS
      are masked off.
      
      This approach was suggested by Christoph Lameter
      <cl@linux-foundation.org>.
      
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      3de47213
  2. 10 7月, 2009 1 次提交
    • D
      slub: add option to disable higher order debugging slabs · fa5ec8a1
      David Rientjes 提交于
      When debugging is enabled, slub requires that additional metadata be
      stored in slabs for certain options: SLAB_RED_ZONE, SLAB_POISON, and
      SLAB_STORE_USER.
      
      Consequently, it may require that the minimum possible slab order needed
      to allocate a single object be greater when using these options.  The
      most notable example is for objects that are PAGE_SIZE bytes in size.
      
      Higher minimum slab orders may cause page allocation failures when oom or
      under heavy fragmentation.
      
      This patch adds a new slub_debug option, which disables debugging by
      default for caches that would have resulted in higher minimum orders:
      
      	slub_debug=O
      
      When this option is used on systems with 4K pages, kmalloc-4096, for
      example, will not have debugging enabled by default even if
      CONFIG_SLUB_DEBUG_ON is defined because it would have resulted in a
      order-1 minimum slab order.
      Reported-by: NLarry Finger <Larry.Finger@lwfinger.net>
      Tested-by: NLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      fa5ec8a1
  3. 26 6月, 2009 1 次提交
  4. 25 6月, 2009 1 次提交
  5. 19 6月, 2009 1 次提交
  6. 17 6月, 2009 1 次提交
  7. 15 6月, 2009 3 次提交
  8. 14 6月, 2009 2 次提交
  9. 12 6月, 2009 3 次提交
    • P
      slab,slub: don't enable interrupts during early boot · 7e85ee0c
      Pekka Enberg 提交于
      As explained by Benjamin Herrenschmidt:
      
        Oh and btw, your patch alone doesn't fix powerpc, because it's missing
        a whole bunch of GFP_KERNEL's in the arch code... You would have to
        grep the entire kernel for things that check slab_is_available() and
        even then you'll be missing some.
      
        For example, slab_is_available() didn't always exist, and so in the
        early days on powerpc, we used a mem_init_done global that is set form
        mem_init() (not perfect but works in practice). And we still have code
        using that to do the test.
      
      Therefore, mask out __GFP_WAIT, __GFP_IO, and __GFP_FS in the slab allocators
      in early boot code to avoid enabling interrupts.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      7e85ee0c
    • P
      slab: setup allocators earlier in the boot sequence · 83b519e8
      Pekka Enberg 提交于
      This patch makes kmalloc() available earlier in the boot sequence so we can get
      rid of some bootmem allocations. The bulk of the changes are due to
      kmem_cache_init() being called with interrupts disabled which requires some
      changes to allocator boostrap code.
      
      Note: 32-bit x86 does WP protect test in mem_init() so we must setup traps
      before we call mem_init() during boot as reported by Ingo Molnar:
      
        We have a hard crash in the WP-protect code:
      
        [    0.000000] Checking if this processor honours the WP bit even in supervisor mode...BUG: Int 14: CR2 ffcff000
        [    0.000000]      EDI 00000188  ESI 00000ac7  EBP c17eaf9c  ESP c17eaf8c
        [    0.000000]      EBX 000014e0  EDX 0000000e  ECX 01856067  EAX 00000001
        [    0.000000]      err 00000003  EIP c10135b1   CS 00000060  flg 00010002
        [    0.000000] Stack: c17eafa8 c17fd410 c16747bc c17eafc4 c17fd7e5 000011fd f8616000 c18237cc
        [    0.000000]        00099800 c17bb000 c17eafec c17f1668 000001c5 c17f1322 c166e039 c1822bf0
        [    0.000000]        c166e033 c153a014 c18237cc 00020800 c17eaff8 c17f106a 00020800 01ba5003
        [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.30-tip-02161-g7a74539-dirty #52203
        [    0.000000] Call Trace:
        [    0.000000]  [<c15357c2>] ? printk+0x14/0x16
        [    0.000000]  [<c10135b1>] ? do_test_wp_bit+0x19/0x23
        [    0.000000]  [<c17fd410>] ? test_wp_bit+0x26/0x64
        [    0.000000]  [<c17fd7e5>] ? mem_init+0x1ba/0x1d8
        [    0.000000]  [<c17f1668>] ? start_kernel+0x164/0x2f7
        [    0.000000]  [<c17f1322>] ? unknown_bootoption+0x0/0x19c
        [    0.000000]  [<c17f106a>] ? __init_begin+0x6a/0x6f
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      83b519e8
    • C
      kmemleak: Add the slub memory allocation/freeing hooks · 06f22f13
      Catalin Marinas 提交于
      This patch adds the callbacks to kmemleak_(alloc|free) functions from the
      slub allocator.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi>
      06f22f13
  10. 11 6月, 2009 1 次提交
  11. 06 5月, 2009 1 次提交
  12. 23 4月, 2009 1 次提交
  13. 12 4月, 2009 1 次提交
  14. 03 4月, 2009 2 次提交
  15. 23 3月, 2009 1 次提交
  16. 25 2月, 2009 1 次提交
  17. 23 2月, 2009 2 次提交
    • D
      slub: add min_partial sysfs tunable · 73d342b1
      David Rientjes 提交于
      Now that a cache's min_partial has been moved to struct kmem_cache, it's
      possible to easily tune it from userspace by adding a sysfs attribute.
      
      It may not be desirable to keep a large number of partial slabs around
      if a cache is used infrequently and memory, especially when constrained
      by a cgroup, is scarce.  It's better to allow userspace to set the
      minimum policy per cache instead of relying explicitly on
      kmem_cache_shrink().
      
      The memory savings from simply moving min_partial from struct
      kmem_cache_node to struct kmem_cache is obviously not significant
      (unless maybe you're from SGI or something), at the largest it's
      
      	# allocated caches * (MAX_NUMNODES - 1) * sizeof(unsigned long)
      
      The true savings occurs when userspace reduces the number of partial
      slabs that would otherwise be wasted, especially on machines with a
      large number of nodes (ia64 with CONFIG_NODES_SHIFT at 10 for default?).
      As well as the kernel estimates ideal values for n->min_partial and
      ensures it's within a sane range, userspace has no other input other
      than writing to /sys/kernel/slab/cache/shrink.
      
      There simply isn't any better heuristic to add when calculating the
      partial values for a better estimate that works for all possible caches.
      And since it's currently a static value, the user really has no way of
      reclaiming that wasted space, which can be significant when constrained
      by a cgroup (either cpusets or, later, memory controller slab limits)
      without shrinking it entirely.
      
      This also allows the user to specify that increased fragmentation and
      more partial slabs are actually desired to avoid the cost of allocating
      new slabs at runtime for specific caches.
      
      There's also no reason why this should be a per-struct kmem_cache_node
      value in the first place.  You could argue that a machine would have
      such node size asymmetries that it should be specified on a per-node
      basis, but we know nobody is doing that right now since it's a purely
      static value at the moment and there's no convenient way to tune that
      via slub's sysfs interface.
      
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      73d342b1
    • D
      slub: move min_partial to struct kmem_cache · 3b89d7d8
      David Rientjes 提交于
      Although it allows for better cacheline use, it is unnecessary to save a
      copy of the cache's min_partial value in each kmem_cache_node.
      
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      3b89d7d8
  18. 20 2月, 2009 3 次提交
  19. 15 2月, 2009 1 次提交
    • N
      lockdep: annotate reclaim context (__GFP_NOFS) · cf40bd16
      Nick Piggin 提交于
      Here is another version, with the incremental patch rolled up, and
      added reclaim context annotation to kswapd, and allocation tracing
      to slab allocators (which may only ever reach the page allocator
      in rare cases, so it is good to put annotations here too).
      
      Haven't tested this version as such, but it should be getting closer
      to merge worthy ;)
      
      --
      After noticing some code in mm/filemap.c accidentally perform a __GFP_FS
      allocation when it should not have been, I thought it might be a good idea to
      try to catch this kind of thing with lockdep.
      
      I coded up a little idea that seems to work. Unfortunately the system has to
      actually be in __GFP_FS page reclaim, then take the lock, before it will mark
      it. But at least that might still be some orders of magnitude more common
      (and more debuggable) than an actual deadlock condition, so we have some
      improvement I hope (the concept is no less complete than discovery of a lock's
      interrupt contexts).
      
      I guess we could even do the same thing with __GFP_IO (normal reclaim), and
      even GFP_NOIO locks too... but filesystems will have the most locks and fiddly
      code paths, so let's start there and see how it goes.
      
      It *seems* to work. I did a quick test.
      
      =================================
      [ INFO: inconsistent lock state ]
      2.6.28-rc6-00007-ged313489-dirty #26
      ---------------------------------
      inconsistent {in-reclaim-W} -> {ov-reclaim-W} usage.
      modprobe/8526 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
      {in-reclaim-W} state was registered at:
        [<ffffffff80267bdb>] __lock_acquire+0x75b/0x1a60
        [<ffffffff80268f71>] lock_acquire+0x91/0xc0
        [<ffffffff8070f0e1>] mutex_lock_nested+0xb1/0x310
        [<ffffffffa002002b>] brd_init+0x2b/0x216 [brd]
        [<ffffffff8020903b>] _stext+0x3b/0x170
        [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
        [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
        [<ffffffffffffffff>] 0xffffffffffffffff
      irq event stamp: 3929
      hardirqs last  enabled at (3929): [<ffffffff8070f2b5>] mutex_lock_nested+0x285/0x310
      hardirqs last disabled at (3928): [<ffffffff8070f089>] mutex_lock_nested+0x59/0x310
      softirqs last  enabled at (3732): [<ffffffff8061f623>] sk_filter+0x83/0xe0
      softirqs last disabled at (3730): [<ffffffff8061f5b6>] sk_filter+0x16/0xe0
      
      other info that might help us debug this:
      1 lock held by modprobe/8526:
       #0:  (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
      
      stack backtrace:
      Pid: 8526, comm: modprobe Not tainted 2.6.28-rc6-00007-ged313489-dirty #26
      Call Trace:
       [<ffffffff80265483>] print_usage_bug+0x193/0x1d0
       [<ffffffff80266530>] mark_lock+0xaf0/0xca0
       [<ffffffff80266735>] mark_held_locks+0x55/0xc0
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffff802667ca>] trace_reclaim_fs+0x2a/0x60
       [<ffffffff80285005>] __alloc_pages_internal+0x475/0x580
       [<ffffffff8070f29e>] ? mutex_lock_nested+0x26e/0x310
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffffa002006a>] brd_init+0x6a/0x216 [brd]
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffff8020903b>] _stext+0x3b/0x170
       [<ffffffff8070f8b9>] ? mutex_unlock+0x9/0x10
       [<ffffffff8070f83d>] ? __mutex_unlock_slowpath+0x10d/0x180
       [<ffffffff802669ec>] ? trace_hardirqs_on_caller+0x12c/0x190
       [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
       [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cf40bd16
  20. 12 2月, 2009 1 次提交
  21. 28 1月, 2009 1 次提交
  22. 14 1月, 2009 1 次提交
  23. 06 1月, 2009 1 次提交
  24. 01 1月, 2009 1 次提交
  25. 30 12月, 2008 2 次提交
    • F
      tracing/kmemtrace: normalize the raw tracer event to the unified tracing API · 36994e58
      Frederic Weisbecker 提交于
      Impact: new tracer plugin
      
      This patch adapts kmemtrace raw events tracing to the unified tracing API.
      
      To enable and use this tracer, just do the following:
      
       echo kmemtrace > /debugfs/tracing/current_tracer
       cat /debugfs/tracing/trace
      
      You will have the following output:
      
       # tracer: kmemtrace
       #
       #
       # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
       # FREE   |      |     |       |              |   |            |        |
       # |
      
      type_id 1 call_site 18446744071565527833 ptr 18446612134395152256
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345164672 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345164912 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345165152 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 0 call_site 18446744071566144042 ptr 18446612134346191680 bytes_req 1304 bytes_alloc 1312 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      
      That was to stay backward compatible with the format output produced in
      inux/tracepoint.h.
      
      This is the default ouput, but note that I tried something else.
      
      If you change an option:
      
      echo kmem_minimalistic > /debugfs/trace_options
      
      and then cat /debugfs/trace, you will have the following output:
      
       # tracer: kmemtrace
       #
       #
       # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
       # FREE   |      |     |       |              |   |            |        |
       # |
      
         -      C                            0xffff88007c088780          file_free_rcu
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc780     -1   d_alloc
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc870     -1   d_alloc
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc960     -1   d_alloc
         +      K   1304   1312   000000d0   0xffff8800791d7340     -1   reiserfs_alloc_inode
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         -      C                            0xffff88007cad6000          putname
         +      K    992   1000   000000d0   0xffff880079045b58     -1   alloc_inode
         +      K    768   1024   000080d0   0xffff88007c096400     -1   alloc_pipe_info
         +      K    240    240   000000d0   0xffff8800790dca50     -1   d_alloc
         +      K    272    320   000080d0   0xffff88007c088780     -1   get_empty_filp
         +      K    272    320   000080d0   0xffff88007c088000     -1   get_empty_filp
      
      Yeah I shall confess kmem_minimalistic should be: kmem_alternative.
      
      Whatever, I find it more readable but this a personal opinion of course.
      We can drop it if you want.
      
      On the ALLOC/FREE column, + means an allocation and - a free.
      
      On the type column, you have K = kmalloc, C = cache, P = page
      
      I would like the flags to be GFP_* strings but that would not be easy to not
      break the column with strings....
      
      About the node...it seems to always be -1. I don't know why but that shouldn't
      be difficult to find.
      
      I moved linux/tracepoint.h to trace/tracepoint.h as well. I think that would
      be more easy to find the tracer headers if they are all in their common
      directory.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      36994e58
    • I
      kmemtrace: move #include lines · 2a38b1c4
      Ingo Molnar 提交于
      Impact: avoid conflicts with kmemcheck
      
      kmemcheck modifies the same area of slab.c and slub.c - move the
      include lines up a bit.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2a38b1c4
  26. 29 12月, 2008 5 次提交