1. 03 4月, 2009 2 次提交
  2. 23 3月, 2009 1 次提交
  3. 25 2月, 2009 1 次提交
  4. 23 2月, 2009 2 次提交
    • D
      slub: add min_partial sysfs tunable · 73d342b1
      David Rientjes 提交于
      Now that a cache's min_partial has been moved to struct kmem_cache, it's
      possible to easily tune it from userspace by adding a sysfs attribute.
      
      It may not be desirable to keep a large number of partial slabs around
      if a cache is used infrequently and memory, especially when constrained
      by a cgroup, is scarce.  It's better to allow userspace to set the
      minimum policy per cache instead of relying explicitly on
      kmem_cache_shrink().
      
      The memory savings from simply moving min_partial from struct
      kmem_cache_node to struct kmem_cache is obviously not significant
      (unless maybe you're from SGI or something), at the largest it's
      
      	# allocated caches * (MAX_NUMNODES - 1) * sizeof(unsigned long)
      
      The true savings occurs when userspace reduces the number of partial
      slabs that would otherwise be wasted, especially on machines with a
      large number of nodes (ia64 with CONFIG_NODES_SHIFT at 10 for default?).
      As well as the kernel estimates ideal values for n->min_partial and
      ensures it's within a sane range, userspace has no other input other
      than writing to /sys/kernel/slab/cache/shrink.
      
      There simply isn't any better heuristic to add when calculating the
      partial values for a better estimate that works for all possible caches.
      And since it's currently a static value, the user really has no way of
      reclaiming that wasted space, which can be significant when constrained
      by a cgroup (either cpusets or, later, memory controller slab limits)
      without shrinking it entirely.
      
      This also allows the user to specify that increased fragmentation and
      more partial slabs are actually desired to avoid the cost of allocating
      new slabs at runtime for specific caches.
      
      There's also no reason why this should be a per-struct kmem_cache_node
      value in the first place.  You could argue that a machine would have
      such node size asymmetries that it should be specified on a per-node
      basis, but we know nobody is doing that right now since it's a purely
      static value at the moment and there's no convenient way to tune that
      via slub's sysfs interface.
      
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      73d342b1
    • D
      slub: move min_partial to struct kmem_cache · 3b89d7d8
      David Rientjes 提交于
      Although it allows for better cacheline use, it is unnecessary to save a
      copy of the cache's min_partial value in each kmem_cache_node.
      
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      3b89d7d8
  5. 20 2月, 2009 3 次提交
  6. 15 2月, 2009 1 次提交
    • N
      lockdep: annotate reclaim context (__GFP_NOFS) · cf40bd16
      Nick Piggin 提交于
      Here is another version, with the incremental patch rolled up, and
      added reclaim context annotation to kswapd, and allocation tracing
      to slab allocators (which may only ever reach the page allocator
      in rare cases, so it is good to put annotations here too).
      
      Haven't tested this version as such, but it should be getting closer
      to merge worthy ;)
      
      --
      After noticing some code in mm/filemap.c accidentally perform a __GFP_FS
      allocation when it should not have been, I thought it might be a good idea to
      try to catch this kind of thing with lockdep.
      
      I coded up a little idea that seems to work. Unfortunately the system has to
      actually be in __GFP_FS page reclaim, then take the lock, before it will mark
      it. But at least that might still be some orders of magnitude more common
      (and more debuggable) than an actual deadlock condition, so we have some
      improvement I hope (the concept is no less complete than discovery of a lock's
      interrupt contexts).
      
      I guess we could even do the same thing with __GFP_IO (normal reclaim), and
      even GFP_NOIO locks too... but filesystems will have the most locks and fiddly
      code paths, so let's start there and see how it goes.
      
      It *seems* to work. I did a quick test.
      
      =================================
      [ INFO: inconsistent lock state ]
      2.6.28-rc6-00007-ged313489-dirty #26
      ---------------------------------
      inconsistent {in-reclaim-W} -> {ov-reclaim-W} usage.
      modprobe/8526 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
      {in-reclaim-W} state was registered at:
        [<ffffffff80267bdb>] __lock_acquire+0x75b/0x1a60
        [<ffffffff80268f71>] lock_acquire+0x91/0xc0
        [<ffffffff8070f0e1>] mutex_lock_nested+0xb1/0x310
        [<ffffffffa002002b>] brd_init+0x2b/0x216 [brd]
        [<ffffffff8020903b>] _stext+0x3b/0x170
        [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
        [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
        [<ffffffffffffffff>] 0xffffffffffffffff
      irq event stamp: 3929
      hardirqs last  enabled at (3929): [<ffffffff8070f2b5>] mutex_lock_nested+0x285/0x310
      hardirqs last disabled at (3928): [<ffffffff8070f089>] mutex_lock_nested+0x59/0x310
      softirqs last  enabled at (3732): [<ffffffff8061f623>] sk_filter+0x83/0xe0
      softirqs last disabled at (3730): [<ffffffff8061f5b6>] sk_filter+0x16/0xe0
      
      other info that might help us debug this:
      1 lock held by modprobe/8526:
       #0:  (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
      
      stack backtrace:
      Pid: 8526, comm: modprobe Not tainted 2.6.28-rc6-00007-ged313489-dirty #26
      Call Trace:
       [<ffffffff80265483>] print_usage_bug+0x193/0x1d0
       [<ffffffff80266530>] mark_lock+0xaf0/0xca0
       [<ffffffff80266735>] mark_held_locks+0x55/0xc0
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffff802667ca>] trace_reclaim_fs+0x2a/0x60
       [<ffffffff80285005>] __alloc_pages_internal+0x475/0x580
       [<ffffffff8070f29e>] ? mutex_lock_nested+0x26e/0x310
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffffa002006a>] brd_init+0x6a/0x216 [brd]
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffff8020903b>] _stext+0x3b/0x170
       [<ffffffff8070f8b9>] ? mutex_unlock+0x9/0x10
       [<ffffffff8070f83d>] ? __mutex_unlock_slowpath+0x10d/0x180
       [<ffffffff802669ec>] ? trace_hardirqs_on_caller+0x12c/0x190
       [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
       [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cf40bd16
  7. 12 2月, 2009 1 次提交
  8. 28 1月, 2009 1 次提交
  9. 14 1月, 2009 1 次提交
  10. 06 1月, 2009 1 次提交
  11. 01 1月, 2009 1 次提交
  12. 30 12月, 2008 2 次提交
    • F
      tracing/kmemtrace: normalize the raw tracer event to the unified tracing API · 36994e58
      Frederic Weisbecker 提交于
      Impact: new tracer plugin
      
      This patch adapts kmemtrace raw events tracing to the unified tracing API.
      
      To enable and use this tracer, just do the following:
      
       echo kmemtrace > /debugfs/tracing/current_tracer
       cat /debugfs/tracing/trace
      
      You will have the following output:
      
       # tracer: kmemtrace
       #
       #
       # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
       # FREE   |      |     |       |              |   |            |        |
       # |
      
      type_id 1 call_site 18446744071565527833 ptr 18446612134395152256
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345164672 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345164912 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345165152 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 0 call_site 18446744071566144042 ptr 18446612134346191680 bytes_req 1304 bytes_alloc 1312 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      
      That was to stay backward compatible with the format output produced in
      inux/tracepoint.h.
      
      This is the default ouput, but note that I tried something else.
      
      If you change an option:
      
      echo kmem_minimalistic > /debugfs/trace_options
      
      and then cat /debugfs/trace, you will have the following output:
      
       # tracer: kmemtrace
       #
       #
       # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
       # FREE   |      |     |       |              |   |            |        |
       # |
      
         -      C                            0xffff88007c088780          file_free_rcu
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc780     -1   d_alloc
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc870     -1   d_alloc
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc960     -1   d_alloc
         +      K   1304   1312   000000d0   0xffff8800791d7340     -1   reiserfs_alloc_inode
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         -      C                            0xffff88007cad6000          putname
         +      K    992   1000   000000d0   0xffff880079045b58     -1   alloc_inode
         +      K    768   1024   000080d0   0xffff88007c096400     -1   alloc_pipe_info
         +      K    240    240   000000d0   0xffff8800790dca50     -1   d_alloc
         +      K    272    320   000080d0   0xffff88007c088780     -1   get_empty_filp
         +      K    272    320   000080d0   0xffff88007c088000     -1   get_empty_filp
      
      Yeah I shall confess kmem_minimalistic should be: kmem_alternative.
      
      Whatever, I find it more readable but this a personal opinion of course.
      We can drop it if you want.
      
      On the ALLOC/FREE column, + means an allocation and - a free.
      
      On the type column, you have K = kmalloc, C = cache, P = page
      
      I would like the flags to be GFP_* strings but that would not be easy to not
      break the column with strings....
      
      About the node...it seems to always be -1. I don't know why but that shouldn't
      be difficult to find.
      
      I moved linux/tracepoint.h to trace/tracepoint.h as well. I think that would
      be more easy to find the tracer headers if they are all in their common
      directory.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      36994e58
    • I
      kmemtrace: move #include lines · 2a38b1c4
      Ingo Molnar 提交于
      Impact: avoid conflicts with kmemcheck
      
      kmemcheck modifies the same area of slab.c and slub.c - move the
      include lines up a bit.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2a38b1c4
  13. 29 12月, 2008 7 次提交
  14. 13 12月, 2008 1 次提交
    • R
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and... · 29c0177e
      Rusty Russell 提交于
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.
      
      Impact: change calling convention of existing cpumask APIs
      
      Most cpumask functions started with cpus_: these have been replaced by
      cpumask_ ones which take struct cpumask pointers as expected.
      
      These four functions don't have good replacement names; fortunately
      they're rarely used, so we just change them over.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: paulus@samba.org
      Cc: mingo@redhat.com
      Cc: tony.luck@intel.com
      Cc: ralf@linux-mips.org
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: cl@linux-foundation.org
      Cc: srostedt@redhat.com
      29c0177e
  15. 11 12月, 2008 1 次提交
  16. 08 12月, 2008 1 次提交
  17. 02 12月, 2008 1 次提交
    • K
      memcg: memory hotplug fix for notifier callback · dc19f9db
      KAMEZAWA Hiroyuki 提交于
      Fixes for memcg/memory hotplug.
      
      While memory hotplug allocate/free memmap, page_cgroup doesn't free
      page_cgroup at OFFLINE when page_cgroup is allocated via bootomem.
      (Because freeing bootmem requires special care.)
      
      Then, if page_cgroup is allocated by bootmem and memmap is freed/allocated
      by memory hotplug, page_cgroup->page == page is no longer true.
      
      But current MEM_ONLINE handler doesn't check it and update
      page_cgroup->page if it's not necessary to allocate page_cgroup.  (This
      was not found because memmap is not freed if SPARSEMEM_VMEMMAP is y.)
      
      And I noticed that MEM_ONLINE can be called against "part of section".
      So, freeing page_cgroup at CANCEL_ONLINE will cause trouble.  (freeing
      used page_cgroup) Don't rollback at CANCEL.
      
      One more, current memory hotplug notifier is stopped by slub because it
      sets NOTIFY_STOP_MASK to return vaule.  So, page_cgroup's callback never
      be called.  (low priority than slub now.)
      
      I think this slub's behavior is not intentional(BUG). and fixes it.
      
      Another way to be considered about page_cgroup allocation:
        - free page_cgroup at OFFLINE even if it's from bootmem
          and remove specieal handler. But it requires more changes.
      
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12041Signed-off-by: NKAMEZAWA Hiruyoki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Tested-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc19f9db
  18. 26 11月, 2008 4 次提交
  19. 23 10月, 2008 1 次提交
  20. 15 9月, 2008 1 次提交
  21. 21 8月, 2008 1 次提交
  22. 05 8月, 2008 1 次提交
    • P
      SLUB: dynamic per-cache MIN_PARTIAL · 5595cffc
      Pekka Enberg 提交于
      This patch changes the static MIN_PARTIAL to a dynamic per-cache ->min_partial
      value that is calculated from object size. The bigger the object size, the more
      pages we keep on the partial list.
      
      I tested SLAB, SLUB, and SLUB with this patch on Jens Axboe's 'netio' example
      script of the fio benchmarking tool. The script stresses the networking
      subsystem which should also give a fairly good beating of kmalloc() et al.
      
      To run the test yourself, first clone the fio repository:
      
        git clone git://git.kernel.dk/fio.git
      
      and then run the following command n times on your machine:
      
        time ./fio examples/netio
      
      The results on my 2-way 64-bit x86 machine are as follows:
      
        [ the minimum, maximum, and average are captured from 50 individual runs ]
      
                       real time (seconds)
                       min      max      avg      sd
        SLAB           22.76    23.38    22.98    0.17
        SLUB           22.80    25.78    23.46    0.72
        SLUB (dynamic) 22.74    23.54    23.00    0.20
      
                       sys time (seconds)
                       min      max      avg      sd
        SLAB           6.90     8.28     7.70     0.28
        SLUB           7.42     16.95    8.89     2.28
        SLUB (dynamic) 7.17     8.64     7.73     0.29
      
                       user time (seconds)
                       min      max      avg      sd
        SLAB           36.89    38.11    37.50    0.29
        SLUB           30.85    37.99    37.06    1.67
        SLUB (dynamic) 36.75    38.07    37.59    0.32
      
      As you can see from the above numbers, this patch brings SLUB to the same level
      as SLAB for this particular workload fixing a ~2% regression. I'd expect this
      change to help similar workloads that allocate a lot of objects that are close
      to the size of a page.
      
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Acked-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      5595cffc
  23. 30 7月, 2008 1 次提交
  24. 27 7月, 2008 1 次提交
  25. 25 7月, 2008 1 次提交
  26. 19 7月, 2008 1 次提交