1. 03 4月, 2009 2 次提交
  2. 31 3月, 2009 1 次提交
    • I
      lockdep: annotate reclaim context (__GFP_NOFS), fix SLOB · 19cefdff
      Ingo Molnar 提交于
      Impact: build fix
      
      fix typo in mm/slob.c:
      
       mm/slob.c:469: error: ‘flags’ undeclared (first use in this function)
       mm/slob.c:469: error: (Each undeclared identifier is reported only once
       mm/slob.c:469: error: for each function it appears in.)
      
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090128135457.350751756@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      19cefdff
  3. 23 3月, 2009 1 次提交
  4. 15 2月, 2009 1 次提交
    • N
      lockdep: annotate reclaim context (__GFP_NOFS) · cf40bd16
      Nick Piggin 提交于
      Here is another version, with the incremental patch rolled up, and
      added reclaim context annotation to kswapd, and allocation tracing
      to slab allocators (which may only ever reach the page allocator
      in rare cases, so it is good to put annotations here too).
      
      Haven't tested this version as such, but it should be getting closer
      to merge worthy ;)
      
      --
      After noticing some code in mm/filemap.c accidentally perform a __GFP_FS
      allocation when it should not have been, I thought it might be a good idea to
      try to catch this kind of thing with lockdep.
      
      I coded up a little idea that seems to work. Unfortunately the system has to
      actually be in __GFP_FS page reclaim, then take the lock, before it will mark
      it. But at least that might still be some orders of magnitude more common
      (and more debuggable) than an actual deadlock condition, so we have some
      improvement I hope (the concept is no less complete than discovery of a lock's
      interrupt contexts).
      
      I guess we could even do the same thing with __GFP_IO (normal reclaim), and
      even GFP_NOIO locks too... but filesystems will have the most locks and fiddly
      code paths, so let's start there and see how it goes.
      
      It *seems* to work. I did a quick test.
      
      =================================
      [ INFO: inconsistent lock state ]
      2.6.28-rc6-00007-ged313489-dirty #26
      ---------------------------------
      inconsistent {in-reclaim-W} -> {ov-reclaim-W} usage.
      modprobe/8526 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
      {in-reclaim-W} state was registered at:
        [<ffffffff80267bdb>] __lock_acquire+0x75b/0x1a60
        [<ffffffff80268f71>] lock_acquire+0x91/0xc0
        [<ffffffff8070f0e1>] mutex_lock_nested+0xb1/0x310
        [<ffffffffa002002b>] brd_init+0x2b/0x216 [brd]
        [<ffffffff8020903b>] _stext+0x3b/0x170
        [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
        [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
        [<ffffffffffffffff>] 0xffffffffffffffff
      irq event stamp: 3929
      hardirqs last  enabled at (3929): [<ffffffff8070f2b5>] mutex_lock_nested+0x285/0x310
      hardirqs last disabled at (3928): [<ffffffff8070f089>] mutex_lock_nested+0x59/0x310
      softirqs last  enabled at (3732): [<ffffffff8061f623>] sk_filter+0x83/0xe0
      softirqs last disabled at (3730): [<ffffffff8061f5b6>] sk_filter+0x16/0xe0
      
      other info that might help us debug this:
      1 lock held by modprobe/8526:
       #0:  (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
      
      stack backtrace:
      Pid: 8526, comm: modprobe Not tainted 2.6.28-rc6-00007-ged313489-dirty #26
      Call Trace:
       [<ffffffff80265483>] print_usage_bug+0x193/0x1d0
       [<ffffffff80266530>] mark_lock+0xaf0/0xca0
       [<ffffffff80266735>] mark_held_locks+0x55/0xc0
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffff802667ca>] trace_reclaim_fs+0x2a/0x60
       [<ffffffff80285005>] __alloc_pages_internal+0x475/0x580
       [<ffffffff8070f29e>] ? mutex_lock_nested+0x26e/0x310
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffffa002006a>] brd_init+0x6a/0x216 [brd]
       [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
       [<ffffffff8020903b>] _stext+0x3b/0x170
       [<ffffffff8070f8b9>] ? mutex_unlock+0x9/0x10
       [<ffffffff8070f83d>] ? __mutex_unlock_slowpath+0x10d/0x180
       [<ffffffff802669ec>] ? trace_hardirqs_on_caller+0x12c/0x190
       [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
       [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cf40bd16
  5. 12 2月, 2009 1 次提交
  6. 19 1月, 2009 1 次提交
  7. 30 12月, 2008 1 次提交
    • F
      tracing/kmemtrace: normalize the raw tracer event to the unified tracing API · 36994e58
      Frederic Weisbecker 提交于
      Impact: new tracer plugin
      
      This patch adapts kmemtrace raw events tracing to the unified tracing API.
      
      To enable and use this tracer, just do the following:
      
       echo kmemtrace > /debugfs/tracing/current_tracer
       cat /debugfs/tracing/trace
      
      You will have the following output:
      
       # tracer: kmemtrace
       #
       #
       # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
       # FREE   |      |     |       |              |   |            |        |
       # |
      
      type_id 1 call_site 18446744071565527833 ptr 18446612134395152256
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345164672 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345164912 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345165152 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 0 call_site 18446744071566144042 ptr 18446612134346191680 bytes_req 1304 bytes_alloc 1312 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      
      That was to stay backward compatible with the format output produced in
      inux/tracepoint.h.
      
      This is the default ouput, but note that I tried something else.
      
      If you change an option:
      
      echo kmem_minimalistic > /debugfs/trace_options
      
      and then cat /debugfs/trace, you will have the following output:
      
       # tracer: kmemtrace
       #
       #
       # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
       # FREE   |      |     |       |              |   |            |        |
       # |
      
         -      C                            0xffff88007c088780          file_free_rcu
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc780     -1   d_alloc
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc870     -1   d_alloc
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc960     -1   d_alloc
         +      K   1304   1312   000000d0   0xffff8800791d7340     -1   reiserfs_alloc_inode
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         -      C                            0xffff88007cad6000          putname
         +      K    992   1000   000000d0   0xffff880079045b58     -1   alloc_inode
         +      K    768   1024   000080d0   0xffff88007c096400     -1   alloc_pipe_info
         +      K    240    240   000000d0   0xffff8800790dca50     -1   d_alloc
         +      K    272    320   000080d0   0xffff88007c088780     -1   get_empty_filp
         +      K    272    320   000080d0   0xffff88007c088000     -1   get_empty_filp
      
      Yeah I shall confess kmem_minimalistic should be: kmem_alternative.
      
      Whatever, I find it more readable but this a personal opinion of course.
      We can drop it if you want.
      
      On the ALLOC/FREE column, + means an allocation and - a free.
      
      On the type column, you have K = kmalloc, C = cache, P = page
      
      I would like the flags to be GFP_* strings but that would not be easy to not
      break the column with strings....
      
      About the node...it seems to always be -1. I don't know why but that shouldn't
      be difficult to find.
      
      I moved linux/tracepoint.h to trace/tracepoint.h as well. I think that would
      be more easy to find the tracer headers if they are all in their common
      directory.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      36994e58
  8. 29 12月, 2008 1 次提交
  9. 16 12月, 2008 1 次提交
  10. 10 10月, 2008 1 次提交
  11. 08 10月, 2008 1 次提交
  12. 30 7月, 2008 1 次提交
  13. 27 7月, 2008 1 次提交
  14. 25 7月, 2008 1 次提交
  15. 20 5月, 2008 1 次提交
  16. 27 4月, 2008 1 次提交
  17. 06 2月, 2008 2 次提交
  18. 10 12月, 2007 1 次提交
  19. 06 12月, 2007 1 次提交
  20. 16 11月, 2007 1 次提交
  21. 17 10月, 2007 3 次提交
  22. 22 7月, 2007 1 次提交
    • M
      slob: reduce list scanning · d6269543
      Matt Mackall 提交于
      The version of SLOB in -mm always scans its free list from the beginning,
      which results in small allocations and free segments clustering at the
      beginning of the list over time.  This causes the average search to scan
      over a large stretch at the beginning on each allocation.
      
      By starting each page search where the last one left off, we evenly
      distribute the allocations and greatly shorten the average search.
      
      Without this patch, kernel compiles on a 1.5G machine take a large amount
      of system time for list scanning.  With this patch, compiles are within a
      few seconds of performance of a SLAB kernel with no notable change in
      system time.
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d6269543
  23. 20 7月, 2007 1 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
  24. 18 7月, 2007 4 次提交
  25. 17 7月, 2007 5 次提交
    • P
      slob: sparsemem support · 84a01c2f
      Paul Mundt 提交于
      Currently slob is disabled if we're using sparsemem, due to an earlier
      patch from Goto-san.  Slob and static sparsemem work without any trouble as
      it is, and the only hiccup is a missing slab_is_available() in the case of
      sparsemem extreme.  With this, we're rid of the last set of restrictions
      for slob usage.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NMatt Mackall <mpm@selenic.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      84a01c2f
    • P
      slob: initial NUMA support · 6193a2ff
      Paul Mundt 提交于
      This adds preliminary NUMA support to SLOB, primarily aimed at systems with
      small nodes (tested all the way down to a 128kB SRAM block), whether
      asymmetric or otherwise.
      
      We follow the same conventions as SLAB/SLUB, preferring current node
      placement for new pages, or with explicit placement, if a node has been
      specified.  Presently on UP NUMA this has the side-effect of preferring
      node#0 allocations (since numa_node_id() == 0, though this could be
      reworked if we could hand off a pfn to determine node placement), so
      single-CPU NUMA systems will want to place smaller nodes further out in
      terms of node id.  Once a page has been bound to a node (via explicit node
      id typing), we only do block allocations from partial free pages that have
      a matching node id in the page flags.
      
      The current implementation does have some scalability problems, in that all
      partial free pages are tracked in the global freelist (with contention due
      to the single spinlock).  However, these are things that are being reworked
      for SMP scalability first, while things like per-node freelists can easily
      be built on top of this sort of functionality once it's been added.
      
      More background can be found in:
      
      	http://marc.info/?l=linux-mm&m=118117916022379&w=2
      	http://marc.info/?l=linux-mm&m=118170446306199&w=2
      	http://marc.info/?l=linux-mm&m=118187859420048&w=2
      
      and subsequent threads.
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Acked-by: NMatt Mackall <mpm@selenic.com>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Acked-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6193a2ff
    • N
      slob: improved alignment handling · 55394849
      Nick Piggin 提交于
      Remove the core slob allocator's minimum alignment restrictions, and instead
      introduce the alignment restrictions at the slab API layer.  This lets us heed
      the ARCH_KMALLOC/SLAB_MINALIGN directives, and also use __alignof__ (unsigned
      long) for the default alignment (which should allow relaxed alignment
      architectures to take better advantage of SLOB's small minimum alignment).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NMatt Mackall <mpm@selenic.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      55394849
    • N
      slob: remove bigblock tracking · d87a133f
      Nick Piggin 提交于
      Remove the bigblock lists in favour of using compound pages and going directly
      to the page allocator.  Allocation size is stored in page->private, which also
      makes ksize more accurate than it previously was.
      
      Saves ~.5K of code, and 12-24 bytes overhead per >= PAGE_SIZE allocation.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NMatt Mackall <mpm@selenic.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d87a133f
    • N
      slob: rework freelist handling · 95b35127
      Nick Piggin 提交于
      Improve slob by turning the freelist into a list of pages using struct page
      fields, then each page has a singly linked freelist of slob blocks via a
      pointer in the struct page.
      
      - The first benefit is that the slob freelists can be indexed by a smaller
        type (2 bytes, if the PAGE_SIZE is reasonable).
      
      - Next is that freeing is much quicker because it does not have to traverse
        the entire freelist. Allocation can be slightly faster too, because we can
        skip almost-full freelist pages completely.
      
      - Slob pages are then freed immediately when they become empty, rather than
        having a periodic timer try to free them. This gives efficiency and memory
        consumption improvement.
      
      Then, we don't encode seperate size and next fields into each slob block,
      rather we use the sign bit to distinguish between "size" or "next". Then
      size 1 blocks contain a "next" offset, and others contain the "size" in
      the first unit and "next" in the second unit.
      
      - This allows minimum slob allocation alignment to go from 8 bytes to 2
        bytes on 32-bit and 12 bytes to 2 bytes on 64-bit. In practice, it is
        best to align them to word size, however some architectures (eg. cris)
        could gain space savings from turning off this extra alignment.
      
      Then, make kmalloc use its own slob_block at the front of the allocation
      in order to encode allocation size, rather than rely on not overwriting
      slob's existing header block.
      
      - This reduces kmalloc allocation overhead similarly to alignment reductions.
      
      - Decouples kmalloc layer from the slob allocator.
      
      Then, add a page flag specific to slob pages.
      
      - This means kfree of a page aligned slob block doesn't have to traverse
        the bigblock list.
      
      I would get benchmarks, but my test box's network doesn't come up with
      slob before this patch. I think something is timing out. Anyway, things
      are faster after the patch.
      
      Code size goes up about 1K, however dynamic memory usage _should_ be
      lower even on relatively small memory systems.
      
      Future todo item is to restore the cyclic free list search, rather than
      to always begin at the start.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NMatt Mackall <mpm@selenic.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95b35127
  26. 17 5月, 2007 3 次提交
  27. 08 5月, 2007 1 次提交