1. 23 8月, 2010 2 次提交
    • D
      radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags · 144dcfc0
      Dave Chinner 提交于
      Commit ebf8aa44 ("radix-tree:
      omplement function radix_tree_range_tag_if_tagged") does not safely
      set tags on on intermediate tree nodes. The code walks down the tree
      setting tags before it has fully resolved the path to the leaf under
      the assumption there will be a leaf slot with the tag set in the
      range it is searching.
      
      Unfortunately, this is not a valid assumption - we can abort after
      setting a tag on an intermediate node if we overrun the number of
      tags we are allowed to set in a batch, or stop scanning because we
      we have passed the last scan index before we reach a leaf slot with
      the tag we are searching for set.
      
      As a result, we can leave the function with tags set on intemediate
      nodes which can be tripped over later by tag-based lookups. The
      result of these stale tags is that lookup may end prematurely or
      livelock because the lookup cannot make progress.
      
      The fix for the problem involves reocrding the traversal path we
      take to the leaf nodes, and only propagating the tags back up the
      tree once the tag is set in the leaf node slot. We are already
      recording the path for efficient traversal, so there is no
      additional overhead to do the intermediately node tag setting in
      this manner.
      
      This fixes a radix tree lookup livelock triggered by the new
      writeback sync livelock avoidance code introduced in commit
      f446daae ("mm: implement writeback
      livelock avoidance using page tagging").
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Acked-by: NJan Kara <jack@suse.cz>
      144dcfc0
    • D
      radix-tree: clear all tags in radix_tree_node_rcu_free · b6dd0865
      Dave Chinner 提交于
      Commit f446daae ("mm: implement
      writeback livelock avoidance using page tagging") introduced a new
      radix tree tag, increasing the number of tags in each node from 2 to
      3. It did not, however, fix up the code in
      radix_tree_node_rcu_free() that cleans up after radix_tree_shrink()
      and hence could leave stray tags set in the new tag array.
      
      The result is that the livelock avoidance code added in the the
      above commit would hit stale tags when doing tag based lookups,
      resulting in livelocks when trying to traverse the tree.
      
      Fix this problem in radix_tree_node_rcu_free() so it doesn't happen
      again in the future by using a loop to walk all the tags up to
      RADIX_TREE_MAX_TAGS to clear the stray tags radix_tree_shrink()
      leaves behind.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Acked-by: NNick Piggin <npiggin@kernel.dk>
      Acked-by: NJan Kara <jack@suse.cz>
      b6dd0865
  2. 21 8月, 2010 1 次提交
  3. 10 8月, 2010 1 次提交
  4. 28 5月, 2010 1 次提交
  5. 10 4月, 2010 1 次提交
    • D
      radix_tree_tag_get() is not as safe as the docs make out [ver #2] · ce82653d
      David Howells 提交于
      radix_tree_tag_get() is not safe to use concurrently with radix_tree_tag_set()
      or radix_tree_tag_clear().  The problem is that the double tag_get() in
      radix_tree_tag_get():
      
      		if (!tag_get(node, tag, offset))
      			saw_unset_tag = 1;
      		if (height == 1) {
      			int ret = tag_get(node, tag, offset);
      
      may see the value change due to the action of set/clear.  RCU is no protection
      against this as no pointers are being changed, no nodes are being replaced
      according to a COW protocol - set/clear alter the node directly.
      
      The documentation in linux/radix-tree.h, however, says that
      radix_tree_tag_get() is an exception to the rule that "any function modifying
      the tree or tags (...) must exclude other modifications, and exclude any
      functions reading the tree".
      
      The problem is that the next statement in radix_tree_tag_get() checks that the
      tag doesn't vary over time:
      
      			BUG_ON(ret && saw_unset_tag);
      
      This has been seen happening in FS-Cache:
      
      	https://www.redhat.com/archives/linux-cachefs/2010-April/msg00013.html
      
      To this end, remove the BUG_ON() from radix_tree_tag_get() and note in various
      comments that the value of the tag may change whilst the RCU read lock is held,
      and thus that the return value of radix_tree_tag_get() may not be relied upon
      unless radix_tree_tag_set/clear() and radix_tree_delete() are excluded from
      running concurrently with it.
      Reported-by: NRomain DEGEZ <romain.degez@smartjog.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce82653d
  6. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  7. 25 2月, 2010 1 次提交
    • P
      radix-tree: Disable RCU lockdep checking in radix tree · 2676a58c
      Paul E. McKenney 提交于
      Because the radix tree is used with many different locking
      designs, we cannot do any effective checking without changing
      the radix-tree APIs. It might make sense to do this later, but
      only if the RCU lockdep checking proves itself sufficiently
      valuable.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-10-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2676a58c
  8. 20 11月, 2009 2 次提交
    • D
      FS-Cache: Don't delete pending pages from the page-store tracking tree · 285e728b
      David Howells 提交于
      Don't delete pending pages from the page-store tracking tree, but rather send
      them for another write as they've presumably been updated.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      285e728b
    • D
      FS-Cache: Use radix tree preload correctly in tracking of pages to be stored · b34df792
      David Howells 提交于
      __fscache_write_page() attempts to load the radix tree preallocation pool for
      the CPU it is on before calling radix_tree_insert(), as the insertion must be
      done inside a pair of spinlocks.
      
      Use of the preallocation pool, however, is contingent on the radix tree being
      initialised without __GFP_WAIT specified.  __fscache_acquire_cookie() was
      passing GFP_NOFS to INIT_RADIX_TREE() - but that includes __GFP_WAIT.
      
      The solution is to AND out __GFP_WAIT.
      
      Additionally, the banner comment to radix_tree_preload() is altered to make
      note of this prerequisite.  Possibly there should be a WARN_ON() too.
      
      Without this fix, I have seen the following recursive deadlock caused by
      radix_tree_insert() attempting to allocate memory inside the spinlocked
      region, which resulted in FS-Cache being called back into to release memory -
      which required the spinlock already held.
      
      =============================================
      [ INFO: possible recursive locking detected ]
      2.6.32-rc6-cachefs #24
      ---------------------------------------------
      nfsiod/7916 is trying to acquire lock:
       (&cookie->lock){+.+.-.}, at: [<ffffffffa0076872>] __fscache_uncache_page+0xdb/0x160 [fscache]
      
      but task is already holding lock:
       (&cookie->lock){+.+.-.}, at: [<ffffffffa0076acc>] __fscache_write_page+0x15c/0x3f3 [fscache]
      
      other info that might help us debug this:
      5 locks held by nfsiod/7916:
       #0:  (nfsiod){+.+.+.}, at: [<ffffffff81048290>] worker_thread+0x19a/0x2e2
       #1:  (&task->u.tk_work#2){+.+.+.}, at: [<ffffffff81048290>] worker_thread+0x19a/0x2e2
       #2:  (&cookie->lock){+.+.-.}, at: [<ffffffffa0076acc>] __fscache_write_page+0x15c/0x3f3 [fscache]
       #3:  (&object->lock#2){+.+.-.}, at: [<ffffffffa0076b07>] __fscache_write_page+0x197/0x3f3 [fscache]
       #4:  (&cookie->stores_lock){+.+...}, at: [<ffffffffa0076b0f>] __fscache_write_page+0x19f/0x3f3 [fscache]
      
      stack backtrace:
      Pid: 7916, comm: nfsiod Not tainted 2.6.32-rc6-cachefs #24
      Call Trace:
       [<ffffffff8105ac7f>] __lock_acquire+0x1649/0x16e3
       [<ffffffff81059ded>] ? __lock_acquire+0x7b7/0x16e3
       [<ffffffff8100e27d>] ? dump_trace+0x248/0x257
       [<ffffffff8105ad70>] lock_acquire+0x57/0x6d
       [<ffffffffa0076872>] ? __fscache_uncache_page+0xdb/0x160 [fscache]
       [<ffffffff8135467c>] _spin_lock+0x2c/0x3b
       [<ffffffffa0076872>] ? __fscache_uncache_page+0xdb/0x160 [fscache]
       [<ffffffffa0076872>] __fscache_uncache_page+0xdb/0x160 [fscache]
       [<ffffffffa0077eb7>] ? __fscache_check_page_write+0x0/0x71 [fscache]
       [<ffffffffa00b4755>] nfs_fscache_release_page+0x86/0xc4 [nfs]
       [<ffffffffa00907f0>] nfs_release_page+0x3c/0x41 [nfs]
       [<ffffffff81087ffb>] try_to_release_page+0x32/0x3b
       [<ffffffff81092c2b>] shrink_page_list+0x316/0x4ac
       [<ffffffff81058a9b>] ? mark_held_locks+0x52/0x70
       [<ffffffff8135451b>] ? _spin_unlock_irq+0x2b/0x31
       [<ffffffff81093153>] shrink_inactive_list+0x392/0x67c
       [<ffffffff81058a9b>] ? mark_held_locks+0x52/0x70
       [<ffffffff810934ca>] shrink_list+0x8d/0x8f
       [<ffffffff81093744>] shrink_zone+0x278/0x33c
       [<ffffffff81052c70>] ? ktime_get_ts+0xad/0xba
       [<ffffffff8109453b>] try_to_free_pages+0x22e/0x392
       [<ffffffff8109184c>] ? isolate_pages_global+0x0/0x212
       [<ffffffff8108e16b>] __alloc_pages_nodemask+0x3dc/0x5cf
       [<ffffffff810ae24a>] cache_alloc_refill+0x34d/0x6c1
       [<ffffffff811bcf74>] ? radix_tree_node_alloc+0x52/0x5c
       [<ffffffff810ae929>] kmem_cache_alloc+0xb2/0x118
       [<ffffffff811bcf74>] radix_tree_node_alloc+0x52/0x5c
       [<ffffffff811bcfd5>] radix_tree_insert+0x57/0x19c
       [<ffffffffa0076b53>] __fscache_write_page+0x1e3/0x3f3 [fscache]
       [<ffffffffa00b4248>] __nfs_readpage_to_fscache+0x58/0x11e [nfs]
       [<ffffffffa009bb77>] nfs_readpage_release+0x34/0x9b [nfs]
       [<ffffffffa009c0d9>] nfs_readpage_release_full+0x32/0x4b [nfs]
       [<ffffffffa0006cff>] rpc_release_calldata+0x12/0x14 [sunrpc]
       [<ffffffffa0006e2d>] rpc_free_task+0x59/0x61 [sunrpc]
       [<ffffffffa0006f03>] rpc_async_release+0x10/0x12 [sunrpc]
       [<ffffffff810482e5>] worker_thread+0x1ef/0x2e2
       [<ffffffff81048290>] ? worker_thread+0x19a/0x2e2
       [<ffffffff81352433>] ? thread_return+0x3e/0x101
       [<ffffffffa0006ef3>] ? rpc_async_release+0x0/0x12 [sunrpc]
       [<ffffffff8104bff5>] ? autoremove_wake_function+0x0/0x34
       [<ffffffff81058d25>] ? trace_hardirqs_on+0xd/0xf
       [<ffffffff810480f6>] ? worker_thread+0x0/0x2e2
       [<ffffffff8104bd21>] kthread+0x7a/0x82
       [<ffffffff8100beda>] child_rip+0xa/0x20
       [<ffffffff8100b87c>] ? restore_args+0x0/0x30
       [<ffffffff8104c2b9>] ? add_wait_queue+0x15/0x44
       [<ffffffff8104bca7>] ? kthread+0x0/0x82
       [<ffffffff8100bed0>] ? child_rip+0x0/0x20
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b34df792
  9. 17 6月, 2009 2 次提交
  10. 07 1月, 2009 1 次提交
  11. 06 1月, 2009 1 次提交
  12. 27 7月, 2008 2 次提交
  13. 05 7月, 2008 1 次提交
  14. 13 6月, 2008 1 次提交
    • N
      radix-tree: fix small lockless radix-tree bug · 643b52b9
      Nick Piggin 提交于
      We shrink a radix tree when its root node has only one child, in the left
      most slot.  The child becomes the new root node.  To perform this
      operation in a manner compatible with concurrent lockless lookups, we
      atomically switch the root pointer from the parent to its child.
      
      However a concurrent lockless lookup may now have loaded a pointer to the
      parent (and is presently deciding what to do next).  For this reason, we
      also have to keep the parent node in a valid state after shrinking the
      tree, until the next RCU grace period -- otherwise this lookup with the
      parent pointer may not do the right thing.  Notably, we need to keep the
      child in the left most slot there in case that is requested by the lookup.
      
      This is all pretty standard RCU stuff.  It is worth repeating because in
      my eagerness to obey the radix tree node constructor scheme, I had broken
      it by zeroing the radix tree node before the grace period.
      
      What could happen is that a lookup can load the parent pointer, then
      decide it wants to follow the left most child slot, only to find the slot
      contained NULL due to the concurrent shrinker having zeroed the parent
      node before waiting for a grace period.  The lookup would return a false
      negative as a result.
      
      Fix it by doing that clearing in the RCU callback.  I would normally want
      to rip out the constructor entirely, but radix tree nodes are one of those
      places where they make sense (only few cachelines will be touched soon
      after allocation).
      
      This was never actually found in any lockless pagecache testing or by the
      test harness, but by seeing the odd problem with my scalable vmap rewrite.
       I have not tickled the test harness into reproducing it yet, but I'll
      keep working at it.
      
      Fortunately, it is not a problem anywhere lockless pagecache is used in
      mainline kernels (pagecache probe is not a guarantee, and brd does not
      have concurrent lookups and deletes).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      643b52b9
  15. 28 4月, 2008 1 次提交
    • C
      Remove set_migrateflags() · 488514d1
      Christoph Lameter 提交于
      Migrate flags must be set on slab creation as agreed upon when the antifrag
      logic was reviewed.  Otherwise some slabs of a slabcache will end up in the
      unmovable and others in the reclaimable section depending on which flag was
      active when a new slab page was allocated.
      
      This likely slid in somehow when antifrag was merged. Remove it.
      
      The buffer_heads are always allocated with __GFP_RECLAIMABLE because the
      SLAB_RECLAIM_ACCOUNT option is set.  The set_migrateflags() never had any
      effect there.
      
      Radix tree allocations are not directly reclaimable but they are allocated
      with __GFP_RECLAIMABLE set on each allocation.  We now set
      SLAB_RECLAIM_ACCOUNT on radix tree slab creation making sure that radix
      tree slabs are consistently placed in the reclaimable section.  Radix tree
      slabs will also be accounted as such.
      
      There is then no user left of set_migratepages. So remove it.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      488514d1
  16. 06 2月, 2008 1 次提交
    • N
      radix-tree: avoid atomic allocations for preloaded insertions · e2848a0e
      Nick Piggin 提交于
      Most pagecache (and some other) radix tree insertions have the great
      opportunity to preallocate a few nodes with relaxed gfp flags.  But the
      preallocation is squandered when it comes time to allocate a node, we
      default to first attempting a GFP_ATOMIC allocation -- that doesn't
      normally fail, but it can eat into atomic memory reserves that we don't
      need to be using.
      
      Another upshot of this is that it removes the sometimes highly contended
      zone->lock from underneath tree_lock.  Pagecache insertions are always
      performed with a radix tree preload, and after this change, such a
      situation will never fall back to kmem_cache_alloc within
      radix_tree_node_alloc.
      
      David Miller reports seeing this allocation fail on a highly threaded
      sparc64 system:
      
      [527319.459981] dd: page allocation failure. order:0, mode:0x20
      [527319.460403] Call Trace:
      [527319.460568]  [00000000004b71e0] __slab_alloc+0x1b0/0x6a8
      [527319.460636]  [00000000004b7bbc] kmem_cache_alloc+0x4c/0xa8
      [527319.460698]  [000000000055309c] radix_tree_node_alloc+0x20/0x90
      [527319.460763]  [0000000000553238] radix_tree_insert+0x12c/0x260
      [527319.460830]  [0000000000495cd0] add_to_page_cache+0x38/0xb0
      [527319.460893]  [00000000004e4794] mpage_readpages+0x6c/0x134
      [527319.460955]  [000000000049c7fc] __do_page_cache_readahead+0x170/0x280
      [527319.461028]  [000000000049cc88] ondemand_readahead+0x208/0x214
      [527319.461094]  [0000000000496018] do_generic_mapping_read+0xe8/0x428
      [527319.461152]  [0000000000497948] generic_file_aio_read+0x108/0x170
      [527319.461217]  [00000000004badac] do_sync_read+0x88/0xd0
      [527319.461292]  [00000000004bb5cc] vfs_read+0x78/0x10c
      [527319.461361]  [00000000004bb920] sys_read+0x34/0x60
      [527319.461424]  [0000000000406294] linux_sparc_syscall32+0x3c/0x40
      
      The calltrace is significant: __do_page_cache_readahead allocates a number
      of pages with GFP_KERNEL, and hence it should have reclaimed sufficient
      memory to satisfy GFP_ATOMIC allocations.  However after the list of pages
      goes to mpage_readpages, there can be significant intervals (including disk
      IO) before all the pages are inserted into the radix-tree.  So the reserves
      can easily be depleted at that point.  The patch is confirmed to fix the
      problem.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e2848a0e
  17. 17 10月, 2007 6 次提交
    • P
      avoid negative (and full-width) shifts in radix-tree.c · 430d275a
      Peter Lund 提交于
      Negative shifts are not allowed in C (the result is undefined).  Same thing
      with full-width shifts.
      
      It works on most platforms but not on the VAX with gcc 4.0.1 (it results in an
      "operand reserved" fault).
      
      Shifting by more than the width of the value on the left is also not
      allowed.  I think the extra '>> 1' tacked on at the end in the original
      code was an attempt to work around that.  Getting rid of that is an extra
      feature of this patch.
      
      Here's the chapter and verse, taken from the final draft of the C99
      standard ("6.5.7 Bitwise shift operators", paragraph 3):
      
        "The integer promotions are performed on each of the operands. The
        type of the result is that of the promoted left operand. If the
        value of the right operand is negative or is greater than or equal
        to the width of the promoted left operand, the behavior is
        undefined."
      
      Thank you to Jan-Benedict Glaw, Christoph Hellwig, Maciej Rozycki, Pekka
      Enberg, Andreas Schwab, and Christoph Lameter for review.  Special thanks
      to Andreas for spotting that my fix only removed half the undefined
      behaviour.
      Signed-off-by: NPeter Lund <firefly@vax64.dk>
      Christoph Lameter <clameter@sgi.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Andreas Schwab <schwab@suse.de>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      430d275a
    • C
      Slab API: remove useless ctor parameter and reorder parameters · 4ba9b9d0
      Christoph Lameter 提交于
      Slab constructors currently have a flags parameter that is never used.  And
      the order of the arguments is opposite to other slab functions.  The object
      pointer is placed before the kmem_cache pointer.
      
      Convert
      
              ctor(void *object, struct kmem_cache *s, unsigned long flags)
      
      to
      
              ctor(struct kmem_cache *s, void *object)
      
      throughout the kernel
      
      [akpm@linux-foundation.org: coupla fixes]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ba9b9d0
    • M
      Group short-lived and reclaimable kernel allocations · e12ba74d
      Mel Gorman 提交于
      This patch marks a number of allocations that are either short-lived such as
      network buffers or are reclaimable such as inode allocations.  When something
      like updatedb is called, long-lived and unmovable kernel allocations tend to
      be spread throughout the address space which increases fragmentation.
      
      This patch groups these allocations together as much as possible by adding a
      new MIGRATE_TYPE.  The MIGRATE_RECLAIMABLE type is for allocations that can be
      reclaimed on demand, but not moved.  i.e.  they can be migrated by deleting
      them and re-reading the information from elsewhere.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e12ba74d
    • J
      fix the max path calculation in radix-tree.c · 26fb1589
      Jeff Moyer 提交于
      A while back, Nick Piggin introduced a patch to reduce the node memory
      usage for small files (commit cfd9b7df):
      
      -#define RADIX_TREE_MAP_SHIFT	6
      +#define RADIX_TREE_MAP_SHIFT	(CONFIG_BASE_SMALL ? 4 : 6)
      
      Unfortunately, he didn't take into account the fact that the
      calculation of the maximum path was based on an assumption of having
      to round up:
      
      #define RADIX_TREE_MAX_PATH (RADIX_TREE_INDEX_BITS/RADIX_TREE_MAP_SHIFT + 2)
      
      So, if CONFIG_BASE_SMALL is set, you will end up with a
      RADIX_TREE_MAX_PATH that is one greater than necessary.  The practical
      upshot of this is just a bit of wasted memory (one long in the
      height_to_maxindex array, an extra pre-allocated radix tree node per
      cpu, and extra stack usage in a couple of functions), but it seems
      worth getting right.
      
      It's also worth noting that I never build with CONFIG_BASE_SMALL.
      What I did to test this was duplicate the code in a small user-space
      program and check the results of the calculations for max path and the
      contents of the height_to_maxindex array.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26fb1589
    • N
      radix-tree: use indirect bit · c0bc9875
      Nick Piggin 提交于
      Rather than sign direct radix-tree pointers with a special bit, sign the
      indirect one that hangs off the root.  This means that, given a lookup_slot
      operation, the invalid result will be differentiated from the valid
      (previously, valid results could have the bit either set or clear).
      
      This does not affect slot lookups which occur under lock -- they can never
      return an invalid result.  Is needed in future for lockless pagecache.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c0bc9875
    • F
      radixtree: introduce radix_tree_next_hole() · 6df8ba4f
      Fengguang Wu 提交于
      Introduce radix_tree_next_hole(root, index, max_scan) to scan radix tree for
      the first hole.  It will be used in interleaved readahead.
      
      The implementation is dumb and obviously correct.  It can help debug(and
      document) the possible smart one in future.
      
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6df8ba4f
  18. 20 7月, 2007 1 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
  19. 14 7月, 2007 1 次提交
  20. 10 5月, 2007 1 次提交
    • R
      Add suspend-related notifications for CPU hotplug · 8bb78442
      Rafael J. Wysocki 提交于
      Since nonboot CPUs are now disabled after tasks and devices have been
      frozen and the CPU hotplug infrastructure is used for this purpose, we need
      special CPU hotplug notifications that will help the CPU-hotplug-aware
      subsystems distinguish normal CPU hotplug events from CPU hotplug events
      related to a system-wide suspend or resume operation in progress.  This
      patch introduces such notifications and causes them to be used during
      suspend and resume transitions.  It also changes all of the
      CPU-hotplug-aware subsystems to take these notifications into consideration
      (for now they are handled in the same way as the corresponding "normal"
      ones).
      
      [oleg@tv-sign.ru: cleanups]
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bb78442
  21. 08 12月, 2006 3 次提交
    • I
      [PATCH] hotplug CPU: clean up hotcpu_notifier() use · 02316067
      Ingo Molnar 提交于
      There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
      prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
      generating compiler warnings of unused symbols, hence forcing people to add
      #ifdefs.
      
      the compiler can skip truly unused functions just fine:
      
          text    data     bss     dec     hex filename
       1624412  728710 3674856 6027978  5bfaca vmlinux.before
       1624412  728710 3674856 6027978  5bfaca vmlinux.after
      
      [akpm@osdl.org: topology.c fix]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      02316067
    • N
      [PATCH] radix-tree: RCU lockless readside · 7cf9c2c7
      Nick Piggin 提交于
      Make radix tree lookups safe to be performed without locks.  Readers are
      protected against nodes being deleted by using RCU based freeing.  Readers
      are protected against new node insertion by using memory barriers to ensure
      the node itself will be properly written before it is visible in the radix
      tree.
      
      Each radix tree node keeps a record of their height (above leaf nodes).
      This height does not change after insertion -- when the radix tree is
      extended, higher nodes are only inserted in the top.  So a lookup can take
      the pointer to what is *now* the root node, and traverse down it even if
      the tree is concurrently extended and this node becomes a subtree of a new
      root.
      
      "Direct" pointers (tree height of 0, where root->rnode points directly to
      the data item) are handled by using the low bit of the pointer to signal
      whether rnode is a direct pointer or a pointer to a radix tree node.
      
      When a reader wants to traverse the next branch, they will take a copy of
      the pointer.  This pointer will be either NULL (and the branch is empty) or
      non-NULL (and will point to a valid node).
      
      [akpm@osdl.org: cleanups]
      [Lee.Schermerhorn@hp.com: bugfixes, comments, simplifications]
      [clameter@sgi.com: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7cf9c2c7
    • C
      [PATCH] slab: remove kmem_cache_t · e18b890b
      Christoph Lameter 提交于
      Replace all uses of kmem_cache_t with struct kmem_cache.
      
      The patch was generated using the following script:
      
      	#!/bin/sh
      	#
      	# Replace one string by another in all the kernel sources.
      	#
      
      	set -e
      
      	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
      		quilt add $file
      		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
      		mv /tmp/$$ $file
      		quilt refresh
      	done
      
      The script was run like this
      
      	sh replace kmem_cache_t "struct kmem_cache"
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e18b890b
  22. 11 10月, 2006 1 次提交
  23. 26 6月, 2006 1 次提交
  24. 23 6月, 2006 3 次提交
    • P
      [PATCH] buglet in radix_tree_tag_set · 4c91c364
      Peter Zijlstra 提交于
      The comment states: 'Setting a tag on a not-present item is a BUG.' Hence
      if 'index' is larger than the maxindex; the item _cannot_ be presen; it
      should also be a BUG.
      
      Also, this allows the following statement (assume a fresh tree):
      
        radix_tree_tag_set(root, 16, 1);
      
      to fail silently, but when preceded by:
      
        radix_tree_insert(root, 32, item);
      
      it would BUG, because the height has been extended by the insert.
      
      In neither case was 16 present.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4c91c364
    • N
      [PATCH] radix-tree: small · cfd9b7df
      Nick Piggin 提交于
      Reduce radix tree node memory usage by about a factor of 4 for small files
      (< 64K).  There are pointer traversal and memory usage costs for large
      files with dense pagecache.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cfd9b7df
    • N
      [PATCH] radix-tree: direct data · 612d6c19
      Nick Piggin 提交于
      The ability to have height 0 radix trees (a direct pointer to the data item
      rather than going through a full node->slot) quietly disappeared with
      old-2.6-bkcvs commit ffee171812d51652f9ba284302d9e5c5cc14bdfd.  On 64-bit
      machines this causes nearly 600 bytes to be used for every <= 4K file in
      pagecache.
      
      Re-introduce this feature, root tags stored in spare ->gfp_mask bits.
      
      Simplify radix_tree_delete's complex tag clearing arrangement (which would
      become even more complex) by just falling back to tag clearing functions
      (the pagecache radix-tree never uses this path anyway, so the icache
      savings will mean it's actually a speedup).
      
      On my 4GB G5, this saves 8MB RAM per kernel kernel source+object tree in
      pagecache.
      
      Pagecache lookup, insertion, and removal speed for small files will also be
      improved.
      
      This makes RCU radix tree harder, but it's worth it.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      612d6c19
  25. 26 3月, 2006 1 次提交
  26. 17 2月, 2006 1 次提交
    • N
      [PATCH] Fix over-zealous tag clearing in radix_tree_delete · 90f9dd8f
      NeilBrown 提交于
      If a tag is set for a node being deleted from a radix_tree, then that
      tag gets cleared from the parent of the node, even if it is set for some
      siblings of the node begin deleted.
      
      This patch changes the logic to include a test for any_tag_set similar
      to the logic a little futher down.  Care is taken to ensure that
      'nr_cleared_tags' remains equals to the number of entries in the 'tags'
      array which are set to '0' (which means that this tag is not set in the
      tree below pathp->node, and should be cleared at pathp->node and
      possibly above.
      
      [ Nick says: "Linus FYI, I was able to modify the radix tree test
        harness to catch the bug and can no longer trigger it after the fix.
        Resulting code passes all other harness tests as well of course." ]
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      90f9dd8f
  27. 09 1月, 2006 1 次提交