1. 20 8月, 2010 1 次提交
  2. 10 8月, 2010 2 次提交
    • J
      mm: implement writeback livelock avoidance using page tagging · f446daae
      Jan Kara 提交于
      We try to avoid livelocks of writeback when some steadily creates dirty
      pages in a mapping we are writing out.  For memory-cleaning writeback,
      using nr_to_write works reasonably well but we cannot really use it for
      data integrity writeback.  This patch tries to solve the problem.
      
      The idea is simple: Tag all pages that should be written back with a
      special tag (TOWRITE) in the radix tree.  This can be done rather quickly
      and thus livelocks should not happen in practice.  Then we start doing the
      hard work of locking pages and sending them to disk only for those pages
      that have TOWRITE tag set.
      
      Note: Adding new radix tree tag grows radix tree node from 288 to 296
      bytes for 32-bit archs and from 552 to 560 bytes for 64-bit archs.
      However, the number of slab/slub items per page remains the same (13 and 7
      respectively).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f446daae
    • J
      radix-tree: omplement function radix_tree_range_tag_if_tagged · ebf8aa44
      Jan Kara 提交于
      Implement function for setting one tag if another tag is set for each item
      in given range.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ebf8aa44
  3. 10 4月, 2010 1 次提交
    • D
      radix_tree_tag_get() is not as safe as the docs make out [ver #2] · ce82653d
      David Howells 提交于
      radix_tree_tag_get() is not safe to use concurrently with radix_tree_tag_set()
      or radix_tree_tag_clear().  The problem is that the double tag_get() in
      radix_tree_tag_get():
      
      		if (!tag_get(node, tag, offset))
      			saw_unset_tag = 1;
      		if (height == 1) {
      			int ret = tag_get(node, tag, offset);
      
      may see the value change due to the action of set/clear.  RCU is no protection
      against this as no pointers are being changed, no nodes are being replaced
      according to a COW protocol - set/clear alter the node directly.
      
      The documentation in linux/radix-tree.h, however, says that
      radix_tree_tag_get() is an exception to the rule that "any function modifying
      the tree or tags (...) must exclude other modifications, and exclude any
      functions reading the tree".
      
      The problem is that the next statement in radix_tree_tag_get() checks that the
      tag doesn't vary over time:
      
      			BUG_ON(ret && saw_unset_tag);
      
      This has been seen happening in FS-Cache:
      
      	https://www.redhat.com/archives/linux-cachefs/2010-April/msg00013.html
      
      To this end, remove the BUG_ON() from radix_tree_tag_get() and note in various
      comments that the value of the tag may change whilst the RCU read lock is held,
      and thus that the return value of radix_tree_tag_get() may not be relied upon
      unless radix_tree_tag_set/clear() and radix_tree_delete() are excluded from
      running concurrently with it.
      Reported-by: NRomain DEGEZ <romain.degez@smartjog.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce82653d
  4. 17 6月, 2009 1 次提交
  5. 06 1月, 2009 1 次提交
    • N
      mm lockless pagecache barrier fix · e8c82c2e
      Nick Piggin 提交于
      An XFS workload showed up a bug in the lockless pagecache patch. Basically it
      would go into an "infinite" loop, although it would sometimes be able to break
      out of the loop! The reason is a missing compiler barrier in the "increment
      reference count unless it was zero" case of the lockless pagecache protocol in
      the gang lookup functions.
      
      This would cause the compiler to use a cached value of struct page pointer to
      retry the operation with, rather than reload it. So the page might have been
      removed from pagecache and freed (refcount==0) but the lookup would not correctly
      notice the page is no longer in pagecache, and keep attempting to increment the
      refcount and failing, until the page gets reallocated for something else. This
      isn't a data corruption because the condition will be detected if the page has
      been reallocated. However it can result in a lockup.
      
      Linus points out that ACCESS_ONCE is also required in that pointer load, even
      if it's absence is not causing a bug on our particular build. The most general
      way to solve this is just to put an rcu_dereference in radix_tree_deref_slot.
      
      Assembly of find_get_pages,
      before:
      .L220:
              movq    (%rbx), %rax    #* ivtmp.1162, tmp82
              movq    (%rax), %rdi    #, prephitmp.1149
      .L218:
              testb   $1, %dil        #, prephitmp.1149
              jne     .L217   #,
              testq   %rdi, %rdi      # prephitmp.1149
              je      .L203   #,
              cmpq    $-1, %rdi       #, prephitmp.1149
              je      .L217   #,
              movl    8(%rdi), %esi   # <variable>._count.counter, c
              testl   %esi, %esi      # c
              je      .L218   #,
      
      after:
      .L212:
              movq    (%rbx), %rax    #* ivtmp.1109, tmp81
              movq    (%rax), %rdi    #, ret
              testb   $1, %dil        #, ret
              jne     .L211   #,
              testq   %rdi, %rdi      # ret
              je      .L197   #,
              cmpq    $-1, %rdi       #, ret
              je      .L211   #,
              movl    8(%rdi), %esi   # <variable>._count.counter, c
              testl   %esi, %esi      # c
              je      .L212   #,
      
      (notice the obvious infinite loop in the first example, if page->count remains 0)
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8c82c2e
  6. 27 7月, 2008 1 次提交
  7. 03 2月, 2008 1 次提交
  8. 17 10月, 2007 2 次提交
  9. 09 5月, 2007 1 次提交
  10. 08 12月, 2006 1 次提交
    • N
      [PATCH] radix-tree: RCU lockless readside · 7cf9c2c7
      Nick Piggin 提交于
      Make radix tree lookups safe to be performed without locks.  Readers are
      protected against nodes being deleted by using RCU based freeing.  Readers
      are protected against new node insertion by using memory barriers to ensure
      the node itself will be properly written before it is visible in the radix
      tree.
      
      Each radix tree node keeps a record of their height (above leaf nodes).
      This height does not change after insertion -- when the radix tree is
      extended, higher nodes are only inserted in the top.  So a lookup can take
      the pointer to what is *now* the root node, and traverse down it even if
      the tree is concurrently extended and this node becomes a subtree of a new
      root.
      
      "Direct" pointers (tree height of 0, where root->rnode points directly to
      the data item) are handled by using the low bit of the pointer to signal
      whether rnode is a direct pointer or a pointer to a radix tree node.
      
      When a reader wants to traverse the next branch, they will take a copy of
      the pointer.  This pointer will be either NULL (and the branch is empty) or
      non-NULL (and will point to a valid node).
      
      [akpm@osdl.org: cleanups]
      [Lee.Schermerhorn@hp.com: bugfixes, comments, simplifications]
      [clameter@sgi.com: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7cf9c2c7
  11. 04 12月, 2006 1 次提交
  12. 23 6月, 2006 1 次提交
    • N
      [PATCH] radix-tree: direct data · 612d6c19
      Nick Piggin 提交于
      The ability to have height 0 radix trees (a direct pointer to the data item
      rather than going through a full node->slot) quietly disappeared with
      old-2.6-bkcvs commit ffee171812d51652f9ba284302d9e5c5cc14bdfd.  On 64-bit
      machines this causes nearly 600 bytes to be used for every <= 4K file in
      pagecache.
      
      Re-introduce this feature, root tags stored in spare ->gfp_mask bits.
      
      Simplify radix_tree_delete's complex tag clearing arrangement (which would
      become even more complex) by just falling back to tag clearing functions
      (the pagecache radix-tree never uses this path anyway, so the icache
      savings will mean it's actually a speedup).
      
      On my 4GB G5, this saves 8MB RAM per kernel kernel source+object tree in
      pagecache.
      
      Pagecache lookup, insertion, and removal speed for small files will also be
      improved.
      
      This makes RCU radix tree harder, but it's worth it.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      612d6c19
  13. 26 3月, 2006 1 次提交
  14. 09 1月, 2006 1 次提交
  15. 07 11月, 2005 1 次提交
  16. 28 10月, 2005 1 次提交
  17. 09 10月, 2005 1 次提交
  18. 11 9月, 2005 1 次提交
  19. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4