1. 28 3月, 2012 3 次提交
  2. 27 3月, 2012 13 次提交
    • J
      Btrfs: deal with read errors on extent buffers differently · ea466794
      Josef Bacik 提交于
      Since we need to read and write extent buffers in their entirety we can't use
      the normal bio_readpage_error stuff since it only works on a per page basis.  So
      instead make it so that if we see an io error in endio we just mark the eb as
      having an IO error and then in btree_read_extent_buffer_pages we will manually
      try other mirrors and then overwrite the bad mirror if we find a good copy.
      This works with larger than page size blocks.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ea466794
    • C
      Btrfs: don't use threaded IO completion helpers for metadata writes · f3f266ab
      Chris Mason 提交于
      The metadata write IO completion code is now simple enough that we
      don't need the threaded helpers anymore.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f3f266ab
    • C
      Btrfs: adjust the write_lock_level as we unlock · f7c79f30
      Chris Mason 提交于
      btrfs_search_slot sometimes needs write locks on high levels of
      the tree.  It remembers the highest level that needs a write lock
      and will use that for all future searches through the tree in a given
      call.
      
      But, very often we'll just cow the top level or the level below and we
      won't really need write locks on the root again after that.  This patch
      changes things to adjust the write lock requirement as it unlocks
      levels.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f7c79f30
    • C
      Btrfs: loop waiting on writeback · a098d8e8
      Chris Mason 提交于
      lock_extent_buffer_for_io needs to loop around and make sure the
      writeback bits are not set.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a098d8e8
    • C
      Btrfs: add the ability to cache a pointer into the eb · cfed81a0
      Chris Mason 提交于
      This cuts down on the CPU time used by map_private_extent_buffer
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      cfed81a0
    • J
      Btrfs: ensure an entire eb is written at once · 0b32f4bb
      Josef Bacik 提交于
      This patch simplifies how we track our extent buffers.  Previously we could exit
      writepages with only having written half of an extent buffer, which meant we had
      to track the state of the pages and the state of the extent buffers differently.
      Now we only read in entire extent buffers and write out entire extent buffers,
      this allows us to simply set bits in our bflags to indicate the state of the eb
      and we no longer have to do things like track uptodate with our iotree.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0b32f4bb
    • J
      Btrfs: introduce mark_extent_buffer_accessed · 5df4235e
      Josef Bacik 提交于
      Because an eb can have multiple pages we need to make sure that all pages within
      the eb are markes as accessed, since releasepage can be called against any page
      in the eb.  This will keep us from possibly evicting hot eb's when we're doing
      larger than pagesize eb's.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      5df4235e
    • J
      Btrfs: introduce free_extent_buffer_stale · 3083ee2e
      Josef Bacik 提交于
      Because btrfs cow's we can end up with extent buffers that are no longer
      necessary just sitting around in memory.  So instead of evicting these pages, we
      could end up evicting things we actually care about.  Thus we have
      free_extent_buffer_stale for use when we are freeing tree blocks.  This will
      make it so that the ref for the eb being in the radix tree is dropped as soon as
      possible and then is freed when the refcount hits 0 instead of waiting to be
      released by releasepage.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      3083ee2e
    • J
      Btrfs: only use the existing eb if it's count isn't 0 · 115391d2
      Josef Bacik 提交于
      We can run into a problem where we find an eb for our existing page already on
      the radix tree but it has a ref count of 0.  It hasn't yet been removed by RCU
      yet so this can cause issues where we will use the EB after free.  So do
      atomic_inc_not_zero on the exists->refs and if it is zero just do
      synchronize_rcu() and try again.  We won't have to worry about new allocators
      coming in since they will block on the page lock at this point.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      115391d2
    • J
      Btrfs: set page->private to the eb · 4f2de97a
      Josef Bacik 提交于
      We spend a lot of time looking up extent buffers from pages when we could just
      store the pointer to the eb the page is associated with in page->private.  This
      patch does just that, and it makes things a little simpler and reduces a bit of
      CPU overhead involved with doing metadata IO.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      4f2de97a
    • C
      Btrfs: allow metadata blocks larger than the page size · 727011e0
      Chris Mason 提交于
      A few years ago the btrfs code to support blocks lager than
      the page size was disabled to fix a few corner cases in the
      page cache handling.  This fixes the code to properly support
      large metadata blocks again.
      
      Since current kernels will crash early and often with larger
      metadata blocks, this adds an incompat bit so that older kernels
      can't mount it.
      
      This also does away with different blocksizes for nodes and leaves.
      You get a single block size for all tree blocks.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      727011e0
    • J
      Btrfs: remove search_start and search_end from find_free_extent and callers · 81c9ad23
      Josef Bacik 提交于
      We have been passing nothing but (u64)-1 to find_free_extent for search_end in
      all of the callers, so it's completely useless, and we've always been passing 0
      in as search_start, so just remove them as function arguments and move
      search_start into find_free_extent.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      81c9ad23
    • J
      Btrfs: remove the ideal caching code · 285ff5af
      Josef Bacik 提交于
      This is a relic from before we had the disk space cache and it was to make
      bootup times when you had btrfs as root not be so damned slow.  Now that we have
      the disk space cache this isn't a problem anymore and really having this code
      casues uneeded fragmentation and complexity, so just remove it.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      285ff5af
  3. 19 3月, 2012 1 次提交
  4. 17 3月, 2012 4 次提交
  5. 11 3月, 2012 5 次提交
  6. 10 3月, 2012 2 次提交
    • A
      aio: fix the "too late munmap()" race · c7b28555
      Al Viro 提交于
      Current code has put_ioctx() called asynchronously from aio_fput_routine();
      that's done *after* we have killed the request that used to pin ioctx,
      so there's nothing to stop io_destroy() waiting in wait_for_all_aios()
      from progressing.  As the result, we can end up with async call of
      put_ioctx() being the last one and possibly happening during exit_mmap()
      or elf_core_dump(), neither of which expects stray munmap() being done
      to them...
      
      We do need to prevent _freeing_ ioctx until aio_fput_routine() is done
      with that, but that's all we care about - neither io_destroy() nor
      exit_aio() will progress past wait_for_all_aios() until aio_fput_routine()
      does really_put_req(), so the ioctx teardown won't be done until then
      and we don't care about the contents of ioctx past that point.
      
      Since actual freeing of these suckers is RCU-delayed, we don't need to
      bump ioctx refcount when request goes into list for async removal.
      All we need is rcu_read_lock held just over the ->ctx_lock-protected
      area in aio_fput_routine().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7b28555
    • A
      aio: fix io_setup/io_destroy race · 86b62a2c
      Al Viro 提交于
      Have ioctx_alloc() return an extra reference, so that caller would drop it
      on success and not bother with re-grabbing it on failure exit.  The current
      code is obviously broken - io_destroy() from another thread that managed
      to guess the address io_setup() would've returned would free ioctx right
      under us; gets especially interesting if aio_context_t * we pass to
      io_setup() points to PROT_READ mapping, so put_user() fails and we end
      up doing io_destroy() on kioctx another thread has just got freed...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      86b62a2c
  7. 07 3月, 2012 2 次提交
  8. 06 3月, 2012 4 次提交
  9. 05 3月, 2012 1 次提交
    • L
      vfs: move dentry_cmp from <linux/dcache.h> to fs/dcache.c · 5483f18e
      Linus Torvalds 提交于
      It's only used inside fs/dcache.c, and we're going to play games with it
      for the word-at-a-time patches.  This time we really don't even want to
      export it, because it really is an internal function to fs/dcache.c, and
      has been since it was introduced.
      
      Having it in that extremely hot header file (it's included in pretty
      much everything, thanks to <linux/fs.h>) is a disaster for testing
      different versions, and is utterly pointless.
      
      We really should have some kind of header file diet thing, where we
      figure out which parts of header files are really better off private and
      only result in more expensive compiles.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5483f18e
  10. 03 3月, 2012 5 次提交