1. 18 3月, 2011 8 次提交
  2. 12 3月, 2011 1 次提交
    • C
      Btrfs: break out of shrink_delalloc earlier · 36e39c40
      Chris Mason 提交于
      Josef had changed shrink_delalloc to exit after three shrink
      attempts, which wasn't quite enough because new writers could
      race in and steal free space.
      
      But it also fixed deadlocks and stalls as we tried to recover
      delalloc reservations.  The code was tweaked to loop 1024
      times, and would reset the counter any time a small amount
      of progress was made.  This was too drastic, and with a
      lot of writers we can end up stuck in shrink_delalloc forever.
      
      The shrink_delalloc loop is fairly complex because the caller is looping
      too, and the caller will go ahead and force a transaction commit to make
      sure we reclaim space.
      
      This reworks things to exit shrink_delalloc when we've forced some
      writeback and the delalloc reservations have gone down.  This means
      the writeback has not just started but has also finished at
      least some of the metadata changes required to reclaim delalloc
      space.
      
      If we've got this wrong, we're returning ENOSPC too early, which
      is a big improvement over the current behavior of hanging the machine.
      
      Test 224 in xfstests hammers on this nicely, and with 1000 writers
      trying to fill a 1GB drive we get our first ENOSPC at 93% full.  The
      other writers are able to continue until we get 100%.
      
      This is a worst case test for btrfs because the 1000 writers are doing
      small IO, and the small FS size means we don't have a lot of room
      for metadata chunks.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      36e39c40
  3. 11 3月, 2011 2 次提交
  4. 09 3月, 2011 1 次提交
  5. 08 3月, 2011 1 次提交
    • C
      Btrfs: deal with short returns from copy_from_user · 31339acd
      Chris Mason 提交于
      When copy_from_user is only able to copy some of the bytes we requested,
      we may end up creating a partially up to date page.  To avoid garbage in
      the page, we need to treat a partial copy as a zero length copy.
      
      This makes the rest of the file_write code drop the page and
      retry the whole copy instead of marking the partially up to
      date page as dirty.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      cc: stable@kernel.org
      31339acd
  6. 07 3月, 2011 1 次提交
    • C
      Btrfs: fix regressions in copy_from_user handling · b1bf862e
      Chris Mason 提交于
      Commit 914ee295 fixed deadlocks in
      btrfs_file_write where we would catch page faults on pages we had
      locked.
      
      But, there were a few problems:
      
      1) The x86-32 iov_iter_copy_from_user_atomic code always fails to copy
      data when the amount to copy is more than 4K and the offset to start
      copying from is not page aligned.  The result was btrfs_file_write
      looping forever retrying the iov_iter_copy_from_user_atomic
      
      We deal with this by changing btrfs_file_write to drop down to single
      page copies when iov_iter_copy_from_user_atomic starts returning failure.
      
      2) The btrfs_file_write code was leaking delalloc reservations when
      iov_iter_copy_from_user_atomic returned zero.  The looping above would
      result in the entire filesystem running out of delalloc reservations and
      constantly trying to flush things to disk.
      
      3) btrfs_file_write will lock down page cache pages, make sure
      any writeback is finished, do the copy_from_user and then release them.
      Before the loop runs we check the first and last pages in the write to
      see if they are only being partially modified.  If the start or end of
      the write isn't aligned, we make sure the corresponding pages are
      up to date so that we don't introduce garbage into the file.
      
      With the copy_from_user changes, we're allowing the VM to reclaim the
      pages after a partial update from copy_from_user, but we're not
      making sure the page cache page is up to date when we loop around to
      resume the write.
      
      We deal with this by pushing the up to date checks down into the page
      prep code.  This fits better with how the rest of file_write works.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      Reported-by: NMitch Harder <mitch.harder@sabayonlinux.org>
      cc: stable@kernel.org
      b1bf862e
  7. 24 2月, 2011 1 次提交
    • C
      Btrfs: fix fiemap bugs with delalloc · ec29ed5b
      Chris Mason 提交于
      The Btrfs fiemap code wasn't properly returning delalloc extents,
      so applications that trust fiemap to decide if there are holes in the
      file see holes instead of delalloc.
      
      This reworks the btrfs fiemap code, adding a get_extent helper that
      searches for delalloc ranges and also adding a helper for extent_fiemap
      that skips past holes in the file.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ec29ed5b
  8. 17 2月, 2011 6 次提交
  9. 15 2月, 2011 6 次提交
    • T
      Btrfs: check return value of alloc_extent_map() · c26a9203
      Tsutomu Itoh 提交于
      I add the check on the return value of alloc_extent_map() to several places.
      In addition, alloc_extent_map() returns only the address or NULL.
      Therefore, check by IS_ERR() is unnecessary. So, I remove IS_ERR() checking.
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      c26a9203
    • I
      Btrfs - Fix memory leak in btrfs_init_new_device() · 67100f25
      Ilya Dryomov 提交于
      Memory allocated by calling kstrdup() should be freed.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      67100f25
    • D
      btrfs: prevent heap corruption in btrfs_ioctl_space_info() · 51788b1b
      Dan Rosenberg 提交于
      Commit bf5fc093 refactored
      btrfs_ioctl_space_info() and introduced several security issues.
      
      space_args.space_slots is an unsigned 64-bit type controlled by a
      possibly unprivileged caller.  The comparison as a signed int type
      allows providing values that are treated as negative and cause the
      subsequent allocation size calculation to wrap, or be truncated to 0.
      By providing a size that's truncated to 0, kmalloc() will return
      ZERO_SIZE_PTR.  It's also possible to provide a value smaller than the
      slot count.  The subsequent loop ignores the allocation size when
      copying data in, resulting in a heap overflow or write to ZERO_SIZE_PTR.
      
      The fix changes the slot count type and comparison typecast to u64,
      which prevents truncation or signedness errors, and also ensures that we
      don't copy more data than we've allocated in the subsequent loop.  Note
      that zero-size allocations are no longer possible since there is already
      an explicit check for space_args.space_slots being 0 and truncation of
      this value is no longer an issue.
      Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      51788b1b
    • Y
      Btrfs: Fix balance panic · 6848ad64
      Yan, Zheng 提交于
      Mark the cloned backref_node as checked in clone_backref_node()
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6848ad64
    • C
      Btrfs: don't release pages when we can't clear the uptodate bits · e3f24cc5
      Chris Mason 提交于
      Btrfs tracks uptodate state in an rbtree as well as in the
      page bits.  This is supposed to enable us to use block sizes other than
      the page size, but there are a few parts still missing before that
      completely works.
      
      But, our readpage routine trusts this additional range based tracking
      of uptodateness, much in the same way the buffer head up to date bits
      are trusted for the other filesystems.
      
      The problem is that sometimes we need to allocate memory in order to
      split records in the rbtree, even when we are just clearing bits.  This
      can be difficult when our clearing function is called GFP_ATOMIC, which
      can happen in the releasepage path.
      
      So, what happens today looks like this:
      
      releasepage called with GFP_ATOMIC
      btrfs_releasepage calls clear_extent_bit
      clear_extent_bit fails to allocate ram, leaving the up to date bit set
      btrfs_releasepage returns success
      
      The end result is the page being gone, but btrfs thinking the range is
      up to date.   Later on if someone tries to read that same page, the
      btrfs readpage code will return immediately thinking the page is already
      up to date.
      
      This commit fixes things to fail the releasepage when we can't clear the
      extent state bits.  It covers both data pages and metadata tree blocks.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      e3f24cc5
    • C
      Btrfs: fix page->private races · eb14ab8e
      Chris Mason 提交于
      There is a race where btrfs_releasepage can drop the
      page->private contents just as alloc_extent_buffer is setting
      up pages for metadata.  Because of how the Btrfs page flags work,
      this results in us skipping the crc on the page during IO.
      
      This patch sovles the race by waiting until after the extent buffer
      is inserted into the radix tree before it sets page private.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      eb14ab8e
  10. 08 2月, 2011 1 次提交
  11. 06 2月, 2011 4 次提交
  12. 01 2月, 2011 5 次提交
  13. 29 1月, 2011 3 次提交
    • J
      Btrfs: handle no memory properly in prepare_pages · 7adf5dfb
      Josef Bacik 提交于
      Instead of doing a BUG_ON(1) in prepare_pages if grab_cache_page() fails, just
      loop through the pages we've already grabbed and unlock and release them, then
      return -ENOMEM like we should.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7adf5dfb
    • J
      Btrfs: do error checking in btrfs_del_csums · ad0397a7
      Josef Bacik 提交于
      Got a report of a box panicing because we got a NULL eb in read_extent_buffer.
      His fs was borked and btrfs_search_path returned EIO, but we don't check for
      errors so the box paniced.  Yes I know this will just make something higher up
      the stack panic, but that's a problem for future Josef.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ad0397a7
    • J
      Btrfs: use the global block reserve if we cannot reserve space · 68a82277
      Josef Bacik 提交于
      We call use_block_rsv right before we make an allocation in order to make sure
      we have enough space.  Now normally people have called btrfs_start_transaction()
      with the appropriate amount of space that we need, so we just use some of that
      pre-reserved space and move along happily.  The problem is where people use
      btrfs_join_transaction(), which doesn't actually reserve any space.  So we try
      and reserve space here, but we cannot flush delalloc, so this forces us to
      return -ENOSPC when in reality we have plenty of space.  The most common symptom
      is seeing a bunch of "couldn't dirty inode" messages in syslog.  With
      xfstests 224 we end up falling back to start_transaction and then doing all the
      flush delalloc stuff which causes to hang for a very long time.
      
      So instead steal from the global reserve, which is what this is meant for
      anyway.  With this patch and the other 2 I have sent xfstests 224 now passes
      successfully.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      68a82277