1. 05 6月, 2009 1 次提交
    • C
      Btrfs: Fix oops and use after free during space balancing · 44fb5511
      Chris Mason 提交于
      The btrfs allocator uses list_for_each to walk the available block
      groups when searching for free blocks.  It starts off with a hint
      to help find the best block group for a given allocation.
      
      The hint is resolved into a block group, but we don't properly check
      to make sure the block group we find isn't in the middle of being
      freed due to filesystem shrinking or balancing.  If it is being
      freed, the list pointers in it are bogus and can't be trusted.  But,
      the code happily goes along and uses them in the list_for_each loop,
      leading to all kinds of fun.
      
      The fix used here is to check to make sure the block group we find really
      is on the list before we use it.  list_del_init is used when removing
      it from the list, so we can do a proper check.
      
      The allocation clustering code has a similar bug where it will trust
      the block group in the current free space cluster.  If our allocation
      flags have changed (going from single spindle dup to raid1 for example)
      because the drives in the FS have changed, we're not allowed to use
      the old block group any more.
      
      The fix used here is to check the current cluster against the
      current allocation flags.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      44fb5511
  2. 04 6月, 2009 1 次提交
  3. 15 5月, 2009 6 次提交
  4. 28 4月, 2009 2 次提交
    • C
      Btrfs: look for acls during btrfs_read_locked_inode · 46a53cca
      Chris Mason 提交于
      This changes btrfs_read_locked_inode() to peek ahead in the btree for acl items.
      If it is certain a given inode has no acls, it will set the in memory acl
      fields to null to avoid acl lookups completely.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      46a53cca
    • C
      Btrfs: fix acl caching · 7b1a14bb
      Chris Mason 提交于
      Linus noticed the btrfs code to cache acls wasn't properly caching
      a NULL acl when the inode didn't have any acls.  This meant the common
      case of no acls resulted in expensive btree searches every time the
      kernel checked permissions (which is quite often).
      
      This is a modified version of Linus' original patch:
      
      Properly set initial acl fields to BTRFS_ACL_NOT_CACHED in the inode.
      This forces an acl lookup when permission checks are done.
      
      Fix btrfs_get_acl to avoid lookups and locking when the inode acls fields
      are set to null.
      
      Fix btrfs_get_acl to use the right return value from __btrfs_getxattr
      when deciding to cache a NULL acl.  It was storing a NULL acl when
      __btrfs_getxattr return -ENOENT, but __btrfs_getxattr was actually returning
      -ENODATA for this case.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7b1a14bb
  5. 27 4月, 2009 6 次提交
  6. 25 4月, 2009 6 次提交
  7. 22 4月, 2009 1 次提交
    • C
      Btrfs: fix btrfs fallocate oops and deadlock · 546888da
      Chris Mason 提交于
      Btrfs fallocate was incorrectly starting a transaction with a lock held
      on the extent_io tree for the file, which could deadlock.  Strictly
      speaking it was using join_transaction which would be safe, but it is better
      to move the transaction outside of the lock.
      
      When preallocated extents are overwritten, btrfs_mark_buffer_dirty was
      being called on an unlocked buffer.  This was triggering an assertion and
      oops because the lock is supposed to be held.
      
      The bug was calling btrfs_mark_buffer_dirty on a leaf after btrfs_del_item had
      been run.  btrfs_del_item takes care of dirtying things, so the solution is a
      to skip the btrfs_mark_buffer_dirty call in this case.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      546888da
  8. 21 4月, 2009 4 次提交
    • C
      Btrfs: use the right node in reada_for_balance · 8c594ea8
      Chris Mason 提交于
      reada_for_balance was using the wrong index into the path node array,
      so it wasn't reading the right blocks.  We never directly used the
      results of the read done by this function because the btree search is
      started over at the end.
      
      This fixes reada_for_balance to reada in the correct node and to
      avoid searching past the last slot in the node.  It also makes sure to
      hold the parent lock while we are finding the nodes to read.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      8c594ea8
    • C
      Btrfs: fix oops on page->mapping->host during writepage · 11c8349b
      Chris Mason 提交于
      The extent_io writepage call updates the writepage index in the inode
      as it makes progress.  But, it was doing the update after unlocking the page,
      which isn't legal because page->mapping can't be trusted once the page
      is unlocked.
      
      This lead to an oops, especially common with compression turned on.  The
      fix here is to update the writeback index before unlocking the page.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      11c8349b
    • C
      Btrfs: add a priority queue to the async thread helpers · d313d7a3
      Chris Mason 提交于
      Btrfs is using WRITE_SYNC_PLUG to send down synchronous IOs with a
      higher priority.  But, the checksumming helper threads prevent it
      from being fully effective.
      
      There are two problems.  First, a big queue of pending checksumming
      will delay the synchronous IO behind other lower priority writes.  Second,
      the checksumming uses an ordered async work queue.  The ordering makes sure
      that IOs are sent to the block layer in the same order they are sent
      to the checksumming threads.  Usually this gives us less seeky IO.
      
      But, when we start mixing IO priorities, the lower priority IO can delay
      the higher priority IO.
      
      This patch solves both problems by adding a high priority list to the async
      helper threads, and a new btrfs_set_work_high_prio(), which is used
      to make put a new async work item onto the higher priority list.
      
      The ordering is still done on high priority IO, but all of the high
      priority bios are ordered separately from the low priority bios.  This
      ordering is purely an IO optimization, it is not involved in data
      or metadata integrity.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      d313d7a3
    • C
      Btrfs: use WRITE_SYNC for synchronous writes · ffbd517d
      Chris Mason 提交于
      Part of reducing fsync/O_SYNC/O_DIRECT latencies is using WRITE_SYNC for
      writes we plan on waiting on in the near future.  This patch
      mirrors recent changes in other filesystems and the generic code to
      use WRITE_SYNC when WB_SYNC_ALL is passed and to use WRITE_SYNC for
      other latency critical writes.
      
      Btrfs uses async worker threads for checksumming before the write is done,
      and then again to actually submit the bios.  The bio submission code just
      runs a per-device list of bios that need to be sent down the pipe.
      
      This list is split into low priority and high priority lists so the
      WRITE_SYNC IO happens first.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ffbd517d
  9. 15 4月, 2009 10 次提交
  10. 14 4月, 2009 3 次提交