1. 21 2月, 2013 5 次提交
    • M
      Btrfs: use the inode own lock to protect its delalloc_bytes · df0af1a5
      Miao Xie 提交于
      We need not use a global lock to protect the delalloc_bytes of the
      inode, just use its own lock. In this way, we can reduce the lock
      contention and ->delalloc_lock will just protect delalloc inode
      list.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      df0af1a5
    • M
      Btrfs: use percpu counter for fs_info->delalloc_bytes · 963d678b
      Miao Xie 提交于
      fs_info->delalloc_bytes is accessed very frequently, so use percpu
      counter instead of the u64 variant for it to reduce the lock
      contention.
      
      This patch also fixed the problem that we access the variant
      without the lock protection.At worst, we would not flush the
      delalloc inodes, and just return ENOSPC error when we still have
      some free space in the fs.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      963d678b
    • M
      Btrfs: use percpu counter for dirty metadata count · e2d84521
      Miao Xie 提交于
      ->dirty_metadata_bytes is accessed very frequently, so use percpu
      counter instead of the u64 variant to reduce the contention of
      the lock.
      
      This patch also fixed the problem that we access it without
      lock protection in __btrfs_btree_balance_dirty(), which may
      cause we skip the dirty pages flush.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      e2d84521
    • M
      Btrfs: protect fs_info->alloc_start · c018daec
      Miao Xie 提交于
      fs_info->alloc_start is a 64bits variant, can be accessed by
      multi-task, but it is not protected strictly, it can be changed
      while we are accessing it. On 32bit machine, we will get wrong
      value because we access it by two instructions.(In fact, it is
      also possible that the same problem happens on the 64bit machine,
      because the compiler may split the 64bit operation into two 32bit
      operation.)
      
      For example:
      Assuming -> alloc_start is 0x0000 0000 0001 0000 at the beginning,
      then we remount and set ->alloc_start to 0x0000 0100 0000 0000.
      	Task0 			Task1
      				load high 32 bits
      	set high 32 bits
      	set low 32 bits
      				load low 32 bits
      
      Task1 will get 0.
      
      This patch fixes this problem by using two locks to protect it
      	fs_info->chunk_mutex
      	sb->s_umount
      On the read side, we just need get one of these two locks, and on
      the write side, we must lock all of them.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c018daec
    • M
      Btrfs: add a comment for fs_info->max_inline · 8c6a3ee6
      Miao Xie 提交于
      Though ->max_inline is a 64bit variant, and may be accessed by
      multi-task, but it is just suggestive number, so we needn't add
      anything to protect fs_info->max_inline, just add a comment to
      explain wny we don't use a lock to protect it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      8c6a3ee6
  2. 20 2月, 2013 26 次提交
  3. 16 2月, 2013 1 次提交
    • D
      btrfs: access superblock via pagecache in scan_one_device · 6f60cbd3
      David Sterba 提交于
      btrfs_scan_one_device is calling set_blocksize() which can race
      with a concurrent process making dirty page cache pages.  It can end up
      dropping dirty page cache pages on the floor, which isn't very nice when
      someone is just running btrfs dev scan to find filesystems on the
      box.
      
      Now that udev is registering btrfs devices as it discovers them, we can
      actually end up racing with our own mkfs program too.  When this
      happens, we drop some of the important blocks written by mkfs.
      
      This commit changes scan_one_device to read the super out of the page
      cache instead of trying to use bread.  This way we don't have to care
      about the blocksize of the device.
      
      This also drops the invalidate_bdev() call.  It wasn't very polite to
      invalidate during the scan either.  mkfs is putting the super into the
      page cache, there's no reason to invalidate at this point.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      6f60cbd3
  4. 15 2月, 2013 1 次提交
  5. 07 2月, 2013 1 次提交
  6. 06 2月, 2013 6 次提交
    • J
      Btrfs: fix EDQUOT handling in btrfs_delalloc_reserve_metadata · eb6b88d9
      Jan Schmidt 提交于
      When btrfs_qgroup_reserve returned a failure, we were missing a counter
      operation for BTRFS_I(inode)->outstanding_extents++, leading to warning
      messages about outstanding extents and space_info->bytes_may_use != 0.
      Additionally, the error handling code didn't take into account that we
      dropped the inode lock which might require more cleanup.
      
      Luckily, all the cleanup code we need is already there and can be shared
      with reserve_metadata_bytes, which is exactly what this patch does.
      Reported-by: NLev Vainblat <lev@zadarastorage.com>
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      eb6b88d9
    • C
    • J
      Btrfs: fix possible stale data exposure · 59fe4f41
      Josef Bacik 提交于
      We specifically do not update the disk i_size if there are ordered extents
      outstanding for any area between the current disk_i_size and our ordered
      extent so that we do not expose stale data.  The problem is the check we
      have only checks if the ordered extent starts at or after the current
      disk_i_size, which doesn't take into account an ordered extent that starts
      before the current disk_i_size and ends past the disk_i_size.  Fix this by
      checking if the extent ends past the disk_i_size.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      59fe4f41
    • J
      Btrfs: fix missing i_size update · 5d1f4020
      Josef Bacik 提交于
      If we have an ordered extent before the ordered extent we are currently
      completing that is after the current disk_i_size we will put our i_size
      update into that ordered extent so that we do not expose stale data.  The
      problem is that if our disk i_size is updated past the previous ordered
      extent we won't update the i_size with the pending i_size update.  So check
      the pending i_size update and if its above the current disk i_size we need
      to go ahead and try to update.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      5d1f4020
    • L
      Btrfs: fix race between snapshot deletion and getting inode · 6f1c3605
      Liu Bo 提交于
      While running snapshot testscript created by Mitch and David,
      the race between autodefrag and snapshot deletion can lead to
      corruption of dead_root list so that we can get crash on
      btrfs_clean_old_snapshots().
      
      And besides autodefrag, scrub also does the same thing, ie. read
      root first and get inode.
      
      Here is the story(take autodefrag as an example):
      (1) when we delete a snapshot or subvolume, it will set its root's
      refs to zero and do a iput() on its own inode, and if this inode happens
      to be the only active in-meory one in root's inode rbtree, it will add
      itself to the global dead_roots list for later cleanup.
      
      (2) after (1), the autodefrag thread may read another inode for defrag
      and the inode is just in the deleted snapshot/subvolume, but all of these
      are without checking if the root is still valid(refs > 0).  So the end up
      result is adding the deleted snapshot/subvolume's root to the global
      dead_roots list AGAIN.
      
      Fortunately, we already have a srcu lock to avoid the race, ie. subvol_srcu.
      
      So all we need to do is to take the lock to protect 'read root and get inode',
      since we synchronize to wait for the rcu grace period before adding something
      to the global dead_roots list.
      Reported-by: NMitch Harder <mitch.harder@sabayonlinux.org>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      6f1c3605
    • M
      Btrfs: fix missing release of the space/qgroup reservation in start_transaction() · 843fcf35
      Miao Xie 提交于
      When we fail to start a transaction, we need to release the reserved free space
      and qgroup space, fix it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      843fcf35