1. 21 2月, 2013 3 次提交
    • M
      Btrfs: use percpu counter for dirty metadata count · e2d84521
      Miao Xie 提交于
      ->dirty_metadata_bytes is accessed very frequently, so use percpu
      counter instead of the u64 variant to reduce the contention of
      the lock.
      
      This patch also fixed the problem that we access it without
      lock protection in __btrfs_btree_balance_dirty(), which may
      cause we skip the dirty pages flush.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      e2d84521
    • M
      Btrfs: protect fs_info->alloc_start · c018daec
      Miao Xie 提交于
      fs_info->alloc_start is a 64bits variant, can be accessed by
      multi-task, but it is not protected strictly, it can be changed
      while we are accessing it. On 32bit machine, we will get wrong
      value because we access it by two instructions.(In fact, it is
      also possible that the same problem happens on the 64bit machine,
      because the compiler may split the 64bit operation into two 32bit
      operation.)
      
      For example:
      Assuming -> alloc_start is 0x0000 0000 0001 0000 at the beginning,
      then we remount and set ->alloc_start to 0x0000 0100 0000 0000.
      	Task0 			Task1
      				load high 32 bits
      	set high 32 bits
      	set low 32 bits
      				load low 32 bits
      
      Task1 will get 0.
      
      This patch fixes this problem by using two locks to protect it
      	fs_info->chunk_mutex
      	sb->s_umount
      On the read side, we just need get one of these two locks, and on
      the write side, we must lock all of them.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c018daec
    • M
      Btrfs: add a comment for fs_info->max_inline · 8c6a3ee6
      Miao Xie 提交于
      Though ->max_inline is a 64bit variant, and may be accessed by
      multi-task, but it is just suggestive number, so we needn't add
      anything to protect fs_info->max_inline, just add a comment to
      explain wny we don't use a lock to protect it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      8c6a3ee6
  2. 20 2月, 2013 5 次提交
  3. 18 12月, 2012 1 次提交
    • C
      Btrfs: fix hash overflow handling · 9c52057c
      Chris Mason 提交于
      The handling for directory crc hash overflows was fairly obscure,
      split_leaf returns EOVERFLOW when we try to extend the item and that is
      supposed to bubble up to userland.  For a while it did so, but along the
      way we added better handling of errors and forced the FS readonly if we
      hit IO errors during the directory insertion.
      
      Along the way, we started testing only for EEXIST and the EOVERFLOW case
      was dropped.  The end result is that we may force the FS readonly if we
      catch a directory hash bucket overflow.
      
      This fixes a few problem spots.  First I add tests for EOVERFLOW in the
      places where we can safely just return the error up the chain.
      
      btrfs_rename is harder though, because it tries to insert the new
      directory item only after it has already unlinked anything the rename
      was going to overwrite.  Rather than adding very complex logic, I added
      a helper to test for the hash overflow case early while it is still safe
      to bail out.
      
      Snapshot and subvolume creation had a similar problem, so they are using
      the new helper now too.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      Reported-by: NPascal Junod <pascal@junod.info>
      9c52057c
  4. 17 12月, 2012 7 次提交
  5. 13 12月, 2012 11 次提交
  6. 12 12月, 2012 2 次提交
    • M
      Btrfs: make delalloc inodes be flushed by multi-task · 8ccf6f19
      Miao Xie 提交于
      This patch introduce a new worker pool named "flush_workers", and if we
      want to force all the inode with pending delalloc to the disks, we can
      queue those inodes into the work queue of the worker pool, in this way,
      those inodes will be flushed by multi-task.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      8ccf6f19
    • M
      Btrfs: improve the noflush reservation · 08e007d2
      Miao Xie 提交于
      In some places(such as: evicting inode), we just can not flush the reserved
      space of delalloc, flushing the delayed directory index and delayed inode
      is OK, but we don't try to flush those things and just go back when there is
      no enough space to be reserved. This patch fixes this problem.
      
      We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL.
      If we can in the transaction, we should not flush anything, or the deadlock
      would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc
      would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used,
      and we will flush all things.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      08e007d2
  7. 26 10月, 2012 1 次提交
  8. 24 10月, 2012 1 次提交
  9. 09 10月, 2012 3 次提交
    • S
      Btrfs: make filesystem read-only when submitting barrier fails · 5af3e8cc
      Stefan Behrens 提交于
      So far the return code of barrier_all_devices() is ignored, which
      means that errors are ignored. The result can be a corrupt
      filesystem which is not consistent.
      This commit adds code to evaluate the return code of
      barrier_all_devices(). The normal btrfs_error() mechanism is used to
      switch the filesystem into read-only mode when errors are detected.
      
      In order to decide whether barrier_all_devices() should return
      error or success, the number of disks that are allowed to fail the
      barrier submission is calculated. This calculation accounts for the
      worst RAID level of metadata, system and data. If single, dup or
      RAID0 is in use, a single disk error is already considered to be
      fatal. Otherwise a single disk error is tolerated.
      
      The calculation of the number of disks that are tolerated to fail
      the barrier operation is performed when the filesystem gets mounted,
      when a balance operation is started and finished, and when devices
      are added or removed.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      5af3e8cc
    • M
      btrfs: extended inode refs · f186373f
      Mark Fasheh 提交于
      This patch adds basic support for extended inode refs. This includes support
      for link and unlink of the refs, which basically gets us support for rename
      as well.
      
      Inode creation does not need changing - extended refs are only added after
      the ref array is full.
      Signed-off-by: NMark Fasheh <mfasheh@suse.de>
      f186373f
    • D
      btrfs: move transaction aborts to the point of failure · 005d6427
      David Sterba 提交于
      Call btrfs_abort_transaction as early as possible when an error
      condition is detected, that way the line number reported is useful
      and we're not clueless anymore which error path led to the abort.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      005d6427
  10. 04 10月, 2012 1 次提交
  11. 02 10月, 2012 5 次提交
    • J
      Btrfs: delay block group item insertion · ea658bad
      Josef Bacik 提交于
      So we have lots of places where we try to preallocate chunks in order to
      make sure we have enough space as we make our allocations.  This has
      historically meant that we're constantly tweaking when we should allocate a
      new chunk, and historically we have gotten this horribly wrong so we way
      over allocate either metadata or data.  To try and keep this from happening
      we are going to make it so that the block group item insertion is done out
      of band at the end of a transaction.  This will allow us to create chunks
      even if we are trying to make an allocation for the extent tree.  With this
      patch my enospc tests run faster (didn't expect this) and more efficiently
      use the disk space (this is what I wanted).  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      ea658bad
    • L
      Btrfs: cleanup for unused ref cache stuff · 0647d6bd
      liubo 提交于
      As ref cache has been removed from btrfs, there is no user on
      its lock and its check.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      0647d6bd
    • M
      Btrfs: fix unprotected ->log_batch · 2ecb7923
      Miao Xie 提交于
      We forget to protect ->log_batch when syncing a file, this patch fix
      this problem by atomic operation. And ->log_batch is used to check
      if there are parallel sync operations or not, so it is unnecessary to
      reset it to 0 after the sync operation of the current log tree complete.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      2ecb7923
    • M
      Btrfs: add a new "type" field into the block reservation structure · 66d8f3dd
      Miao Xie 提交于
      Sometimes we need choose the method of the reservation according to the type
      of the block reservation, such as the reservation for the delayed inode update.
      Now we identify the type just by comparing the address of the reservation
      variants, it is very ugly if it is a temporary one because we need compare it
      with all the common reservation variants. So we add a new "type" field to keep
      the type the reservation variants.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      66d8f3dd
    • J
      Btrfs: btrfs_drop_extent_cache should never fail · 7014cdb4
      Josef Bacik 提交于
      I noticed this when I was doing the fsync stuff, we allocate split extents if we
      drop an extent range that is in the middle of an existing extent.  This BUG()'s
      if we fail to allocate memory, but the fact is this is just a cache, we will
      just regenerate the cache if we need it, the important part is that we free the
      range we are given.  This can be done without allocations, so if we fail to
      allocate splits just skip the splitting stage and free our em and look for more
      extents to drop.  This also makes btrfs_drop_extent_cache a void since nobody
      was checking the return value anyway.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      7014cdb4