1. 07 5月, 2013 1 次提交
    • D
      btrfs: clean up transaction abort messages · 08748810
      David Sterba 提交于
      The transaction abort stacktrace is printed only once per module
      lifetime, but we'd like to see it each time it happens per mounted
      filesystem.  Introduce a fs_state flag that records it.
      
      Tweak the messages around abort:
      * add error number to the first abort
      * print the exact negative errno from btrfs_decode_error
      * clean up btrfs_decode_error and callers
      * no dots at the end of the messages
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      08748810
  2. 01 3月, 2013 1 次提交
  3. 21 2月, 2013 13 次提交
    • M
      Btrfs: fix remount vs autodefrag · dc81cdc5
      Miao Xie 提交于
      If we remount the fs to close the auto defragment or make the fs R/O,
      we should stop the auto defragment.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      dc81cdc5
    • Z
      btrfs: define BTRFS_MAGIC as a u64 value · cdb4c574
      Zach Brown 提交于
      super.magic is an le64 but it's treated as an unterminated string when
      compared against BTRFS_MAGIC which is defined as a string.  Instead
      define BTRFS_MAGIC as a normal hex value and use endian helpers to
      compare it to the super's magic.
      
      I tested this by mounting an fs made before the change and made sure
      that it didn't introduce sparse errors.  This matches a similar cleanup
      that is pending in btrfs-progs.  David Sterba pointed out that we should
      fix the kernel side as well :).
      Signed-off-by: NZach Brown <zab@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      cdb4c574
    • J
      Btrfs: place ordered operations on a per transaction list · 569e0f35
      Josef Bacik 提交于
      Miao made the ordered operations stuff run async, which introduced a
      deadlock where we could get somebody (sync) racing in and committing the
      transaction while a commit was already happening.  The new committer would
      try and flush ordered operations which would hang waiting for the commit to
      finish because it is done asynchronously and no longer inherits the callers
      trans handle.  To fix this we need to make the ordered operations list a per
      transaction list.  We can get new inodes added to the ordered operation list
      by truncating them and then having another process writing to them, so this
      makes it so that anybody trying to add an ordered operation _must_ start a
      transaction in order to add itself to the list, which will keep new inodes
      from getting added to the ordered operations list after we start committing.
      This should fix the deadlock and also keeps us from doing a lot more work
      than we need to during commit.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      569e0f35
    • D
      btrfs: add cancellation points to defrag · 210549eb
      David Sterba 提交于
      The defrag operation can take very long, we want to have a way how to
      cancel it. The code checks for a pending signal at safe points in the
      defrag loops and returns EAGAIN. This means a user can press ^C after
      running 'btrfs fi defrag', woks for both defrag modes, files and root.
      
      Returning from the command was instant in my light tests, but may take
      longer depending on the aging factor of the filesystem.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      210549eb
    • J
      Btrfs: steal from global reserve if we are cleaning up orphans · 5d80366e
      Josef Bacik 提交于
      Sometimes xfstest 83 will fail to remount the scratch device because we've
      gotten ourselves so full that we cannot cleanup the orphan items.  In this
      case check to see if we're doing the orphan cleanup and if we are allow us
      to steal our reservation from the global block rsv.  With this patch I've
      not been able to reproduce the failed mount problem.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      5d80366e
    • E
      btrfs: remove cache only arguments from defrag path · de78b51a
      Eric Sandeen 提交于
      The entry point at the defrag ioctl always sets "cache only" to 0;
      the codepaths haven't run for a long time as far as I can
      tell.  Chris says they're dead code, so remove them.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      de78b51a
    • E
      btrfs: handle null fs_info in btrfs_panic() · aa43a17c
      Eric Sandeen 提交于
      At least backref_tree_panic() can apparently pass
      in a null fs_info, so handle that in __btrfs_panic
      to get the message out on the console.
      
      The btrfs_panic macro also uses fs_info, but that's
      largely pointless; it's testing to see if
      BTRFS_MOUNT_PANIC_ON_FATAL_ERROR is not set.
      But if it *were* set, __btrfs_panic() would have,
      well, paniced and we wouldn't be here, testing it!
      So just BUG() at this point.
      
      And since we only use fs_info once now, just use it
      directly.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      aa43a17c
    • M
      Btrfs: use bit operation for ->fs_state · 87533c47
      Miao Xie 提交于
      There is no lock to protect fs_info->fs_state, it will introduce
      some problems, such as the value may be covered by the other task
      when several tasks modify it. For example:
      	Task0 - CPU0		Task1 - CPU1
      	mov %fs_state rax
      	or $0x1 rax
      				mov %fs_state rax
      				or $0x2 rax
      	mov rax %fs_state
      				mov rax %fs_state
      The expected value is 3, but in fact, it is 2.
      
      Though this problem doesn't happen now (because there is only one
      flag currently), the code is error prone, if we add other flags,
      the above problem will happen to a certainty.
      
      Now we use bit operation for it to fix the above problem.
      In this way, we can make the code more robust and be easy to
      add new flags.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      87533c47
    • M
      Btrfs: use seqlock to protect fs_info->avail_{data, metadata, system}_alloc_bits · de98ced9
      Miao Xie 提交于
      There is no lock to protect
        fs_info->avail_{data, metadata, system}_alloc_bits,
      it may introduce some problem, such as the wrong profile
      information, so we add a seqlock to protect them.
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      de98ced9
    • M
      Btrfs: use percpu counter for fs_info->delalloc_bytes · 963d678b
      Miao Xie 提交于
      fs_info->delalloc_bytes is accessed very frequently, so use percpu
      counter instead of the u64 variant for it to reduce the lock
      contention.
      
      This patch also fixed the problem that we access the variant
      without the lock protection.At worst, we would not flush the
      delalloc inodes, and just return ENOSPC error when we still have
      some free space in the fs.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      963d678b
    • M
      Btrfs: use percpu counter for dirty metadata count · e2d84521
      Miao Xie 提交于
      ->dirty_metadata_bytes is accessed very frequently, so use percpu
      counter instead of the u64 variant to reduce the contention of
      the lock.
      
      This patch also fixed the problem that we access it without
      lock protection in __btrfs_btree_balance_dirty(), which may
      cause we skip the dirty pages flush.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      e2d84521
    • M
      Btrfs: protect fs_info->alloc_start · c018daec
      Miao Xie 提交于
      fs_info->alloc_start is a 64bits variant, can be accessed by
      multi-task, but it is not protected strictly, it can be changed
      while we are accessing it. On 32bit machine, we will get wrong
      value because we access it by two instructions.(In fact, it is
      also possible that the same problem happens on the 64bit machine,
      because the compiler may split the 64bit operation into two 32bit
      operation.)
      
      For example:
      Assuming -> alloc_start is 0x0000 0000 0001 0000 at the beginning,
      then we remount and set ->alloc_start to 0x0000 0100 0000 0000.
      	Task0 			Task1
      				load high 32 bits
      	set high 32 bits
      	set low 32 bits
      				load low 32 bits
      
      Task1 will get 0.
      
      This patch fixes this problem by using two locks to protect it
      	fs_info->chunk_mutex
      	sb->s_umount
      On the read side, we just need get one of these two locks, and on
      the write side, we must lock all of them.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c018daec
    • M
      Btrfs: add a comment for fs_info->max_inline · 8c6a3ee6
      Miao Xie 提交于
      Though ->max_inline is a 64bit variant, and may be accessed by
      multi-task, but it is just suggestive number, so we needn't add
      anything to protect fs_info->max_inline, just add a comment to
      explain wny we don't use a lock to protect it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      8c6a3ee6
  4. 20 2月, 2013 5 次提交
  5. 02 2月, 2013 3 次提交
    • C
      Btrfs: Add a stripe cache to raid56 · 4ae10b3a
      Chris Mason 提交于
      The stripe cache allows us to avoid extra read/modify/write cycles
      by caching the pages we read off the disk.  Pages are cached when:
      
      * They are read in during a read/modify/write cycle
      
      * They are written during a read/modify/write cycle
      
      * They are involved in a parity rebuild
      
      Pages are not cached if we're doing a full stripe write.  We're
      assuming that a full stripe write won't be followed by another
      partial stripe write any time soon.
      
      This provides a substantial boost in performance for workloads that
      synchronously modify adjacent offsets in the file, and for the parity
      rebuild use case in general.
      
      The size of the stripe cache isn't tunable (yet) and is set at 1024
      entries.
      
      Example on flash: dd if=/dev/zero of=/mnt/xxx bs=4K oflag=direct
      
      Without the stripe cache  -- 2.1MB/s
      With the stripe cache 21MB/s
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      4ae10b3a
    • D
      Btrfs: RAID5 and RAID6 · 53b381b3
      David Woodhouse 提交于
      This builds on David Woodhouse's original Btrfs raid5/6 implementation.
      The code has changed quite a bit, blame Chris Mason for any bugs.
      
      Read/modify/write is done after the higher levels of the filesystem have
      prepared a given bio.  This means the higher layers are not responsible
      for building full stripes, and they don't need to query for the topology
      of the extents that may get allocated during delayed allocation runs.
      It also means different files can easily share the same stripe.
      
      But, it does expose us to incorrect parity if we crash or lose power
      while doing a read/modify/write cycle.  This will be addressed in a
      later commit.
      
      Scrub is unable to repair crc errors on raid5/6 chunks.
      
      Discard does not work on raid5/6 (yet)
      
      The stripe size is fixed at 64KiB per disk.  This will be tunable
      in a later commit.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      53b381b3
    • D
      Btrfs: add rw argument to merge_bio_hook() · 64a16701
      David Woodhouse 提交于
      We'll want to merge writes so they can fill a full RAID[56] stripe, but
      not necessarily reads.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      64a16701
  6. 18 12月, 2012 1 次提交
    • C
      Btrfs: fix hash overflow handling · 9c52057c
      Chris Mason 提交于
      The handling for directory crc hash overflows was fairly obscure,
      split_leaf returns EOVERFLOW when we try to extend the item and that is
      supposed to bubble up to userland.  For a while it did so, but along the
      way we added better handling of errors and forced the FS readonly if we
      hit IO errors during the directory insertion.
      
      Along the way, we started testing only for EEXIST and the EOVERFLOW case
      was dropped.  The end result is that we may force the FS readonly if we
      catch a directory hash bucket overflow.
      
      This fixes a few problem spots.  First I add tests for EOVERFLOW in the
      places where we can safely just return the error up the chain.
      
      btrfs_rename is harder though, because it tries to insert the new
      directory item only after it has already unlinked anything the rename
      was going to overwrite.  Rather than adding very complex logic, I added
      a helper to test for the hash overflow case early while it is still safe
      to bail out.
      
      Snapshot and subvolume creation had a similar problem, so they are using
      the new helper now too.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      Reported-by: NPascal Junod <pascal@junod.info>
      9c52057c
  7. 17 12月, 2012 7 次提交
  8. 13 12月, 2012 9 次提交