1. 01 9月, 2013 8 次提交
  2. 02 7月, 2013 2 次提交
    • J
      Btrfs: make the chunk allocator completely tree lockless · 6df9a95e
      Josef Bacik 提交于
      When adjusting the enospc rules for relocation I ran into a deadlock because we
      were relocating the only system chunk and that forced us to try and allocate a
      new system chunk while holding locks in the chunk tree, which caused us to
      deadlock.  To fix this I've moved all of the dev extent addition and chunk
      addition out to the delayed chunk completion stuff.  We still keep the in-memory
      stuff which makes sure everything is consistent.
      
      One change I had to make was to search the commit root of the device tree to
      find a free dev extent, and hold onto any chunk em's that we allocated in that
      transaction so we do not allocate the same dev extent twice.  This has the side
      effect of fixing a bug with balance that has been there ever since balance
      existed.  Basically you can free a block group and it's dev extent and then
      immediately allocate that dev extent for a new block group and write stuff to
      that dev extent, all within the same transaction.  So if you happen to crash
      during a balance you could come back to a completely broken file system.  This
      patch should keep these sort of things from happening in the future since we
      won't be able to allocate free'd dev extents until after the transaction
      commits.  This has passed all of the xfstests and my super annoying stress test
      followed by a balance.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      6df9a95e
    • M
      Btrfs: fix wrong mirror number tuning · a70c6172
      Miao Xie 提交于
      Now reading the data from the target device of the replace operation is allowed,
      so the mirror number that is greater than the stripes number of a chunk is valid,
      we will tune it when we find there is no target device later. Fix it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      a70c6172
  3. 14 6月, 2013 3 次提交
  4. 18 5月, 2013 2 次提交
    • C
      Btrfs: use a btrfs bioset instead of abusing bio internals · 9be3395b
      Chris Mason 提交于
      Btrfs has been pointer tagging bi_private and using bi_bdev
      to store the stripe index and mirror number of failed IOs.
      
      As bios bubble back up through the call chain, we use these
      to decide if and how to retry our IOs.  They are also used
      to count IO failures on a per device basis.
      
      Recently a bio tracepoint was added lead to crashes because
      we were abusing bi_bdev.
      
      This commit adds a btrfs bioset, and creates explicit fields
      for the mirror number and stripe index.  The plan is to
      extend this structure for all of the fields currently in
      struct btrfs_bio, which will mean one less kmalloc in
      our IO path.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      Reported-by: NTejun Heo <tj@kernel.org>
      9be3395b
    • A
      Correct allowed raid levels on balance. · 8250dabe
      Andreas Philipp 提交于
      Raid5 with 3 devices is well defined while the old logic allowed
      raid5 only with a minimum of 4 devices when converting the block group
      profile via btrfs balance. Creating a raid5 with just three devices
      using mkfs.btrfs worked always as expected. This is now fixed and the
      whole logic is rewritten.
      Signed-off-by: NAndreas Philipp <philipp.andreas@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      8250dabe
  5. 07 5月, 2013 7 次提交
    • E
      btrfs: make static code static & remove dead code · 48a3b636
      Eric Sandeen 提交于
      Big patch, but all it does is add statics to functions which
      are in fact static, then remove the associated dead-code fallout.
      
      removed functions:
      
      btrfs_iref_to_path()
      __btrfs_lookup_delayed_deletion_item()
      __btrfs_search_delayed_insertion_item()
      __btrfs_search_delayed_deletion_item()
      find_eb_for_page()
      btrfs_find_block_group()
      range_straddles_pages()
      extent_range_uptodate()
      btrfs_file_extent_length()
      btrfs_scrub_cancel_devid()
      btrfs_start_transaction_lflush()
      
      btrfs_print_tree() is left because it is used for debugging.
      btrfs_start_transaction_lflush() and btrfs_reada_detach() are
      left for symmetry.
      
      ulist.c functions are left, another patch will take care of those.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      48a3b636
    • J
      Btrfs: don't BUG_ON() in btrfs_num_copies · fb7669b5
      Josef Bacik 提交于
      A user sent me a btrfs-image that was panicing because of some corruption.  This
      is because we pass in a bogus value to btrfs_num_copies, and it panics.  Instead
      just return 1.  We only call btrfs_num_copies to see if there are other copies
      to try and read for things, so if we just return 1 it will make the callers exit
      out with an appropriate error value.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fb7669b5
    • J
      Btrfs: deal with bad mappings in btrfs_map_block · 9bb91873
      Josef Bacik 提交于
      Martin Steigerwald reported a BUG_ON() in btrfs_map_block where we didn't find
      a chunk for a particular block we were trying to map.  This happened because the
      block was bogus.  We shouldn't be BUG_ON()'ing in this case, just print a
      message and return an error.  This came from reada_add_block and it appears to
      deal with an error fine so we should be good there.  Thanks,
      Reported-by: NMartin Steigerwald <Martin@lichtvoll.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      9bb91873
    • M
      Btrfs: use a lock to protect incompat/compat flag of the super block · ceda0864
      Miao Xie 提交于
      The following case will make the incompat/compat flag of the super block
      be recovered.
       Task1					|Task2
       flags = btrfs_super_incompat_flags();	|
      					|flags = btrfs_super_incompat_flags();
       flags |= new_flag1;			|
      					|flags |= new_flag2;
       btrfs_set_super_incompat_flags(flags);	|
      					|btrfs_set_super_incompat_flags(flags);
      the new_flag1 is recovered.
      
      In order to avoid this problem, we introduce a lock named super_lock into
      the btrfs_fs_info structure. If we want to update incompat/compat flags
      of the super block, we must hold it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      ceda0864
    • E
      btrfs: ignore device open failures in __btrfs_open_devices · f63e0cca
      Eric Sandeen 提交于
      This:
      
         # mkfs.btrfs /dev/sdb{1,2} ; wipefs -a /dev/sdb1; mount /dev/sdb2 /mnt/test
      
      would lead to a blkdev open/close mismatch when the mount fails, and
      a permanently busy (opened O_EXCL) sdb2:
      
         # wipefs -a /dev/sdb2
         wipefs: error: /dev/sdb2: probing initialization failed: Device or resource busy
      
      It's because btrfs_open_devices() may open some devices, fail on
      the last one, and return that failure stored in "ret."   The mount
      then fails, but the caller then does not clean up the open devices.
      
      Chris assures me that:
      
      "btrfs_open_devices just means: go off and open every bdev you can from
      this uuid.  It should return success if we opened any of them at all."
      
      So change the logic to ignore any open failures; just skip processing
      of that device.  Later on it's decided whether we have enough devices
      to continue.
      Reported-by: NJan Safranek <jsafrane@redhat.com>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      f63e0cca
    • J
      Btrfs: fix bad extent logging · 09a2a8f9
      Josef Bacik 提交于
      A user sent me a btrfs-image of a file system that was panicing on mount during
      the log recovery.  I had originally thought these problems were from a bug in
      the free space cache code, but that was just a symptom of the problem.  The
      problem is if your application does something like this
      
      [prealloc][prealloc][prealloc]
      
      the internal extent maps will merge those all together into one extent map, even
      though on disk they are 3 separate extents.  So if you go to write into one of
      these ranges the extent map will be right since we use the physical extent when
      doing the write, but when we log the extents they will use the wrong sizes for
      the remainder prealloc space.  If this doesn't happen to trip up the free space
      cache (which it won't in a lot of cases) then you will get bogus entries in your
      extent tree which will screw stuff up later.  The data and such will still work,
      but everything else is broken.  This patch fixes this by not allowing extents
      that are on the modified list to be merged.  This has the side effect that we
      are no longer adding everything to the modified list all the time, which means
      we now have to call btrfs_drop_extents every time we log an extent into the
      tree.  So this allows me to drop all this speciality code I was using to get
      around calling btrfs_drop_extents.  With this patch the testcase I've created no
      longer creates a bogus file system after replaying the log.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      09a2a8f9
    • S
      Btrfs: Include the device in most error printk()s · c2cf52eb
      Simon Kirby 提交于
      With more than one btrfs volume mounted, it can be very difficult to find
      out which volume is hitting an error. btrfs_error() will print this, but
      it is currently rigged as more of a fatal error handler, while many of
      the printk()s are currently for debugging and yet-unhandled cases.
      
      This patch just changes the functions where the device information is
      already available. Some cases remain where the root or fs_info is not
      passed to the function emitting the error.
      
      This may introduce some confusion with volumes backed by multiple devices
      emitting errors referring to the primary device in the set instead of the
      one on which the error occurred.
      
      Use btrfs_printk(fs_info, format, ...) rather than writing the device
      string every time, and introduce macro wrappers ala XFS for brevity.
      Since the function already cannot be used for continuations, print a
      newline as part of the btrfs_printk() message rather than at each caller.
      Signed-off-by: NSimon Kirby <sim@hostway.ca>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c2cf52eb
  6. 24 3月, 2013 1 次提交
    • K
      block: Use bio_sectors() more consistently · aa8b57aa
      Kent Overstreet 提交于
      Bunch of places in the code weren't using it where they could be -
      this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx
      into a struct bvec_iter.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: "Ed L. Cashin" <ecashin@coraid.com>
      CC: Nick Piggin <npiggin@kernel.dk>
      CC: Jiri Kosina <jkosina@suse.cz>
      CC: Jim Paris <jim@jtan.com>
      CC: Geoff Levand <geoff@infradead.org>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: dm-devel@redhat.com
      CC: Neil Brown <neilb@suse.de>
      CC: Steven Rostedt <rostedt@goodmis.org>
      Acked-by: NEd Cashin <ecashin@coraid.com>
      aa8b57aa
  7. 22 3月, 2013 1 次提交
    • J
      Btrfs: handle a bogus chunk tree nicely · 835d974f
      Josef Bacik 提交于
      If you restore a btrfs-image file system and try to mount that file system we'll
      panic.  That's because btrfs-image restores and just makes one big chunk to
      envelope the whole disk, since they are really only meant to be messed with by
      our btrfs-progs.  So fix up btrfs_rmap_block and the callers of it for mount so
      that we no longer panic but instead just return an error and fail to mount.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      835d974f
  8. 15 3月, 2013 1 次提交
    • E
      btrfs: use rcu_barrier() to wait for bdev puts at unmount · bc178622
      Eric Sandeen 提交于
      Doing this would reliably fail with -EBUSY for me:
      
      # mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2
      ...
      unable to open /dev/sdb2: Device or resource busy
      
      because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it.
      
      Using systemtap to track bdev gets & puts shows a kworker thread doing a
      blkdev put after mkfs attempts a get; this is left over from the unmount
      path:
      
      btrfs_close_devices
      	__btrfs_close_devices
      		call_rcu(&device->rcu, free_device);
      			free_device
      				INIT_WORK(&device->rcu_work, __free_device);
      				schedule_work(&device->rcu_work);
      
      so unmount might complete before __free_device fires & does its blkdev_put.
      
      Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait
      until all blkdev_put()s are done, and the device is truly free once
      unmount completes.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      bc178622
  9. 07 3月, 2013 1 次提交
  10. 05 3月, 2013 1 次提交
  11. 01 3月, 2013 1 次提交
  12. 27 2月, 2013 1 次提交
    • Q
      btrfs: cleanup for open-coded alignment · fda2832f
      Qu Wenruo 提交于
      Though most of the btrfs codes are using ALIGN macro for page alignment,
      there are still some codes using open-coded alignment like the
      following:
      ------
              u64 mask = ((u64)root->stripesize - 1);
              u64 ret = (val + mask) & ~mask;
      ------
      Or even hidden one:
      ------
              num_bytes = (end - start + blocksize) & ~(blocksize - 1);
      ------
      
      Sometimes these open-coded alignment is not so easy to understand for
      newbie like me.
      
      This commit changes the open-coded alignment to the ALIGN macro for a
      better readability.
      
      Also there is a previous patch from David Sterba with similar changes,
      but the patch is for 3.2 kernel and seems not merged.
      http://www.spinics.net/lists/linux-btrfs/msg12747.html
      
      Cc: David Sterba <dave@jikos.cz>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fda2832f
  13. 21 2月, 2013 9 次提交
  14. 20 2月, 2013 1 次提交
  15. 16 2月, 2013 1 次提交
    • D
      btrfs: access superblock via pagecache in scan_one_device · 6f60cbd3
      David Sterba 提交于
      btrfs_scan_one_device is calling set_blocksize() which can race
      with a concurrent process making dirty page cache pages.  It can end up
      dropping dirty page cache pages on the floor, which isn't very nice when
      someone is just running btrfs dev scan to find filesystems on the
      box.
      
      Now that udev is registering btrfs devices as it discovers them, we can
      actually end up racing with our own mkfs program too.  When this
      happens, we drop some of the important blocks written by mkfs.
      
      This commit changes scan_one_device to read the super out of the page
      cache instead of trying to use bread.  This way we don't have to care
      about the blocksize of the device.
      
      This also drops the invalidate_bdev() call.  It wasn't very polite to
      invalidate during the scan either.  mkfs is putting the super into the
      page cache, there's no reason to invalidate at this point.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      6f60cbd3