1. 02 7月, 2013 1 次提交
  2. 14 6月, 2013 3 次提交
  3. 18 5月, 2013 2 次提交
    • C
      Btrfs: use a btrfs bioset instead of abusing bio internals · 9be3395b
      Chris Mason 提交于
      Btrfs has been pointer tagging bi_private and using bi_bdev
      to store the stripe index and mirror number of failed IOs.
      
      As bios bubble back up through the call chain, we use these
      to decide if and how to retry our IOs.  They are also used
      to count IO failures on a per device basis.
      
      Recently a bio tracepoint was added lead to crashes because
      we were abusing bi_bdev.
      
      This commit adds a btrfs bioset, and creates explicit fields
      for the mirror number and stripe index.  The plan is to
      extend this structure for all of the fields currently in
      struct btrfs_bio, which will mean one less kmalloc in
      our IO path.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      Reported-by: NTejun Heo <tj@kernel.org>
      9be3395b
    • A
      Correct allowed raid levels on balance. · 8250dabe
      Andreas Philipp 提交于
      Raid5 with 3 devices is well defined while the old logic allowed
      raid5 only with a minimum of 4 devices when converting the block group
      profile via btrfs balance. Creating a raid5 with just three devices
      using mkfs.btrfs worked always as expected. This is now fixed and the
      whole logic is rewritten.
      Signed-off-by: NAndreas Philipp <philipp.andreas@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      8250dabe
  4. 07 5月, 2013 7 次提交
    • E
      btrfs: make static code static & remove dead code · 48a3b636
      Eric Sandeen 提交于
      Big patch, but all it does is add statics to functions which
      are in fact static, then remove the associated dead-code fallout.
      
      removed functions:
      
      btrfs_iref_to_path()
      __btrfs_lookup_delayed_deletion_item()
      __btrfs_search_delayed_insertion_item()
      __btrfs_search_delayed_deletion_item()
      find_eb_for_page()
      btrfs_find_block_group()
      range_straddles_pages()
      extent_range_uptodate()
      btrfs_file_extent_length()
      btrfs_scrub_cancel_devid()
      btrfs_start_transaction_lflush()
      
      btrfs_print_tree() is left because it is used for debugging.
      btrfs_start_transaction_lflush() and btrfs_reada_detach() are
      left for symmetry.
      
      ulist.c functions are left, another patch will take care of those.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      48a3b636
    • J
      Btrfs: don't BUG_ON() in btrfs_num_copies · fb7669b5
      Josef Bacik 提交于
      A user sent me a btrfs-image that was panicing because of some corruption.  This
      is because we pass in a bogus value to btrfs_num_copies, and it panics.  Instead
      just return 1.  We only call btrfs_num_copies to see if there are other copies
      to try and read for things, so if we just return 1 it will make the callers exit
      out with an appropriate error value.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fb7669b5
    • J
      Btrfs: deal with bad mappings in btrfs_map_block · 9bb91873
      Josef Bacik 提交于
      Martin Steigerwald reported a BUG_ON() in btrfs_map_block where we didn't find
      a chunk for a particular block we were trying to map.  This happened because the
      block was bogus.  We shouldn't be BUG_ON()'ing in this case, just print a
      message and return an error.  This came from reada_add_block and it appears to
      deal with an error fine so we should be good there.  Thanks,
      Reported-by: NMartin Steigerwald <Martin@lichtvoll.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      9bb91873
    • M
      Btrfs: use a lock to protect incompat/compat flag of the super block · ceda0864
      Miao Xie 提交于
      The following case will make the incompat/compat flag of the super block
      be recovered.
       Task1					|Task2
       flags = btrfs_super_incompat_flags();	|
      					|flags = btrfs_super_incompat_flags();
       flags |= new_flag1;			|
      					|flags |= new_flag2;
       btrfs_set_super_incompat_flags(flags);	|
      					|btrfs_set_super_incompat_flags(flags);
      the new_flag1 is recovered.
      
      In order to avoid this problem, we introduce a lock named super_lock into
      the btrfs_fs_info structure. If we want to update incompat/compat flags
      of the super block, we must hold it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      ceda0864
    • E
      btrfs: ignore device open failures in __btrfs_open_devices · f63e0cca
      Eric Sandeen 提交于
      This:
      
         # mkfs.btrfs /dev/sdb{1,2} ; wipefs -a /dev/sdb1; mount /dev/sdb2 /mnt/test
      
      would lead to a blkdev open/close mismatch when the mount fails, and
      a permanently busy (opened O_EXCL) sdb2:
      
         # wipefs -a /dev/sdb2
         wipefs: error: /dev/sdb2: probing initialization failed: Device or resource busy
      
      It's because btrfs_open_devices() may open some devices, fail on
      the last one, and return that failure stored in "ret."   The mount
      then fails, but the caller then does not clean up the open devices.
      
      Chris assures me that:
      
      "btrfs_open_devices just means: go off and open every bdev you can from
      this uuid.  It should return success if we opened any of them at all."
      
      So change the logic to ignore any open failures; just skip processing
      of that device.  Later on it's decided whether we have enough devices
      to continue.
      Reported-by: NJan Safranek <jsafrane@redhat.com>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      f63e0cca
    • J
      Btrfs: fix bad extent logging · 09a2a8f9
      Josef Bacik 提交于
      A user sent me a btrfs-image of a file system that was panicing on mount during
      the log recovery.  I had originally thought these problems were from a bug in
      the free space cache code, but that was just a symptom of the problem.  The
      problem is if your application does something like this
      
      [prealloc][prealloc][prealloc]
      
      the internal extent maps will merge those all together into one extent map, even
      though on disk they are 3 separate extents.  So if you go to write into one of
      these ranges the extent map will be right since we use the physical extent when
      doing the write, but when we log the extents they will use the wrong sizes for
      the remainder prealloc space.  If this doesn't happen to trip up the free space
      cache (which it won't in a lot of cases) then you will get bogus entries in your
      extent tree which will screw stuff up later.  The data and such will still work,
      but everything else is broken.  This patch fixes this by not allowing extents
      that are on the modified list to be merged.  This has the side effect that we
      are no longer adding everything to the modified list all the time, which means
      we now have to call btrfs_drop_extents every time we log an extent into the
      tree.  So this allows me to drop all this speciality code I was using to get
      around calling btrfs_drop_extents.  With this patch the testcase I've created no
      longer creates a bogus file system after replaying the log.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      09a2a8f9
    • S
      Btrfs: Include the device in most error printk()s · c2cf52eb
      Simon Kirby 提交于
      With more than one btrfs volume mounted, it can be very difficult to find
      out which volume is hitting an error. btrfs_error() will print this, but
      it is currently rigged as more of a fatal error handler, while many of
      the printk()s are currently for debugging and yet-unhandled cases.
      
      This patch just changes the functions where the device information is
      already available. Some cases remain where the root or fs_info is not
      passed to the function emitting the error.
      
      This may introduce some confusion with volumes backed by multiple devices
      emitting errors referring to the primary device in the set instead of the
      one on which the error occurred.
      
      Use btrfs_printk(fs_info, format, ...) rather than writing the device
      string every time, and introduce macro wrappers ala XFS for brevity.
      Since the function already cannot be used for continuations, print a
      newline as part of the btrfs_printk() message rather than at each caller.
      Signed-off-by: NSimon Kirby <sim@hostway.ca>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c2cf52eb
  5. 22 3月, 2013 1 次提交
    • J
      Btrfs: handle a bogus chunk tree nicely · 835d974f
      Josef Bacik 提交于
      If you restore a btrfs-image file system and try to mount that file system we'll
      panic.  That's because btrfs-image restores and just makes one big chunk to
      envelope the whole disk, since they are really only meant to be messed with by
      our btrfs-progs.  So fix up btrfs_rmap_block and the callers of it for mount so
      that we no longer panic but instead just return an error and fail to mount.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      835d974f
  6. 15 3月, 2013 1 次提交
    • E
      btrfs: use rcu_barrier() to wait for bdev puts at unmount · bc178622
      Eric Sandeen 提交于
      Doing this would reliably fail with -EBUSY for me:
      
      # mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2
      ...
      unable to open /dev/sdb2: Device or resource busy
      
      because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it.
      
      Using systemtap to track bdev gets & puts shows a kworker thread doing a
      blkdev put after mkfs attempts a get; this is left over from the unmount
      path:
      
      btrfs_close_devices
      	__btrfs_close_devices
      		call_rcu(&device->rcu, free_device);
      			free_device
      				INIT_WORK(&device->rcu_work, __free_device);
      				schedule_work(&device->rcu_work);
      
      so unmount might complete before __free_device fires & does its blkdev_put.
      
      Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait
      until all blkdev_put()s are done, and the device is truly free once
      unmount completes.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      bc178622
  7. 07 3月, 2013 1 次提交
  8. 05 3月, 2013 1 次提交
  9. 01 3月, 2013 1 次提交
  10. 27 2月, 2013 1 次提交
    • Q
      btrfs: cleanup for open-coded alignment · fda2832f
      Qu Wenruo 提交于
      Though most of the btrfs codes are using ALIGN macro for page alignment,
      there are still some codes using open-coded alignment like the
      following:
      ------
              u64 mask = ((u64)root->stripesize - 1);
              u64 ret = (val + mask) & ~mask;
      ------
      Or even hidden one:
      ------
              num_bytes = (end - start + blocksize) & ~(blocksize - 1);
      ------
      
      Sometimes these open-coded alignment is not so easy to understand for
      newbie like me.
      
      This commit changes the open-coded alignment to the ALIGN macro for a
      better readability.
      
      Also there is a previous patch from David Sterba with similar changes,
      but the patch is for 3.2 kernel and seems not merged.
      http://www.spinics.net/lists/linux-btrfs/msg12747.html
      
      Cc: David Sterba <dave@jikos.cz>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fda2832f
  11. 21 2月, 2013 9 次提交
  12. 20 2月, 2013 1 次提交
  13. 16 2月, 2013 1 次提交
    • D
      btrfs: access superblock via pagecache in scan_one_device · 6f60cbd3
      David Sterba 提交于
      btrfs_scan_one_device is calling set_blocksize() which can race
      with a concurrent process making dirty page cache pages.  It can end up
      dropping dirty page cache pages on the floor, which isn't very nice when
      someone is just running btrfs dev scan to find filesystems on the
      box.
      
      Now that udev is registering btrfs devices as it discovers them, we can
      actually end up racing with our own mkfs program too.  When this
      happens, we drop some of the important blocks written by mkfs.
      
      This commit changes scan_one_device to read the super out of the page
      cache instead of trying to use bread.  This way we don't have to care
      about the blocksize of the device.
      
      This also drops the invalidate_bdev() call.  It wasn't very polite to
      invalidate during the scan either.  mkfs is putting the super into the
      page cache, there's no reason to invalidate at this point.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      6f60cbd3
  14. 05 2月, 2013 1 次提交
  15. 02 2月, 2013 2 次提交
    • D
      Btrfs: RAID5 and RAID6 · 53b381b3
      David Woodhouse 提交于
      This builds on David Woodhouse's original Btrfs raid5/6 implementation.
      The code has changed quite a bit, blame Chris Mason for any bugs.
      
      Read/modify/write is done after the higher levels of the filesystem have
      prepared a given bio.  This means the higher layers are not responsible
      for building full stripes, and they don't need to query for the topology
      of the extents that may get allocated during delayed allocation runs.
      It also means different files can easily share the same stripe.
      
      But, it does expose us to incorrect parity if we crash or lose power
      while doing a read/modify/write cycle.  This will be addressed in a
      later commit.
      
      Scrub is unable to repair crc errors on raid5/6 chunks.
      
      Discard does not work on raid5/6 (yet)
      
      The stripe size is fixed at 64KiB per disk.  This will be tunable
      in a later commit.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      53b381b3
    • E
      btrfs: don't try to notify udev about missing devices · 3c911608
      Eric Sandeen 提交于
      If we remove a missing device, bdev is null, and if we
      send that off to btrfs_kobject_uevent we'll panic.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      3c911608
  16. 25 1月, 2013 1 次提交
  17. 22 1月, 2013 1 次提交
  18. 20 1月, 2013 1 次提交
    • I
      Btrfs: bring back balance pause/resume logic · ed0fb78f
      Ilya Dryomov 提交于
      Balance pause/resume logic got broken by 5ac00add (went in into 3.8-rc1
      as part of dev-replace merge).  Offending commit took a stab at making
      mutually exclusive volume operations (add_dev, rm_dev, resize, balance,
      replace_dev) not block behind volume_mutex if another such operation is
      in progress and instead return an error right away.  Balancing front-end
      relied on the blocking behaviour, so the fix is ugly, but short of a
      complete rework, it's the best we can do.
      Reported-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      ed0fb78f
  19. 15 1月, 2013 1 次提交
  20. 17 12月, 2012 3 次提交
    • L
      Btrfs: put raid properties into global table · 31e50229
      Liu Bo 提交于
      Raid properties can be shared among raid calculation code, we can put
      them into a global table to keep it simple.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      31e50229
    • J
      Btrfs: log changed inodes based on the extent map tree · 70c8a91c
      Josef Bacik 提交于
      We don't really need to copy extents from the source tree since we have all
      of the information already available to us in the extent_map tree.  So
      instead just write the extents straight to the log tree and don't bother to
      copy the extent items from the source tree.
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      70c8a91c
    • L
      btrfs: Notify udev when removing device · b8b8ff59
      Lukas Czerner 提交于
      Currently udev does not know about the device being removed from the
      file system. This may result in the situation where we're unable to
      mount the file system by UUID or by LABEL because the by-uuid and
      by-label links may still point to the device which is no longer part of
      the btrfs file system and hence does not have any btrfs super block.
      
      It can be easily reproduced by the following:
      
      mkfs.btrfs -L bugfs /dev/loop[0-6]
      mount /dev/loop0 /mnt/test
      btrfs device delete /dev/loop0 /mnt/test
      umount /mnt/test
      
      mount LABEL=bugfs /mnt/test <---- this fails
      
      then see:
      
      ls -l /dev/disk/by-label/bugfs
      
      which will still point to the /dev/loop0
      
      We did not noticed this before because libblkid would send the udev
      event for us when it notice that the link does not fit the reality,
      however it does not do that anymore and completely relies on udev
      information.
      
      Fix this by sending the KOBJ_CHANGE event to the bdev kobject after
      successful device removal.
      
      Note that this does not affect device addition, because we will open the
      device prior the addition from userspace and udev will notice that and
      reread the device afterwards.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      b8b8ff59