1. 07 5月, 2013 5 次提交
    • J
      Btrfs: deal with bad mappings in btrfs_map_block · 9bb91873
      Josef Bacik 提交于
      Martin Steigerwald reported a BUG_ON() in btrfs_map_block where we didn't find
      a chunk for a particular block we were trying to map.  This happened because the
      block was bogus.  We shouldn't be BUG_ON()'ing in this case, just print a
      message and return an error.  This came from reada_add_block and it appears to
      deal with an error fine so we should be good there.  Thanks,
      Reported-by: NMartin Steigerwald <Martin@lichtvoll.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      9bb91873
    • M
      Btrfs: use a lock to protect incompat/compat flag of the super block · ceda0864
      Miao Xie 提交于
      The following case will make the incompat/compat flag of the super block
      be recovered.
       Task1					|Task2
       flags = btrfs_super_incompat_flags();	|
      					|flags = btrfs_super_incompat_flags();
       flags |= new_flag1;			|
      					|flags |= new_flag2;
       btrfs_set_super_incompat_flags(flags);	|
      					|btrfs_set_super_incompat_flags(flags);
      the new_flag1 is recovered.
      
      In order to avoid this problem, we introduce a lock named super_lock into
      the btrfs_fs_info structure. If we want to update incompat/compat flags
      of the super block, we must hold it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      ceda0864
    • E
      btrfs: ignore device open failures in __btrfs_open_devices · f63e0cca
      Eric Sandeen 提交于
      This:
      
         # mkfs.btrfs /dev/sdb{1,2} ; wipefs -a /dev/sdb1; mount /dev/sdb2 /mnt/test
      
      would lead to a blkdev open/close mismatch when the mount fails, and
      a permanently busy (opened O_EXCL) sdb2:
      
         # wipefs -a /dev/sdb2
         wipefs: error: /dev/sdb2: probing initialization failed: Device or resource busy
      
      It's because btrfs_open_devices() may open some devices, fail on
      the last one, and return that failure stored in "ret."   The mount
      then fails, but the caller then does not clean up the open devices.
      
      Chris assures me that:
      
      "btrfs_open_devices just means: go off and open every bdev you can from
      this uuid.  It should return success if we opened any of them at all."
      
      So change the logic to ignore any open failures; just skip processing
      of that device.  Later on it's decided whether we have enough devices
      to continue.
      Reported-by: NJan Safranek <jsafrane@redhat.com>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      f63e0cca
    • J
      Btrfs: fix bad extent logging · 09a2a8f9
      Josef Bacik 提交于
      A user sent me a btrfs-image of a file system that was panicing on mount during
      the log recovery.  I had originally thought these problems were from a bug in
      the free space cache code, but that was just a symptom of the problem.  The
      problem is if your application does something like this
      
      [prealloc][prealloc][prealloc]
      
      the internal extent maps will merge those all together into one extent map, even
      though on disk they are 3 separate extents.  So if you go to write into one of
      these ranges the extent map will be right since we use the physical extent when
      doing the write, but when we log the extents they will use the wrong sizes for
      the remainder prealloc space.  If this doesn't happen to trip up the free space
      cache (which it won't in a lot of cases) then you will get bogus entries in your
      extent tree which will screw stuff up later.  The data and such will still work,
      but everything else is broken.  This patch fixes this by not allowing extents
      that are on the modified list to be merged.  This has the side effect that we
      are no longer adding everything to the modified list all the time, which means
      we now have to call btrfs_drop_extents every time we log an extent into the
      tree.  So this allows me to drop all this speciality code I was using to get
      around calling btrfs_drop_extents.  With this patch the testcase I've created no
      longer creates a bogus file system after replaying the log.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      09a2a8f9
    • S
      Btrfs: Include the device in most error printk()s · c2cf52eb
      Simon Kirby 提交于
      With more than one btrfs volume mounted, it can be very difficult to find
      out which volume is hitting an error. btrfs_error() will print this, but
      it is currently rigged as more of a fatal error handler, while many of
      the printk()s are currently for debugging and yet-unhandled cases.
      
      This patch just changes the functions where the device information is
      already available. Some cases remain where the root or fs_info is not
      passed to the function emitting the error.
      
      This may introduce some confusion with volumes backed by multiple devices
      emitting errors referring to the primary device in the set instead of the
      one on which the error occurred.
      
      Use btrfs_printk(fs_info, format, ...) rather than writing the device
      string every time, and introduce macro wrappers ala XFS for brevity.
      Since the function already cannot be used for continuations, print a
      newline as part of the btrfs_printk() message rather than at each caller.
      Signed-off-by: NSimon Kirby <sim@hostway.ca>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c2cf52eb
  2. 22 3月, 2013 1 次提交
    • J
      Btrfs: handle a bogus chunk tree nicely · 835d974f
      Josef Bacik 提交于
      If you restore a btrfs-image file system and try to mount that file system we'll
      panic.  That's because btrfs-image restores and just makes one big chunk to
      envelope the whole disk, since they are really only meant to be messed with by
      our btrfs-progs.  So fix up btrfs_rmap_block and the callers of it for mount so
      that we no longer panic but instead just return an error and fail to mount.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      835d974f
  3. 15 3月, 2013 1 次提交
    • E
      btrfs: use rcu_barrier() to wait for bdev puts at unmount · bc178622
      Eric Sandeen 提交于
      Doing this would reliably fail with -EBUSY for me:
      
      # mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2
      ...
      unable to open /dev/sdb2: Device or resource busy
      
      because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it.
      
      Using systemtap to track bdev gets & puts shows a kworker thread doing a
      blkdev put after mkfs attempts a get; this is left over from the unmount
      path:
      
      btrfs_close_devices
      	__btrfs_close_devices
      		call_rcu(&device->rcu, free_device);
      			free_device
      				INIT_WORK(&device->rcu_work, __free_device);
      				schedule_work(&device->rcu_work);
      
      so unmount might complete before __free_device fires & does its blkdev_put.
      
      Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait
      until all blkdev_put()s are done, and the device is truly free once
      unmount completes.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      bc178622
  4. 07 3月, 2013 1 次提交
  5. 05 3月, 2013 1 次提交
  6. 01 3月, 2013 1 次提交
  7. 27 2月, 2013 1 次提交
    • Q
      btrfs: cleanup for open-coded alignment · fda2832f
      Qu Wenruo 提交于
      Though most of the btrfs codes are using ALIGN macro for page alignment,
      there are still some codes using open-coded alignment like the
      following:
      ------
              u64 mask = ((u64)root->stripesize - 1);
              u64 ret = (val + mask) & ~mask;
      ------
      Or even hidden one:
      ------
              num_bytes = (end - start + blocksize) & ~(blocksize - 1);
      ------
      
      Sometimes these open-coded alignment is not so easy to understand for
      newbie like me.
      
      This commit changes the open-coded alignment to the ALIGN macro for a
      better readability.
      
      Also there is a previous patch from David Sterba with similar changes,
      but the patch is for 3.2 kernel and seems not merged.
      http://www.spinics.net/lists/linux-btrfs/msg12747.html
      
      Cc: David Sterba <dave@jikos.cz>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fda2832f
  8. 21 2月, 2013 9 次提交
  9. 20 2月, 2013 1 次提交
  10. 16 2月, 2013 1 次提交
    • D
      btrfs: access superblock via pagecache in scan_one_device · 6f60cbd3
      David Sterba 提交于
      btrfs_scan_one_device is calling set_blocksize() which can race
      with a concurrent process making dirty page cache pages.  It can end up
      dropping dirty page cache pages on the floor, which isn't very nice when
      someone is just running btrfs dev scan to find filesystems on the
      box.
      
      Now that udev is registering btrfs devices as it discovers them, we can
      actually end up racing with our own mkfs program too.  When this
      happens, we drop some of the important blocks written by mkfs.
      
      This commit changes scan_one_device to read the super out of the page
      cache instead of trying to use bread.  This way we don't have to care
      about the blocksize of the device.
      
      This also drops the invalidate_bdev() call.  It wasn't very polite to
      invalidate during the scan either.  mkfs is putting the super into the
      page cache, there's no reason to invalidate at this point.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      6f60cbd3
  11. 05 2月, 2013 1 次提交
  12. 02 2月, 2013 2 次提交
    • D
      Btrfs: RAID5 and RAID6 · 53b381b3
      David Woodhouse 提交于
      This builds on David Woodhouse's original Btrfs raid5/6 implementation.
      The code has changed quite a bit, blame Chris Mason for any bugs.
      
      Read/modify/write is done after the higher levels of the filesystem have
      prepared a given bio.  This means the higher layers are not responsible
      for building full stripes, and they don't need to query for the topology
      of the extents that may get allocated during delayed allocation runs.
      It also means different files can easily share the same stripe.
      
      But, it does expose us to incorrect parity if we crash or lose power
      while doing a read/modify/write cycle.  This will be addressed in a
      later commit.
      
      Scrub is unable to repair crc errors on raid5/6 chunks.
      
      Discard does not work on raid5/6 (yet)
      
      The stripe size is fixed at 64KiB per disk.  This will be tunable
      in a later commit.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      53b381b3
    • E
      btrfs: don't try to notify udev about missing devices · 3c911608
      Eric Sandeen 提交于
      If we remove a missing device, bdev is null, and if we
      send that off to btrfs_kobject_uevent we'll panic.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      3c911608
  13. 25 1月, 2013 1 次提交
  14. 22 1月, 2013 1 次提交
  15. 20 1月, 2013 1 次提交
    • I
      Btrfs: bring back balance pause/resume logic · ed0fb78f
      Ilya Dryomov 提交于
      Balance pause/resume logic got broken by 5ac00add (went in into 3.8-rc1
      as part of dev-replace merge).  Offending commit took a stab at making
      mutually exclusive volume operations (add_dev, rm_dev, resize, balance,
      replace_dev) not block behind volume_mutex if another such operation is
      in progress and instead return an error right away.  Balancing front-end
      relied on the blocking behaviour, so the fix is ugly, but short of a
      complete rework, it's the best we can do.
      Reported-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      ed0fb78f
  16. 15 1月, 2013 1 次提交
  17. 17 12月, 2012 4 次提交
    • L
      Btrfs: put raid properties into global table · 31e50229
      Liu Bo 提交于
      Raid properties can be shared among raid calculation code, we can put
      them into a global table to keep it simple.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      31e50229
    • J
      Btrfs: log changed inodes based on the extent map tree · 70c8a91c
      Josef Bacik 提交于
      We don't really need to copy extents from the source tree since we have all
      of the information already available to us in the extent_map tree.  So
      instead just write the extents straight to the log tree and don't bother to
      copy the extent items from the source tree.
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      70c8a91c
    • L
      btrfs: Notify udev when removing device · b8b8ff59
      Lukas Czerner 提交于
      Currently udev does not know about the device being removed from the
      file system. This may result in the situation where we're unable to
      mount the file system by UUID or by LABEL because the by-uuid and
      by-label links may still point to the device which is no longer part of
      the btrfs file system and hence does not have any btrfs super block.
      
      It can be easily reproduced by the following:
      
      mkfs.btrfs -L bugfs /dev/loop[0-6]
      mount /dev/loop0 /mnt/test
      btrfs device delete /dev/loop0 /mnt/test
      umount /mnt/test
      
      mount LABEL=bugfs /mnt/test <---- this fails
      
      then see:
      
      ls -l /dev/disk/by-label/bugfs
      
      which will still point to the /dev/loop0
      
      We did not noticed this before because libblkid would send the udev
      event for us when it notice that the link does not fit the reality,
      however it does not do that anymore and completely relies on udev
      information.
      
      Fix this by sending the KOBJ_CHANGE event to the bdev kobject after
      successful device removal.
      
      Note that this does not affect device addition, because we will open the
      device prior the addition from userspace and udev will notice that and
      reread the device afterwards.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      b8b8ff59
    • S
      Btrfs: fix a build warning for an unused label · f9c83748
      Stefan Behrens 提交于
      This issue was detected by the "0-DAY kernel build testing".
      
      fs/btrfs/volumes.c: In function 'btrfs_rm_device':
      fs/btrfs/volumes.c:1505:1: warning: label 'error_close' defined but not used [-Wunused-label]
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      f9c83748
  18. 13 12月, 2012 7 次提交
    • S
      Btrfs: allow repair code to include target disk when searching mirrors · ad6d620e
      Stefan Behrens 提交于
      Make the target disk of a running device replace operation
      available for reading. This is only used as a last ressort for
      the defect repair procedure. And it is dependent on the location
      of the data block to read, because during an ongoing device
      replace operation, the target drive is only partially filled
      with the filesystem data.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      ad6d620e
    • S
      Btrfs: optionally avoid reads from device replace source drive · 30d9861f
      Stefan Behrens 提交于
      It is desirable to be able to configure the device replace
      procedure to avoid reading the source drive (the one to be
      copied) whenever possible. This is useful when the number of
      read errors on this disk is high, because it would delay the
      copy procedure alot. Therefore there is an option to avoid
      reading from the source disk unless the repair procedure
      really needs to access it. The regular read req asks for
      mapping the block with mirror_num == 0, in this case the
      source disk is avoided whenever possible. The repair code
      selects the mirror_num explicitly (mirror_num != 0), this
      case is not changed by this commit.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      30d9861f
    • S
      Btrfs: changes to live filesystem are also written to replacement disk · 472262f3
      Stefan Behrens 提交于
      During a running dev replace operation, all write requests to
      the live filesystem are duplicated to also write to the target
      drive. Therefore btrfs_map_block() is changed to duplicate
      stripes that are written to the source disk of a device replace
      procedure to be written to the target disk as well.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      472262f3
    • S
      Btrfs: introduce GET_READ_MIRRORS functionality for btrfs_map_block() · 29a8d9a0
      Stefan Behrens 提交于
      Before this commit, btrfs_map_block() was called with REQ_WRITE
      in order to retrieve the list of mirrors for a disk block.
      This needs to be changed for the device replace procedure since
      it makes a difference whether you are asking for read mirrors
      or for locations to write to.
      GET_READ_MIRRORS is introduced as a new interface to call
      btrfs_map_block().
      In the current commit, the functionality is not yet changed,
      only the interface for GET_READ_MIRRORS is introduced and all
      the places that should use this new interface are adapted.
      
      The reason that REQ_WRITE cannot be abused anymore to retrieve
      a list of read mirrors is that during a running dev replace
      operation all write requests to the live filesystem are
      duplicated to also write to the target drive.
      Keep in mind that the target disk is only partially a valid
      copy of the source disk while the operation is ongoing. All
      writes go to the target disk, but not all reads would return
      valid data on the target disk. Therefore it is not possible
      anymore to abuse a REQ_WRITE interface to find valid mirrors
      for a REQ_READ.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      29a8d9a0
    • S
      Btrfs: change core code of btrfs to support the device replace operations · 8dabb742
      Stefan Behrens 提交于
      This commit contains all the essential changes to the core code
      of Btrfs for support of the device replace procedure.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      8dabb742
    • S
      Btrfs: add new sources for device replace code · e93c89c1
      Stefan Behrens 提交于
      This adds a new file to the sources together with the header file
      and the changes to ioctl.h and ctree.h that are required by the
      new C source file. Additionally, 4 new functions are added to
      volume.c that deal with device creation and destruction.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      e93c89c1
    • S
      Btrfs: handle errors from btrfs_map_bio() everywhere · 61891923
      Stefan Behrens 提交于
      With the addition of the device replace procedure, it is possible
      for btrfs_map_bio(READ) to report an error. This happens when the
      specific mirror is requested which is located on the target disk,
      and the copy operation has not yet copied this block. Hence the
      block cannot be read and this error state is indicated by
      returning EIO.
      Some background information follows now. A new mirror is added
      while the device replace procedure is running.
      btrfs_get_num_copies() returns one more, and
      btrfs_map_bio(GET_READ_MIRROR) adds one more mirror if a disk
      location is involved that was already handled by the device
      replace copy operation. The assigned mirror num is the highest
      mirror number, e.g. the value 3 in case of RAID1.
      If btrfs_map_bio() is invoked with mirror_num == 0 (i.e., select
      any mirror), the copy on the target drive is never selected
      because that disk shall be able to perform the write requests as
      quickly as possible. The parallel execution of read requests would
      only slow down the disk copy procedure. Second case is that
      btrfs_map_bio() is called with mirror_num > 0. This is done from
      the repair code only. In this case, the highest mirror num is
      assigned to the target disk, since it is used last. And when this
      mirror is not available because the copy procedure has not yet
      handled this area, an error is returned. Everywhere in the code
      the handling of such errors is added now.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      61891923