1. 11 3月, 2014 1 次提交
    • M
      Btrfs: fix use-after-free in the finishing procedure of the device replace · c404e0dc
      Miao Xie 提交于
      During device replace test, we hit a null pointer deference (It was very easy
      to reproduce it by running xfstests' btrfs/011 on the devices with the virtio
      scsi driver). There were two bugs that caused this problem:
      - We might allocate new chunks on the replaced device after we updated
        the mapping tree. And we forgot to replace the source device in those
        mapping of the new chunks.
      - We might get the mapping information which including the source device
        before the mapping information update. And then submit the bio which was
        based on that mapping information after we freed the source device.
      
      For the first bug, we can fix it by doing mapping tree update and source
      device remove in the same context of the chunk mutex. The chunk mutex is
      used to protect the allocable device list, the above method can avoid
      the new chunk allocation, and after we remove the source device, all
      the new chunks will be allocated on the new device. So it can fix
      the first bug.
      
      For the second bug, we need make sure all flighting bios are finished and
      no new bios are produced during we are removing the source device. To fix
      this problem, we introduced a global @bio_counter, we not only inc/dec
      @bio_counter outsize of map_blocks, but also inc it before submitting bio
      and dec @bio_counter when ending bios.
      
      Since Raid56 is a little different and device replace dosen't support raid56
      yet, it is not addressed in the patch and I add comments to make sure we will
      fix it in the future.
      Reported-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      c404e0dc
  2. 12 11月, 2013 2 次提交
  3. 01 9月, 2013 4 次提交
  4. 02 7月, 2013 1 次提交
    • J
      Btrfs: make the chunk allocator completely tree lockless · 6df9a95e
      Josef Bacik 提交于
      When adjusting the enospc rules for relocation I ran into a deadlock because we
      were relocating the only system chunk and that forced us to try and allocate a
      new system chunk while holding locks in the chunk tree, which caused us to
      deadlock.  To fix this I've moved all of the dev extent addition and chunk
      addition out to the delayed chunk completion stuff.  We still keep the in-memory
      stuff which makes sure everything is consistent.
      
      One change I had to make was to search the commit root of the device tree to
      find a free dev extent, and hold onto any chunk em's that we allocated in that
      transaction so we do not allocate the same dev extent twice.  This has the side
      effect of fixing a bug with balance that has been there ever since balance
      existed.  Basically you can free a block group and it's dev extent and then
      immediately allocate that dev extent for a new block group and write stuff to
      that dev extent, all within the same transaction.  So if you happen to crash
      during a balance you could come back to a completely broken file system.  This
      patch should keep these sort of things from happening in the future since we
      won't be able to allocate free'd dev extents until after the transaction
      commits.  This has passed all of the xfstests and my super annoying stress test
      followed by a balance.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      6df9a95e
  5. 14 6月, 2013 1 次提交
  6. 18 5月, 2013 1 次提交
    • C
      Btrfs: use a btrfs bioset instead of abusing bio internals · 9be3395b
      Chris Mason 提交于
      Btrfs has been pointer tagging bi_private and using bi_bdev
      to store the stripe index and mirror number of failed IOs.
      
      As bios bubble back up through the call chain, we use these
      to decide if and how to retry our IOs.  They are also used
      to count IO failures on a per device basis.
      
      Recently a bio tracepoint was added lead to crashes because
      we were abusing bi_bdev.
      
      This commit adds a btrfs bioset, and creates explicit fields
      for the mirror number and stripe index.  The plan is to
      extend this structure for all of the fields currently in
      struct btrfs_bio, which will mean one less kmalloc in
      our IO path.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      Reported-by: NTejun Heo <tj@kernel.org>
      9be3395b
  7. 07 5月, 2013 1 次提交
    • E
      btrfs: make static code static & remove dead code · 48a3b636
      Eric Sandeen 提交于
      Big patch, but all it does is add statics to functions which
      are in fact static, then remove the associated dead-code fallout.
      
      removed functions:
      
      btrfs_iref_to_path()
      __btrfs_lookup_delayed_deletion_item()
      __btrfs_search_delayed_insertion_item()
      __btrfs_search_delayed_deletion_item()
      find_eb_for_page()
      btrfs_find_block_group()
      range_straddles_pages()
      extent_range_uptodate()
      btrfs_file_extent_length()
      btrfs_scrub_cancel_devid()
      btrfs_start_transaction_lflush()
      
      btrfs_print_tree() is left because it is used for debugging.
      btrfs_start_transaction_lflush() and btrfs_reada_detach() are
      left for symmetry.
      
      ulist.c functions are left, another patch will take care of those.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      48a3b636
  8. 20 2月, 2013 1 次提交
  9. 02 2月, 2013 1 次提交
    • D
      Btrfs: RAID5 and RAID6 · 53b381b3
      David Woodhouse 提交于
      This builds on David Woodhouse's original Btrfs raid5/6 implementation.
      The code has changed quite a bit, blame Chris Mason for any bugs.
      
      Read/modify/write is done after the higher levels of the filesystem have
      prepared a given bio.  This means the higher layers are not responsible
      for building full stripes, and they don't need to query for the topology
      of the extents that may get allocated during delayed allocation runs.
      It also means different files can easily share the same stripe.
      
      But, it does expose us to incorrect parity if we crash or lose power
      while doing a read/modify/write cycle.  This will be addressed in a
      later commit.
      
      Scrub is unable to repair crc errors on raid5/6 chunks.
      
      Discard does not work on raid5/6 (yet)
      
      The stripe size is fixed at 64KiB per disk.  This will be tunable
      in a later commit.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      53b381b3
  10. 17 12月, 2012 1 次提交
  11. 13 12月, 2012 9 次提交
  12. 29 8月, 2012 1 次提交
  13. 24 7月, 2012 2 次提交
  14. 03 7月, 2012 2 次提交
  15. 15 6月, 2012 1 次提交
    • J
      Btrfs: use rcu to protect device->name · 606686ee
      Josef Bacik 提交于
      Al pointed out that we can just toss out the old name on a device and add a
      new one arbitrarily, so anybody who uses device->name in printk could
      possibly use free'd memory.  Instead of adding locking around all of this he
      suggested doing it with RCU, so I've introduced a struct rcu_string that
      does just that and have gone through and protected all accesses to
      device->name that aren't under the uuid_mutex with rcu_read_lock().  This
      protects us and I will use it for dealing with removing the device that we
      used to mount the file system in a later patch.  Thanks,
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      606686ee
  16. 30 5月, 2012 3 次提交
  17. 22 3月, 2012 1 次提交
  18. 17 1月, 2012 7 次提交
    • I
      Btrfs: add balance progress reporting · 19a39dce
      Ilya Dryomov 提交于
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      19a39dce
    • I
      Btrfs: allow for canceling restriper · a7e99c69
      Ilya Dryomov 提交于
      Implement an ioctl for canceling restriper.  Currently we wait until
      relocation of the current block group is finished, in future this can be
      done by triggering a commit.  Balance item is deleted and no memory
      about the interrupted balance is kept.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      a7e99c69
    • I
      Btrfs: allow for pausing restriper · 837d5b6e
      Ilya Dryomov 提交于
      Implement an ioctl for pausing restriper.  This pauses the relocation,
      but balance is still considered to be "in progress": balance item is
      not deleted, other volume operations cannot be started, etc.  If paused
      in the middle of profile changing operation we will continue making
      allocations with the target profile.
      
      Add a hook to close_ctree() to pause restriper and free its data
      structures on unmount.  (It's safe to unmount when restriper is in
      "paused" state, we will resume with the same parameters on the next
      mount)
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      837d5b6e
    • I
      Btrfs: recover balance on mount · 59641015
      Ilya Dryomov 提交于
      On mount, if balance item is found, resume balance in a separate
      kernel thread.
      
      Try to be smart to continue roughly where previous balance (or convert)
      was interrupted.  For chunk types that were being converted to some
      profile we turn on soft convert, in case of a simple balance we turn on
      usage filter and relocate only less-than-90%-full chunks of that type.
      These are just heuristics but they help quite a bit, and can be improved
      in future.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      59641015
    • I
      Btrfs: soft profile changing mode (aka soft convert) · cfa4c961
      Ilya Dryomov 提交于
      When doing convert from one profile to another if soft mode is on
      restriper won't touch chunks that already have the profile we are
      converting to.  This is useful if e.g. half of the FS was converted
      earlier.
      
      The soft mode switch is (like every other filter) per-type.  This means
      that we can convert for example meta chunks the "hard" way while
      converting data chunks selectively with soft switch.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      cfa4c961
    • I
      Btrfs: implement online profile changing · e4d8ec0f
      Ilya Dryomov 提交于
      Profile changing is done by launching a balance with
      BTRFS_BALANCE_CONVERT bits set and target fields of respective
      btrfs_balance_args structs initialized.  Profile reducing code in this
      case will pick restriper's target profile if it's available instead of
      doing a blind reduce.  If target profile is not yet available it goes
      back to a plain reduce.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      e4d8ec0f
    • I
      Btrfs: virtual address space subset filter · ea67176a
      Ilya Dryomov 提交于
      Select chunks which have at least one byte located inside a given
      [vstart, vend) virtual address space range.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      ea67176a