1. 15 6月, 2012 1 次提交
    • J
      Btrfs: add btrfs_next_old_leaf · 3d7806ec
      Jan Schmidt 提交于
      To make sense of the tree mod log, the backref walker not only needs
      btrfs_search_old_slot, but it also called btrfs_next_leaf, which in turn was
      calling btrfs_search_slot. This obviously didn't give the correct result.
      
      This commit adds btrfs_next_old_leaf, a drop-in replacement for
      btrfs_next_leaf with a time_seq parameter. If it is zero, it behaves exactly
      like btrfs_next_leaf. If it is non-zero, it will use btrfs_search_old_slot
      with this time_seq parameter.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      3d7806ec
  2. 31 5月, 2012 1 次提交
    • J
      Btrfs: use delayed ref sequence numbers for all fs-tree updates · 95a06077
      Jan Schmidt 提交于
      The sequence number for delayed refs is needed to postpone certain delayed
      refs for a very short period while walking backrefs. Before the tree
      modification log, we thought we'd only have to hold back those references
      that don't have a counter operation.
      
      While now we've the tree mod log, we're rewinding fs tree blocks to a
      defined consistent state. We cannot know in advance for which tree block
      we'll be doing rewind operations later. Therefore, we must postpone all the
      delayed refs for fs-tree blocks, even those having a counter operation.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      95a06077
  3. 30 5月, 2012 5 次提交
    • S
      Btrfs: set ioprio of scrub readahead to idle · 3d136a11
      Stefan Behrens 提交于
      Reduce ioprio class of scrub readahead threads to idle priority.
      This setting is fixed. This priority has shown the best performance
      during all measurements.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      3d136a11
    • S
      Btrfs: read device stats on mount, write modified ones during commit · 733f4fbb
      Stefan Behrens 提交于
      The device statistics are written into the device tree with each
      transaction commit. Only modified statistics are written.
      When a filesystem is mounted, the device statistics for each involved
      device are read from the device tree and used to initialize the
      counters.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      733f4fbb
    • J
      Btrfs: fix how we deal with the orphan block rsv · 8a35d95f
      Josef Bacik 提交于
      Ceph was hitting this race where we would remove an inode from the per-root
      orphan list before we would release the space we had reserved for the inode.
      We actually don't need a list or anything, we just need to make sure the
      root doesn't try to free up the orphan reserve until after the inodes have
      released their reservations.  So use an atomic counter instead of a list on
      the root and only decrement the counter after we've released our
      reservation.  I've tested this as well as several others and we no longer
      see the warnings that you would see while running ceph.  Thanks,
      Btrfs: fix how we deal with the orphan block rsv
      
      Ceph was hitting this race where we would remove an inode from the per-root
      orphan list before we would release the space we had reserved for the inode.
      We actually don't need a list or anything, we just need to make sure the
      root doesn't try to free up the orphan reserve until after the inodes have
      released their reservations.  So use an atomic counter instead of a list on
      the root and only decrement the counter after we've released our
      reservation.  I've tested this as well as several others and we no longer
      see the warnings that you would see while running ceph.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      8a35d95f
    • J
      Btrfs: add btrfs_search_old_slot · 5d9e75c4
      Jan Schmidt 提交于
      The tree modification log together with the current state of the tree gives
      a consistent, old version of the tree. btrfs_search_old_slot is used to
      search through this old version and return old (dummy!) extent buffers.
      Naturally, this function cannot do any tree modifications.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      5d9e75c4
    • J
      Btrfs: add tree modification log functions · bd989ba3
      Jan Schmidt 提交于
      The tree mod log will log modifications made fs-tree nodes. Most
      modifications are done by autobalance of the tree. Such changes are recorded
      as long as a block entry exists. When released, the log is cleaned.
      
      With the tree modification log, it's possible to reconstruct a consistent
      old state of the tree. This is required to do backref walking on a busy
      file system.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      bd989ba3
  4. 26 5月, 2012 3 次提交
  5. 19 4月, 2012 1 次提交
  6. 13 4月, 2012 1 次提交
  7. 28 3月, 2012 1 次提交
  8. 27 3月, 2012 6 次提交
  9. 22 3月, 2012 6 次提交
  10. 15 2月, 2012 1 次提交
  11. 17 1月, 2012 9 次提交
    • I
      Btrfs: allow for canceling restriper · a7e99c69
      Ilya Dryomov 提交于
      Implement an ioctl for canceling restriper.  Currently we wait until
      relocation of the current block group is finished, in future this can be
      done by triggering a commit.  Balance item is deleted and no memory
      about the interrupted balance is kept.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      a7e99c69
    • I
      Btrfs: allow for pausing restriper · 837d5b6e
      Ilya Dryomov 提交于
      Implement an ioctl for pausing restriper.  This pauses the relocation,
      but balance is still considered to be "in progress": balance item is
      not deleted, other volume operations cannot be started, etc.  If paused
      in the middle of profile changing operation we will continue making
      allocations with the target profile.
      
      Add a hook to close_ctree() to pause restriper and free its data
      structures on unmount.  (It's safe to unmount when restriper is in
      "paused" state, we will resume with the same parameters on the next
      mount)
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      837d5b6e
    • I
      Btrfs: add skip_balance mount option · 9555c6c1
      Ilya Dryomov 提交于
      Since restriper kthread starts involuntarily on mount and can suck cpu
      and memory bandwidth add a mount option to forcefully skip it.  The
      restriper in that case hangs around in paused state and can be resumed
      from userspace when it's convenient.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      9555c6c1
    • I
      Btrfs: save balance parameters to disk · 0940ebf6
      Ilya Dryomov 提交于
      Introduce a new btree objectid for storing balance item.  The reason is
      to be able to resume restriper after a crash with the same parameters.
      Balance item has a very high objectid and goes into tree of tree roots.
      
      The key for the new item is as follows:
      
      	[ BTRFS_BALANCE_OBJECTID ; BTRFS_BALANCE_ITEM_KEY ; 0 ]
      
      Older kernels simply ignore it so it's safe to mount with an older
      kernel and then go back to the newer one.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      0940ebf6
    • I
      Btrfs: do not reduce profile in do_chunk_alloc() · 70922617
      Ilya Dryomov 提交于
      Every caller of do_chunk_alloc() feeds it the reduced allocation
      profile, so stop trying to reduce it one more time.  Instead check the
      validity of the passed profile.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      70922617
    • I
      Btrfs: add basic restriper infrastructure · c9e9f97b
      Ilya Dryomov 提交于
      Add basic restriper infrastructure: extended balancing ioctl and all
      related ioctl data structures, add data structure for tracking
      restriper's state to fs_info, etc.  The semantics of the old balancing
      ioctl are fully preserved.
      
      Explicitly disallow any volume operations when balance is in progress.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      c9e9f97b
    • I
      Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit · a46d11a8
      Ilya Dryomov 提交于
      Right now on-disk BTRFS_BLOCK_GROUP_* profile bits are used for
      avail_{data,metadata,system}_alloc_bits fields, which gather info about
      available allocation profiles in the FS.  When chunk is created or read
      from disk, its profile is OR'ed with the corresponding avail_alloc_bits
      field.  Since SINGLE is denoted by 0 in the on-disk format, currently
      there is no way to tell when such chunks become avaialble.  Restriper
      needs that information, so add a separate bit for SINGLE profile.
      
      This bit is going to be in-memory only, it should never be written out
      to disk, so it's not a disk format change.  However to avoid remappings
      in future, reserve corresponding on-disk bit.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      a46d11a8
    • I
      Btrfs: introduce masks for chunk type and profile · 52ba6929
      Ilya Dryomov 提交于
      Chunk's type and profile are encoded in u64 flags field.  Introduce
      masks to easily access them.  Also fix the type of BTRFS_BLOCK_GROUP_*
      constants, it should be ULL.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      52ba6929
    • I
      Btrfs: get rid of *_alloc_profile fields · 6fef8df1
      Ilya Dryomov 提交于
      {data,metadata,system}_alloc_profile fields have been unused for a long
      time now.  Get rid of them.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      6fef8df1
  12. 09 1月, 2012 1 次提交
  13. 22 12月, 2011 3 次提交
    • A
      Btrfs: mark delayed refs as for cow · 66d7e7f0
      Arne Jansen 提交于
      Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value
      from every call site. The for_cow parameter will later on be used to
      determine if a ref will change anything with respect to qgroups.
      
      Delayed refs coming from relocation are always counted as for_cow, as they
      don't change subvol quota.
      
      Also pass in the fs_info for later use.
      
      btrfs_find_all_roots() will use this as an optimization, as changes that are
      for_cow will not change anything with respect to which root points to a
      certain leaf. Thus, we don't need to add the current sequence number to
      those delayed refs.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      66d7e7f0
    • J
      Btrfs: added helper btrfs_next_item() · c7d22a3c
      Jan Schmidt 提交于
      btrfs_next_item() makes the btrfs path point to the next item, crossing leaf
      boundaries if needed.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      c7d22a3c
    • S
      Btrfs: integrate integrity check module into btrfs · 21adbd5c
      Stefan Behrens 提交于
      This is the last part of the patch series. It modifies the btrfs
      code to use the integrity check module if configured to do so
      with the define BTRFS_FS_CHECK_INTEGRITY. If this define is not set,
      the only effective change is that code is added that handles the
      mount option to activate the integrity check. If the mount option is
      set and the define BTRFS_FS_CHECK_INTEGRITY is not set, that code
      complains in the log and the mount fails with EINVAL.
      
      Add the mount option to activate the usage of the integrity check
      code.
      Add invocation of btrfs integrity check code init and cleanup
      function on mount and umount, respectively.
      Add hook to call btrfs integrity check code version of
      submit_bh/submit_bio.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      21adbd5c
  14. 16 12月, 2011 1 次提交
    • J
      Btrfs: deal with enospc from dirtying inodes properly · 22c44fe6
      Josef Bacik 提交于
      Now that we're properly keeping track of delayed inode space we've been getting
      a lot of warnings out of btrfs_dirty_inode() when running xfstest 83.  This is
      because a bunch of people call mark_inode_dirty, which is void so we can't
      return ENOSPC.  This needs to be fixed in a few areas
      
      1) file_update_time - this updates the mtime and such when writing to a file,
      which will call mark_inode_dirty.  So copy file_update_time into btrfs so we can
      call btrfs_dirty_inode directly and return an error if we get one appropriately.
      
      2) fix symlinks to use btrfs_setattr for ->setattr.  For some reason we weren't
      setting ->setattr for symlinks, even though we should have been.  This catches
      one of the cases where we were getting errors in mark_inode_dirty.
      
      3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly
      instead of mark_inode_dirty.  This lets us return errors properly for truncate
      and chown/anything related to setattr.
      
      4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and
      print an error if we have one.  The only remaining user we can't control for
      this is touch_atime(), but we don't really want to keep people from walking
      down the tree if we don't have space to save the atime update, so just complain
      but don't worry about it.
      
      With this patch xfstests 83 complains a handful of times instead of hundreds of
      times.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      22c44fe6