1. 19 3月, 2010 2 次提交
    • C
      Btrfs: return keys for large items to the search ioctl · 90fdde14
      Chris Mason 提交于
      The search ioctl was skipping large items entirely (ones that are too
      big for the results buffer).  This changes things to at least copy
      the item header so that we can send information about the item back to
      userland.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      90fdde14
    • C
      Btrfs: fix key checks and advance in the search ioctl · abc6e134
      Chris Mason 提交于
      The search ioctl was working well for finding tree roots, but using it for
      generic searches requires a few changes to how the keys are advanced.
      This treats the search control min fields for objectid, type and offset
      more like a key, where we drop the offset to zero once we bump the type,
      etc.
      
      The downside of this is that we are changing the min_type and min_offset
      fields during the search, and so the ioctl caller needs extra checks to make sure
      the keys in the result are the ones it wanted.
      
      This also changes key_in_sk to use btrfs_comp_cpu_keys, just to make
      things more readable.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      abc6e134
  2. 17 3月, 2010 2 次提交
  3. 15 3月, 2010 8 次提交
    • A
      btrfs: use memparse · 91748467
      Akinobu Mita 提交于
      Use memparse() instead of its own private implementation.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: linux-btrfs@vger.kernel.org
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      91748467
    • J
      Btrfs: add a "df" ioctl for btrfs · 1406e432
      Josef Bacik 提交于
      df is a very loaded question in btrfs.  This gives us a way to get the per-space
      usage information so we can tell exactly what is in use where.  This will help
      us figure out ENOSPC problems, and help users better understand where their disk
      space is going.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      1406e432
    • J
      Btrfs: cache the extent state everywhere we possibly can V2 · 2ac55d41
      Josef Bacik 提交于
      This patch just goes through and fixes everybody that does
      
      lock_extent()
      blah
      unlock_extent()
      
      to use
      
      lock_extent_bits()
      blah
      unlock_extent_cached()
      
      and pass around a extent_state so we only have to do the searches once per
      function.  This gives me about a 3 mb/s boots on my random write test.  I have
      not converted some things, like the relocation and ioctl's, since they aren't
      heavily used and the relocation stuff is in the middle of being re-written.  I
      also changed the clear_extent_bit() to only unset the cached state if we are
      clearing EXTENT_LOCKED and related stuff, so we can do things like this
      
      lock_extent_bits()
      clear delalloc bits
      unlock_extent_cached()
      
      without losing our cached state.  I tested this thoroughly and turned on
      LEAK_DEBUG to make sure we weren't leaking extent states, everything worked out
      fine.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      2ac55d41
    • C
      Btrfs: add new defrag-range ioctl. · 1e701a32
      Chris Mason 提交于
      The btrfs defrag ioctl was limited to doing the entire file.  This
      commit adds a new interface that can defrag a specific range inside
      the file.
      
      It can also force compression on the file, allowing you to selectively
      compress individual files after they were created, even when mount -o
      compress isn't turned on.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      1e701a32
    • C
      Btrfs: be more selective in the defrag ioctl · 940100a4
      Chris Mason 提交于
      The btrfs defrag ioctl had some bugs around delalloc accounting, and it
      wasn't properly skipping pages that were not in the mapping.
      
      It wasn't properly clearing the page checked flag, which could make the
      writeback code ignore the page forever while pinning it as dirty.
      
      This commit fixes those problems and makes defrag a little smarter.  It
      skips holes and it doesn't waste time defragging large extents.  If a
      tiny extent comes before a very large extent, it will defrag both of
      them to make sure the tiny extent ends up next to something big.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      940100a4
    • J
      Btrfs: add ioctl and incompat flag to set the default mount subvol · 6ef5ed0d
      Josef Bacik 提交于
      This patch needs to go along with my previous patch.  This lets us set the
      default dir item's location to whatever root we want to use as our default
      mounting subvol.  With this we don't have to use mount -o subvol=<tree id>
      anymore to mount a different subvol, we can just set the new one and it will
      just magically work.  I've done some moderate testing with this, mostly just
      switching the default mount around, mounting subvols and the default mount at
      the same time and such, everything seems to work.  Thanks,
      
      Older kernels would generally be able to still mount the filesystem with the
      default subvolume set, but it would result in a different volume being mounted,
      which could be an even more unpleasant suprise for users.  So if you set your
      default subvolume, you can't go back to older kernels.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6ef5ed0d
    • C
      Btrfs: add search and inode lookup ioctls · ac8e9819
      Chris Mason 提交于
      The search ioctl is a generic tool for doing btree searches from
      userland applications.  The first user of the search ioctl is a
      subvolume listing feature, but we'll also use it to find new
      files in a subvolume.
      
      The search ioctl allows you to specify min and max keys to search for,
      along with min and max transid.  It returns the items along with a
      header that includes the item key.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ac8e9819
    • T
      Btrfs: add a function to lookup a directory path by following backrefs · 98d377a0
      TARUISI Hiroaki 提交于
      This will be used by the inode lookup ioctl.
      Signed-off-by: NTARUISI Hiroaki <taruishi.hiroak@jp.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      98d377a0
  4. 18 12月, 2009 2 次提交
  5. 16 12月, 2009 1 次提交
  6. 09 10月, 2009 2 次提交
  7. 30 9月, 2009 1 次提交
  8. 29 9月, 2009 1 次提交
    • J
      Btrfs: proper -ENOSPC handling · 9ed74f2d
      Josef Bacik 提交于
      At the start of a transaction we do a btrfs_reserve_metadata_space() and
      specify how many items we plan on modifying.  Then once we've done our
      modifications and such, just call btrfs_unreserve_metadata_space() for
      the same number of items we reserved.
      
      For keeping track of metadata needed for data I've had to add an extent_io op
      for when we merge extents.  This lets us track space properly when we are doing
      sequential writes, so we don't end up reserving way more metadata space than
      what we need.
      
      The only place where the metadata space accounting is not done is in the
      relocation code.  This is because Yan is going to be reworking that code in the
      near future, so running btrfs-vol -b could still possibly result in a ENOSPC
      related panic.  This patch also turns off the metadata_ratio stuff in order to
      allow users to more efficiently use their disk space.
      
      This patch makes it so we track how much metadata we need for an inode's
      delayed allocation extents by tracking how many extents are currently
      waiting for allocation.  It introduces two new callbacks for the
      extent_io tree's, merge_extent_hook and split_extent_hook.  These help
      us keep track of when we merge delalloc extents together and split them
      up.  Reservations are handled prior to any actually dirty'ing occurs,
      and then we unreserve after we dirty.
      
      btrfs_unreserve_metadata_for_delalloc() will make the appropriate
      unreservations as needed based on the number of reservations we
      currently have and the number of extents we currently have.  Doing the
      reservation outside of doing any of the actual dirty'ing lets us do
      things like filemap_flush() the inode to try and force delalloc to
      happen, or as a last resort actually start allocation on all delalloc
      inodes in the fs.  This has survived dbench, fs_mark and an fsx torture
      test.
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9ed74f2d
  9. 22 9月, 2009 3 次提交
  10. 12 9月, 2009 1 次提交
    • C
      Btrfs: Fix extent replacment race · a1ed835e
      Chris Mason 提交于
      Data COW means that whenever we write to a file, we replace any old
      extent pointers with new ones.  There was a window where a readpage
      might find the old extent pointers on disk and cache them in the
      extent_map tree in ram in the middle of a given write replacing them.
      
      Even though both the readpage and the write had their respective bytes
      in the file locked, the extent readpage inserts may cover more bytes than
      it had locked down.
      
      This commit closes the race by keeping the new extent pinned in the extent
      map tree until after the on-disk btree is properly setup with the new
      extent pointers.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a1ed835e
  11. 13 7月, 2009 1 次提交
  12. 03 7月, 2009 1 次提交
  13. 11 6月, 2009 1 次提交
  14. 10 6月, 2009 2 次提交
    • C
      Btrfs: implement FS_IOC_GETFLAGS/SETFLAGS/GETVERSION · 6cbff00f
      Christoph Hellwig 提交于
      Add support for the standard attributes set via chattr and read via
      lsattr.  Currently we store the attributes in the flags value in
      the btrfs inode, but I wonder whether we should split it into two so
      that we don't have to keep converting between the two formats.
      
      Remove the btrfs_clear_flag/btrfs_set_flag/btrfs_test_flag macros
      as they were confusing the existing code and got in the way of the
      new additions.
      
      Also add the FS_IOC_GETVERSION ioctl for getting i_generation as it's
      trivial.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6cbff00f
    • Y
      Btrfs: Mixed back reference (FORWARD ROLLING FORMAT CHANGE) · 5d4f98a2
      Yan Zheng 提交于
      This commit introduces a new kind of back reference for btrfs metadata.
      Once a filesystem has been mounted with this commit, IT WILL NO LONGER
      BE MOUNTABLE BY OLDER KERNELS.
      
      When a tree block in subvolume tree is cow'd, the reference counts of all
      extents it points to are increased by one.  At transaction commit time,
      the old root of the subvolume is recorded in a "dead root" data structure,
      and the btree it points to is later walked, dropping reference counts
      and freeing any blocks where the reference count goes to 0.
      
      The increments done during cow and decrements done after commit cancel out,
      and the walk is a very expensive way to go about freeing the blocks that
      are no longer referenced by the new btree root.  This commit reduces the
      transaction overhead by avoiding the need for dead root records.
      
      When a non-shared tree block is cow'd, we free the old block at once, and the
      new block inherits old block's references. When a tree block with reference
      count > 1 is cow'd, we increase the reference counts of all extents
      the new block points to by one, and decrease the old block's reference count by
      one.
      
      This dead tree avoidance code removes the need to modify the reference
      counts of lower level extents when a non-shared tree block is cow'd.
      But we still need to update back ref for all pointers in the block.
      This is because the location of the block is recorded in the back ref
      item.
      
      We can solve this by introducing a new type of back ref. The new
      back ref provides information about pointer's key, level and in which
      tree the pointer lives. This information allow us to find the pointer
      by searching the tree. The shortcoming of the new back ref is that it
      only works for pointers in tree blocks referenced by their owner trees.
      
      This is mostly a problem for snapshots, where resolving one of these
      fuzzy back references would be O(number_of_snapshots) and quite slow.
      The solution used here is to use the fuzzy back references in the common
      case where a given tree block is only referenced by one root,
      and use the full back references when multiple roots have a reference
      on a given block.
      
      This commit adds per subvolume red-black tree to keep trace of cached
      inodes. The red-black tree helps the balancing code to find cached
      inodes whose inode numbers within a given range.
      
      This commit improves the balancing code by introducing several data
      structures to keep the state of balancing. The most important one
      is the back ref cache. It caches how the upper level tree blocks are
      referenced. This greatly reduce the overhead of checking back ref.
      
      The improved balancing code scales significantly better with a large
      number of snapshots.
      
      This is a very large commit and was written in a number of
      pieces.  But, they depend heavily on the disk format change and were
      squashed together to make sure git bisect didn't end up in a
      bad state wrt space balancing or the format change.
      Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      5d4f98a2
  15. 15 5月, 2009 1 次提交
  16. 27 4月, 2009 1 次提交
  17. 25 4月, 2009 1 次提交
    • C
      Btrfs: fix fallocate deadlock on inode extent lock · e980b50c
      Chris Mason 提交于
      The btrfs fallocate call takes an extent lock on the entire range
      being fallocated, and then runs through insert_reserved_extent on each
      extent as they are allocated.
      
      The problem with this is that btrfs_drop_extents may decide to try
      and take the same extent lock fallocate was already holding.  The solution
      used here is to push down knowledge of the range that is already locked
      going into btrfs_drop_extents.
      
      It turns out that at least one other caller had the same bug.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      e980b50c
  18. 21 4月, 2009 1 次提交
  19. 01 4月, 2009 1 次提交
  20. 21 2月, 2009 1 次提交
    • J
      Btrfs: add better -ENOSPC handling · 6a63209f
      Josef Bacik 提交于
      This is a step in the direction of better -ENOSPC handling.  Instead of
      checking the global bytes counter we check the space_info bytes counters to
      make sure we have enough space.
      
      If we don't we go ahead and try to allocate a new chunk, and then if that fails
      we return -ENOSPC.  This patch adds two counters to btrfs_space_info,
      bytes_delalloc and bytes_may_use.
      
      bytes_delalloc account for extents we've actually setup for delalloc and will
      be allocated at some point down the line. 
      
      bytes_may_use is to keep track of how many bytes we may use for delalloc at
      some point.  When we actually set the extent_bit for the delalloc bytes we
      subtract the reserved bytes from the bytes_may_use counter.  This keeps us from
      not actually being able to allocate space for any delalloc bytes.
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      
      
      
      6a63209f
  21. 21 1月, 2009 1 次提交
  22. 06 1月, 2009 3 次提交
  23. 19 12月, 2008 1 次提交
  24. 12 12月, 2008 1 次提交
    • Y
      Btrfs: fix leaking block group on balance · d2fb3437
      Yan Zheng 提交于
      The block group structs are referenced in many different
      places, and it's not safe to free while balancing.  So, those block
      group structs were simply leaked instead.
      
      This patch replaces the block group pointer in the inode with the starting byte
      offset of the block group and adds reference counting to the block group
      struct.
      Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
      d2fb3437