1. 27 7月, 2020 2 次提交
    • D
      btrfs: start deprecation of mount option inode_cache · b547a88e
      David Sterba 提交于
      Estimated time of removal of the functionality is 5.11, the option will
      be still parsed but will have no effect.
      
      Reasons for deprecation and removal:
      
      - very poor naming choice of the mount option, it's supposed to cache
        and reuse the inode _numbers_, but it sounds a some generic cache for
        inodes
      
      - the only known usecase where this option would make sense is on a
        32bit architecture where inode numbers in one subvolume would be
        exhausted due to 32bit inode::i_ino
      
      - the cache is stored on disk, consumes space, needs to be loaded and
        written back
      
      - new inode number allocation is slower due to lookups into the cache
        (compared to a simple increment which is the default)
      
      - uses the free-space-cache code that is going to be deprecated as well
        in the future
      
      Known problems:
      
      - since 2011, returning EEXIST when there's not enough space in a page
        to store all checksums, see commit 4b9465cb ("Btrfs: add mount -o
        inode_cache")
      
      Remaining issues:
      
      - if the option was enabled, new inodes created, the option disabled
        again, the cache is still stored on the devices and there's currently
        no way to remove it
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b547a88e
    • Q
      btrfs: introduce "rescue=" mount option · 74ef0018
      Qu Wenruo 提交于
      This patch introduces a new "rescue=" mount option group for all mount
      options for data recovery.
      
      Different rescue sub options are seperated by ':'. E.g
      "ro,rescue=nologreplay:usebackuproot".
      
      The original plan was to use ';', but ';' needs to be escaped/quoted,
      or it will be interpreted by bash, similar to '|'.
      
      And obviously, user can specify rescue options one by one like:
      "ro,rescue=nologreplay,rescue=usebackuproot".
      
      The following mount options are converted to "rescue=", old mount
      options are deprecated but still available for compatibility purpose:
      
      - usebackuproot
        Now it's "rescue=usebackuproot"
      
      - nologreplay
        Now it's "rescue=nologreplay"
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      74ef0018
  2. 02 7月, 2020 1 次提交
  3. 25 5月, 2020 4 次提交
  4. 24 3月, 2020 10 次提交
  5. 13 2月, 2020 1 次提交
  6. 03 2月, 2020 1 次提交
    • J
      btrfs: do not zero f_bavail if we have available space · d55966c4
      Josef Bacik 提交于
      There was some logic added a while ago to clear out f_bavail in statfs()
      if we did not have enough free metadata space to satisfy our global
      reserve.  This was incorrect at the time, however didn't really pose a
      problem for normal file systems because we would often allocate chunks
      if we got this low on free metadata space, and thus wouldn't really hit
      this case unless we were actually full.
      
      Fast forward to today and now we are much better about not allocating
      metadata chunks all of the time.  Couple this with d792b0f1 ("btrfs:
      always reserve our entire size for the global reserve") which now means
      we'll easily have a larger global reserve than our free space, we are
      now more likely to trip over this while still having plenty of space.
      
      Fix this by skipping this logic if the global rsv's space_info is not
      full.  space_info->full is 0 unless we've attempted to allocate a chunk
      for that space_info and that has failed.  If this happens then the space
      for the global reserve is definitely sacred and we need to report
      b_avail == 0, but before then we can just use our calculated b_avail.
      Reported-by: NMartin Steigerwald <martin@lichtvoll.de>
      Fixes: ca8a51b3 ("btrfs: statfs: report zero available if metadata are exhausted")
      CC: stable@vger.kernel.org # 4.5+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Tested-By: NMartin Steigerwald <martin@lichtvoll.de>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d55966c4
  7. 20 1月, 2020 2 次提交
    • D
      btrfs: add the beginning of async discard, discard workqueue · b0643e59
      Dennis Zhou 提交于
      When discard is enabled, everytime a pinned extent is released back to
      the block_group's free space cache, a discard is issued for the extent.
      This is an overeager approach when it comes to discarding and helping
      the SSD maintain enough free space to prevent severe garbage collection
      situations.
      
      This adds the beginning of async discard. Instead of issuing a discard
      prior to returning it to the free space, it is just marked as untrimmed.
      The block_group is then added to a LRU which then feeds into a workqueue
      to issue discards at a much slower rate. Full discarding of unused block
      groups is still done and will be addressed in a future patch of the
      series.
      
      For now, we don't persist the discard state of extents and bitmaps.
      Therefore, our failure recovery mode will be to consider extents
      untrimmed. This lets us handle failure and unmounting as one in the
      same.
      
      On a number of Facebook webservers, I collected data every minute
      accounting the time we spent in btrfs_finish_extent_commit() (col. 1)
      and in btrfs_commit_transaction() (col. 2). btrfs_finish_extent_commit()
      is where we discard extents synchronously before returning them to the
      free space cache.
      
      discard=sync:
                       p99 total per minute       p99 total per minute
            Drive   |   extent_commit() (ms)  |    commit_trans() (ms)
          ---------------------------------------------------------------
           Drive A  |           434           |          1170
           Drive B  |           880           |          2330
           Drive C  |          2943           |          3920
           Drive D  |          4763           |          5701
      
      discard=async:
                       p99 total per minute       p99 total per minute
            Drive   |   extent_commit() (ms)  |    commit_trans() (ms)
          --------------------------------------------------------------
           Drive A  |           134           |           956
           Drive B  |            64           |          1972
           Drive C  |            59           |          1032
           Drive D  |            62           |          1200
      
      While it's not great that the stats are cumulative over 1m, all of these
      servers are running the same workload and and the delta between the two
      are substantial. We are spending significantly less time in
      btrfs_finish_extent_commit() which is responsible for discarding.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDennis Zhou <dennis@kernel.org>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b0643e59
    • D
      btrfs: rename DISCARD mount option to to DISCARD_SYNC · 46b27f50
      Dennis Zhou 提交于
      This series introduces async discard which will use the flag
      DISCARD_ASYNC, so rename the original flag to DISCARD_SYNC as it is
      synchronously done in transaction commit.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDennis Zhou <dennis@kernel.org>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      46b27f50
  8. 19 11月, 2019 6 次提交
  9. 18 11月, 2019 5 次提交
  10. 23 10月, 2019 1 次提交
  11. 09 9月, 2019 3 次提交
  12. 02 7月, 2019 3 次提交
  13. 01 7月, 2019 1 次提交