1. 16 8月, 2017 7 次提交
  2. 24 7月, 2017 2 次提交
    • O
      Btrfs: fix early ENOSPC due to delalloc · 17024ad0
      Omar Sandoval 提交于
      If a lot of metadata is reserved for outstanding delayed allocations, we
      rely on shrink_delalloc() to reclaim metadata space in order to fulfill
      reservation tickets. However, shrink_delalloc() has a shortcut where if
      it determines that space can be overcommitted, it will stop early. This
      made sense before the ticketed enospc system, but now it means that
      shrink_delalloc() will often not reclaim enough space to fulfill any
      tickets, leading to an early ENOSPC. (Reservation tickets don't care
      about being able to overcommit, they need every byte accounted for.)
      
      Fix it by getting rid of the shortcut so that shrink_delalloc() reclaims
      all of the metadata it is supposed to. This fixes early ENOSPCs we were
      seeing when doing a btrfs receive to populate a new filesystem, as well
      as early ENOSPCs Christoph saw when doing a big cp -r onto Btrfs.
      
      Fixes: 957780eb ("Btrfs: introduce ticketed enospc infrastructure")
      Tested-by: NChristoph Anton Mitterer <mail@christoph.anton.mitterer.name>
      Cc: stable@vger.kernel.org
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      17024ad0
    • J
      btrfs: fix lockup in find_free_extent with read-only block groups · 14443937
      Jeff Mahoney 提交于
      If we have a block group that is all of the following:
      1) uncached in memory
      2) is read-only
      3) has a disk cache state that indicates we need to recreate the cache
      
      AND the file system has enough free space fragmentation such that the
      request for an extent of a given size can't be honored;
      
      AND have a single CPU core;
      
      AND it's the block group with the highest starting offset such that
      there are no opportunities (like reading from disk) for the loop to
      yield the CPU;
      
      We can end up with a lockup.
      
      The root cause is simple.  Once we're in the position that we've read in
      all of the other block groups directly and none of those block groups
      can honor the request, there are no more opportunities to sleep.  We end
      up trying to start a caching thread which never gets run if we only have
      one core.  This *should* present as a hung task waiting on the caching
      thread to make some progress, but it doesn't.  Instead, it degrades into
      a busy loop because of the placement of the read-only check.
      
      During the first pass through the loop, block_group->cached will be set
      to BTRFS_CACHE_STARTED and have_caching_bg will be set.  Then we hit the
      read-only check and short circuit the loop.  We're not yet in
      LOOP_CACHING_WAIT, so we skip that loop back before going through the
      loop again for other raid groups.
      
      Then we move to LOOP_CACHING_WAIT state.
      
      During the this pass through the loop, ->cached will still be
      BTRFS_CACHE_STARTED, which means it's not cached, so we'll enter
      cache_block_group, do a lot of nothing, and return, and also set
      have_caching_bg again.  Then we hit the read-only check and short circuit
      the loop.  The same thing happens as before except now we DO trigger
      the LOOP_CACHING_WAIT && have_caching_bg check and loop back up to the
      top.  We do this forever.
      
      There are two fixes in this patch since they address the same underlying
      bug.
      
      The first is to add a cond_resched to the end of the loop to ensure
      that the caching thread always has an opportunity to run.  This will
      fix the soft lockup issue, but find_free_extent will still loop doing
      nothing until the thread has completed.
      
      The second is to move the read-only check to the top of the loop.  We're
      never going to return an allocation within a read-only block group so
      we may as well skip it early.  The check for ->cached == BTRFS_CACHE_ERROR
      would cause the same problem except that BTRFS_CACHE_ERROR is considered
      a "done" state and we won't re-set have_caching_bg again.
      
      Many thanks to Stephan Kulow <coolo@suse.de> for his excellent help in
      the testing process.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      14443937
  3. 30 6月, 2017 10 次提交
  4. 20 6月, 2017 7 次提交
  5. 01 6月, 2017 2 次提交
    • J
      btrfs: fix race with relocation recovery and fs_root setup · a9b3311e
      Jeff Mahoney 提交于
      If we have to recover relocation during mount, we'll ultimately have to
      evict the orphan inode.  That goes through the reservation dance, where
      priority_reclaim_metadata_space and flush_space expect fs_info->fs_root
      to be valid.  That's the next thing to be set up during mount, so we
      crash, almost always in flush_space trying to join the transaction
      but priority_reclaim_metadata_space is possible as well.  This call
      path has been problematic in the past WRT whether ->fs_root is valid
      yet.  Commit 957780eb (Btrfs: introduce ticketed enospc
      infrastructure) added new users that are called in the direct path
      instead of the async path that had already been worked around.
      
      The thing is that we don't actually need the fs_root, specifically, for
      anything.  We either use it to determine whether the root is the
      chunk_root for use in choosing an allocation profile or as a root to pass
      btrfs_join_transaction before immediately committing it.  Anything that
      isn't the chunk root works in the former case and any root works in
      the latter.
      
      A simple fix is to use a root we know will always be there: the
      extent_root.
      
      Cc: <stable@vger.kernel.org> # v4.8+
      Fixes: 957780eb (Btrfs: introduce ticketed enospc infrastructure)
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a9b3311e
    • J
      btrfs: fix memory leak in update_space_info failure path · 896533a7
      Jeff Mahoney 提交于
      If we fail to add the space_info kobject, we'll leak the memory
      for the percpu counter.
      
      Fixes: 6ab0a202 (btrfs: publish allocation data in sysfs)
      Cc: <stable@vger.kernel.org> # v3.14+
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      896533a7
  6. 18 4月, 2017 6 次提交
  7. 02 3月, 2017 1 次提交
  8. 28 2月, 2017 5 次提交