1. 07 6月, 2019 1 次提交
    • N
      btrfs: Always trim all unallocated space in btrfs_trim_free_extents · 8103d10b
      Nikolay Borisov 提交于
      This patch removes support for range parameters of FITRIM ioctl when
      trimming unallocated space on devices. This is necessary since ranges
      passed from user space are generally interpreted as logical addresses,
      whereas btrfs_trim_free_extents used to interpret them as device
      physical extents. This could result in counter-intuitive behavior for
      users so it's best to remove that support altogether.
      
      Additionally, the existing range support had a bug where if an offset
      was passed to FITRIM which overflows u64 e.g. -1 (parsed as u64
      18446744073709551615) then wrong data was fed into btrfs_issue_discard,
      which in turn leads to wrap-around when aligning the passed range and
      results in wrong regions being discarded which leads to data corruption.
      
      Fixes: c2d1b3aa ("btrfs: Honour FITRIM range constraints during free space trim")
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8103d10b
  2. 16 5月, 2019 2 次提交
  3. 02 5月, 2019 1 次提交
    • J
      btrfs: reserve delalloc metadata differently · c8eaeac7
      Josef Bacik 提交于
      With the per-inode block reserves we started refilling the reserve based
      on the calculated size of the outstanding csum bytes and extents for the
      inode, including the amount we were adding with the new operation.
      
      However, generic/224 exposed a problem with this approach.  With 1000
      files all writing at the same time we ended up with a bunch of bytes
      being reserved but unusable.
      
      When you write to a file we reserve space for the csum leaves for those
      bytes, the number of extent items required to cover those bytes, and a
      single transaction item for updating the inode at ordered extent finish
      for that range of bytes.  This is held until the ordered extent finishes
      and we release all of the reserved space.
      
      If a second write comes in at this point we would add a single
      reservation for the new outstanding extent and however many reservations
      for the csum leaves.  At this point we find the delta of how much we
      have reserved and how much outstanding size this is and attempt to
      reserve this delta.  If the first write finishes it will not release any
      space, because the space it had reserved for the initial write is still
      needed for the second write.  However some space would have been used,
      as we have added csums, extent items, and dirtied the inode.  Our
      reserved space would be > 0 but less than the total needed reserved
      space.
      
      This is just for a single inode, now consider generic/224.  This has
      1000 inodes writing in parallel to a very small file system, 1GiB.  In
      my testing this usually means we get about a 120MiB metadata area to
      work with, more than enough to allow the writes to continue, but not
      enough if all of the inodes are stuck trying to reserve the slack space
      while continuing to hold their leftovers from their initial writes.
      
      Fix this by pre-reserved _only_ for the space we are currently trying to
      add.  Then once that is successful modify our inodes csum count and
      outstanding extents, and then add the newly reserved space to the inodes
      block_rsv.  This allows us to actually pass generic/224 without running
      out of metadata space.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c8eaeac7
  4. 30 4月, 2019 36 次提交