1. 09 2月, 2021 3 次提交
  2. 10 12月, 2020 6 次提交
  3. 08 12月, 2020 5 次提交
  4. 07 10月, 2020 2 次提交
  5. 20 8月, 2020 1 次提交
    • F
      btrfs: fix space cache memory leak after transaction abort · bbc37d6e
      Filipe Manana 提交于
      If a transaction aborts it can cause a memory leak of the pages array of
      a block group's io_ctl structure. The following steps explain how that can
      happen:
      
      1) Transaction N is committing, currently in state TRANS_STATE_UNBLOCKED
         and it's about to start writing out dirty extent buffers;
      
      2) Transaction N + 1 already started and another task, task A, just called
         btrfs_commit_transaction() on it;
      
      3) Block group B was dirtied (extents allocated from it) by transaction
         N + 1, so when task A calls btrfs_start_dirty_block_groups(), at the
         very beginning of the transaction commit, it starts writeback for the
         block group's space cache by calling btrfs_write_out_cache(), which
         allocates the pages array for the block group's io_ctl with a call to
         io_ctl_init(). Block group A is added to the io_list of transaction
         N + 1 by btrfs_start_dirty_block_groups();
      
      4) While transaction N's commit is writing out the extent buffers, it gets
         an IO error and aborts transaction N, also setting the file system to
         RO mode;
      
      5) Task A has already returned from btrfs_start_dirty_block_groups(), is at
         btrfs_commit_transaction() and has set transaction N + 1 state to
         TRANS_STATE_COMMIT_START. Immediately after that it checks that the
         filesystem was turned to RO mode, due to transaction N's abort, and
         jumps to the "cleanup_transaction" label. After that we end up at
         btrfs_cleanup_one_transaction() which calls btrfs_cleanup_dirty_bgs().
         That helper finds block group B in the transaction's io_list but it
         never releases the pages array of the block group's io_ctl, resulting in
         a memory leak.
      
      In fact at the point when we are at btrfs_cleanup_dirty_bgs(), the pages
      array points to pages that were already released by us at
      __btrfs_write_out_cache() through the call to io_ctl_drop_pages(). We end
      up freeing the pages array only after waiting for the ordered extent to
      complete through btrfs_wait_cache_io(), which calls io_ctl_free() to do
      that. But in the transaction abort case we don't wait for the space cache's
      ordered extent to complete through a call to btrfs_wait_cache_io(), so
      that's why we end up with a memory leak - we wait for the ordered extent
      to complete indirectly by shutting down the work queues and waiting for
      any jobs in them to complete before returning from close_ctree().
      
      We can solve the leak simply by freeing the pages array right after
      releasing the pages (with the call to io_ctl_drop_pages()) at
      __btrfs_write_out_cache(), since we will never use it anymore after that
      and the pages array points to already released pages at that point, which
      is currently not a problem since no one will use it after that, but not a
      good practice anyway since it can easily lead to use-after-free issues.
      
      So fix this by freeing the pages array right after releasing the pages at
      __btrfs_write_out_cache().
      
      This issue can often be reproduced with test case generic/475 from fstests
      and kmemleak can detect it and reports it with the following trace:
      
      unreferenced object 0xffff9bbf009fa600 (size 512):
        comm "fsstress", pid 38807, jiffies 4298504428 (age 22.028s)
        hex dump (first 32 bytes):
          00 a0 7c 4d 3d ed ff ff 40 a0 7c 4d 3d ed ff ff  ..|M=...@.|M=...
          80 a0 7c 4d 3d ed ff ff c0 a0 7c 4d 3d ed ff ff  ..|M=.....|M=...
        backtrace:
          [<00000000f4b5cfe2>] __kmalloc+0x1a8/0x3e0
          [<0000000028665e7f>] io_ctl_init+0xa7/0x120 [btrfs]
          [<00000000a1f95b2d>] __btrfs_write_out_cache+0x86/0x4a0 [btrfs]
          [<00000000207ea1b0>] btrfs_write_out_cache+0x7f/0xf0 [btrfs]
          [<00000000af21f534>] btrfs_start_dirty_block_groups+0x27b/0x580 [btrfs]
          [<00000000c3c23d44>] btrfs_commit_transaction+0xa6f/0xe70 [btrfs]
          [<000000009588930c>] create_subvol+0x581/0x9a0 [btrfs]
          [<000000009ef2fd7f>] btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
          [<00000000474e5187>] __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
          [<00000000708ee349>] btrfs_ioctl_snap_create_v2+0xb0/0xf0 [btrfs]
          [<00000000ea60106f>] btrfs_ioctl+0x12c/0x3130 [btrfs]
          [<000000005c923d6d>] __x64_sys_ioctl+0x83/0xb0
          [<0000000043ace2c9>] do_syscall_64+0x33/0x80
          [<00000000904efbce>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      CC: stable@vger.kernel.org # 4.9+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      bbc37d6e
  6. 11 8月, 2020 1 次提交
    • J
      btrfs: only search for left_info if there is no right_info in try_merge_free_space · bf53d468
      Josef Bacik 提交于
      In try_to_merge_free_space we attempt to find entries to the left and
      right of the entry we are adding to see if they can be merged.  We
      search for an entry past our current info (saved into right_info), and
      then if right_info exists and it has a rb_prev() we save the rb_prev()
      into left_info.
      
      However there's a slight problem in the case that we have a right_info,
      but no entry previous to that entry.  At that point we will search for
      an entry just before the info we're attempting to insert.  This will
      simply find right_info again, and assign it to left_info, making them
      both the same pointer.
      
      Now if right_info _can_ be merged with the range we're inserting, we'll
      add it to the info and free right_info.  However further down we'll
      access left_info, which was right_info, and thus get a use-after-free.
      
      Fix this by only searching for the left entry if we don't find a right
      entry at all.
      
      The CVE referenced had a specially crafted file system that could
      trigger this use-after-free. However with the tree checker improvements
      we no longer trigger the conditions for the UAF.  But the original
      conditions still apply, hence this fix.
      
      Reference: CVE-2019-19448
      Fixes: 96303081 ("Btrfs: use hybrid extents+bitmap rb tree for free space")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      bf53d468
  7. 27 7月, 2020 3 次提交
  8. 25 5月, 2020 5 次提交
    • F
      btrfs: turn space cache writeout failure messages into debug messages · bbcd1f4d
      Filipe Manana 提交于
      Since commit 1afb648e ("btrfs: use standard debug config option to
      enable free-space-cache debug prints"), we started to log error messages
      that were never logged before since there was no DEBUG macro defined
      anywhere. This started to make test case btrfs/187 to fail very often,
      as it greps for any btrfs error messages in dmesg/syslog and fails if
      any is found:
      
      (...)
      btrfs/186 1s ...  2s
      btrfs/187       - output mismatch (see .../results//btrfs/187.out.bad)
          \--- tests/btrfs/187.out     2019-05-17 12:48:32.537340749 +0100
          \+++ /home/fdmanana/git/hub/xfstests/results//btrfs/187.out.bad ...
          \@@ -1,3 +1,8 @@
           QA output created by 187
           Create a readonly snapshot of 'SCRATCH_MNT' in 'SCRATCH_MNT/snap1'
           Create a readonly snapshot of 'SCRATCH_MNT' in 'SCRATCH_MNT/snap2'
          +[268364.139958] BTRFS error (device sdc): failed to write free space cache for block group 30408704
          +[268380.156503] BTRFS error (device sdc): failed to write free space cache for block group 30408704
          +[268380.161703] BTRFS error (device sdc): failed to write free space cache for block group 30408704
          +[268380.253180] BTRFS error (device sdc): failed to write free space cache for block group 30408704
          ...
          (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/187.out ...
      btrfs/188 4s ...  2s
      (...)
      
      The space cache write failures happen due to ENOSPC when attempting to
      update the free space cache items in the root tree. This happens because
      when starting or joining a transaction we don't know how many block
      groups we will end up changing (due to extent allocation or release) and
      therefore never reserve space for updating free space cache items.
      More often than not, the free space cache writeout succeeds since the
      metadata space info is not yet full nor very close to being full, but
      when it is, the space cache writeout fails with ENOSPC.
      
      Occasional failures to write space caches are not considered critical
      since they can be rebuilt when mounting the filesystem or the next
      attempt to write a free space cache in the next transaction commit might
      succeed, so we used to hide those error messages with a preprocessor
      check for the existence of the DEBUG macro that was never enabled
      anywhere.
      
      A few other generic test cases also trigger the error messages due to
      ENOSPC failure when writing free space caches as well, however they don't
      fail since they don't grep dmesg/syslog for any btrfs specific error
      messages.
      
      So change the messages from 'error' level to 'debug' level, as it doesn't
      make much sense to have error messages triggered only if the debug macro
      is enabled plus, more importantly, the error is not serious nor highly
      unexpected.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      bbcd1f4d
    • F
      btrfs: include error on messages about failure to write space/inode caches · 2e69a7a6
      Filipe Manana 提交于
      Currently the error messages logged when we fail to write a free space
      cache or an inode cache are not very useful as they don't mention what
      was the error. So include the error number in the messages.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2e69a7a6
    • D
      btrfs: simplify iget helpers · 0202e83f
      David Sterba 提交于
      The inode lookup starting at btrfs_iget takes the full location key,
      while only the objectid is used to match the inode, because the lookup
      happens inside the given root thus the inode number is unique.
      The entire location key is properly set up in btrfs_init_locked_inode.
      
      Simplify the helpers and pass only inode number, renaming it to 'ino'
      instead of 'objectid'. This allows to remove temporary variables key,
      saving some stack space.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0202e83f
    • F
      btrfs: move the block group freeze/unfreeze helpers into block-group.c · 684b752b
      Filipe Manana 提交于
      The helpers btrfs_freeze_block_group() and btrfs_unfreeze_block_group()
      used to be named btrfs_get_block_group_trimming() and
      btrfs_put_block_group_trimming() respectively.
      
      At the time they were added to free-space-cache.c, by commit e33e17ee
      ("btrfs: add missing discards when unpinning extents with -o discard")
      because all the trimming related functions were in free-space-cache.c.
      
      Now that the helpers were renamed and are used in scrub context as well,
      move them to block-group.c, a much more logical location for them.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      684b752b
    • F
      btrfs: rename member 'trimming' of block group to a more generic name · 6b7304af
      Filipe Manana 提交于
      Back in 2014, commit 04216820 ("Btrfs: fix race between fs trimming
      and block group remove/allocation"), I added the 'trimming' member to the
      block group structure. Its purpose was to prevent races between trimming
      and block group deletion/allocation by pinning the block group in a way
      that prevents its logical address and device extents from being reused
      while trimming is in progress for a block group, so that if another task
      deletes the block group and then another task allocates a new block group
      that gets the same logical address and device extents while the trimming
      task is still in progress.
      
      After the previous fix for scrub (patch "btrfs: fix a race between scrub
      and block group removal/allocation"), scrub now also has the same needs that
      trimming has, so the member name 'trimming' no longer makes sense.
      Since there is already a 'pinned' member in the block group that refers
      to space reservations (pinned bytes), rename the member to 'frozen',
      add a comment on top of it to describe its general purpose and rename
      the helpers to increment and decrement the counter as well, to match
      the new member name.
      
      The next patch in the series will move the helpers into a more suitable
      file (from free-space-cache.c to block-group.c).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6b7304af
  9. 24 3月, 2020 6 次提交
  10. 20 1月, 2020 8 次提交