1. 11 4月, 2015 2 次提交
    • C
      Btrfs: fix use after free when close_ctree frees the orphan_rsv · cdfb080e
      Chris Mason 提交于
      Near the end of close_ctree, we're calling btrfs_free_block_rsv
      to free up the orphan rsv.  The problem is this call updates the
      space_info, which has already been freed.
      
      This adds a new __ function that directly calls kfree instead of trying
      to update the space infos.
      Signed-off-by: NChris Mason <clm@fb.com>
      cdfb080e
    • C
      Btrfs: allow block group cache writeout outside critical section in commit · 1bbc621e
      Chris Mason 提交于
      We loop through all of the dirty block groups during commit and write
      the free space cache.  In order to make sure the cache is currect, we do
      this while no other writers are allowed in the commit.
      
      If a large number of block groups are dirty, this can introduce long
      stalls during the final stages of the commit, which can block new procs
      trying to change the filesystem.
      
      This commit changes the block group cache writeout to take appropriate
      locks and allow it to run earlier in the commit.  We'll still have to
      redo some of the block groups, but it means we can get most of the work
      out of the way without blocking the entire FS.
      Signed-off-by: NChris Mason <clm@fb.com>
      1bbc621e
  2. 27 3月, 2015 1 次提交
  3. 14 3月, 2015 1 次提交
  4. 04 3月, 2015 2 次提交
  5. 21 2月, 2015 1 次提交
  6. 17 2月, 2015 15 次提交
  7. 03 2月, 2015 3 次提交
    • F
      Btrfs: fix race between transaction commit and empty block group removal · d4b450cd
      Filipe Manana 提交于
      Committing a transaction can race with automatic removal of empty block
      groups (cleaner kthread), leading to a BUG_ON() in the transaction
      commit code while running btrfs_finish_extent_commit(). The following
      sequence diagram shows how it can happen:
      
                 CPU 1                                       CPU 2
      
      btrfs_commit_transaction()
        fs_info->running_transaction = NULL
        btrfs_finish_extent_commit()
          find_first_extent_bit()
            -> found range for block group X
               in fs_info->freed_extents[]
      
                                                     btrfs_delete_unused_bgs()
                                                       -> found block group X
      
                                                       Removed block group X's range
                                                       from fs_info->freed_extents[]
      
                                                       btrfs_remove_chunk()
                                                          btrfs_remove_block_group(bg X)
      
          unpin_extent_range(bg X range)
             btrfs_lookup_block_group(bg X)
                -> returns NULL
                  -> BUG_ON()
      
      The trace that results from the BUG_ON() is:
      
      [48665.187808] ------------[ cut here ]------------
      [48665.188032] kernel BUG at fs/btrfs/extent-tree.c:5675!
      [48665.188032] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
      [48665.188032] Modules linked in: dm_flakey dm_mod crc32c_generic btrfs xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop parport_pc evdev microcode
      [48665.197388] CPU: 2 PID: 31211 Comm: kworker/u32:16 Tainted: G        W      3.19.0-rc5-btrfs-next-4+ #1
      [48665.197388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [48665.197388] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
      [48665.197388] task: ffff880222011810 ti: ffff8801b56a4000 task.ti: ffff8801b56a4000
      [48665.197388] RIP: 0010:[<ffffffffa0350d05>]  [<ffffffffa0350d05>] unpin_extent_range+0x6a/0x1ba [btrfs]
      [48665.197388] RSP: 0018:ffff8801b56a7b88  EFLAGS: 00010246
      [48665.197388] RAX: 0000000000000000 RBX: ffff8802143a6000 RCX: ffff8802220120c8
      [48665.197388] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8800a3c140b0
      [48665.197388] RBP: ffff8801b56a7bd8 R08: 0000000000000003 R09: 0000000000000000
      [48665.197388] R10: 0000000000000000 R11: 000000000000bbac R12: 0000000012e8e000
      [48665.197388] R13: ffff8800a3c14000 R14: 0000000000000000 R15: 0000000000000000
      [48665.197388] FS:  0000000000000000(0000) GS:ffff88023ec40000(0000) knlGS:0000000000000000
      [48665.197388] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [48665.197388] CR2: 00007f065e42f270 CR3: 0000000206f70000 CR4: 00000000000006e0
      [48665.197388] Stack:
      [48665.197388]  ffff8801b56a7bd8 0000000012ea0000 01ff8800a3c14138 0000000012e9ffff
      [48665.197388]  ffff880141df3dd8 ffff8802143a6000 ffff8800a3c14138 ffff880141df3df0
      [48665.197388]  ffff880141df3dd8 0000000000000000 ffff8801b56a7c08 ffffffffa0354227
      [48665.197388] Call Trace:
      [48665.197388]  [<ffffffffa0354227>] btrfs_finish_extent_commit+0xb0/0xd9 [btrfs]
      [48665.197388]  [<ffffffffa0366b4b>] btrfs_commit_transaction+0x791/0x92c [btrfs]
      [48665.197388]  [<ffffffffa0352432>] flush_space+0x43d/0x452 [btrfs]
      [48665.197388]  [<ffffffff814295c3>] ? _raw_spin_unlock+0x28/0x33
      [48665.197388]  [<ffffffffa035255f>] btrfs_async_reclaim_metadata_space+0x118/0x164 [btrfs]
      [48665.197388]  [<ffffffff81059917>] ? process_one_work+0x14b/0x3ab
      [48665.197388]  [<ffffffff810599ac>] process_one_work+0x1e0/0x3ab
      [48665.197388]  [<ffffffff81079fa9>] ? trace_hardirqs_off+0xd/0xf
      [48665.197388]  [<ffffffff8105a55b>] worker_thread+0x210/0x2d0
      [48665.197388]  [<ffffffff8105a34b>] ? rescuer_thread+0x2c3/0x2c3
      [48665.197388]  [<ffffffff8105e5c0>] kthread+0xef/0xf7
      [48665.197388]  [<ffffffff81429682>] ? _raw_spin_unlock_irq+0x2d/0x39
      [48665.197388]  [<ffffffff8105e4d1>] ? __kthread_parkme+0xad/0xad
      [48665.197388]  [<ffffffff81429dec>] ret_from_fork+0x7c/0xb0
      [48665.197388]  [<ffffffff8105e4d1>] ? __kthread_parkme+0xad/0xad
      [48665.197388] Code: 85 f6 74 14 49 8b 06 49 03 46 09 49 39 c4 72 1d 4c 89 f7 e8 83 ec ff ff 4c 89 e6 4c 89 ef e8 1e f1 ff ff 48 85 c0 49 89 c6 75 02 <0f> 0b 49 8b 1e 49 03 5e 09 48 8b
      [48665.197388] RIP  [<ffffffffa0350d05>] unpin_extent_range+0x6a/0x1ba [btrfs]
      [48665.197388]  RSP <ffff8801b56a7b88>
      [48665.272246] ---[ end trace b9c6ab9957521376 ]---
      
      Fix this by ensuring that unpining the block group's range in
      btrfs_finish_extent_commit() is done in a synchronized fashion
      with removing the block group's range from freed_extents[]
      in btrfs_delete_unused_bgs()
      
      This race got introduced with the change:
      
          Btrfs: remove empty block groups automatically
          commit 47ab2a6cSigned-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d4b450cd
    • D
      btrfs: add checks for sys_chunk_array sizes · ce7fca5f
      David Sterba 提交于
      Verify that possible minimum and maximum size is set, validity of
      contents is checked in btrfs_read_sys_array.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      ce7fca5f
    • D
      btrfs: more superblock checks, lower bounds on devices and sectorsize/nodesize · 75d6ad38
      David Sterba 提交于
      I received a few crafted images from Jiri, all got through the recently
      added superblock checks. The lower bounds checks for num_devices and
      sector/node -sizes were missing and caused a crash during mount.
      
      Tools for symbolic code execution were used to prepare the images
      contents.
      Reported-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      75d6ad38
  8. 22 1月, 2015 5 次提交
  9. 21 1月, 2015 3 次提交
  10. 15 1月, 2015 2 次提交
  11. 13 12月, 2014 4 次提交
  12. 11 12月, 2014 1 次提交
    • F
      Btrfs: fix fs corruption on transaction abort if device supports discard · 678886bd
      Filipe Manana 提交于
      When we abort a transaction we iterate over all the ranges marked as dirty
      in fs_info->freed_extents[0] and fs_info->freed_extents[1], clear them
      from those trees, add them back (unpin) to the free space caches and, if
      the fs was mounted with "-o discard", perform a discard on those regions.
      Also, after adding the regions to the free space caches, a fitrim ioctl call
      can see those ranges in a block group's free space cache and perform a discard
      on the ranges, so the same issue can happen without "-o discard" as well.
      
      This causes corruption, affecting one or multiple btree nodes (in the worst
      case leaving the fs unmountable) because some of those ranges (the ones in
      the fs_info->pinned_extents tree) correspond to btree nodes/leafs that are
      referred by the last committed super block - breaking the rule that anything
      that was committed by a transaction is untouched until the next transaction
      commits successfully.
      
      I ran into this while running in a loop (for several hours) the fstest that
      I recently submitted:
      
        [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim
      
      The corruption always happened when a transaction aborted and then fsck complained
      like this:
      
         _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
         *** fsck.btrfs output ***
         Check tree block failed, want=94945280, have=0
         Check tree block failed, want=94945280, have=0
         Check tree block failed, want=94945280, have=0
         Check tree block failed, want=94945280, have=0
         Check tree block failed, want=94945280, have=0
         read block failed check_tree_block
         Couldn't open file system
      
      In this case 94945280 corresponded to the root of a tree.
      Using frace what I observed was the following sequence of steps happened:
      
         1) transaction N started, fs_info->pinned_extents pointed to
            fs_info->freed_extents[0];
      
         2) node/eb 94945280 is created;
      
         3) eb is persisted to disk;
      
         4) transaction N commit starts, fs_info->pinned_extents now points to
            fs_info->freed_extents[1], and transaction N completes;
      
         5) transaction N + 1 starts;
      
         6) eb is COWed, and btrfs_free_tree_block() called for this eb;
      
         7) eb range (94945280 to 94945280 + 16Kb) is added to
            fs_info->pinned_extents (fs_info->freed_extents[1]);
      
         8) Something goes wrong in transaction N + 1, like hitting ENOSPC
            for example, and the transaction is aborted, turning the fs into
            readonly mode. The stack trace I got for example:
      
            [112065.253935]  [<ffffffff8140c7b6>] dump_stack+0x4d/0x66
            [112065.254271]  [<ffffffff81042984>] warn_slowpath_common+0x7f/0x98
            [112065.254567]  [<ffffffffa0325990>] ? __btrfs_abort_transaction+0x50/0x10b [btrfs]
            [112065.261674]  [<ffffffff810429e5>] warn_slowpath_fmt+0x48/0x50
            [112065.261922]  [<ffffffffa032949e>] ? btrfs_free_path+0x26/0x29 [btrfs]
            [112065.262211]  [<ffffffffa0325990>] __btrfs_abort_transaction+0x50/0x10b [btrfs]
            [112065.262545]  [<ffffffffa036b1d6>] btrfs_remove_chunk+0x537/0x58b [btrfs]
            [112065.262771]  [<ffffffffa033840f>] btrfs_delete_unused_bgs+0x1de/0x21b [btrfs]
            [112065.263105]  [<ffffffffa0343106>] cleaner_kthread+0x100/0x12f [btrfs]
            (...)
            [112065.264493] ---[ end trace dd7903a975a31a08 ]---
            [112065.264673] BTRFS: error (device sdc) in btrfs_remove_chunk:2625: errno=-28 No space left
            [112065.264997] BTRFS info (device sdc): forced readonly
      
         9) The clear kthread sees that the BTRFS_FS_STATE_ERROR bit is set in
            fs_info->fs_state and calls btrfs_cleanup_transaction(), which in
            turn calls btrfs_destroy_pinned_extent();
      
         10) Then btrfs_destroy_pinned_extent() iterates over all the ranges
             marked as dirty in fs_info->freed_extents[], and for each one
             it calls discard, if the fs was mounted with "-o discard", and
             adds the range to the free space cache of the respective block
             group;
      
         11) btrfs_trim_block_group(), invoked from the fitrim ioctl code path,
             sees the free space entries and performs a discard;
      
         12) After an umount and mount (or fsck), our eb's location on disk was full
             of zeroes, and it should have been untouched, because it was marked as
             dirty in the fs_info->pinned_extents tree, and therefore used by the
             trees that the last committed superblock points to.
      
      Fix this by not performing a discard and not adding the ranges to the free space
      caches - it's useless from this point since the fs is now in readonly mode and
      we won't write free space caches to disk anymore (otherwise we would leak space)
      nor any new superblock. By not adding the ranges to the free space caches, it
      prevents other code paths from allocating that space and write to it as well,
      therefore being safer and simpler.
      
      This isn't a new problem, as it's been present since 2011 (git commit
      acce952b).
      
      Cc: stable@vger.kernel.org  # any kernel released after 2011-01-06
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      678886bd