1. 17 12月, 2018 16 次提交
  2. 06 11月, 2018 2 次提交
    • F
      Btrfs: fix deadlock on tree root leaf when finding free extent · 4222ea71
      Filipe Manana 提交于
      When we are writing out a free space cache, during the transaction commit
      phase, we can end up in a deadlock which results in a stack trace like the
      following:
      
       schedule+0x28/0x80
       btrfs_tree_read_lock+0x8e/0x120 [btrfs]
       ? finish_wait+0x80/0x80
       btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
       btrfs_search_slot+0xf6/0x9f0 [btrfs]
       ? evict_refill_and_join+0xd0/0xd0 [btrfs]
       ? inode_insert5+0x119/0x190
       btrfs_lookup_inode+0x3a/0xc0 [btrfs]
       ? kmem_cache_alloc+0x166/0x1d0
       btrfs_iget+0x113/0x690 [btrfs]
       __lookup_free_space_inode+0xd8/0x150 [btrfs]
       lookup_free_space_inode+0x5b/0xb0 [btrfs]
       load_free_space_cache+0x7c/0x170 [btrfs]
       ? cache_block_group+0x72/0x3b0 [btrfs]
       cache_block_group+0x1b3/0x3b0 [btrfs]
       ? finish_wait+0x80/0x80
       find_free_extent+0x799/0x1010 [btrfs]
       btrfs_reserve_extent+0x9b/0x180 [btrfs]
       btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
       __btrfs_cow_block+0x11d/0x500 [btrfs]
       btrfs_cow_block+0xdc/0x180 [btrfs]
       btrfs_search_slot+0x3bd/0x9f0 [btrfs]
       btrfs_lookup_inode+0x3a/0xc0 [btrfs]
       ? kmem_cache_alloc+0x166/0x1d0
       btrfs_update_inode_item+0x46/0x100 [btrfs]
       cache_save_setup+0xe4/0x3a0 [btrfs]
       btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
       btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
      
      At cache_save_setup() we need to update the inode item of a block group's
      cache which is located in the tree root (fs_info->tree_root), which means
      that it may result in COWing a leaf from that tree. If that happens we
      need to find a free metadata extent and while looking for one, if we find
      a block group which was not cached yet we attempt to load its cache by
      calling cache_block_group(). However this function will try to load the
      inode of the free space cache, which requires finding the matching inode
      item in the tree root - if that inode item is located in the same leaf as
      the inode item of the space cache we are updating at cache_save_setup(),
      we end up in a deadlock, since we try to obtain a read lock on the same
      extent buffer that we previously write locked.
      
      So fix this by using the tree root's commit root when searching for a
      block group's free space cache inode item when we are attempting to load
      a free space cache. This is safe since block groups once loaded stay in
      memory forever, as well as their caches, so after they are first loaded
      we will never need to read their inode items again. For new block groups,
      once they are created they get their ->cached field set to
      BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.
      Reported-by: NAndrew Nelson <andrew.s.nelson@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
      Fixes: 9d66e233 ("Btrfs: load free space cache if it exists")
      Tested-by: NAndrew Nelson <andrew.s.nelson@gmail.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4222ea71
    • R
      Btrfs: fix cur_offset in the error case for nocow · 506481b2
      Robbie Ko 提交于
      When the cow_file_range fails, the related resources are unlocked
      according to the range [start..end), so the unlock cannot be repeated in
      run_delalloc_nocow.
      
      In some cases (e.g. cur_offset <= end && cow_start != -1), cur_offset is
      not updated correctly, so move the cur_offset update before
      cow_file_range.
      
        kernel BUG at mm/page-writeback.c:2663!
        Internal error: Oops - BUG: 0 [#1] SMP
        CPU: 3 PID: 31525 Comm: kworker/u8:7 Tainted: P O
        Hardware name: Realtek_RTD1296 (DT)
        Workqueue: writeback wb_workfn (flush-btrfs-1)
        task: ffffffc076db3380 ti: ffffffc02e9ac000 task.ti: ffffffc02e9ac000
        PC is at clear_page_dirty_for_io+0x1bc/0x1e8
        LR is at clear_page_dirty_for_io+0x14/0x1e8
        pc : [<ffffffc00033c91c>] lr : [<ffffffc00033c774>] pstate: 40000145
        sp : ffffffc02e9af4f0
        Process kworker/u8:7 (pid: 31525, stack limit = 0xffffffc02e9ac020)
        Call trace:
        [<ffffffc00033c91c>] clear_page_dirty_for_io+0x1bc/0x1e8
        [<ffffffbffc514674>] extent_clear_unlock_delalloc+0x1e4/0x210 [btrfs]
        [<ffffffbffc4fb168>] run_delalloc_nocow+0x3b8/0x948 [btrfs]
        [<ffffffbffc4fb948>] run_delalloc_range+0x250/0x3a8 [btrfs]
        [<ffffffbffc514c0c>] writepage_delalloc.isra.21+0xbc/0x1d8 [btrfs]
        [<ffffffbffc516048>] __extent_writepage+0xe8/0x248 [btrfs]
        [<ffffffbffc51630c>] extent_write_cache_pages.isra.17+0x164/0x378 [btrfs]
        [<ffffffbffc5185a8>] extent_writepages+0x48/0x68 [btrfs]
        [<ffffffbffc4f5828>] btrfs_writepages+0x20/0x30 [btrfs]
        [<ffffffc00033d758>] do_writepages+0x30/0x88
        [<ffffffc0003ba0f4>] __writeback_single_inode+0x34/0x198
        [<ffffffc0003ba6c4>] writeback_sb_inodes+0x184/0x3c0
        [<ffffffc0003ba96c>] __writeback_inodes_wb+0x6c/0xc0
        [<ffffffc0003bac20>] wb_writeback+0x1b8/0x1c0
        [<ffffffc0003bb0f0>] wb_workfn+0x150/0x250
        [<ffffffc0002b0014>] process_one_work+0x1dc/0x388
        [<ffffffc0002b02f0>] worker_thread+0x130/0x500
        [<ffffffc0002b6344>] kthread+0x10c/0x110
        [<ffffffc000284590>] ret_from_fork+0x10/0x40
        Code: d503201f a9025bb5 a90363b7 f90023b9 (d4210000)
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NRobbie Ko <robbieko@synology.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      506481b2
  3. 19 10月, 2018 2 次提交
  4. 17 10月, 2018 1 次提交
  5. 15 10月, 2018 17 次提交
  6. 11 10月, 2018 1 次提交
  7. 23 8月, 2018 1 次提交
    • F
      Btrfs: sync log after logging new name · d4682ba0
      Filipe Manana 提交于
      When we add a new name for an inode which was logged in the current
      transaction, we update the inode in the log so that its new name and
      ancestors are added to the log. However when we do this we do not persist
      the log, so the changes remain in memory only, and as a consequence, any
      ancestors that were created in the current transaction are updated such
      that future calls to btrfs_inode_in_log() return true. This leads to a
      subsequent fsync against such new ancestor directories returning
      immediately, without persisting the log, therefore after a power failure
      the new ancestor directories do not exist, despite fsync being called
      against them explicitly.
      
      Example:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ mkdir /mnt/A
        $ mkdir /mnt/B
        $ mkdir /mnt/A/C
        $ touch /mnt/B/foo
        $ xfs_io -c "fsync" /mnt/B/foo
        $ ln /mnt/B/foo /mnt/A/C/foo
        $ xfs_io -c "fsync" /mnt/A
        <power failure>
      
      After the power failure, directory "A" does not exist, despite the explicit
      fsync on it.
      
      Instead of fixing this by changing the behaviour of the explicit fsync on
      directory "A" to persist the log instead of doing nothing, make the logging
      of the new file name (which happens when creating a hard link or renaming)
      persist the log. This approach not only is simpler, not requiring addition
      of new fields to the inode in memory structure, but also gives us the same
      behaviour as ext4, xfs and f2fs (possibly other filesystems too).
      
      A test case for fstests follows soon.
      
      Fixes: 12fcfd22 ("Btrfs: tree logging unlink/rename fixes")
      Reported-by: NVijay Chidambaram <vvijay03@gmail.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d4682ba0