1. 27 9月, 2016 8 次提交
  2. 26 9月, 2016 16 次提交
  3. 22 9月, 2016 2 次提交
  4. 06 9月, 2016 2 次提交
    • W
      btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress · ce129655
      Wang Xiaoguang 提交于
      In btrfs_async_reclaim_metadata_space(), we use ticket's address to
      determine whether asynchronous metadata reclaim work is making progress.
      
      	ticket = list_first_entry(&space_info->tickets,
      				  struct reserve_ticket, list);
      	if (last_ticket == ticket) {
      		flush_state++;
      	} else {
      		last_ticket = ticket;
      		flush_state = FLUSH_DELAYED_ITEMS_NR;
      		if (commit_cycles)
      			commit_cycles--;
      	}
      
      But indeed it's wrong, we should not rely on local variable's address to
      do this check, because addresses may be same. In my test environment, I
      dd one 168MB file in a 256MB fs, found that for this file, every time
      wait_reserve_ticket() called, local variable ticket's address is same,
      
      For above codes, assume a previous ticket's address is addrA, last_ticket
      is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and
      wake up it, then another ticket is added, but with the same address addrA,
      now last_ticket will be same to current ticket, then current ticket's flush
      work will start from current flush_state, not initial FLUSH_DELAYED_ITEMS_NR,
      which may result in some enospc issues(I have seen this in my test machine).
      Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ce129655
    • C
      Btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns · cbd60aa7
      Chris Mason 提交于
      We use a btrfs_log_ctx structure to pass information into the
      tree log commit, and get error values out.  It gets added to a per
      log-transaction list which we walk when things go bad.
      
      Commit d1433deb added an optimization to skip waiting for the log
      commit, but didn't take root_log_ctx out of the list.  This
      patch makes sure we remove things before exiting.
      Signed-off-by: NChris Mason <clm@fb.com>
      Fixes: d1433deb
      cc: stable@vger.kernel.org # 3.15+
      cbd60aa7
  5. 05 9月, 2016 1 次提交
  6. 01 9月, 2016 3 次提交
  7. 25 8月, 2016 8 次提交
    • F
      Btrfs: fix lockdep warning on deadlock against an inode's log mutex · 28a23593
      Filipe Manana 提交于
      Commit 44f714da ("Btrfs: improve performance on fsync against new
      inode after rename/unlink"), which landed in 4.8-rc2, introduced a
      possibility for a deadlock due to double locking of an inode's log mutex
      by the same task, which lockdep reports with:
      
      [23045.433975] =============================================
      [23045.434748] [ INFO: possible recursive locking detected ]
      [23045.435426] 4.7.0-rc6-btrfs-next-34+ #1 Not tainted
      [23045.436044] ---------------------------------------------
      [23045.436044] xfs_io/3688 is trying to acquire lock:
      [23045.436044]  (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
      [23045.436044]
                     but task is already holding lock:
      [23045.436044]  (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
      [23045.436044]
                     other info that might help us debug this:
      [23045.436044]  Possible unsafe locking scenario:
      
      [23045.436044]        CPU0
      [23045.436044]        ----
      [23045.436044]   lock(&ei->log_mutex);
      [23045.436044]   lock(&ei->log_mutex);
      [23045.436044]
                      *** DEADLOCK ***
      
      [23045.436044]  May be due to missing lock nesting notation
      
      [23045.436044] 3 locks held by xfs_io/3688:
      [23045.436044]  #0:  (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffffa035f2ae>] btrfs_sync_file+0x14e/0x425 [btrfs]
      [23045.436044]  #1:  (sb_internal#2){.+.+.+}, at: [<ffffffff8118446b>] __sb_start_write+0x5f/0xb0
      [23045.436044]  #2:  (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
      [23045.436044]
                     stack backtrace:
      [23045.436044] CPU: 4 PID: 3688 Comm: xfs_io Not tainted 4.7.0-rc6-btrfs-next-34+ #1
      [23045.436044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
      [23045.436044]  0000000000000000 ffff88022f5f7860 ffffffff8127074d ffffffff82a54b70
      [23045.436044]  ffffffff82a54b70 ffff88022f5f7920 ffffffff81092897 ffff880228015d68
      [23045.436044]  0000000000000000 ffffffff82a54b70 ffffffff829c3f00 ffff880228015d68
      [23045.436044] Call Trace:
      [23045.436044]  [<ffffffff8127074d>] dump_stack+0x67/0x90
      [23045.436044]  [<ffffffff81092897>] __lock_acquire+0xcbb/0xe4e
      [23045.436044]  [<ffffffff8109155f>] ? mark_lock+0x24/0x201
      [23045.436044]  [<ffffffff8109179a>] ? mark_held_locks+0x5e/0x74
      [23045.436044]  [<ffffffff81092de0>] lock_acquire+0x12f/0x1c3
      [23045.436044]  [<ffffffff81092de0>] ? lock_acquire+0x12f/0x1c3
      [23045.436044]  [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
      [23045.436044]  [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
      [23045.436044]  [<ffffffff814a51a4>] mutex_lock_nested+0x77/0x3a7
      [23045.436044]  [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
      [23045.436044]  [<ffffffffa039705e>] ? btrfs_release_delayed_node+0xb/0xd [btrfs]
      [23045.436044]  [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
      [23045.436044]  [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
      [23045.436044]  [<ffffffff810a0ed1>] ? vprintk_emit+0x453/0x465
      [23045.436044]  [<ffffffffa0385a61>] btrfs_log_inode+0x66e/0xc95 [btrfs]
      [23045.436044]  [<ffffffffa03c084d>] log_new_dir_dentries+0x26c/0x359 [btrfs]
      [23045.436044]  [<ffffffffa03865aa>] btrfs_log_inode_parent+0x4a6/0x628 [btrfs]
      [23045.436044]  [<ffffffffa0387552>] btrfs_log_dentry_safe+0x5a/0x75 [btrfs]
      [23045.436044]  [<ffffffffa035f464>] btrfs_sync_file+0x304/0x425 [btrfs]
      [23045.436044]  [<ffffffff811acaf4>] vfs_fsync_range+0x8c/0x9e
      [23045.436044]  [<ffffffff811acb22>] vfs_fsync+0x1c/0x1e
      [23045.436044]  [<ffffffff811acc79>] do_fsync+0x31/0x4a
      [23045.436044]  [<ffffffff811ace99>] SyS_fsync+0x10/0x14
      [23045.436044]  [<ffffffff814a88e5>] entry_SYSCALL_64_fastpath+0x18/0xa8
      [23045.436044]  [<ffffffff8108f039>] ? trace_hardirqs_off_caller+0x3f/0xaa
      
      An example reproducer for this is:
      
         $ mkfs.btrfs -f /dev/sdb
         $ mount /dev/sdb /mnt
         $ mkdir /mnt/dir
         $ touch /mnt/dir/foo
         $ sync
         $ mv /mnt/dir/foo /mnt/dir/bar
         $ touch /mnt/dir/foo
         $ xfs_io -c "fsync" /mnt/dir/bar
      
      This is because while logging the inode of file bar we end up logging its
      parent directory (since its inode has an unlink_trans field matching the
      current transaction id due to the rename operation), which in turn logs
      the inodes for all its new dentries, so that the new inode for the new
      file named foo gets logged which in turn triggered another logging attempt
      for the inode we are fsync'ing, since that inode had an old name that
      corresponds to the name of the new inode.
      
      So fix this by ensuring that when logging the inode for a new dentry that
      has a name matching an old name of some other inode, we don't log again
      the original inode that we are fsync'ing.
      
      Fixes: 44f714da ("Btrfs: improve performance on fsync against new inode after rename/unlink")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      28a23593
    • L
      Btrfs: detect corruption when non-root leaf has zero item · 1ba98d08
      Liu Bo 提交于
      Right now we treat leaf which has zero item as a valid one
      because we could have an empty tree, that is, a root that is
      also a leaf without any item, however, in the same case but
      when the leaf is not a root, we can end up with hitting the
      BUG_ON(1) in btrfs_extend_item() called by
      setup_inline_extent_backref().
      
      This makes us check the situation as a corruption if leaf is
      not its own root.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      1ba98d08
    • L
      Btrfs: check btree node's nritems · 053ab70f
      Liu Bo 提交于
      When btree node (level = 1) has nritems which equals to zero,
      we can end up with panic due to insert_ptr()'s
      
      BUG_ON(slot > nritems);
      
      where slot is 1 and nritems is 0, as copy_for_split() calls
      insert_ptr(.., path->slots[1] + 1, ...);
      
      A invalid value results in the whole mess, this adds the check
      for btree's node nritems so that we stop reading block when
      when something is wrong.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      053ab70f
    • J
      btrfs: don't create or leak aliased root while cleaning up orphans · 35bbb97f
      Jeff Mahoney 提交于
      commit 909c3a22 (Btrfs: fix loading of orphan roots leading to BUG_ON)
      avoids the BUG_ON but can add an aliased root to the dead_roots list or
      leak the root.
      
      Since we've already been loading roots into the radix tree, we should
      use it before looking the root up on disk.
      
      Cc: <stable@vger.kernel.org> # 4.5
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      35bbb97f
    • J
      Btrfs: fix em leak in find_first_block_group · 187ee58c
      Josef Bacik 提交于
      We need to call free_extent_map() on the em we look up.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      187ee58c
    • A
      btrfs: do not background blkdev_put() · 14238819
      Anand Jain 提交于
      At the end of unmount/dev-delete, if the device exclusive open is not
      actually closed, then there might be a race with another program in
      the userland who is trying to open the device in exclusive mode and
      it may fail for eg:
            unmount /btrfs; fsck /dev/x
            btrfs dev del /dev/x /btrfs; fsck /dev/x
      so here background blkdev_put() is not a choice
      Signed-off-by: NAnand Jain <Anand.Jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      14238819
    • L
      Btrfs: clarify do_chunk_alloc()'s return value · 28b737f6
      Liu Bo 提交于
      Function start_transaction() can return ERR_PTR(1) when flush is
      BTRFS_RESERVE_FLUSH_LIMIT, so the call graph is
      
      start_transaction (return ERR_PTR(1))
        -> btrfs_block_rsv_add (return 1)
           -> reserve_metadata_bytes (return 1)
              -> flush_space (return 1)
                 -> do_chunk_alloc  (return 1)
      
      With BTRFS_RESERVE_FLUSH_LIMIT, if flush_space is already on the
      flush_state of ALLOC_CHUNK and it successfully allocates a new
      chunk, then instead of trying to reserve space again,
      reserve_metadata_bytes returns 1 immediately.
      
      Eventually the callers who call start_transaction() usually just
      do the IS_ERR() check which ERR_PTR(1) can pass, then it'll get
      a panic when dereferencing a pointer which is ERR_PTR(1).
      
      The following patch fixes the above problem.
      "btrfs: flush_space: treat return value of do_chunk_alloc properly"
      https://patchwork.kernel.org/patch/7778651/
      
      This add comments to clarify do_chunk_alloc()'s return value.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      28b737f6
    • W
      btrfs: fix fsfreeze hang caused by delayed iputs deal · 9e7cc91a
      Wang Xiaoguang 提交于
      When running fstests generic/068, sometimes we got below deadlock:
        xfs_io          D ffff8800331dbb20     0  6697   6693 0x00000080
        ffff8800331dbb20 ffff88007acfc140 ffff880034d895c0 ffff8800331dc000
        ffff880032d243e8 fffffffeffffffff ffff880032d24400 0000000000000001
        ffff8800331dbb38 ffffffff816a9045 ffff880034d895c0 ffff8800331dbba8
        Call Trace:
        [<ffffffff816a9045>] schedule+0x35/0x80
        [<ffffffff816abab2>] rwsem_down_read_failed+0xf2/0x140
        [<ffffffff8118f5e1>] ? __filemap_fdatawrite_range+0xd1/0x100
        [<ffffffff8134f978>] call_rwsem_down_read_failed+0x18/0x30
        [<ffffffffa06631fc>] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs]
        [<ffffffff810d32b5>] percpu_down_read+0x35/0x50
        [<ffffffff81217dfc>] __sb_start_write+0x2c/0x40
        [<ffffffffa067f5d5>] start_transaction+0x2a5/0x4d0 [btrfs]
        [<ffffffffa067f857>] btrfs_join_transaction+0x17/0x20 [btrfs]
        [<ffffffffa068ba34>] btrfs_evict_inode+0x3c4/0x5d0 [btrfs]
        [<ffffffff81230a1a>] evict+0xba/0x1a0
        [<ffffffff812316b6>] iput+0x196/0x200
        [<ffffffffa06851d0>] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs]
        [<ffffffffa067f1d8>] btrfs_commit_transaction+0x928/0xa80 [btrfs]
        [<ffffffffa0646df0>] btrfs_freeze+0x30/0x40 [btrfs]
        [<ffffffff81218040>] freeze_super+0xf0/0x190
        [<ffffffff81229275>] do_vfs_ioctl+0x4a5/0x5c0
        [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
        [<ffffffff810038cf>] ? syscall_trace_enter_phase1+0x11f/0x140
        [<ffffffff81229409>] SyS_ioctl+0x79/0x90
        [<ffffffff81003c12>] do_syscall_64+0x62/0x110
        [<ffffffff816acbe1>] entry_SYSCALL64_slow_path+0x25/0x25
      
      >From this warning, freeze_super() already holds SB_FREEZE_FS, but
      btrfs_freeze() will call btrfs_commit_transaction() again, if
      btrfs_commit_transaction() finds that it has delayed iputs to handle,
      it'll start_transaction(), which will try to get SB_FREEZE_FS lock
      again, then deadlock occurs.
      
      The root cause is that in btrfs, sync_filesystem(sb) does not make
      sure all metadata is updated. There still maybe some codes adding
      delayed iputs, see below sample race window:
      
               CPU1                                  |         CPU2
      |-> freeze_super()                             |
          |-> sync_filesystem(sb);                   |
          |                                          |-> cleaner_kthread()
          |                                          |   |-> btrfs_delete_unused_bgs()
          |                                          |       |-> btrfs_remove_chunk()
          |                                          |           |-> btrfs_remove_block_group()
          |                                          |               |-> btrfs_add_delayed_iput()
          |                                          |
          |-> sb->s_writers.frozen = SB_FREEZE_FS;   |
          |-> sb_wait_write(sb, SB_FREEZE_FS);       |
          |   acquire SB_FREEZE_FS lock.             |
          |                                          |
          |-> btrfs_freeze()                         |
              |-> btrfs_commit_transaction()         |
                  |-> btrfs_run_delayed_iputs()      |
                  |   will handle delayed iputs,     |
                  |   that means start_transaction() |
                  |   will be called, which will try |
                  |   to get SB_FREEZE_FS lock.      |
      
      To fix this issue, introduce a "int fs_frozen" to record internally whether
      fs has been frozen. If fs has been frozen, we can not handle delayed iputs.
      Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add comment to btrfs_freeze ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9e7cc91a