1. 07 10月, 2020 1 次提交
    • Q
      btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations · e85fde51
      Qu Wenruo 提交于
      [BUG]
      When quota is enabled for TEST_DEV, generic/013 sometimes fails like this:
      
        generic/013 14s ... _check_dmesg: something found in dmesg (see xfstests-dev/results//generic/013.dmesg)
      
      And with the following metadata leak:
      
        BTRFS warning (device dm-3): qgroup 0/1370 has unreleased space, type 2 rsv 49152
        ------------[ cut here ]------------
        WARNING: CPU: 2 PID: 47912 at fs/btrfs/disk-io.c:4078 close_ctree+0x1dc/0x323 [btrfs]
        Call Trace:
         btrfs_put_super+0x15/0x17 [btrfs]
         generic_shutdown_super+0x72/0x110
         kill_anon_super+0x18/0x30
         btrfs_kill_super+0x17/0x30 [btrfs]
         deactivate_locked_super+0x3b/0xa0
         deactivate_super+0x40/0x50
         cleanup_mnt+0x135/0x190
         __cleanup_mnt+0x12/0x20
         task_work_run+0x64/0xb0
         __prepare_exit_to_usermode+0x1bc/0x1c0
         __syscall_return_slowpath+0x47/0x230
         do_syscall_64+0x64/0xb0
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        ---[ end trace a6cfd45ba80e4e06 ]---
        BTRFS error (device dm-3): qgroup reserved space leaked
        BTRFS info (device dm-3): disk space caching is enabled
        BTRFS info (device dm-3): has skinny extents
      
      [CAUSE]
      The qgroup preallocated meta rsv operations of that offending root are:
      
        btrfs_delayed_inode_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=131072
        btrfs_delayed_inode_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=131072
        btrfs_subvolume_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=49152
        btrfs_delayed_inode_release_metadata: convert_meta_prealloc root=1370 num_bytes=-131072
        btrfs_delayed_inode_release_metadata: convert_meta_prealloc root=1370 num_bytes=-131072
      
      It's pretty obvious that, we reserve qgroup meta rsv in
      btrfs_subvolume_reserve_metadata(), but doesn't have corresponding
      release/convert calls in btrfs_subvolume_release_metadata().
      
      This leads to the leakage.
      
      [FIX]
      To fix this bug, we should follow what we're doing in
      btrfs_delalloc_reserve_metadata(), where we reserve qgroup space, and
      add it to block_rsv->qgroup_rsv_reserved.
      
      And free the qgroup reserved metadata space when releasing the
      block_rsv.
      
      To do this, we need to change the btrfs_subvolume_release_metadata() to
      accept btrfs_root, and record the qgroup_to_release number, and call
      btrfs_qgroup_convert_reserved_meta() for it.
      
      Fixes: 733e03a0 ("btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e85fde51
  2. 25 5月, 2020 1 次提交
    • D
      btrfs: simplify root lookup by id · 56e9357a
      David Sterba 提交于
      The main function to lookup a root by its id btrfs_get_fs_root takes the
      whole key, while only using the objectid. The value of offset is preset
      to (u64)-1 but not actually used until btrfs_find_root that does the
      actual search.
      
      Switch btrfs_get_fs_root to use only objectid and remove all local
      variables that existed just for the lookup. The actual key for search is
      set up in btrfs_get_fs_root, reusing another key variable.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      56e9357a
  3. 24 3月, 2020 6 次提交
  4. 08 1月, 2020 1 次提交
  5. 09 9月, 2019 1 次提交
  6. 04 7月, 2019 1 次提交
  7. 09 5月, 2019 1 次提交
    • F
      Btrfs: do not abort transaction at btrfs_update_root() after failure to COW path · 72bd2323
      Filipe Manana 提交于
      Currently when we fail to COW a path at btrfs_update_root() we end up
      always aborting the transaction. However all the current callers of
      btrfs_update_root() are able to deal with errors returned from it, many do
      end up aborting the transaction themselves (directly or not, such as the
      transaction commit path), other BUG_ON() or just gracefully cancel whatever
      they were doing.
      
      When syncing the fsync log, we call btrfs_update_root() through
      tree-log.c:update_log_root(), and if it returns an -ENOSPC error, the log
      sync code does not abort the transaction, instead it gracefully handles
      the error and returns -EAGAIN to the fsync handler, so that it falls back
      to a transaction commit. Any other error different from -ENOSPC, makes the
      log sync code abort the transaction.
      
      So remove the transaction abort from btrfs_update_log() when we fail to
      COW a path to update the root item, so that if an -ENOSPC failure happens
      we avoid aborting the current transaction and have a chance of the fsync
      succeeding after falling back to a transaction commit.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203413
      Fixes: 79787eaa ("btrfs: replace many BUG_ONs with proper error handling")
      Cc: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      72bd2323
  8. 30 4月, 2019 1 次提交
  9. 27 2月, 2019 1 次提交
    • J
      btrfs: check for refs on snapshot delete resume · 78c52d9e
      Josef Bacik 提交于
      There's a bug in snapshot deletion where we won't update the
      drop_progress key if we're in the UPDATE_BACKREF stage.  This is a
      problem because we could drop refs for blocks we know don't belong to
      ours.  If we crash or umount at the right time we could experience
      messages such as the following when snapshot deletion resumes
      
       BTRFS error (device dm-3): unable to find ref byte nr 66797568 parent 0 root 258  owner 1 offset 0
       ------------[ cut here ]------------
       WARNING: CPU: 3 PID: 16052 at fs/btrfs/extent-tree.c:7108 __btrfs_free_extent.isra.78+0x62c/0xb30 [btrfs]
       CPU: 3 PID: 16052 Comm: umount Tainted: G        W  OE     5.0.0-rc4+ #147
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011
       RIP: 0010:__btrfs_free_extent.isra.78+0x62c/0xb30 [btrfs]
       RSP: 0018:ffffc90005cd7b18 EFLAGS: 00010286
       RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
       RDX: ffff88842fade680 RSI: ffff88842fad6b18 RDI: ffff88842fad6b18
       RBP: ffffc90005cd7bc8 R08: 0000000000000000 R09: 0000000000000001
       R10: 0000000000000001 R11: ffffffff822696b8 R12: 0000000003fb4000
       R13: 0000000000000001 R14: 0000000000000102 R15: ffff88819c9d67e0
       FS:  00007f08bb138fc0(0000) GS:ffff88842fac0000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f8f5d861ea0 CR3: 00000003e99fe000 CR4: 00000000000006e0
       Call Trace:
       ? _raw_spin_unlock+0x27/0x40
       ? btrfs_merge_delayed_refs+0x356/0x3e0 [btrfs]
       __btrfs_run_delayed_refs+0x75a/0x13c0 [btrfs]
       ? join_transaction+0x2b/0x460 [btrfs]
       btrfs_run_delayed_refs+0xf3/0x1c0 [btrfs]
       btrfs_commit_transaction+0x52/0xa50 [btrfs]
       ? start_transaction+0xa6/0x510 [btrfs]
       btrfs_sync_fs+0x79/0x1c0 [btrfs]
       sync_filesystem+0x70/0x90
       generic_shutdown_super+0x27/0x120
       kill_anon_super+0x12/0x30
       btrfs_kill_super+0x16/0xa0 [btrfs]
       deactivate_locked_super+0x43/0x70
       deactivate_super+0x40/0x60
       cleanup_mnt+0x3f/0x80
       __cleanup_mnt+0x12/0x20
       task_work_run+0x8b/0xc0
       exit_to_usermode_loop+0xce/0xd0
       do_syscall_64+0x20b/0x210
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      To fix this simply mark dead roots we read from disk as DEAD and then
      set the walk_control->restarted flag so we know we have a restarted
      deletion.  From here whenever we try to drop refs for blocks we check to
      verify our ref is set on them, and if it is not we skip it.  Once we
      find a ref that is set we unset walk_control->restarted since the tree
      should be in a normal state from then on, and any problems we run into
      from there are different issues.  I tested this with an existing broken
      fs and my reproducer that creates a broken fs and it fixed both file
      systems.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      78c52d9e
  10. 25 2月, 2019 1 次提交
  11. 06 8月, 2018 3 次提交
  12. 06 6月, 2018 1 次提交
    • D
      vfs: change inode times to use struct timespec64 · 95582b00
      Deepa Dinamani 提交于
      struct timespec is not y2038 safe. Transition vfs to use
      y2038 safe struct timespec64 instead.
      
      The change was made with the help of the following cocinelle
      script. This catches about 80% of the changes.
      All the header file and logic changes are included in the
      first 5 rules. The rest are trivial substitutions.
      I avoid changing any of the function signatures or any other
      filesystem specific data structures to keep the patch simple
      for review.
      
      The script can be a little shorter by combining different cases.
      But, this version was sufficient for my usecase.
      
      virtual patch
      
      @ depends on patch @
      identifier now;
      @@
      - struct timespec
      + struct timespec64
        current_time ( ... )
        {
      - struct timespec now = current_kernel_time();
      + struct timespec64 now = current_kernel_time64();
        ...
      - return timespec_trunc(
      + return timespec64_trunc(
        ... );
        }
      
      @ depends on patch @
      identifier xtime;
      @@
       struct \( iattr \| inode \| kstat \) {
       ...
      -       struct timespec xtime;
      +       struct timespec64 xtime;
       ...
       }
      
      @ depends on patch @
      identifier t;
      @@
       struct inode_operations {
       ...
      int (*update_time) (...,
      -       struct timespec t,
      +       struct timespec64 t,
      ...);
       ...
       }
      
      @ depends on patch @
      identifier t;
      identifier fn_update_time =~ "update_time$";
      @@
       fn_update_time (...,
      - struct timespec *t,
      + struct timespec64 *t,
       ...) { ... }
      
      @ depends on patch @
      identifier t;
      @@
      lease_get_mtime( ... ,
      - struct timespec *t
      + struct timespec64 *t
        ) { ... }
      
      @te depends on patch forall@
      identifier ts;
      local idexpression struct inode *inode_node;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn_update_time =~ "update_time$";
      identifier fn;
      expression e, E3;
      local idexpression struct inode *node1;
      local idexpression struct inode *node2;
      local idexpression struct iattr *attr1;
      local idexpression struct iattr *attr2;
      local idexpression struct iattr attr;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      @@
      (
      (
      - struct timespec ts;
      + struct timespec64 ts;
      |
      - struct timespec ts = current_time(inode_node);
      + struct timespec64 ts = current_time(inode_node);
      )
      
      <+... when != ts
      (
      - timespec_equal(&inode_node->i_xtime, &ts)
      + timespec64_equal(&inode_node->i_xtime, &ts)
      |
      - timespec_equal(&ts, &inode_node->i_xtime)
      + timespec64_equal(&ts, &inode_node->i_xtime)
      |
      - timespec_compare(&inode_node->i_xtime, &ts)
      + timespec64_compare(&inode_node->i_xtime, &ts)
      |
      - timespec_compare(&ts, &inode_node->i_xtime)
      + timespec64_compare(&ts, &inode_node->i_xtime)
      |
      ts = current_time(e)
      |
      fn_update_time(..., &ts,...)
      |
      inode_node->i_xtime = ts
      |
      node1->i_xtime = ts
      |
      ts = inode_node->i_xtime
      |
      <+... attr1->ia_xtime ...+> = ts
      |
      ts = attr1->ia_xtime
      |
      ts.tv_sec
      |
      ts.tv_nsec
      |
      btrfs_set_stack_timespec_sec(..., ts.tv_sec)
      |
      btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
      |
      - ts = timespec64_to_timespec(
      + ts =
      ...
      -)
      |
      - ts = ktime_to_timespec(
      + ts = ktime_to_timespec64(
      ...)
      |
      - ts = E3
      + ts = timespec_to_timespec64(E3)
      |
      - ktime_get_real_ts(&ts)
      + ktime_get_real_ts64(&ts)
      |
      fn(...,
      - ts
      + timespec64_to_timespec(ts)
      ,...)
      )
      ...+>
      (
      <... when != ts
      - return ts;
      + return timespec64_to_timespec(ts);
      ...>
      )
      |
      - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
      |
      - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
      + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
      |
      - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
      |
      node1->i_xtime1 =
      - timespec_trunc(attr1->ia_xtime1,
      + timespec64_trunc(attr1->ia_xtime1,
      ...)
      |
      - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
      + attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
      ...)
      |
      - ktime_get_real_ts(&attr1->ia_xtime1)
      + ktime_get_real_ts64(&attr1->ia_xtime1)
      |
      - ktime_get_real_ts(&attr.ia_xtime1)
      + ktime_get_real_ts64(&attr.ia_xtime1)
      )
      
      @ depends on patch @
      struct inode *node;
      struct iattr *attr;
      identifier fn;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      expression e;
      @@
      (
      - fn(node->i_xtime);
      + fn(timespec64_to_timespec(node->i_xtime));
      |
       fn(...,
      - node->i_xtime);
      + timespec64_to_timespec(node->i_xtime));
      |
      - e = fn(attr->ia_xtime);
      + e = fn(timespec64_to_timespec(attr->ia_xtime));
      )
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      )
      ...+>
      }
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      struct kstat *stat;
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier i_xtime =~ "^i_[acm]time$";
      identifier xtime =~ "^[acm]time$";
      identifier fn, ret;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(stat->xtime);
      ret = fn (...,
      - &stat->xtime);
      + &ts);
      )
      ...+>
      }
      
      @ depends on patch @
      struct inode *node;
      struct inode *node2;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier i_xtime3 =~ "^i_[acm]time$";
      struct iattr *attrp;
      struct iattr *attrp2;
      struct iattr attr ;
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      struct kstat *stat;
      struct kstat stat1;
      struct timespec64 ts;
      identifier xtime =~ "^[acmb]time$";
      expression e;
      @@
      (
      ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
      |
       node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       stat->xtime = node2->i_xtime1;
      |
       stat1.xtime = node2->i_xtime1;
      |
      ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
      |
      ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
      |
      - e = node->i_xtime1;
      + e = timespec64_to_timespec( node->i_xtime1 );
      |
      - e = attrp->ia_xtime1;
      + e = timespec64_to_timespec( attrp->ia_xtime1 );
      |
      node->i_xtime1 = current_time(...);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
       node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
      - node->i_xtime1 = e;
      + node->i_xtime1 = timespec_to_timespec64(e);
      )
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: <anton@tuxera.com>
      Cc: <balbi@kernel.org>
      Cc: <bfields@fieldses.org>
      Cc: <darrick.wong@oracle.com>
      Cc: <dhowells@redhat.com>
      Cc: <dsterba@suse.com>
      Cc: <dwmw2@infradead.org>
      Cc: <hch@lst.de>
      Cc: <hirofumi@mail.parknet.co.jp>
      Cc: <hubcap@omnibond.com>
      Cc: <jack@suse.com>
      Cc: <jaegeuk@kernel.org>
      Cc: <jaharkes@cs.cmu.edu>
      Cc: <jslaby@suse.com>
      Cc: <keescook@chromium.org>
      Cc: <mark@fasheh.com>
      Cc: <miklos@szeredi.hu>
      Cc: <nico@linaro.org>
      Cc: <reiserfs-devel@vger.kernel.org>
      Cc: <richard@nod.at>
      Cc: <sage@redhat.com>
      Cc: <sfrench@samba.org>
      Cc: <swhiteho@redhat.com>
      Cc: <tj@kernel.org>
      Cc: <trond.myklebust@primarydata.com>
      Cc: <tytso@mit.edu>
      Cc: <viro@zeniv.linux.org.uk>
      95582b00
  13. 12 4月, 2018 1 次提交
  14. 22 1月, 2018 1 次提交
  15. 30 10月, 2017 1 次提交
  16. 21 8月, 2017 1 次提交
  17. 16 8月, 2017 1 次提交
  18. 17 7月, 2017 1 次提交
    • D
      VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) · bc98a42c
      David Howells 提交于
      Firstly by applying the following with coccinelle's spatch:
      
      	@@ expression SB; @@
      	-SB->s_flags & MS_RDONLY
      	+sb_rdonly(SB)
      
      to effect the conversion to sb_rdonly(sb), then by applying:
      
      	@@ expression A, SB; @@
      	(
      	-(!sb_rdonly(SB)) && A
      	+!sb_rdonly(SB) && A
      	|
      	-A != (sb_rdonly(SB))
      	+A != sb_rdonly(SB)
      	|
      	-A == (sb_rdonly(SB))
      	+A == sb_rdonly(SB)
      	|
      	-!(sb_rdonly(SB))
      	+!sb_rdonly(SB)
      	|
      	-A && (sb_rdonly(SB))
      	+A && sb_rdonly(SB)
      	|
      	-A || (sb_rdonly(SB))
      	+A || sb_rdonly(SB)
      	|
      	-(sb_rdonly(SB)) != A
      	+sb_rdonly(SB) != A
      	|
      	-(sb_rdonly(SB)) == A
      	+sb_rdonly(SB) == A
      	|
      	-(sb_rdonly(SB)) && A
      	+sb_rdonly(SB) && A
      	|
      	-(sb_rdonly(SB)) || A
      	+sb_rdonly(SB) || A
      	)
      
      	@@ expression A, B, SB; @@
      	(
      	-(sb_rdonly(SB)) ? 1 : 0
      	+sb_rdonly(SB)
      	|
      	-(sb_rdonly(SB)) ? A : B
      	+sb_rdonly(SB) ? A : B
      	)
      
      to remove left over excess bracketage and finally by applying:
      
      	@@ expression A, SB; @@
      	(
      	-(A & MS_RDONLY) != sb_rdonly(SB)
      	+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
      	|
      	-(A & MS_RDONLY) == sb_rdonly(SB)
      	+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
      	)
      
      to make comparisons against the result of sb_rdonly() (which is a bool)
      work correctly.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bc98a42c
  19. 22 6月, 2017 1 次提交
  20. 18 4月, 2017 1 次提交
    • D
      btrfs: Use ktime_get_real_ts for root ctime · fa7aede2
      Deepa Dinamani 提交于
      btrfs_root_item maintains the ctime for root updates.  This is not part
      of vfs_inode.
      
      Since current_time() uses struct inode* as an argument as Linus
      suggested, this cannot be used to update root times unless, we modify
      the signature to use inode.
      
      Since btrfs uses nanosecond time granularity, it can also use
      ktime_get_real_ts directly to obtain timestamp for the root. It is
      necessary to use the timespec time api here because the same
      btrfs_set_stack_timespec_*() apis are used for vfs inode times as well.
      These can be transitioned to using timespec64 when btrfs internally
      changes to use timespec64 as well.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: NDavid Sterba <dsterba@suse.com>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fa7aede2
  21. 14 2月, 2017 1 次提交
  22. 06 12月, 2016 4 次提交
  23. 27 9月, 2016 1 次提交
  24. 25 8月, 2016 1 次提交
  25. 26 7月, 2016 1 次提交
  26. 26 5月, 2016 1 次提交
  27. 28 4月, 2016 1 次提交
  28. 04 3月, 2016 1 次提交
    • F
      Btrfs: fix loading of orphan roots leading to BUG_ON · 909c3a22
      Filipe Manana 提交于
      When looking for orphan roots during mount we can end up hitting a
      BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is
      replayed and qgroups are enabled. This is because after a log tree is
      replayed, a transaction commit is made, which triggers qgroup extent
      accounting which in turn does backref walking which ends up reading and
      inserting all roots in the radix tree fs_info->fs_root_radix, including
      orphan roots (deleted snapshots). So after the log tree is replayed, when
      finding orphan roots we hit the BUG_ON with the following trace:
      
      [118209.182438] ------------[ cut here ]------------
      [118209.183279] kernel BUG at fs/btrfs/root-tree.c:314!
      [118209.184074] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [118209.185123] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic ppdev xor raid6_pq evdev sg parport_pc parport acpi_cpufreq tpm_tis tpm psmouse
      processor i2c_piix4 serio_raw pcspkr i2c_core button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata
      virtio_pci virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs]
      [118209.186318] CPU: 14 PID: 28428 Comm: mount Tainted: G        W       4.5.0-rc5-btrfs-next-24+ #1
      [118209.186318] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
      [118209.186318] task: ffff8801ec131040 ti: ffff8800af34c000 task.ti: ffff8800af34c000
      [118209.186318] RIP: 0010:[<ffffffffa04237d7>]  [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs]
      [118209.186318] RSP: 0018:ffff8800af34faa8  EFLAGS: 00010246
      [118209.186318] RAX: 00000000ffffffef RBX: 00000000ffffffef RCX: 0000000000000001
      [118209.186318] RDX: 0000000080000000 RSI: 0000000000000001 RDI: 00000000ffffffff
      [118209.186318] RBP: ffff8800af34fb08 R08: 0000000000000001 R09: 0000000000000000
      [118209.186318] R10: ffff8800af34f9f0 R11: 6db6db6db6db6db7 R12: ffff880171b97000
      [118209.186318] R13: ffff8801ca9d65e0 R14: ffff8800afa2e000 R15: 0000160000000000
      [118209.186318] FS:  00007f5bcb914840(0000) GS:ffff88023edc0000(0000) knlGS:0000000000000000
      [118209.186318] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [118209.186318] CR2: 00007f5bcaceb5d9 CR3: 00000000b49b5000 CR4: 00000000000006e0
      [118209.186318] Stack:
      [118209.186318]  fffffbffffffffff 010230ffffffffff 0101000000000000 ff84000000000000
      [118209.186318]  fbffffffffffffff 30ffffffffffffff 0000000000000101 ffff880082348000
      [118209.186318]  0000000000000000 ffff8800afa2e000 ffff8800afa2e000 0000000000000000
      [118209.186318] Call Trace:
      [118209.186318]  [<ffffffffa042e2db>] open_ctree+0x1e37/0x21b9 [btrfs]
      [118209.186318]  [<ffffffffa040a753>] btrfs_mount+0x97e/0xaed [btrfs]
      [118209.186318]  [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf
      [118209.186318]  [<ffffffff8117b87e>] mount_fs+0x67/0x131
      [118209.186318]  [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde
      [118209.186318]  [<ffffffffa0409f81>] btrfs_mount+0x1ac/0xaed [btrfs]
      [118209.186318]  [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf
      [118209.186318]  [<ffffffff8108c26b>] ? lockdep_init_map+0xb9/0x1b3
      [118209.186318]  [<ffffffff8117b87e>] mount_fs+0x67/0x131
      [118209.186318]  [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde
      [118209.186318]  [<ffffffff81195637>] do_mount+0x8a6/0x9e8
      [118209.186318]  [<ffffffff8119598d>] SyS_mount+0x77/0x9f
      [118209.186318]  [<ffffffff81493017>] entry_SYSCALL_64_fastpath+0x12/0x6b
      [118209.186318] Code: 64 00 00 85 c0 89 c3 75 24 f0 41 80 4c 24 20 20 49 8b bc 24 f0 01 00 00 4c 89 e6 e8 e8 65 00 00 85 c0 89 c3 74 11 83 f8 ef 75 02 <0f> 0b
      4c 89 e7 e8 da 72 00 00 eb 1c 41 83 bc 24 00 01 00 00 00
      [118209.186318] RIP  [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs]
      [118209.186318]  RSP <ffff8800af34faa8>
      [118209.230735] ---[ end trace 83938f987d85d477 ]---
      
      So fix this by not treating the error -EEXIST, returned when attempting
      to insert a root already inserted by the backref walking code, as an error.
      
      The following test case for xfstests reproduces the bug:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            _cleanup_flakey
            cd /
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
        . ./common/dmflakey
      
        # real QA test starts here
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _require_dm_target flakey
        _require_metadata_journaling $SCRATCH_DEV
      
        rm -f $seqres.full
      
        _scratch_mkfs >>$seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        _run_btrfs_util_prog quota enable $SCRATCH_MNT
      
        # Create 2 directories with one file in one of them.
        # We use these just to trigger a transaction commit later, moving the file from
        # directory a to directory b and doing an fsync against directory a.
        mkdir $SCRATCH_MNT/a
        mkdir $SCRATCH_MNT/b
        touch $SCRATCH_MNT/a/f
        sync
      
        # Create our test file with 2 4K extents.
        $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io
      
        # Create a snapshot and delete it. This doesn't really delete the snapshot
        # immediately, just makes it inaccessible and invisible to user space, the
        # snapshot is deleted later by a dedicated kernel thread (cleaner kthread)
        # which is woke up at the next transaction commit.
        # A root orphan item is inserted into the tree of tree roots, so that if a
        # power failure happens before the dedicated kernel thread does the snapshot
        # deletion, the next time the filesystem is mounted it resumes the snapshot
        # deletion.
        _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
        _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap
      
        # Now overwrite half of the extents we wrote before. Because we made a snapshpot
        # before, which isn't really deleted yet (since no transaction commit happened
        # after we did the snapshot delete request), the non overwritten extents get
        # referenced twice, once by the default subvolume and once by the snapshot.
        $XFS_IO_PROG -c "pwrite -S 0xbb 4K 8K" $SCRATCH_MNT/foobar | _filter_xfs_io
      
        # Now move file f from directory a to directory b and fsync directory a.
        # The fsync on the directory a triggers a transaction commit (because a file
        # was moved from it to another directory) and the file fsync leaves a log tree
        # with file extent items to replay.
        mv $SCRATCH_MNT/a/f $SCRATCH_MNT/a/b
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar
      
        echo "File digest before power failure:"
        md5sum $SCRATCH_MNT/foobar | _filter_scratch
      
        # Now simulate a power failure and mount the filesystem to replay the log tree.
        # After the log tree was replayed, we used to hit a BUG_ON() when processing
        # the root orphan item for the deleted snapshot. This is because when processing
        # an orphan root the code expected to be the first code inserting the root into
        # the fs_info->fs_root_radix radix tree, while in reallity it was the second
        # caller attempting to do it - the first caller was the transaction commit that
        # took place after replaying the log tree, when updating the qgroup counters.
        _flakey_drop_and_remount
      
        echo "File digest before after failure:"
        # Must match what he got before the power failure.
        md5sum $SCRATCH_MNT/foobar | _filter_scratch
      
        _unmount_flakey
        status=0
        exit
      
      Fixes: 2d9e9776 ("Btrfs: use btrfs_get_fs_root in resolve_indirect_ref")
      Cc: stable@vger.kernel.org  # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      909c3a22
  29. 18 2月, 2016 1 次提交
  30. 22 10月, 2015 1 次提交