1. 29 10月, 2019 1 次提交
    • F
      Btrfs: check for the full sync flag while holding the inode lock during fsync · 1b921b5b
      Filipe Manana 提交于
      commit ba0b084ac309283db6e329785c1dc4f45fdbd379 upstream.
      
      We were checking for the full fsync flag in the inode before locking the
      inode, which is racy, since at that that time it might not be set but
      after we acquire the inode lock some other task set it. One case where
      this can happen is on a system low on memory and some concurrent task
      failed to allocate an extent map and therefore set the full sync flag on
      the inode, to force the next fsync to work in full mode.
      
      A consequence of missing the full fsync flag set is hitting the problems
      fixed by commit 0c713cbab620 ("Btrfs: fix race between ranged fsync and
      writeback of adjacent ranges"), BUG_ON() when dropping extents from a log
      tree, hitting assertion failures at tree-log.c:copy_items() or all sorts
      of weird inconsistencies after replaying a log due to file extents items
      representing ranges that overlap.
      
      So just move the check such that it's done after locking the inode and
      before starting writeback again.
      
      Fixes: 0c713cbab620 ("Btrfs: fix race between ranged fsync and writeback of adjacent ranges")
      CC: stable@vger.kernel.org # 5.2+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b921b5b
  2. 26 7月, 2019 1 次提交
  3. 31 5月, 2019 3 次提交
    • R
      Btrfs: fix data bytes_may_use underflow with fallocate due to failed quota reserve · 1084fc9a
      Robbie Ko 提交于
      [ Upstream commit 39ad317315887c2cb9a4347a93a8859326ddf136 ]
      
      When doing fallocate, we first add the range to the reserve_list and
      then reserve the quota.  If quota reservation fails, we'll release all
      reserved parts of reserve_list.
      
      However, cur_offset is not updated to indicate that this range is
      already been inserted into the list.  Therefore, the same range is freed
      twice.  Once at list_for_each_entry loop, and once at the end of the
      function.  This will result in WARN_ON on bytes_may_use when we free the
      remaining space.
      
      At the end, under the 'out' label we have a call to:
      
         btrfs_free_reserved_data_space(inode, data_reserved, alloc_start, alloc_end - cur_offset);
      
      The start offset, third argument, should be cur_offset.
      
      Everything from alloc_start to cur_offset was freed by the
      list_for_each_entry_safe_loop.
      
      Fixes: 18513091 ("btrfs: update btrfs_space_info's bytes_may_use timely")
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NRobbie Ko <robbieko@synology.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1084fc9a
    • F
      Btrfs: fix race between ranged fsync and writeback of adjacent ranges · 92f907d7
      Filipe Manana 提交于
      commit 0c713cbab6200b0ab6473b50435e450a6e1de85d upstream.
      
      When we do a full fsync (the bit BTRFS_INODE_NEEDS_FULL_SYNC is set in the
      inode) that happens to be ranged, which happens during a msync() or writes
      for files opened with O_SYNC for example, we can end up with a corrupt log,
      due to different file extent items representing ranges that overlap with
      each other, or hit some assertion failures.
      
      When doing a ranged fsync we only flush delalloc and wait for ordered
      exents within that range. If while we are logging items from our inode
      ordered extents for adjacent ranges complete, we end up in a race that can
      make us insert the file extent items that overlap with others we logged
      previously and the assertion failures.
      
      For example, if tree-log.c:copy_items() receives a leaf that has the
      following file extents items, all with a length of 4K and therefore there
      is an implicit hole in the range 68K to 72K - 1:
      
        (257 EXTENT_ITEM 64K), (257 EXTENT_ITEM 72K), (257 EXTENT_ITEM 76K), ...
      
      It copies them to the log tree. However due to the need to detect implicit
      holes, it may release the path, in order to look at the previous leaf to
      detect an implicit hole, and then later it will search again in the tree
      for the first file extent item key, with the goal of locking again the
      leaf (which might have changed due to concurrent changes to other inodes).
      
      However when it locks again the leaf containing the first key, the key
      corresponding to the extent at offset 72K may not be there anymore since
      there is an ordered extent for that range that is finishing (that is,
      somewhere in the middle of btrfs_finish_ordered_io()), and it just
      removed the file extent item but has not yet replaced it with a new file
      extent item, so the part of copy_items() that does hole detection will
      decide that there is a hole in the range starting from 68K to 76K - 1,
      and therefore insert a file extent item to represent that hole, having
      a key offset of 68K. After that we now have a log tree with 2 different
      extent items that have overlapping ranges:
      
       1) The file extent item copied before copy_items() released the path,
          which has a key offset of 72K and a length of 4K, representing the
          file range 72K to 76K - 1.
      
       2) And a file extent item representing a hole that has a key offset of
          68K and a length of 8K, representing the range 68K to 76K - 1. This
          item was inserted after releasing the path, and overlaps with the
          extent item inserted before.
      
      The overlapping extent items can cause all sorts of unpredictable and
      incorrect behaviour, either when replayed or if a fast (non full) fsync
      happens later, which can trigger a BUG_ON() when calling
      btrfs_set_item_key_safe() through __btrfs_drop_extents(), producing a
      trace like the following:
      
        [61666.783269] ------------[ cut here ]------------
        [61666.783943] kernel BUG at fs/btrfs/ctree.c:3182!
        [61666.784644] invalid opcode: 0000 [#1] PREEMPT SMP
        (...)
        [61666.786253] task: ffff880117b88c40 task.stack: ffffc90008168000
        [61666.786253] RIP: 0010:btrfs_set_item_key_safe+0x7c/0xd2 [btrfs]
        [61666.786253] RSP: 0018:ffffc9000816b958 EFLAGS: 00010246
        [61666.786253] RAX: 0000000000000000 RBX: 000000000000000f RCX: 0000000000030000
        [61666.786253] RDX: 0000000000000000 RSI: ffffc9000816ba4f RDI: ffffc9000816b937
        [61666.786253] RBP: ffffc9000816b998 R08: ffff88011dae2428 R09: 0000000000001000
        [61666.786253] R10: 0000160000000000 R11: 6db6db6db6db6db7 R12: ffff88011dae2418
        [61666.786253] R13: ffffc9000816ba4f R14: ffff8801e10c4118 R15: ffff8801e715c000
        [61666.786253] FS:  00007f6060a18700(0000) GS:ffff88023f5c0000(0000) knlGS:0000000000000000
        [61666.786253] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [61666.786253] CR2: 00007f6060a28000 CR3: 0000000213e69000 CR4: 00000000000006e0
        [61666.786253] Call Trace:
        [61666.786253]  __btrfs_drop_extents+0x5e3/0xaad [btrfs]
        [61666.786253]  ? time_hardirqs_on+0x9/0x14
        [61666.786253]  btrfs_log_changed_extents+0x294/0x4e0 [btrfs]
        [61666.786253]  ? release_extent_buffer+0x38/0xb4 [btrfs]
        [61666.786253]  btrfs_log_inode+0xb6e/0xcdc [btrfs]
        [61666.786253]  ? lock_acquire+0x131/0x1c5
        [61666.786253]  ? btrfs_log_inode_parent+0xee/0x659 [btrfs]
        [61666.786253]  ? arch_local_irq_save+0x9/0xc
        [61666.786253]  ? btrfs_log_inode_parent+0x1f5/0x659 [btrfs]
        [61666.786253]  btrfs_log_inode_parent+0x223/0x659 [btrfs]
        [61666.786253]  ? arch_local_irq_save+0x9/0xc
        [61666.786253]  ? lockref_get_not_zero+0x2c/0x34
        [61666.786253]  ? rcu_read_unlock+0x3e/0x5d
        [61666.786253]  btrfs_log_dentry_safe+0x60/0x7b [btrfs]
        [61666.786253]  btrfs_sync_file+0x317/0x42c [btrfs]
        [61666.786253]  vfs_fsync_range+0x8c/0x9e
        [61666.786253]  SyS_msync+0x13c/0x1c9
        [61666.786253]  entry_SYSCALL_64_fastpath+0x18/0xad
      
      A sample of a corrupt log tree leaf with overlapping extents I got from
      running btrfs/072:
      
            item 14 key (295 108 200704) itemoff 2599 itemsize 53
                    extent data disk bytenr 0 nr 0
                    extent data offset 0 nr 458752 ram 458752
            item 15 key (295 108 659456) itemoff 2546 itemsize 53
                    extent data disk bytenr 4343541760 nr 770048
                    extent data offset 606208 nr 163840 ram 770048
            item 16 key (295 108 663552) itemoff 2493 itemsize 53
                    extent data disk bytenr 4343541760 nr 770048
                    extent data offset 610304 nr 155648 ram 770048
            item 17 key (295 108 819200) itemoff 2440 itemsize 53
                    extent data disk bytenr 4334788608 nr 4096
                    extent data offset 0 nr 4096 ram 4096
      
      The file extent item at offset 659456 (item 15) ends at offset 823296
      (659456 + 163840) while the next file extent item (item 16) starts at
      offset 663552.
      
      Another different problem that the race can trigger is a failure in the
      assertions at tree-log.c:copy_items(), which expect that the first file
      extent item key we found before releasing the path exists after we have
      released path and that the last key we found before releasing the path
      also exists after releasing the path:
      
        $ cat -n fs/btrfs/tree-log.c
        4080          if (need_find_last_extent) {
        4081                  /* btrfs_prev_leaf could return 1 without releasing the path */
        4082                  btrfs_release_path(src_path);
        4083                  ret = btrfs_search_slot(NULL, inode->root, &first_key,
        4084                                  src_path, 0, 0);
        4085                  if (ret < 0)
        4086                          return ret;
        4087                  ASSERT(ret == 0);
        (...)
        4103                  if (i >= btrfs_header_nritems(src_path->nodes[0])) {
        4104                          ret = btrfs_next_leaf(inode->root, src_path);
        4105                          if (ret < 0)
        4106                                  return ret;
        4107                          ASSERT(ret == 0);
        4108                          src = src_path->nodes[0];
        4109                          i = 0;
        4110                          need_find_last_extent = true;
        4111                  }
        (...)
      
      The second assertion implicitly expects that the last key before the path
      release still exists, because the surrounding while loop only stops after
      we have found that key. When this assertion fails it produces a stack like
      this:
      
        [139590.037075] assertion failed: ret == 0, file: fs/btrfs/tree-log.c, line: 4107
        [139590.037406] ------------[ cut here ]------------
        [139590.037707] kernel BUG at fs/btrfs/ctree.h:3546!
        [139590.038034] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
        [139590.038340] CPU: 1 PID: 31841 Comm: fsstress Tainted: G        W         5.0.0-btrfs-next-46 #1
        (...)
        [139590.039354] RIP: 0010:assfail.constprop.24+0x18/0x1a [btrfs]
        (...)
        [139590.040397] RSP: 0018:ffffa27f48f2b9b0 EFLAGS: 00010282
        [139590.040730] RAX: 0000000000000041 RBX: ffff897c635d92c8 RCX: 0000000000000000
        [139590.041105] RDX: 0000000000000000 RSI: ffff897d36a96868 RDI: ffff897d36a96868
        [139590.041470] RBP: ffff897d1b9a0708 R08: 0000000000000000 R09: 0000000000000000
        [139590.041815] R10: 0000000000000008 R11: 0000000000000000 R12: 0000000000000013
        [139590.042159] R13: 0000000000000227 R14: ffff897cffcbba88 R15: 0000000000000001
        [139590.042501] FS:  00007f2efc8dee80(0000) GS:ffff897d36a80000(0000) knlGS:0000000000000000
        [139590.042847] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [139590.043199] CR2: 00007f8c064935e0 CR3: 0000000232252002 CR4: 00000000003606e0
        [139590.043547] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [139590.043899] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [139590.044250] Call Trace:
        [139590.044631]  copy_items+0xa3f/0x1000 [btrfs]
        [139590.045009]  ? generic_bin_search.constprop.32+0x61/0x200 [btrfs]
        [139590.045396]  btrfs_log_inode+0x7b3/0xd70 [btrfs]
        [139590.045773]  btrfs_log_inode_parent+0x2b3/0xce0 [btrfs]
        [139590.046143]  ? do_raw_spin_unlock+0x49/0xc0
        [139590.046510]  btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
        [139590.046872]  btrfs_sync_file+0x3b6/0x440 [btrfs]
        [139590.047243]  btrfs_file_write_iter+0x45b/0x5c0 [btrfs]
        [139590.047592]  __vfs_write+0x129/0x1c0
        [139590.047932]  vfs_write+0xc2/0x1b0
        [139590.048270]  ksys_write+0x55/0xc0
        [139590.048608]  do_syscall_64+0x60/0x1b0
        [139590.048946]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [139590.049287] RIP: 0033:0x7f2efc4be190
        (...)
        [139590.050342] RSP: 002b:00007ffe743243a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
        [139590.050701] RAX: ffffffffffffffda RBX: 0000000000008d58 RCX: 00007f2efc4be190
        [139590.051067] RDX: 0000000000008d58 RSI: 00005567eca0f370 RDI: 0000000000000003
        [139590.051459] RBP: 0000000000000024 R08: 0000000000000003 R09: 0000000000008d60
        [139590.051863] R10: 0000000000000078 R11: 0000000000000246 R12: 0000000000000003
        [139590.052252] R13: 00000000003d3507 R14: 00005567eca0f370 R15: 0000000000000000
        (...)
        [139590.055128] ---[ end trace 193f35d0215cdeeb ]---
      
      So fix this race between a full ranged fsync and writeback of adjacent
      ranges by flushing all delalloc and waiting for all ordered extents to
      complete before logging the inode. This is the simplest way to solve the
      problem because currently the full fsync path does not deal with ranges
      at all (it assumes a full range from 0 to LLONG_MAX) and it always needs
      to look at adjacent ranges for hole detection. For use cases of ranged
      fsyncs this can make a few fsyncs slower but on the other hand it can
      make some following fsyncs to other ranges do less work or no need to do
      anything at all. A full fsync is rare anyway and happens only once after
      loading/creating an inode and once after less common operations such as a
      shrinking truncate.
      
      This is an issue that exists for a long time, and was often triggered by
      generic/127, because it does mmap'ed writes and msync (which triggers a
      ranged fsync). Adding support for the tree checker to detect overlapping
      extents (next patch in the series) and trigger a WARN() when such cases
      are found, and then calling btrfs_check_leaf_full() at the end of
      btrfs_insert_file_extent() made the issue much easier to detect. Running
      btrfs/072 with that change to the tree checker and making fsstress open
      files always with O_SYNC made it much easier to trigger the issue (as
      triggering it with generic/127 is very rare).
      
      CC: stable@vger.kernel.org # 3.16+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92f907d7
    • J
      btrfs: don't double unlock on error in btrfs_punch_hole · ce21e658
      Josef Bacik 提交于
      commit 8fca955057b9c58467d1b231e43f19c4cf26ae8c upstream.
      
      If we have an error writing out a delalloc range in
      btrfs_punch_hole_lock_range we'll unlock the inode and then goto
      out_only_mutex, where we will again unlock the inode.  This is bad,
      don't do this.
      
      Fixes: f27451f2 ("Btrfs: add support for fallocate's zero range operation")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce21e658
  4. 06 12月, 2018 1 次提交
    • F
      Btrfs: fix rare chances for data loss when doing a fast fsync · 715608db
      Filipe Manana 提交于
      commit aab15e8e upstream.
      
      After the simplification of the fast fsync patch done recently by commit
      b5e6c3e1 ("btrfs: always wait on ordered extents at fsync time") and
      commit e7175a69 ("btrfs: remove the wait ordered logic in the
      log_one_extent path"), we got a very short time window where we can get
      extents logged without writeback completing first or extents logged
      without logging the respective data checksums. Both issues can only happen
      when doing a non-full (fast) fsync.
      
      As soon as we enter btrfs_sync_file() we trigger writeback, then lock the
      inode and then wait for the writeback to complete before starting to log
      the inode. However before we acquire the inode's lock and after we started
      writeback, it's possible that more writes happened and dirtied more pages.
      If that happened and those pages get writeback triggered while we are
      logging the inode (for example, the VM subsystem triggering it due to
      memory pressure, or another concurrent fsync), we end up seeing the
      respective extent maps in the inode's list of modified extents and will
      log matching file extent items without waiting for the respective
      ordered extents to complete, meaning that either of the following will
      happen:
      
      1) We log an extent after its writeback finishes but before its checksums
         are added to the csum tree, leading to -EIO errors when attempting to
         read the extent after a log replay.
      
      2) We log an extent before its writeback finishes.
         Therefore after the log replay we will have a file extent item pointing
         to an unwritten extent (and without the respective data checksums as
         well).
      
      This could not happen before the fast fsync patch simplification, because
      for any extent we found in the list of modified extents, we would wait for
      its respective ordered extent to finish writeback or collect its checksums
      for logging if it did not complete yet.
      
      Fix this by triggering writeback again after acquiring the inode's lock
      and before waiting for ordered extents to complete.
      
      Fixes: e7175a69 ("btrfs: remove the wait ordered logic in the log_one_extent path")
      Fixes: b5e6c3e1 ("btrfs: always wait on ordered extents at fsync time")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      715608db
  5. 14 11月, 2018 2 次提交
    • J
      btrfs: move the dio_sem higher up the callchain · 51c62a33
      Josef Bacik 提交于
      commit c495144b upstream.
      
      We're getting a lockdep splat because we take the dio_sem under the
      log_mutex.  What we really need is to protect fsync() from logging an
      extent map for an extent we never waited on higher up, so just guard the
      whole thing with dio_sem.
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.18.0-rc4-xfstests-00025-g5de5edbaf1d4 #411 Not tainted
      ------------------------------------------------------
      aio-dio-invalid/30928 is trying to acquire lock:
      0000000092621cfd (&mm->mmap_sem){++++}, at: get_user_pages_unlocked+0x5a/0x1e0
      
      but task is already holding lock:
      00000000cefe6b35 (&ei->dio_sem){++++}, at: btrfs_direct_IO+0x3be/0x400
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #5 (&ei->dio_sem){++++}:
             lock_acquire+0xbd/0x220
             down_write+0x51/0xb0
             btrfs_log_changed_extents+0x80/0xa40
             btrfs_log_inode+0xbaf/0x1000
             btrfs_log_inode_parent+0x26f/0xa80
             btrfs_log_dentry_safe+0x50/0x70
             btrfs_sync_file+0x357/0x540
             do_fsync+0x38/0x60
             __ia32_sys_fdatasync+0x12/0x20
             do_fast_syscall_32+0x9a/0x2f0
             entry_SYSENTER_compat+0x84/0x96
      
      -> #4 (&ei->log_mutex){+.+.}:
             lock_acquire+0xbd/0x220
             __mutex_lock+0x86/0xa10
             btrfs_record_unlink_dir+0x2a/0xa0
             btrfs_unlink+0x5a/0xc0
             vfs_unlink+0xb1/0x1a0
             do_unlinkat+0x264/0x2b0
             do_fast_syscall_32+0x9a/0x2f0
             entry_SYSENTER_compat+0x84/0x96
      
      -> #3 (sb_internal#2){.+.+}:
             lock_acquire+0xbd/0x220
             __sb_start_write+0x14d/0x230
             start_transaction+0x3e6/0x590
             btrfs_evict_inode+0x475/0x640
             evict+0xbf/0x1b0
             btrfs_run_delayed_iputs+0x6c/0x90
             cleaner_kthread+0x124/0x1a0
             kthread+0x106/0x140
             ret_from_fork+0x3a/0x50
      
      -> #2 (&fs_info->cleaner_delayed_iput_mutex){+.+.}:
             lock_acquire+0xbd/0x220
             __mutex_lock+0x86/0xa10
             btrfs_alloc_data_chunk_ondemand+0x197/0x530
             btrfs_check_data_free_space+0x4c/0x90
             btrfs_delalloc_reserve_space+0x20/0x60
             btrfs_page_mkwrite+0x87/0x520
             do_page_mkwrite+0x31/0xa0
             __handle_mm_fault+0x799/0xb00
             handle_mm_fault+0x7c/0xe0
             __do_page_fault+0x1d3/0x4a0
             async_page_fault+0x1e/0x30
      
      -> #1 (sb_pagefaults){.+.+}:
             lock_acquire+0xbd/0x220
             __sb_start_write+0x14d/0x230
             btrfs_page_mkwrite+0x6a/0x520
             do_page_mkwrite+0x31/0xa0
             __handle_mm_fault+0x799/0xb00
             handle_mm_fault+0x7c/0xe0
             __do_page_fault+0x1d3/0x4a0
             async_page_fault+0x1e/0x30
      
      -> #0 (&mm->mmap_sem){++++}:
             __lock_acquire+0x42e/0x7a0
             lock_acquire+0xbd/0x220
             down_read+0x48/0xb0
             get_user_pages_unlocked+0x5a/0x1e0
             get_user_pages_fast+0xa4/0x150
             iov_iter_get_pages+0xc3/0x340
             do_direct_IO+0xf93/0x1d70
             __blockdev_direct_IO+0x32d/0x1c20
             btrfs_direct_IO+0x227/0x400
             generic_file_direct_write+0xcf/0x180
             btrfs_file_write_iter+0x308/0x58c
             aio_write+0xf8/0x1d0
             io_submit_one+0x3a9/0x620
             __ia32_compat_sys_io_submit+0xb2/0x270
             do_int80_syscall_32+0x5b/0x1a0
             entry_INT80_compat+0x88/0xa0
      
      other info that might help us debug this:
      
      Chain exists of:
        &mm->mmap_sem --> &ei->log_mutex --> &ei->dio_sem
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&ei->dio_sem);
                                     lock(&ei->log_mutex);
                                     lock(&ei->dio_sem);
        lock(&mm->mmap_sem);
      
       *** DEADLOCK ***
      
      1 lock held by aio-dio-invalid/30928:
       #0: 00000000cefe6b35 (&ei->dio_sem){++++}, at: btrfs_direct_IO+0x3be/0x400
      
      stack backtrace:
      CPU: 0 PID: 30928 Comm: aio-dio-invalid Not tainted 4.18.0-rc4-xfstests-00025-g5de5edbaf1d4 #411
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      Call Trace:
       dump_stack+0x7c/0xbb
       print_circular_bug.isra.37+0x297/0x2a4
       check_prev_add.constprop.45+0x781/0x7a0
       ? __lock_acquire+0x42e/0x7a0
       validate_chain.isra.41+0x7f0/0xb00
       __lock_acquire+0x42e/0x7a0
       lock_acquire+0xbd/0x220
       ? get_user_pages_unlocked+0x5a/0x1e0
       down_read+0x48/0xb0
       ? get_user_pages_unlocked+0x5a/0x1e0
       get_user_pages_unlocked+0x5a/0x1e0
       get_user_pages_fast+0xa4/0x150
       iov_iter_get_pages+0xc3/0x340
       do_direct_IO+0xf93/0x1d70
       ? __alloc_workqueue_key+0x358/0x490
       ? __blockdev_direct_IO+0x14b/0x1c20
       __blockdev_direct_IO+0x32d/0x1c20
       ? btrfs_run_delalloc_work+0x40/0x40
       ? can_nocow_extent+0x490/0x490
       ? kvm_clock_read+0x1f/0x30
       ? can_nocow_extent+0x490/0x490
       ? btrfs_run_delalloc_work+0x40/0x40
       btrfs_direct_IO+0x227/0x400
       ? btrfs_run_delalloc_work+0x40/0x40
       generic_file_direct_write+0xcf/0x180
       btrfs_file_write_iter+0x308/0x58c
       aio_write+0xf8/0x1d0
       ? kvm_clock_read+0x1f/0x30
       ? __might_fault+0x3e/0x90
       io_submit_one+0x3a9/0x620
       ? io_submit_one+0xe5/0x620
       __ia32_compat_sys_io_submit+0xb2/0x270
       do_int80_syscall_32+0x5b/0x1a0
       entry_INT80_compat+0x88/0xa0
      
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51c62a33
    • C
      Btrfs: don't clean dirty pages during buffered writes · 8f2ecee5
      Chris Mason 提交于
      commit 7703bdd8d23e6ef057af3253958a793ec6066b28 upstream.
      
      During buffered writes, we follow this basic series of steps:
      
      again:
      	lock all the pages
      	wait for writeback on all the pages
      	Take the extent range lock
      	wait for ordered extents on the whole range
      	clean all the pages
      
      	if (copy_from_user_in_atomic() hits a fault) {
      		drop our locks
      		goto again;
      	}
      
      	dirty all the pages
      	release all the locks
      
      The extra waiting, cleaning and locking are there to make sure we don't
      modify pages in flight to the drive, after they've been crc'd.
      
      If some of the pages in the range were already dirty when the write
      began, and we need to goto again, we create a window where a dirty page
      has been cleaned and unlocked.  It may be reclaimed before we're able to
      lock it again, which means we'll read the old contents off the drive and
      lose any modifications that had been pending writeback.
      
      We don't actually need to clean the pages.  All of the other locking in
      place makes sure we don't start IO on the pages, so we can just leave
      them dirty for the duration of the write.
      
      Fixes: 73d59314 (the original btrfs merge)
      CC: stable@vger.kernel.org # v4.4+
      Signed-off-by: NChris Mason <clm@fb.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8f2ecee5
  6. 06 8月, 2018 6 次提交
  7. 06 6月, 2018 1 次提交
    • D
      vfs: change inode times to use struct timespec64 · 95582b00
      Deepa Dinamani 提交于
      struct timespec is not y2038 safe. Transition vfs to use
      y2038 safe struct timespec64 instead.
      
      The change was made with the help of the following cocinelle
      script. This catches about 80% of the changes.
      All the header file and logic changes are included in the
      first 5 rules. The rest are trivial substitutions.
      I avoid changing any of the function signatures or any other
      filesystem specific data structures to keep the patch simple
      for review.
      
      The script can be a little shorter by combining different cases.
      But, this version was sufficient for my usecase.
      
      virtual patch
      
      @ depends on patch @
      identifier now;
      @@
      - struct timespec
      + struct timespec64
        current_time ( ... )
        {
      - struct timespec now = current_kernel_time();
      + struct timespec64 now = current_kernel_time64();
        ...
      - return timespec_trunc(
      + return timespec64_trunc(
        ... );
        }
      
      @ depends on patch @
      identifier xtime;
      @@
       struct \( iattr \| inode \| kstat \) {
       ...
      -       struct timespec xtime;
      +       struct timespec64 xtime;
       ...
       }
      
      @ depends on patch @
      identifier t;
      @@
       struct inode_operations {
       ...
      int (*update_time) (...,
      -       struct timespec t,
      +       struct timespec64 t,
      ...);
       ...
       }
      
      @ depends on patch @
      identifier t;
      identifier fn_update_time =~ "update_time$";
      @@
       fn_update_time (...,
      - struct timespec *t,
      + struct timespec64 *t,
       ...) { ... }
      
      @ depends on patch @
      identifier t;
      @@
      lease_get_mtime( ... ,
      - struct timespec *t
      + struct timespec64 *t
        ) { ... }
      
      @te depends on patch forall@
      identifier ts;
      local idexpression struct inode *inode_node;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn_update_time =~ "update_time$";
      identifier fn;
      expression e, E3;
      local idexpression struct inode *node1;
      local idexpression struct inode *node2;
      local idexpression struct iattr *attr1;
      local idexpression struct iattr *attr2;
      local idexpression struct iattr attr;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      @@
      (
      (
      - struct timespec ts;
      + struct timespec64 ts;
      |
      - struct timespec ts = current_time(inode_node);
      + struct timespec64 ts = current_time(inode_node);
      )
      
      <+... when != ts
      (
      - timespec_equal(&inode_node->i_xtime, &ts)
      + timespec64_equal(&inode_node->i_xtime, &ts)
      |
      - timespec_equal(&ts, &inode_node->i_xtime)
      + timespec64_equal(&ts, &inode_node->i_xtime)
      |
      - timespec_compare(&inode_node->i_xtime, &ts)
      + timespec64_compare(&inode_node->i_xtime, &ts)
      |
      - timespec_compare(&ts, &inode_node->i_xtime)
      + timespec64_compare(&ts, &inode_node->i_xtime)
      |
      ts = current_time(e)
      |
      fn_update_time(..., &ts,...)
      |
      inode_node->i_xtime = ts
      |
      node1->i_xtime = ts
      |
      ts = inode_node->i_xtime
      |
      <+... attr1->ia_xtime ...+> = ts
      |
      ts = attr1->ia_xtime
      |
      ts.tv_sec
      |
      ts.tv_nsec
      |
      btrfs_set_stack_timespec_sec(..., ts.tv_sec)
      |
      btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
      |
      - ts = timespec64_to_timespec(
      + ts =
      ...
      -)
      |
      - ts = ktime_to_timespec(
      + ts = ktime_to_timespec64(
      ...)
      |
      - ts = E3
      + ts = timespec_to_timespec64(E3)
      |
      - ktime_get_real_ts(&ts)
      + ktime_get_real_ts64(&ts)
      |
      fn(...,
      - ts
      + timespec64_to_timespec(ts)
      ,...)
      )
      ...+>
      (
      <... when != ts
      - return ts;
      + return timespec64_to_timespec(ts);
      ...>
      )
      |
      - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
      |
      - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
      + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
      |
      - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
      |
      node1->i_xtime1 =
      - timespec_trunc(attr1->ia_xtime1,
      + timespec64_trunc(attr1->ia_xtime1,
      ...)
      |
      - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
      + attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
      ...)
      |
      - ktime_get_real_ts(&attr1->ia_xtime1)
      + ktime_get_real_ts64(&attr1->ia_xtime1)
      |
      - ktime_get_real_ts(&attr.ia_xtime1)
      + ktime_get_real_ts64(&attr.ia_xtime1)
      )
      
      @ depends on patch @
      struct inode *node;
      struct iattr *attr;
      identifier fn;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      expression e;
      @@
      (
      - fn(node->i_xtime);
      + fn(timespec64_to_timespec(node->i_xtime));
      |
       fn(...,
      - node->i_xtime);
      + timespec64_to_timespec(node->i_xtime));
      |
      - e = fn(attr->ia_xtime);
      + e = fn(timespec64_to_timespec(attr->ia_xtime));
      )
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      )
      ...+>
      }
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      struct kstat *stat;
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier i_xtime =~ "^i_[acm]time$";
      identifier xtime =~ "^[acm]time$";
      identifier fn, ret;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(stat->xtime);
      ret = fn (...,
      - &stat->xtime);
      + &ts);
      )
      ...+>
      }
      
      @ depends on patch @
      struct inode *node;
      struct inode *node2;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier i_xtime3 =~ "^i_[acm]time$";
      struct iattr *attrp;
      struct iattr *attrp2;
      struct iattr attr ;
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      struct kstat *stat;
      struct kstat stat1;
      struct timespec64 ts;
      identifier xtime =~ "^[acmb]time$";
      expression e;
      @@
      (
      ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
      |
       node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       stat->xtime = node2->i_xtime1;
      |
       stat1.xtime = node2->i_xtime1;
      |
      ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
      |
      ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
      |
      - e = node->i_xtime1;
      + e = timespec64_to_timespec( node->i_xtime1 );
      |
      - e = attrp->ia_xtime1;
      + e = timespec64_to_timespec( attrp->ia_xtime1 );
      |
      node->i_xtime1 = current_time(...);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
       node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
      - node->i_xtime1 = e;
      + node->i_xtime1 = timespec_to_timespec64(e);
      )
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: <anton@tuxera.com>
      Cc: <balbi@kernel.org>
      Cc: <bfields@fieldses.org>
      Cc: <darrick.wong@oracle.com>
      Cc: <dhowells@redhat.com>
      Cc: <dsterba@suse.com>
      Cc: <dwmw2@infradead.org>
      Cc: <hch@lst.de>
      Cc: <hirofumi@mail.parknet.co.jp>
      Cc: <hubcap@omnibond.com>
      Cc: <jack@suse.com>
      Cc: <jaegeuk@kernel.org>
      Cc: <jaharkes@cs.cmu.edu>
      Cc: <jslaby@suse.com>
      Cc: <keescook@chromium.org>
      Cc: <mark@fasheh.com>
      Cc: <miklos@szeredi.hu>
      Cc: <nico@linaro.org>
      Cc: <reiserfs-devel@vger.kernel.org>
      Cc: <richard@nod.at>
      Cc: <sage@redhat.com>
      Cc: <sfrench@samba.org>
      Cc: <swhiteho@redhat.com>
      Cc: <tj@kernel.org>
      Cc: <trond.myklebust@primarydata.com>
      Cc: <tytso@mit.edu>
      Cc: <viro@zeniv.linux.org.uk>
      95582b00
  8. 18 4月, 2018 1 次提交
  9. 12 4月, 2018 1 次提交
  10. 31 3月, 2018 3 次提交
    • Q
      btrfs: qgroup: Use separate meta reservation type for delalloc · 43b18595
      Qu Wenruo 提交于
      Before this patch, btrfs qgroup is mixing per-transcation meta rsv with
      preallocated meta rsv, making it quite easy to underflow qgroup meta
      reservation.
      
      Since we have the new qgroup meta rsv types, apply it to delalloc
      reservation.
      
      Now for delalloc, most of its reserved space will use META_PREALLOC qgroup
      rsv type.
      
      And for callers reducing outstanding extent like btrfs_finish_ordered_io(),
      they will convert corresponding META_PREALLOC reservation to
      META_PERTRANS.
      
      This is mainly due to the fact that current qgroup numbers will only be
      updated in btrfs_commit_transaction(), that's to say if we don't keep
      such placeholder reservation, we can exceed qgroup limitation.
      
      And for callers freeing outstanding extent in error handler, we will
      just free META_PREALLOC bytes.
      
      This behavior makes callers of btrfs_qgroup_release_meta() or
      btrfs_qgroup_convert_meta() to be aware of which type they are.
      So in this patch, btrfs_delalloc_release_metadata() and its callers get
      an extra parameter to info qgroup to do correct meta convert/release.
      
      The good news is, even we use the wrong type (convert or free), it won't
      cause obvious bug, as prealloc type is always in good shape, and the
      type only affects how per-trans meta is increased or not.
      
      So the worst case will be at most metadata limitation can be sometimes
      exceeded (no convert at all) or metadata limitation is reached too soon
      (no free at all).
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      43b18595
    • N
      btrfs: Remove userspace transaction ioctls · 7a5a07a8
      Nikolay Borisov 提交于
      Commit 3558d4f8 ("btrfs: Deprecate userspace transaction ioctls")
      marked the beginning of the end of userspace transaction. This commit
      finishes the job! There are no known users and ceph does not use the
      ioctl anymore.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Acked-by: NSage Weil <sage@redhat.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7a5a07a8
    • D
      btrfs: open code trivial helper btrfs_page_exists_in_range · 051c98eb
      David Sterba 提交于
      The called function name is self explanatory.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      051c98eb
  11. 26 3月, 2018 2 次提交
  12. 29 1月, 2018 1 次提交
    • J
      fs: new API for handling inode->i_version · ae5e165d
      Jeff Layton 提交于
      Add a documentation blob that explains what the i_version field is, how
      it is expected to work, and how it is currently implemented by various
      filesystems.
      
      We already have inode_inc_iversion. Add several other functions for
      manipulating and accessing the i_version counter. For now, the
      implementation is trivial and basically works the way that all of the
      open-coded i_version accesses work today.
      
      Future patches will convert existing users of i_version to use the new
      API, and then convert the backend implementation to do things more
      efficiently.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      ae5e165d
  13. 22 1月, 2018 10 次提交
    • F
      Btrfs: fix space leak after fallocate and zero range operations · 81fdf638
      Filipe Manana 提交于
      If we do a buffered write after a zero range operation that has an
      unaligned (with the filesystem's sector size) end which also falls within
      an unwritten (prealloc) extent that is currently beyond the inode's
      i_size, and the zero range operation has the flag FALLOC_FL_KEEP_SIZE,
      we end up leaking data and metadata space. This happens because when
      zeroing a range we call btrfs_truncate_block(), which does delalloc
      (loads the page and partially zeroes its content), and in the buffered
      write path we only clear existing delalloc space reservation for the
      range we are writing into if that range starts at an offset smaller then
      the inode's i_size, which makes sense since we can not have delalloc
      extents beyond the i_size, only unwritten extents are allowed.
      
      Example reproducer:
      
       $ mkfs.btrfs -f /dev/sdb
       $ mount /dev/sdb /mnt
       $ xfs_io -f -c "falloc -k 428K 4K" /mnt/foobar
       $ xfs_io -c "fzero -k 0 430K" /mnt/foobar
       $ xfs_io -c "pwrite -S 0xaa 428K 4K" /mnt/foobar
       $ umount /mnt
      
      After the unmount we get the metadata and data space leaks reported in
      dmesg/syslog:
      
       [95794.602253] ------------[ cut here ]------------
       [95794.603322] WARNING: CPU: 0 PID: 31496 at fs/btrfs/inode.c:9561 btrfs_destroy_inode+0x4e/0x206 [btrfs]
       [95794.605167] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs]
       [95794.613000] CPU: 0 PID: 31496 Comm: umount Tainted: G        W       4.14.0-rc6-btrfs-next-54+ #1
       [95794.614448] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
       [95794.615972] task: ffff880075aa0240 task.stack: ffffc90001734000
       [95794.617114] RIP: 0010:btrfs_destroy_inode+0x4e/0x206 [btrfs]
       [95794.618001] RSP: 0018:ffffc90001737d00 EFLAGS: 00010202
       [95794.618721] RAX: 0000000000000000 RBX: ffff880070fa1418 RCX: ffffc90001737c7c
       [95794.619645] RDX: 0000000175aa0240 RSI: 0000000000000001 RDI: ffff880070fa1418
       [95794.620711] RBP: ffffc90001737d38 R08: 0000000000000000 R09: 0000000000000000
       [95794.621932] R10: ffffc90001737c48 R11: ffff88007123e158 R12: ffff880075b6a000
       [95794.623124] R13: ffff88006145c000 R14: ffff880070fa1418 R15: ffff880070c3b4a0
       [95794.624188] FS:  00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
       [95794.625578] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [95794.626522] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0
       [95794.627647] Call Trace:
       [95794.628128]  destroy_inode+0x3d/0x55
       [95794.628573]  evict+0x177/0x17e
       [95794.629010]  dispose_list+0x50/0x71
       [95794.629478]  evict_inodes+0x132/0x141
       [95794.630289]  generic_shutdown_super+0x3f/0x10b
       [95794.630864]  kill_anon_super+0x12/0x1c
       [95794.631383]  btrfs_kill_super+0x16/0x21 [btrfs]
       [95794.631930]  deactivate_locked_super+0x30/0x68
       [95794.632539]  deactivate_super+0x36/0x39
       [95794.633200]  cleanup_mnt+0x49/0x67
       [95794.633818]  __cleanup_mnt+0x12/0x14
       [95794.634416]  task_work_run+0x82/0xa6
       [95794.634902]  prepare_exit_to_usermode+0xe1/0x10c
       [95794.635525]  syscall_return_slowpath+0x18c/0x1af
       [95794.636122]  entry_SYSCALL_64_fastpath+0xab/0xad
       [95794.636834] RIP: 0033:0x7fa678cb99a7
       [95794.637370] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       [95794.638672] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7
       [95794.639596] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90
       [95794.640703] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015
       [95794.641773] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64
       [95794.643150] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160
       [95794.644249] Code: ff 4c 8b a8 80 06 00 00 48 8b 87 c0 01 00 00 48 85 c0 74 02 0f ff 48 83 bb e0 02 00 00 00 74 02 0f ff 83 bb 3c ff ff ff 00 74 02 <0f> ff 83 bb 40 ff ff ff 00 74 02 0f ff 48 83 bb f8 fe ff ff 00
       [95794.646929] ---[ end trace e95877675c6ec007 ]---
       [95794.647751] ------------[ cut here ]------------
       [95794.648509] WARNING: CPU: 0 PID: 31496 at fs/btrfs/inode.c:9562 btrfs_destroy_inode+0x59/0x206 [btrfs]
       [95794.649842] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs]
       [95794.654659] CPU: 0 PID: 31496 Comm: umount Tainted: G        W       4.14.0-rc6-btrfs-next-54+ #1
       [95794.655894] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
       [95794.657546] task: ffff880075aa0240 task.stack: ffffc90001734000
       [95794.658433] RIP: 0010:btrfs_destroy_inode+0x59/0x206 [btrfs]
       [95794.659279] RSP: 0018:ffffc90001737d00 EFLAGS: 00010202
       [95794.660054] RAX: 0000000000000000 RBX: ffff880070fa1418 RCX: ffffc90001737c7c
       [95794.660753] RDX: 0000000175aa0240 RSI: 0000000000000001 RDI: ffff880070fa1418
       [95794.661513] RBP: ffffc90001737d38 R08: 0000000000000000 R09: 0000000000000000
       [95794.662289] R10: ffffc90001737c48 R11: ffff88007123e158 R12: ffff880075b6a000
       [95794.663393] R13: ffff88006145c000 R14: ffff880070fa1418 R15: ffff880070c3b4a0
       [95794.664342] FS:  00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
       [95794.665673] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [95794.666593] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0
       [95794.667629] Call Trace:
       [95794.668065]  destroy_inode+0x3d/0x55
       [95794.668637]  evict+0x177/0x17e
       [95794.669179]  dispose_list+0x50/0x71
       [95794.669830]  evict_inodes+0x132/0x141
       [95794.670416]  generic_shutdown_super+0x3f/0x10b
       [95794.671103]  kill_anon_super+0x12/0x1c
       [95794.671786]  btrfs_kill_super+0x16/0x21 [btrfs]
       [95794.672552]  deactivate_locked_super+0x30/0x68
       [95794.673393]  deactivate_super+0x36/0x39
       [95794.674107]  cleanup_mnt+0x49/0x67
       [95794.674706]  __cleanup_mnt+0x12/0x14
       [95794.675279]  task_work_run+0x82/0xa6
       [95794.675795]  prepare_exit_to_usermode+0xe1/0x10c
       [95794.676507]  syscall_return_slowpath+0x18c/0x1af
       [95794.677275]  entry_SYSCALL_64_fastpath+0xab/0xad
       [95794.678006] RIP: 0033:0x7fa678cb99a7
       [95794.678600] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       [95794.679739] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7
       [95794.680779] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90
       [95794.681837] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015
       [95794.682867] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64
       [95794.683891] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160
       [95794.684843] Code: c0 01 00 00 48 85 c0 74 02 0f ff 48 83 bb e0 02 00 00 00 74 02 0f ff 83 bb 3c ff ff ff 00 74 02 0f ff 83 bb 40 ff ff ff 00 74 02 <0f> ff 48 83 bb f8 fe ff ff 00 74 02 0f ff 48 83 bb 00 ff ff ff
       [95794.687156] ---[ end trace e95877675c6ec008 ]---
       [95794.687876] ------------[ cut here ]------------
       [95794.688579] WARNING: CPU: 0 PID: 31496 at fs/btrfs/inode.c:9565 btrfs_destroy_inode+0x7d/0x206 [btrfs]
       [95794.689735] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs]
       [95794.695015] CPU: 0 PID: 31496 Comm: umount Tainted: G        W       4.14.0-rc6-btrfs-next-54+ #1
       [95794.696396] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
       [95794.697956] task: ffff880075aa0240 task.stack: ffffc90001734000
       [95794.698925] RIP: 0010:btrfs_destroy_inode+0x7d/0x206 [btrfs]
       [95794.699763] RSP: 0018:ffffc90001737d00 EFLAGS: 00010206
       [95794.700434] RAX: 0000000000000000 RBX: ffff880070fa1418 RCX: ffffc90001737c7c
       [95794.701445] RDX: 0000000175aa0240 RSI: 0000000000000001 RDI: ffff880070fa1418
       [95794.702448] RBP: ffffc90001737d38 R08: 0000000000000000 R09: 0000000000000000
       [95794.703557] R10: ffffc90001737c48 R11: ffff88007123e158 R12: ffff880075b6a000
       [95794.704441] R13: ffff88006145c000 R14: ffff880070fa1418 R15: ffff880070c3b4a0
       [95794.705270] FS:  00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
       [95794.706341] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [95794.707001] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0
       [95794.708030] Call Trace:
       [95794.708466]  destroy_inode+0x3d/0x55
       [95794.709071]  evict+0x177/0x17e
       [95794.709497]  dispose_list+0x50/0x71
       [95794.709973]  evict_inodes+0x132/0x141
       [95794.710564]  generic_shutdown_super+0x3f/0x10b
       [95794.711200]  kill_anon_super+0x12/0x1c
       [95794.711633]  btrfs_kill_super+0x16/0x21 [btrfs]
       [95794.712139]  deactivate_locked_super+0x30/0x68
       [95794.712608]  deactivate_super+0x36/0x39
       [95794.713093]  cleanup_mnt+0x49/0x67
       [95794.713514]  __cleanup_mnt+0x12/0x14
       [95794.713933]  task_work_run+0x82/0xa6
       [95794.714543]  prepare_exit_to_usermode+0xe1/0x10c
       [95794.715247]  syscall_return_slowpath+0x18c/0x1af
       [95794.715952]  entry_SYSCALL_64_fastpath+0xab/0xad
       [95794.716653] RIP: 0033:0x7fa678cb99a7
       [95794.721100] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       [95794.722052] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7
       [95794.722856] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90
       [95794.723698] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015
       [95794.724736] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64
       [95794.725928] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160
       [95794.726728] Code: 40 ff ff ff 00 74 02 0f ff 48 83 bb f8 fe ff ff 00 74 02 0f ff 48 83 bb 00 ff ff ff 00 74 02 0f ff 48 83 bb 30 ff ff ff 00 74 02 <0f> ff 48 83 bb 08 ff ff ff 00 74 02 0f ff 4d 85 e4 0f 84 52 01
       [95794.729203] ---[ end trace e95877675c6ec009 ]---
       [95794.841054] ------------[ cut here ]------------
       [95794.841829] WARNING: CPU: 0 PID: 31496 at fs/btrfs/extent-tree.c:5831 btrfs_free_block_groups+0x235/0x36a [btrfs]
       [95794.843425] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs]
       [95794.850658] CPU: 0 PID: 31496 Comm: umount Tainted: G        W       4.14.0-rc6-btrfs-next-54+ #1
       [95794.852590] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
       [95794.854752] task: ffff880075aa0240 task.stack: ffffc90001734000
       [95794.855812] RIP: 0010:btrfs_free_block_groups+0x235/0x36a [btrfs]
       [95794.856811] RSP: 0018:ffffc90001737d70 EFLAGS: 00010206
       [95794.857805] RAX: 0000000080000000 RBX: ffff88006145c000 RCX: 0000000000000001
       [95794.859014] RDX: 00000001810af668 RSI: 0000000000000002 RDI: 00000000ffffffff
       [95794.860270] RBP: ffffc90001737d98 R08: 0000000000000000 R09: ffffffff817e22b9
       [95794.861525] R10: ffffc90001737c80 R11: 00000000000337fd R12: 0000000000000000
       [95794.862700] R13: ffff88006145c0c0 R14: ffff88021b61a800 R15: ffff88006145c100
       [95794.863810] FS:  00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
       [95794.865149] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [95794.866099] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0
       [95794.867198] Call Trace:
       [95794.867626]  close_ctree+0x1db/0x2b8 [btrfs]
       [95794.868188]  ? evict_inodes+0x132/0x141
       [95794.869037]  btrfs_put_super+0x15/0x17 [btrfs]
       [95794.870400]  generic_shutdown_super+0x6a/0x10b
       [95794.871262]  kill_anon_super+0x12/0x1c
       [95794.872046]  btrfs_kill_super+0x16/0x21 [btrfs]
       [95794.872746]  deactivate_locked_super+0x30/0x68
       [95794.873687]  deactivate_super+0x36/0x39
       [95794.874639]  cleanup_mnt+0x49/0x67
       [95794.875504]  __cleanup_mnt+0x12/0x14
       [95794.876126]  task_work_run+0x82/0xa6
       [95794.876788]  prepare_exit_to_usermode+0xe1/0x10c
       [95794.877777]  syscall_return_slowpath+0x18c/0x1af
       [95794.878381]  entry_SYSCALL_64_fastpath+0xab/0xad
       [95794.878888] RIP: 0033:0x7fa678cb99a7
       [95794.879307] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       [95794.880204] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7
       [95794.881640] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90
       [95794.882690] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015
       [95794.883538] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64
       [95794.884562] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160
       [95794.885664] Code: 89 ef e8 07 ec 32 e1 e8 9d c0 ea e0 48 8d b3 28 02 00 00 48 83 c9 ff 31 d2 48 89 df e8 29 c5 ff ff 48 83 bb 80 02 00 00 00 74 02 <0f> ff 48 83 bb 88 02 00 00 00 74 02 0f ff 48 83 bb d8 02 00 00
       [95794.887980] ---[ end trace e95877675c6ec00a ]---
       [95794.888739] ------------[ cut here ]------------
       [95794.889405] WARNING: CPU: 0 PID: 31496 at fs/btrfs/extent-tree.c:5832 btrfs_free_block_groups+0x241/0x36a [btrfs]
       [95794.891020] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs]
       [95794.897551] CPU: 0 PID: 31496 Comm: umount Tainted: G        W       4.14.0-rc6-btrfs-next-54+ #1
       [95794.898509] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
       [95794.899685] task: ffff880075aa0240 task.stack: ffffc90001734000
       [95794.900592] RIP: 0010:btrfs_free_block_groups+0x241/0x36a [btrfs]
       [95794.901387] RSP: 0018:ffffc90001737d70 EFLAGS: 00010206
       [95794.902300] RAX: 0000000080000000 RBX: ffff88006145c000 RCX: 0000000000000001
       [95794.903260] RDX: 00000001810af668 RSI: 0000000000000002 RDI: 00000000ffffffff
       [95794.904332] RBP: ffffc90001737d98 R08: 0000000000000000 R09: ffffffff817e22b9
       [95794.905300] R10: ffffc90001737c80 R11: 00000000000337fd R12: 0000000000000000
       [95794.906439] R13: ffff88006145c0c0 R14: ffff88021b61a800 R15: ffff88006145c100
       [95794.907459] FS:  00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
       [95794.908625] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [95794.909511] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0
       [95794.910630] Call Trace:
       [95794.911153]  close_ctree+0x1db/0x2b8 [btrfs]
       [95794.911837]  ? evict_inodes+0x132/0x141
       [95794.912344]  btrfs_put_super+0x15/0x17 [btrfs]
       [95794.912975]  generic_shutdown_super+0x6a/0x10b
       [95794.913788]  kill_anon_super+0x12/0x1c
       [95794.914424]  btrfs_kill_super+0x16/0x21 [btrfs]
       [95794.915142]  deactivate_locked_super+0x30/0x68
       [95794.915831]  deactivate_super+0x36/0x39
       [95794.916433]  cleanup_mnt+0x49/0x67
       [95794.917045]  __cleanup_mnt+0x12/0x14
       [95794.917665]  task_work_run+0x82/0xa6
       [95794.918309]  prepare_exit_to_usermode+0xe1/0x10c
       [95794.919021]  syscall_return_slowpath+0x18c/0x1af
       [95794.919722]  entry_SYSCALL_64_fastpath+0xab/0xad
       [95794.920426] RIP: 0033:0x7fa678cb99a7
       [95794.921039] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       [95794.922303] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7
       [95794.923335] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90
       [95794.924364] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015
       [95794.925435] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64
       [95794.926533] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160
       [95794.927557] Code: 48 8d b3 28 02 00 00 48 83 c9 ff 31 d2 48 89 df e8 29 c5 ff ff 48 83 bb 80 02 00 00 00 74 02 0f ff 48 83 bb 88 02 00 00 00 74 02 <0f> ff 48 83 bb d8 02 00 00 00 74 02 0f ff 48 83 bb e0 02 00 00
       [95794.930166] ---[ end trace e95877675c6ec00b ]---
       [95794.930961] ------------[ cut here ]------------
       [95794.931727] WARNING: CPU: 0 PID: 31496 at fs/btrfs/extent-tree.c:9953 btrfs_free_block_groups+0x2bc/0x36a [btrfs]
       [95794.932729] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs]
       [95794.938394] CPU: 0 PID: 31496 Comm: umount Tainted: G        W       4.14.0-rc6-btrfs-next-54+ #1
       [95794.939842] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
       [95794.941455] task: ffff880075aa0240 task.stack: ffffc90001734000
       [95794.942336] RIP: 0010:btrfs_free_block_groups+0x2bc/0x36a [btrfs]
       [95794.943268] RSP: 0018:ffffc90001737d70 EFLAGS: 00010206
       [95794.944127] RAX: ffff8802004fd0e8 RBX: ffff88006145c000 RCX: 0000000000000001
       [95794.945211] RDX: 00000001810af668 RSI: 0000000000000002 RDI: 00000000ffffffff
       [95794.946316] RBP: ffffc90001737d98 R08: 0000000000000000 R09: ffffffff817e22b9
       [95794.947271] R10: ffffc90001737c80 R11: 00000000000337fd R12: ffff8802004fd0e8
       [95794.948219] R13: ffff88006145c0c0 R14: ffff88006145e598 R15: ffff88006145c100
       [95794.949193] FS:  00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
       [95794.950495] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [95794.951338] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0
       [95794.952361] Call Trace:
       [95794.952811]  close_ctree+0x1db/0x2b8 [btrfs]
       [95794.953522]  ? evict_inodes+0x132/0x141
       [95794.954543]  btrfs_put_super+0x15/0x17 [btrfs]
       [95794.955231]  generic_shutdown_super+0x6a/0x10b
       [95794.955916]  kill_anon_super+0x12/0x1c
       [95794.956414]  btrfs_kill_super+0x16/0x21 [btrfs]
       [95794.956953]  deactivate_locked_super+0x30/0x68
       [95794.957635]  deactivate_super+0x36/0x39
       [95794.958256]  cleanup_mnt+0x49/0x67
       [95794.958701]  __cleanup_mnt+0x12/0x14
       [95794.959181]  task_work_run+0x82/0xa6
       [95794.959635]  prepare_exit_to_usermode+0xe1/0x10c
       [95794.960182]  syscall_return_slowpath+0x18c/0x1af
       [95794.960731]  entry_SYSCALL_64_fastpath+0xab/0xad
       [95794.961438] RIP: 0033:0x7fa678cb99a7
       [95794.961990] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       [95794.963111] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7
       [95794.963975] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90
       [95794.964680] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015
       [95794.965763] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64
       [95794.966868] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160
       [95794.967800] Code: 00 00 00 4c 8b a3 98 25 00 00 49 83 bc 24 60 ff ff ff 00 75 16 49 83 bc 24 68 ff ff ff 00 75 0b 49 83 bc 24 70 ff ff ff 00 74 16 <0f> ff 49 8d b4 24 18 ff ff ff 31 c9 31 d2 48 89 df e8 93 7a ff
       [95794.970629] ---[ end trace e95877675c6ec00c ]---
       [95794.971451] BTRFS info (device sdi): space_info 1 has 7680000 free, is not full
       [95794.972351] BTRFS info (device sdi): space_info total=8388608, used=704512, pinned=0, reserved=0, may_use=4096, readonly=0
       [95794.973595] ------------[ cut here ]------------
       [95794.974353] WARNING: CPU: 0 PID: 31496 at fs/btrfs/extent-tree.c:9953 btrfs_free_block_groups+0x2bc/0x36a [btrfs]
       [95794.980163] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs]
       [95794.986461] CPU: 0 PID: 31496 Comm: umount Tainted: G        W       4.14.0-rc6-btrfs-next-54+ #1
       [95794.987591] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
       [95794.988929] task: ffff880075aa0240 task.stack: ffffc90001734000
       [95794.989922] RIP: 0010:btrfs_free_block_groups+0x2bc/0x36a [btrfs]
       [95794.990715] RSP: 0018:ffffc90001737d70 EFLAGS: 00010206
       [95794.991431] RAX: ffff88020f6e70e8 RBX: ffff88006145c000 RCX: ffffffff8115a906
       [95794.992455] RDX: ffffffff8115a902 RSI: ffff880075aa0b40 RDI: ffff880075aa0b40
       [95794.993535] RBP: ffffc90001737d98 R08: 0000000000000020 R09: fffffffffffffff7
       [95794.994573] R10: 00000000ffffffc4 R11: ffff8800633b1bc0 R12: ffff88020f6e70e8
       [95794.996250] R13: 0000000000000038 R14: ffff88006145e598 R15: 0000000000000000
       [95794.997233] FS:  00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
       [95794.998592] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [95794.999484] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0
       [95795.000542] Call Trace:
       [95795.001138]  close_ctree+0x1db/0x2b8 [btrfs]
       [95795.001885]  ? evict_inodes+0x132/0x141
       [95795.002407]  btrfs_put_super+0x15/0x17 [btrfs]
       [95795.003093]  generic_shutdown_super+0x6a/0x10b
       [95795.003720]  kill_anon_super+0x12/0x1c
       [95795.004353]  btrfs_kill_super+0x16/0x21 [btrfs]
       [95795.005095]  deactivate_locked_super+0x30/0x68
       [95795.005716]  deactivate_super+0x36/0x39
       [95795.006388]  cleanup_mnt+0x49/0x67
       [95795.006939]  __cleanup_mnt+0x12/0x14
       [95795.007512]  task_work_run+0x82/0xa6
       [95795.008124]  prepare_exit_to_usermode+0xe1/0x10c
       [95795.008994]  syscall_return_slowpath+0x18c/0x1af
       [95795.009831]  entry_SYSCALL_64_fastpath+0xab/0xad
       [95795.010610] RIP: 0033:0x7fa678cb99a7
       [95795.011193] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       [95795.012327] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7
       [95795.013432] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90
       [95795.014558] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015
       [95795.015577] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64
       [95795.016569] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160
       [95795.017662] Code: 00 00 00 4c 8b a3 98 25 00 00 49 83 bc 24 60 ff ff ff 00 75 16 49 83 bc 24 68 ff ff ff 00 75 0b 49 83 bc 24 70 ff ff ff 00 74 16 <0f> ff 49 8d b4 24 18 ff ff ff 31 c9 31 d2 48 89 df e8 93 7a ff
       [95795.020538] ---[ end trace e95877675c6ec00d ]---
       [95795.021259] BTRFS info (device sdi): space_info 4 has 1072775168 free, is not full
       [95795.022390] BTRFS info (device sdi): space_info total=1073741824, used=114688, pinned=0, reserved=0, may_use=786432, readonly=65536
      
      Fix this by ensuring the zero range operation does not call
      btrfs_truncate_block() if the corresponding extent is an unwritten one
      (it's pointless anyway, since reading from an unwritten extent yields
      zeroes).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Tested-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      81fdf638
    • F
      Btrfs: fix missing inode i_size update after zero range operation · 9f13ce74
      Filipe Manana 提交于
      For a fallocate's zero range operation that targets a range with an end
      that is not aligned to the sector size, we can end up not updating the
      inode's i_size. This happens when the last page of the range maps to an
      unwritten (prealloc) extent and before that last page we have either a
      hole or a written extent. This is because in this scenario we relied
      on a call to btrfs_prealloc_file_range() to update the inode's i_size,
      however it can only update the i_size to the "down aligned" end of the
      range.
      
      Example:
      
       $ mkfs.btrfs -f /dev/sdc
       $ mount /dev/sdc /mnt
       $ xfs_io -f -c "pwrite -S 0xff 0 428K" /mnt/foobar
       $ xfs_io -c "falloc -k 428K 4K" /mnt/foobar
       $ xfs_io -c "fzero 0 430K" /mnt/foobar
       $ du --bytes /mnt/foobar
       438272	/mnt/foobar
      
      The inode's i_size was left as 428Kb (438272 bytes) when it should have
      been updated to 430Kb (440320 bytes).
      Fix this by always updating the inode's i_size explicitly after zeroing
      the range.
      
      Fixes: ba6d5887946ff86d93dc ("Btrfs: add support for fallocate's zero range operation")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9f13ce74
    • F
      Btrfs: use cached state when dirtying pages during buffered write · 94f45071
      Filipe Manana 提交于
      During a buffered IO write, we can have an extent state that we got when
      we locked the range (if the range starts at an offset lower than eof), so
      always pass it to btrfs_dirty_pages() so that setting the delalloc bit
      in the range does not need to do a full search in the inode's io tree,
      saving time and reducing the amount of time we hold the io tree's lock.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      94f45071
    • F
      Btrfs: add support for fallocate's zero range operation · f27451f2
      Filipe Manana 提交于
      This implements support the zero range operation of fallocate. For now
      at least it's as simple as possible while reusing most of the existing
      fallocate and hole punching infrastructure.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f27451f2
    • D
      btrfs: sink unlock_extent parameter gfp_flags · e43bbe5e
      David Sterba 提交于
      All callers pass either GFP_NOFS or GFP_KERNEL now, so we can sink the
      parameter to the function, though we lose some of the slightly better
      semantics of GFP_KERNEL in some places, it's worth cleaning up the
      callchains.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e43bbe5e
    • L
      Btrfs: set plug for fsync · 343e4fc1
      Liu Bo 提交于
      Setting plug can merge adjacent IOs before dispatching IOs to the disk
      driver.
      
      Without plug, it'd not be a problem for single disk usecases, but for
      multiple disks using raid profile, a large IO can be split to several
      IOs of stripe length, and plug can be helpful to bring them together
      for each disk so that we can save several disk access.
      
      Moreover, fsync issues synchronous writes, so plug can really take
      effect.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      343e4fc1
    • D
      btrfs: sink gfp parameter to clear_extent_bit · ae0f1625
      David Sterba 提交于
      All callers use GFP_NOFS, we don't have to pass it as an argument. The
      built-in tests pass GFP_KERNEL, but they run only at module load time
      and NOFS works there as well.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ae0f1625
    • L
      Btrfs: add __init macro to btrfs init functions · f5c29bd9
      Liu Bo 提交于
      Adding __init macro gives kernel a hint that this function is only used
      during the initialization phase and its memory resources can be freed up
      after.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f5c29bd9
    • N
      btrfs: Use locked_end rather than open coding it · 96b09dde
      Nikolay Borisov 提交于
      Right before we go into this loop locked_end is set to alloc_end - 1 and
      is being used in nearby functions, no need to have exceptions. This just
      makes the code consistent, no functional changes.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      96b09dde
    • N
      btrfs: Move loop termination condition in while() · 6b7d6e93
      Nikolay Borisov 提交于
      Fallocating a file in btrfs goes through several stages. The one before
      actually inserting the fallocated extents is to create a qgroup
      reservation, covering the desired range. To this end there is a loop in
      btrfs_fallocate which checks to see if there are holes in the fallocated
      range or !PREALLOC extents past EOF and if so create qgroup reservations
      for them. Unfortunately, the main condition of the loop is burried right
      at the end of its body rather than in the actual while statement which
      makes it non-obvious. Fix this by moving the condition in the while
      statement where it belongs. No functional changes.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6b7d6e93
  14. 28 11月, 2017 1 次提交
    • L
      Btrfs: fix list_add corruption and soft lockups in fsync · ebb70442
      Liu Bo 提交于
      Xfstests btrfs/146 revealed this corruption,
      
      [   58.138831] Buffer I/O error on dev dm-0, logical block 2621424, async page read
      [   58.151233] BTRFS error (device sdf): bdev /dev/mapper/error-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
      [   58.152403] list_add corruption. prev->next should be next (ffff88005e6775d8), but was ffffc9000189be88. (prev=ffffc9000189be88).
      [   58.153518] ------------[ cut here ]------------
      [   58.153892] WARNING: CPU: 1 PID: 1287 at lib/list_debug.c:31 __list_add_valid+0x169/0x1f0
      ...
      [   58.157379] RIP: 0010:__list_add_valid+0x169/0x1f0
      ...
      [   58.161956] Call Trace:
      [   58.162264]  btrfs_log_inode_parent+0x5bd/0xfb0 [btrfs]
      [   58.163583]  btrfs_log_dentry_safe+0x60/0x80 [btrfs]
      [   58.164003]  btrfs_sync_file+0x4c2/0x6f0 [btrfs]
      [   58.164393]  vfs_fsync_range+0x5f/0xd0
      [   58.164898]  do_fsync+0x5a/0x90
      [   58.165170]  SyS_fsync+0x10/0x20
      [   58.165395]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      ...
      
      It turns out that we could record btrfs_log_ctx:io_err in
      log_one_extents when IO fails, but make log_one_extents() return '0'
      instead of -EIO, so the IO error is not acknowledged by the callers,
      i.e.  btrfs_log_inode_parent(), which would remove btrfs_log_ctx:list
      from list head 'root->log_ctxs'.  Since btrfs_log_ctx is allocated
      from stack memory, it'd get freed with a object alive on the
      list. then a future list_add will throw the above warning.
      
      This returns the correct error in the above case.
      
      Jeff also reported this while testing against his fsync error
      patch set[1].
      
      [1]: https://www.spinics.net/lists/linux-btrfs/msg65308.html
      "btrfs list corruption and soft lockups while testing writeback error handling"
      
      Fixes: 8407f553 ("Btrfs: fix data corruption after fast fsync and writeback error")
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ebb70442
  15. 16 11月, 2017 2 次提交
    • F
      Btrfs: fix reported number of inode blocks after buffered append writes · e3b8a485
      Filipe Manana 提交于
      The patch from commit a7e3b975 ("Btrfs: fix reported number of inode
      blocks") introduced a regression where if we do a buffered write starting
      at position equal to or greater than the file's size and then stat(2) the
      file before writeback is triggered, the number of used blocks does not
      change (unless there's a prealloc/unwritten extent). Example:
      
        $ xfs_io -f -c "pwrite -S 0xab 0 64K" foobar
        $ du -h foobar
        0	foobar
        $ sync
        $ du -h foobar
        64K	foobar
      
      The first version of that patch didn't had this regression and the second
      version, which was the one committed, was made only to address some
      performance regression detected by the intel test robots using fs_mark.
      
      This fixes the regression by setting the new delaloc bit in the range, and
      doing it at btrfs_dirty_pages() while setting the regular dealloc bit as
      well, so that this way we set both bits at once avoiding navigation of the
      inode's io tree twice. Doing it at btrfs_dirty_pages() is also the most
      meaninful place, as we should set the new dellaloc bit when if we set the
      delalloc bit, which happens only if we copied bytes into the pages at
      __btrfs_buffered_write().
      
      This was making some of LTP's du tests fail, which can be quickly run
      using a command line like the following:
      
        $ ./runltp -q -p -l /ltp.log -f commands -s du -d /mnt
      
      Fixes: a7e3b975 ("Btrfs: fix reported number of inode blocks")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e3b8a485
    • F
      Btrfs: move definition of the function btrfs_find_new_delalloc_bytes · f48bf66b
      Filipe Manana 提交于
      Move the definition of the function btrfs_find_new_delalloc_bytes() closer
      to the function btrfs_dirty_pages(), because in a future commit it will be
      used exclusively by btrfs_dirty_pages(). This just moves the function's
      definition, with no functional changes at all.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f48bf66b
  16. 02 11月, 2017 1 次提交
    • J
      Btrfs: rework outstanding_extents · 8b62f87b
      Josef Bacik 提交于
      Right now we do a lot of weird hoops around outstanding_extents in order
      to keep the extent count consistent.  This is because we logically
      transfer the outstanding_extent count from the initial reservation
      through the set_delalloc_bits.  This makes it pretty difficult to get a
      handle on how and when we need to mess with outstanding_extents.
      
      Fix this by revamping the rules of how we deal with outstanding_extents.
      Now instead everybody that is holding on to a delalloc extent is
      required to increase the outstanding extents count for itself.  This
      means we'll have something like this
      
      btrfs_delalloc_reserve_metadata	- outstanding_extents = 1
       btrfs_set_extent_delalloc	- outstanding_extents = 2
      btrfs_release_delalloc_extents	- outstanding_extents = 1
      
      for an initial file write.  Now take the append write where we extend an
      existing delalloc range but still under the maximum extent size
      
      btrfs_delalloc_reserve_metadata - outstanding_extents = 2
        btrfs_set_extent_delalloc
          btrfs_set_bit_hook		- outstanding_extents = 3
          btrfs_merge_extent_hook	- outstanding_extents = 2
      btrfs_delalloc_release_extents	- outstanding_extnets = 1
      
      In order to make the ordered extent transition we of course must now
      make ordered extents carry their own outstanding_extent reservation, so
      for cow_file_range we end up with
      
      btrfs_add_ordered_extent	- outstanding_extents = 2
      clear_extent_bit		- outstanding_extents = 1
      btrfs_remove_ordered_extent	- outstanding_extents = 0
      
      This makes all manipulations of outstanding_extents much more explicit.
      Every successful call to btrfs_delalloc_reserve_metadata _must_ now be
      combined with btrfs_release_delalloc_extents, even in the error case, as
      that is the only function that actually modifies the
      outstanding_extents counter.
      
      The drawback to this is now we are much more likely to have transient
      cases where outstanding_extents is much larger than it actually should
      be.  This could happen before as we manipulated the delalloc bits, but
      now it happens basically at every write.  This may put more pressure on
      the ENOSPC flushing code, but I think making this code simpler is worth
      the cost.  I have another change coming to mitigate this side-effect
      somewhat.
      
      I also added trace points for the counter manipulation.  These were used
      by a bpf script I wrote to help track down leak issues.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8b62f87b
  17. 30 10月, 2017 3 次提交