1. 30 11月, 2016 2 次提交
  2. 04 10月, 2016 2 次提交
  3. 27 9月, 2016 4 次提交
  4. 26 9月, 2016 3 次提交
  5. 25 8月, 2016 7 次提交
    • L
      Btrfs: detect corruption when non-root leaf has zero item · 1ba98d08
      Liu Bo 提交于
      Right now we treat leaf which has zero item as a valid one
      because we could have an empty tree, that is, a root that is
      also a leaf without any item, however, in the same case but
      when the leaf is not a root, we can end up with hitting the
      BUG_ON(1) in btrfs_extend_item() called by
      setup_inline_extent_backref().
      
      This makes us check the situation as a corruption if leaf is
      not its own root.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      1ba98d08
    • L
      Btrfs: check btree node's nritems · 053ab70f
      Liu Bo 提交于
      When btree node (level = 1) has nritems which equals to zero,
      we can end up with panic due to insert_ptr()'s
      
      BUG_ON(slot > nritems);
      
      where slot is 1 and nritems is 0, as copy_for_split() calls
      insert_ptr(.., path->slots[1] + 1, ...);
      
      A invalid value results in the whole mess, this adds the check
      for btree's node nritems so that we stop reading block when
      when something is wrong.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      053ab70f
    • J
      btrfs: don't create or leak aliased root while cleaning up orphans · 35bbb97f
      Jeff Mahoney 提交于
      commit 909c3a22 (Btrfs: fix loading of orphan roots leading to BUG_ON)
      avoids the BUG_ON but can add an aliased root to the dead_roots list or
      leak the root.
      
      Since we've already been loading roots into the radix tree, we should
      use it before looking the root up on disk.
      
      Cc: <stable@vger.kernel.org> # 4.5
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      35bbb97f
    • W
      btrfs: fix fsfreeze hang caused by delayed iputs deal · 9e7cc91a
      Wang Xiaoguang 提交于
      When running fstests generic/068, sometimes we got below deadlock:
        xfs_io          D ffff8800331dbb20     0  6697   6693 0x00000080
        ffff8800331dbb20 ffff88007acfc140 ffff880034d895c0 ffff8800331dc000
        ffff880032d243e8 fffffffeffffffff ffff880032d24400 0000000000000001
        ffff8800331dbb38 ffffffff816a9045 ffff880034d895c0 ffff8800331dbba8
        Call Trace:
        [<ffffffff816a9045>] schedule+0x35/0x80
        [<ffffffff816abab2>] rwsem_down_read_failed+0xf2/0x140
        [<ffffffff8118f5e1>] ? __filemap_fdatawrite_range+0xd1/0x100
        [<ffffffff8134f978>] call_rwsem_down_read_failed+0x18/0x30
        [<ffffffffa06631fc>] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs]
        [<ffffffff810d32b5>] percpu_down_read+0x35/0x50
        [<ffffffff81217dfc>] __sb_start_write+0x2c/0x40
        [<ffffffffa067f5d5>] start_transaction+0x2a5/0x4d0 [btrfs]
        [<ffffffffa067f857>] btrfs_join_transaction+0x17/0x20 [btrfs]
        [<ffffffffa068ba34>] btrfs_evict_inode+0x3c4/0x5d0 [btrfs]
        [<ffffffff81230a1a>] evict+0xba/0x1a0
        [<ffffffff812316b6>] iput+0x196/0x200
        [<ffffffffa06851d0>] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs]
        [<ffffffffa067f1d8>] btrfs_commit_transaction+0x928/0xa80 [btrfs]
        [<ffffffffa0646df0>] btrfs_freeze+0x30/0x40 [btrfs]
        [<ffffffff81218040>] freeze_super+0xf0/0x190
        [<ffffffff81229275>] do_vfs_ioctl+0x4a5/0x5c0
        [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
        [<ffffffff810038cf>] ? syscall_trace_enter_phase1+0x11f/0x140
        [<ffffffff81229409>] SyS_ioctl+0x79/0x90
        [<ffffffff81003c12>] do_syscall_64+0x62/0x110
        [<ffffffff816acbe1>] entry_SYSCALL64_slow_path+0x25/0x25
      
      >From this warning, freeze_super() already holds SB_FREEZE_FS, but
      btrfs_freeze() will call btrfs_commit_transaction() again, if
      btrfs_commit_transaction() finds that it has delayed iputs to handle,
      it'll start_transaction(), which will try to get SB_FREEZE_FS lock
      again, then deadlock occurs.
      
      The root cause is that in btrfs, sync_filesystem(sb) does not make
      sure all metadata is updated. There still maybe some codes adding
      delayed iputs, see below sample race window:
      
               CPU1                                  |         CPU2
      |-> freeze_super()                             |
          |-> sync_filesystem(sb);                   |
          |                                          |-> cleaner_kthread()
          |                                          |   |-> btrfs_delete_unused_bgs()
          |                                          |       |-> btrfs_remove_chunk()
          |                                          |           |-> btrfs_remove_block_group()
          |                                          |               |-> btrfs_add_delayed_iput()
          |                                          |
          |-> sb->s_writers.frozen = SB_FREEZE_FS;   |
          |-> sb_wait_write(sb, SB_FREEZE_FS);       |
          |   acquire SB_FREEZE_FS lock.             |
          |                                          |
          |-> btrfs_freeze()                         |
              |-> btrfs_commit_transaction()         |
                  |-> btrfs_run_delayed_iputs()      |
                  |   will handle delayed iputs,     |
                  |   that means start_transaction() |
                  |   will be called, which will try |
                  |   to get SB_FREEZE_FS lock.      |
      
      To fix this issue, introduce a "int fs_frozen" to record internally whether
      fs has been frozen. If fs has been frozen, we can not handle delayed iputs.
      Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add comment to btrfs_freeze ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9e7cc91a
    • J
      btrfs: waiting on qgroup rescan should not always be interruptible · d06f23d6
      Jeff Mahoney 提交于
      We wait on qgroup rescan completion in three places: file system
      shutdown, the quota disable ioctl, and the rescan wait ioctl.  If the
      user sends a signal while we're waiting, we continue happily along.  This
      is expected behavior for the rescan wait ioctl.  It's racy in the shutdown
      path but mostly works due to other unrelated synchronization points.
      In the quota disable path, it Oopses the kernel pretty much immediately.
      
      Cc: <stable@vger.kernel.org> # v4.4+
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d06f23d6
    • J
      btrfs: properly track when rescan worker is running · d2c609b8
      Jeff Mahoney 提交于
      The qgroup_flags field is overloaded such that it reflects the on-disk
      status of qgroups and the runtime state.  The BTRFS_QGROUP_STATUS_FLAG_RESCAN
      flag is used to indicate that a rescan operation is in progress, but if
      the file system is unmounted while a rescan is running, the rescan
      operation is paused.  If the file system is then mounted read-only,
      the flag will still be present but the rescan operation will not have
      been resumed.  When we go to umount, btrfs_qgroup_wait_for_completion
      will see the flag and interpret it to mean that the rescan worker is
      still running and will wait for a completion that will never come.
      
      This patch uses a separate flag to indicate when the worker is
      running.  The locking and state surrounding the qgroup rescan worker
      needs a lot of attention beyond this patch but this is enough to
      avoid a hung umount.
      
      Cc: <stable@vger.kernel.org> # v4.4+
      Signed-off-by; Jeff Mahoney <jeffm@suse.com>
      Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d2c609b8
    • L
      Btrfs: fix memory leak of reloc_root · 1c1ea4f7
      Liu Bo 提交于
      When some critical errors occur and FS would be flipped into RO,
      if we have an on-going balance, we can end up with a memory leak
      of root->reloc_root since btrfs_drop_snapshots() bails out
      without freeing reloc_root at the very early start.
      
      However, we're not able to free reloc_root in btrfs_drop_snapshots()
      because its caller, merge_reloc_roots(), still needs to access it to
      cleanup reloc_root's rbtree.
      
      This makes us free reloc_root when we're going to free fs/file roots.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      1c1ea4f7
  6. 08 8月, 2016 1 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
  7. 26 7月, 2016 6 次提交
    • J
      btrfs: tests, use BTRFS_FS_STATE_DUMMY_FS_INFO instead of dummy root · f5ee5c9a
      Jeff Mahoney 提交于
      Now that we have a dummy fs_info associated with each test that
      uses a root, we don't need the DUMMY_ROOT bit anymore.  This lets
      us make choices without needing an actual root like in e.g.
      btrfs_find_create_tree_block.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f5ee5c9a
    • J
      btrfs: tests, require fs_info for root · 7c0260ee
      Jeff Mahoney 提交于
      This allows the upcoming patchset to push nodesize and sectorsize into
      fs_info.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7c0260ee
    • J
      btrfs: btrfs_test_opt and friends should take a btrfs_fs_info · 3cdde224
      Jeff Mahoney 提交于
      btrfs_test_opt and friends only use the root pointer to access
      the fs_info.  Let's pass the fs_info directly in preparation to
      eliminate similar patterns all over btrfs.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3cdde224
    • J
      btrfs: plumb fs_info into btrfs_work · cb001095
      Jeff Mahoney 提交于
      In order to provide an fsid for trace events, we'll need a btrfs_fs_info
      pointer.  The most lightweight way to do that for btrfs_work structures
      is to associate it with the __btrfs_workqueue structure.  Each queued
      btrfs_work structure has a workqueue associated with it, so that's
      a natural fit.  It's a privately defined structures, so we add accessors
      to retrieve the fs_info pointer.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      cb001095
    • N
      btrfs: Fix slab accounting flags · fba4b697
      Nikolay Borisov 提交于
      BTRFS is using a variety of slab caches to satisfy internal needs.
      Those slab caches are always allocated with the SLAB_RECLAIM_ACCOUNT,
      meaning allocations from the caches are going to be accounted as
      SReclaimable. At the same time btrfs is not registering any shrinkers
      whatsoever, thus preventing memory from the slabs to be shrunk. This
      means those caches are not in fact reclaimable.
      
      To fix this remove the SLAB_RECLAIM_ACCOUNT on all caches apart from the
      inode cache, since this one is being freed by the generic VFS super_block
      shrinker. Also set the transaction related caches as SLAB_TEMPORARY,
      to better document the lifetime of the objects (it just translates
      to SLAB_RECLAIM_ACCOUNT).
      Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fba4b697
    • L
      Btrfs: fix double free of fs root · 876d2cf1
      Liu Bo 提交于
      I got this warning while mounting a btrfs image,
      
      [ 3020.509606] ------------[ cut here ]------------
      [ 3020.510107] WARNING: CPU: 3 PID: 5581 at lib/idr.c:1051 ida_remove+0xca/0x190
      [ 3020.510853] ida_remove called for id=42 which is not allocated.
      [ 3020.511466] Modules linked in:
      [ 3020.511802] CPU: 3 PID: 5581 Comm: mount Not tainted 4.7.0-rc5+ #274
      [ 3020.512438] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
      [ 3020.513385]  0000000000000286 0000000021295d86 ffff88006c66b8f0 ffffffff8182ba5a
      [ 3020.514153]  0000000000000000 0000000000000009 ffff88006c66b930 ffffffff810e0ed7
      [ 3020.514928]  0000041b00000000 ffffffff8289a8c0 ffff88007f437880 0000000000000000
      [ 3020.515717] Call Trace:
      [ 3020.515965]  [<ffffffff8182ba5a>] dump_stack+0xc9/0x13f
      [ 3020.516487]  [<ffffffff810e0ed7>] __warn+0x147/0x160
      [ 3020.517005]  [<ffffffff810e0f4f>] warn_slowpath_fmt+0x5f/0x80
      [ 3020.517572]  [<ffffffff8182e6ca>] ida_remove+0xca/0x190
      [ 3020.518075]  [<ffffffff813a2bcc>] free_anon_bdev+0x2c/0x60
      [ 3020.518609]  [<ffffffff81657a9f>] free_fs_root+0x13f/0x160
      [ 3020.519138]  [<ffffffff8165c679>] btrfs_get_fs_root+0x379/0x3d0
      [ 3020.519710]  [<ffffffff81e6e975>] ? __mutex_unlock_slowpath+0x155/0x2c0
      [ 3020.520366]  [<ffffffff816615b1>] open_ctree+0x2e91/0x3200
      [ 3020.520965]  [<ffffffff8161ede2>] btrfs_mount+0x1322/0x15b0
      [ 3020.521536]  [<ffffffff81e60e74>] ? kmemleak_alloc_percpu+0x44/0x170
      [ 3020.522167]  [<ffffffff8115f5e1>] ? lockdep_init_map+0x61/0x210
      [ 3020.522780]  [<ffffffff813a4f59>] mount_fs+0x49/0x2c0
      [ 3020.523305]  [<ffffffff813d840c>] vfs_kern_mount+0xac/0x1b0
      [ 3020.523872]  [<ffffffff8161dee1>] btrfs_mount+0x421/0x15b0
      [ 3020.524402]  [<ffffffff81e60e74>] ? kmemleak_alloc_percpu+0x44/0x170
      [ 3020.525045]  [<ffffffff8115f5e1>] ? lockdep_init_map+0x61/0x210
      [ 3020.525657]  [<ffffffff8115f5e1>] ? lockdep_init_map+0x61/0x210
      [ 3020.526289]  [<ffffffff813a4f59>] mount_fs+0x49/0x2c0
      [ 3020.526803]  [<ffffffff813d840c>] vfs_kern_mount+0xac/0x1b0
      [ 3020.527365]  [<ffffffff813dc27a>] do_mount+0x41a/0x1770
      [ 3020.527899]  [<ffffffff812e800d>] ? strndup_user+0x6d/0xc0
      [ 3020.528447]  [<ffffffff812e7f68>] ? memdup_user+0x78/0xb0
      [ 3020.528987]  [<ffffffff813ddad0>] SyS_mount+0x150/0x160
      [ 3020.529493]  [<ffffffff81e72b7c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
      
      It turns out that we free fs root twice, btrfs_init_fs_root() calls
      free_anon_bdev(root->anon_dev) and later then btrfs_get_fs_root() cals
      free_fs_root which does another free_anon_bdev() and it ends up with the
      above warning.
      
      Instead of reset root->anon_dev to 0 after free_anon_bdev(), we can let
      btrfs_init_fs_root() return directly since its callers have already done
      the free job by calling free_fs_root().
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      876d2cf1
  8. 24 6月, 2016 1 次提交
  9. 18 6月, 2016 3 次提交
    • C
      Btrfs: btrfs_check_super_valid: Allow 4096 as stripesize · dd5c9311
      Chandan Rajendra 提交于
      Older btrfs-progs/mkfs.btrfs sets 4096 as the stripesize. Hence
      restricting stripesize to be equal to sectorsize would cause super block
      validation to return an error on architectures where PAGE_SIZE is not
      equal to 4096.
      
      Hence as a workaround, this commit allows stripesize to be set to 4096
      bytes.
      Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      dd5c9311
    • Z
      btrfs: avoid blocking open_ctree from cleaner_kthread · 90c711ab
      Zygo Blaxell 提交于
      This fixes a problem introduced in commit 2f3165ec
      "btrfs: don't force mounts to wait for cleaner_kthread to delete one or more subvolumes".
      
      open_ctree eventually calls btrfs_replay_log which in turn calls
      btrfs_commit_super which tries to lock the cleaner_mutex, causing a
      recursive mutex deadlock during mount.
      
      Instead of playing whack-a-mole trying to keep up with all the
      functions that may want to lock cleaner_mutex, put all the cleaner_mutex
      lockers back where they were, and attack the problem more directly:
      keep cleaner_kthread asleep until the filesystem is mounted.
      
      When filesystems are mounted read-only and later remounted read-write,
      open_ctree did not set fs_info->open and neither does anything else.
      Set this flag in btrfs_remount so that neither btrfs_delete_unused_bgs
      nor cleaner_kthread get confused by the common case of "/" filesystem
      read-only mount followed by read-write remount.
      Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      90c711ab
    • L
      Btrfs: check if extent buffer is aligned to sectorsize · c871b0f2
      Liu Bo 提交于
      Thanks to fuzz testing, we can pass an invalid bytenr to extent buffer
      via alloc_extent_buffer().  An unaligned eb can have more pages than it
      should have, which ends up extent buffer's leak or some corrupted content
      in extent buffer.
      
      This adds a warning to let us quickly know what was happening.
      
      Now that alloc_extent_buffer() no more returns NULL, this changes its
      caller and callers of its caller to match with the new error
      handling.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c871b0f2
  10. 08 6月, 2016 4 次提交
  11. 06 6月, 2016 2 次提交
  12. 03 6月, 2016 1 次提交
  13. 26 5月, 2016 1 次提交
  14. 10 5月, 2016 2 次提交
  15. 06 5月, 2016 1 次提交
    • Z
      btrfs: don't force mounts to wait for cleaner_kthread to delete one or more subvolumes · 2f3165ec
      Zygo Blaxell 提交于
      During a mount, we start the cleaner kthread first because the transaction
      kthread wants to wake up the cleaner kthread.  We start the transaction
      kthread next because everything in btrfs wants transactions.  We do reloc
      recovery in the thread that was doing the original mount call once the
      transaction kthread is running.  This means that the cleaner kthread
      could already be running when reloc recovery happens (e.g. if a snapshot
      delete was started before a crash).
      
      Relocation does not play well with the cleaner kthread, so a mutex was
      added in commit 5f316481 "Btrfs: fix
      race between balance recovery and root deletion" to prevent both from
      being active at the same time.
      
      If the cleaner kthread is already holding the mutex by the time we get
      to btrfs_recover_relocation, the mount will be blocked until at least
      one deleted subvolume is cleaned (possibly more if the mount process
      doesn't get the lock right away).  During this time (which could be an
      arbitrarily long time on a large/slow filesystem), the mount process is
      stuck and the filesystem is unnecessarily inaccessible.
      
      Fix this by locking cleaner_mutex before we start cleaner_kthread, and
      unlocking the mutex after mount no longer requires it.  This ensures
      that the mounting process will not be blocked by the cleaner kthread.
      The cleaner kthread is already prepared for mutex contention and will
      just go to sleep until the mutex is available.
      Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2f3165ec