1. 27 9月, 2016 18 次提交
    • G
      btrfs: parent_start initialization cleanup · 0f5053eb
      Goldwyn Rodrigues 提交于
      Code cleanup. parent_start is initialized multiple times when it is
      not necessary to do so.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0f5053eb
    • G
      btrfs: Remove already completed TODO comment · 6cea66e5
      Goldwyn Rodrigues 提交于
      Fixes: 7cf5b976 ("btrfs: qgroup: Cleanup old inaccurate facilities")
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6cea66e5
    • G
      btrfs: Do not reassign count in btrfs_run_delayed_refs · dd12d5b8
      Goldwyn Rodrigues 提交于
      Code cleanup. count is already (unsgined long)-1. That is the reason
      run_all was set. Do not reassign it (unsigned long)-1.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      dd12d5b8
    • A
      btrfs: fix a possible umount deadlock · 0ccd0528
      Anand Jain 提交于
      btrfs_show_devname() is using the device_list_mutex, sometimes
      a call to blkdev_put() leads vfs calling into this func. So
      call blkdev_put() outside of device_list_mutex, as of now.
      
      [  983.284212] ======================================================
      [  983.290401] [ INFO: possible circular locking dependency detected ]
      [  983.296677] 4.8.0-rc5-ceph-00023-g1b39cec2 #1 Not tainted
      [  983.302081] -------------------------------------------------------
      [  983.308357] umount/21720 is trying to acquire lock:
      [  983.313243]  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff9128ec51>] blkdev_put+0x31/0x150
      [  983.321264]
      [  983.321264] but task is already holding lock:
      [  983.327101]  (&fs_devs->device_list_mutex){+.+...}, at: [<ffffffffc033d6f6>] __btrfs_close_devices+0x46/0x200 [btrfs]
      [  983.337839]
      [  983.337839] which lock already depends on the new lock.
      [  983.337839]
      [  983.346024]
      [  983.346024] the existing dependency chain (in reverse order) is:
      [  983.353512]
      -> #4 (&fs_devs->device_list_mutex){+.+...}:
      [  983.359096]        [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0
      [  983.365143]        [<ffffffff91823125>] mutex_lock_nested+0x65/0x350
      [  983.371521]        [<ffffffffc02d8116>] btrfs_show_devname+0x36/0x1f0 [btrfs]
      [  983.378710]        [<ffffffff9129523e>] show_vfsmnt+0x4e/0x150
      [  983.384593]        [<ffffffff9126ffc7>] m_show+0x17/0x20
      [  983.389957]        [<ffffffff91276405>] seq_read+0x2b5/0x3b0
      [  983.395669]        [<ffffffff9124c808>] __vfs_read+0x28/0x100
      [  983.401464]        [<ffffffff9124eb3b>] vfs_read+0xab/0x150
      [  983.407080]        [<ffffffff9124ec32>] SyS_read+0x52/0xb0
      [  983.412609]        [<ffffffff91825fc0>] entry_SYSCALL_64_fastpath+0x23/0xc1
      [  983.419617]
      -> #3 (namespace_sem){++++++}:
      [  983.424024]        [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0
      [  983.430074]        [<ffffffff918239e9>] down_write+0x49/0x80
      [  983.435785]        [<ffffffff91272457>] lock_mount+0x67/0x1c0
      [  983.441582]        [<ffffffff91272ab2>] do_add_mount+0x32/0xf0
      [  983.447458]        [<ffffffff9127363a>] finish_automount+0x5a/0xc0
      [  983.453682]        [<ffffffff91259513>] follow_managed+0x1b3/0x2a0
      [  983.459912]        [<ffffffff9125b750>] lookup_fast+0x300/0x350
      [  983.465875]        [<ffffffff9125d6e7>] path_openat+0x3a7/0xaa0
      [  983.471846]        [<ffffffff9125ef75>] do_filp_open+0x85/0xe0
      [  983.477731]        [<ffffffff9124c41c>] do_sys_open+0x14c/0x1f0
      [  983.483702]        [<ffffffff9124c4de>] SyS_open+0x1e/0x20
      [  983.489240]        [<ffffffff91825fc0>] entry_SYSCALL_64_fastpath+0x23/0xc1
      [  983.496254]
      -> #2 (&sb->s_type->i_mutex_key#3){+.+.+.}:
      [  983.501798]        [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0
      [  983.507855]        [<ffffffff918239e9>] down_write+0x49/0x80
      [  983.513558]        [<ffffffff91366237>] start_creating+0x87/0x100
      [  983.519703]        [<ffffffff91366647>] debugfs_create_dir+0x17/0x100
      [  983.526195]        [<ffffffff911df153>] bdi_register+0x93/0x210
      [  983.532165]        [<ffffffff911df313>] bdi_register_owner+0x43/0x70
      [  983.538570]        [<ffffffff914080fb>] device_add_disk+0x1fb/0x450
      [  983.544888]        [<ffffffff91580226>] loop_add+0x1e6/0x290
      [  983.550596]        [<ffffffff91fec358>] loop_init+0x10b/0x14f
      [  983.556394]        [<ffffffff91002207>] do_one_initcall+0xa7/0x180
      [  983.562618]        [<ffffffff91f932e0>] kernel_init_freeable+0x1cc/0x266
      [  983.569370]        [<ffffffff918174be>] kernel_init+0xe/0x100
      [  983.575166]        [<ffffffff9182620f>] ret_from_fork+0x1f/0x40
      [  983.581131]
      -> #1 (loop_index_mutex){+.+.+.}:
      [  983.585801]        [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0
      [  983.591858]        [<ffffffff91823125>] mutex_lock_nested+0x65/0x350
      [  983.598256]        [<ffffffff9157ed3f>] lo_open+0x1f/0x60
      [  983.603704]        [<ffffffff9128eec3>] __blkdev_get+0x123/0x400
      [  983.609757]        [<ffffffff9128f4ea>] blkdev_get+0x34a/0x350
      [  983.615639]        [<ffffffff9128f554>] blkdev_open+0x64/0x80
      [  983.621428]        [<ffffffff9124aff6>] do_dentry_open+0x1c6/0x2d0
      [  983.627651]        [<ffffffff9124c029>] vfs_open+0x69/0x80
      [  983.633181]        [<ffffffff9125db74>] path_openat+0x834/0xaa0
      [  983.639152]        [<ffffffff9125ef75>] do_filp_open+0x85/0xe0
      [  983.645035]        [<ffffffff9124c41c>] do_sys_open+0x14c/0x1f0
      [  983.650999]        [<ffffffff9124c4de>] SyS_open+0x1e/0x20
      [  983.656535]        [<ffffffff91825fc0>] entry_SYSCALL_64_fastpath+0x23/0xc1
      [  983.663541]
      -> #0 (&bdev->bd_mutex){+.+.+.}:
      [  983.668107]        [<ffffffff910def43>] __lock_acquire+0x1003/0x17b0
      [  983.674510]        [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0
      [  983.680561]        [<ffffffff91823125>] mutex_lock_nested+0x65/0x350
      [  983.686967]        [<ffffffff9128ec51>] blkdev_put+0x31/0x150
      [  983.692761]        [<ffffffffc033481f>] btrfs_close_bdev+0x4f/0x60 [btrfs]
      [  983.699699]        [<ffffffffc033d77b>] __btrfs_close_devices+0xcb/0x200 [btrfs]
      [  983.707178]        [<ffffffffc033d8db>] btrfs_close_devices+0x2b/0xa0 [btrfs]
      [  983.714380]        [<ffffffffc03081c5>] close_ctree+0x265/0x340 [btrfs]
      [  983.721061]        [<ffffffffc02d7959>] btrfs_put_super+0x19/0x20 [btrfs]
      [  983.727908]        [<ffffffff91250e2f>] generic_shutdown_super+0x6f/0x100
      [  983.734744]        [<ffffffff91250f56>] kill_anon_super+0x16/0x30
      [  983.740888]        [<ffffffffc02da97e>] btrfs_kill_super+0x1e/0x130 [btrfs]
      [  983.747909]        [<ffffffff91250fe9>] deactivate_locked_super+0x49/0x80
      [  983.754745]        [<ffffffff912515fd>] deactivate_super+0x5d/0x70
      [  983.760977]        [<ffffffff91270a1c>] cleanup_mnt+0x5c/0x80
      [  983.766773]        [<ffffffff91270a92>] __cleanup_mnt+0x12/0x20
      [  983.772738]        [<ffffffff910aa2fe>] task_work_run+0x7e/0xc0
      [  983.778708]        [<ffffffff91081b5a>] exit_to_usermode_loop+0x7e/0xb4
      [  983.785373]        [<ffffffff910039eb>] syscall_return_slowpath+0xbb/0xd0
      [  983.792212]        [<ffffffff9182605c>] entry_SYSCALL_64_fastpath+0xbf/0xc1
      [  983.799225]
      [  983.799225] other info that might help us debug this:
      [  983.799225]
      [  983.807291] Chain exists of:
        &bdev->bd_mutex --> namespace_sem --> &fs_devs->device_list_mutex
      
      [  983.816521]  Possible unsafe locking scenario:
      [  983.816521]
      [  983.822489]        CPU0                    CPU1
      [  983.827043]        ----                    ----
      [  983.831599]   lock(&fs_devs->device_list_mutex);
      [  983.836289]                                lock(namespace_sem);
      [  983.842268]                                lock(&fs_devs->device_list_mutex);
      [  983.849478]   lock(&bdev->bd_mutex);
      [  983.853127]
      [  983.853127]  *** DEADLOCK ***
      [  983.853127]
      [  983.859113] 3 locks held by umount/21720:
      [  983.863145]  #0:  (&type->s_umount_key#35){++++..}, at: [<ffffffff912515f5>] deactivate_super+0x55/0x70
      [  983.872713]  #1:  (uuid_mutex){+.+.+.}, at: [<ffffffffc033d8d3>] btrfs_close_devices+0x23/0xa0 [btrfs]
      [  983.882206]  #2:  (&fs_devs->device_list_mutex){+.+...}, at: [<ffffffffc033d6f6>] __btrfs_close_devices+0x46/0x200 [btrfs]
      [  983.893422]
      [  983.893422] stack backtrace:
      [  983.897824] CPU: 6 PID: 21720 Comm: umount Not tainted 4.8.0-rc5-ceph-00023-g1b39cec2 #1
      [  983.905958] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015
      [  983.913492]  0000000000000000 ffff8c8a53c17a38 ffffffff91429521 ffffffff9260f4f0
      [  983.921018]  ffffffff92642760 ffff8c8a53c17a88 ffffffff911b2b04 0000000000000050
      [  983.928542]  ffffffff9237d620 ffff8c8a5294aee0 ffff8c8a5294aeb8 ffff8c8a5294aee0
      [  983.936072] Call Trace:
      [  983.938545]  [<ffffffff91429521>] dump_stack+0x85/0xc4
      [  983.943715]  [<ffffffff911b2b04>] print_circular_bug+0x1fb/0x20c
      [  983.949748]  [<ffffffff910def43>] __lock_acquire+0x1003/0x17b0
      [  983.955613]  [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0
      [  983.961123]  [<ffffffff9128ec51>] ? blkdev_put+0x31/0x150
      [  983.966550]  [<ffffffff91823125>] mutex_lock_nested+0x65/0x350
      [  983.972407]  [<ffffffff9128ec51>] ? blkdev_put+0x31/0x150
      [  983.977832]  [<ffffffff9128ec51>] blkdev_put+0x31/0x150
      [  983.983101]  [<ffffffffc033481f>] btrfs_close_bdev+0x4f/0x60 [btrfs]
      [  983.989500]  [<ffffffffc033d77b>] __btrfs_close_devices+0xcb/0x200 [btrfs]
      [  983.996415]  [<ffffffffc033d8db>] btrfs_close_devices+0x2b/0xa0 [btrfs]
      [  984.003068]  [<ffffffffc03081c5>] close_ctree+0x265/0x340 [btrfs]
      [  984.009189]  [<ffffffff9126cc5e>] ? evict_inodes+0x15e/0x170
      [  984.014881]  [<ffffffffc02d7959>] btrfs_put_super+0x19/0x20 [btrfs]
      [  984.021176]  [<ffffffff91250e2f>] generic_shutdown_super+0x6f/0x100
      [  984.027476]  [<ffffffff91250f56>] kill_anon_super+0x16/0x30
      [  984.033082]  [<ffffffffc02da97e>] btrfs_kill_super+0x1e/0x130 [btrfs]
      [  984.039548]  [<ffffffff91250fe9>] deactivate_locked_super+0x49/0x80
      [  984.045839]  [<ffffffff912515fd>] deactivate_super+0x5d/0x70
      [  984.051525]  [<ffffffff91270a1c>] cleanup_mnt+0x5c/0x80
      [  984.056774]  [<ffffffff91270a92>] __cleanup_mnt+0x12/0x20
      [  984.062201]  [<ffffffff910aa2fe>] task_work_run+0x7e/0xc0
      [  984.067625]  [<ffffffff91081b5a>] exit_to_usermode_loop+0x7e/0xb4
      [  984.073747]  [<ffffffff910039eb>] syscall_return_slowpath+0xbb/0xd0
      [  984.080038]  [<ffffffff9182605c>] entry_SYSCALL_64_fastpath+0xbf/0xc1
      Reported-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0ccd0528
    • L
      Btrfs: fix memory leak in do_walk_down · a958eab0
      Liu Bo 提交于
      The extent buffer 'next' needs to be free'd conditionally.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a958eab0
    • J
      btrfs: btrfs_debug should consume fs_info when DEBUG is not defined · c01f5f96
      Jeff Mahoney 提交于
      We can hit unused variable warnings when btrfs_debug and friends are
      just aliases for no_printk.  This is due to the fs_info not getting
      consumed by the function call, which can happen if convenenience
      variables are used.  This patch adds a new btrfs_no_printk static inline
      that consumes the convenience variable and does nothing else.  It
      silences the unused variable warning and has no impact on the generated
      code:
      
      $ size fs/btrfs/extent_io.o*
         text	   data	    bss	    dec	    hex	filename
        44072	    152	     32	  44256	   ace0	fs/btrfs/extent_io.o.btrfs_no_printk
        44072	    152	     32	  44256	   ace0	fs/btrfs/extent_io.o.no_printk
      
      Fixes: 27a0dd61 (Btrfs: make btrfs_debug match pr_debug handling related to DEBUG)
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c01f5f96
    • J
      btrfs: convert send's verbose_printk to btrfs_debug · 04ab956e
      Jeff Mahoney 提交于
      This was basically an open-coded, less flexible dynamic printk.  We can
      just use btrfs_debug instead.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      04ab956e
    • J
      btrfs: convert pr_* to btrfs_* where possible · ab8d0fc4
      Jeff Mahoney 提交于
      For many printks, we want to know which file system issued the message.
      
      This patch converts most pr_* calls to use the btrfs_* versions instead.
      In some cases, this means adding plumbing to allow call sites access to
      an fs_info pointer.
      
      fs/btrfs/check-integrity.c is left alone for another day.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ab8d0fc4
    • J
      btrfs: convert printk(KERN_* to use pr_* calls · 62e85577
      Jeff Mahoney 提交于
      This patch converts printk(KERN_* style messages to use the pr_* versions.
      
      One side effect is that anything that was KERN_DEBUG is now automatically
      a dynamic debug message.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      62e85577
    • J
      btrfs: unsplit printed strings · 5d163e0e
      Jeff Mahoney 提交于
      CodingStyle chapter 2:
      "[...] never break user-visible strings such as printk messages,
      because that breaks the ability to grep for them."
      
      This patch unsplits user-visible strings.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5d163e0e
    • J
      btrfs: clean the old superblocks before freeing the device · cea67ab9
      Jeff Mahoney 提交于
      btrfs_rm_device frees the block device but then re-opens it using
      the saved device name.  A race exists between the close and the
      re-open that allows the block size to be changed.  The result
      is getting stuck forever in the reclaim loop in __getblk_slow.
      
      This patch moves the superblock cleanup before closing the block
      device, which is also consistent with other callers.  We also don't
      need a private copy of dev_name as the whole routine operates under
      the uuid_mutex.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      cea67ab9
    • L
      Btrfs: kill BUG_ON in run_delayed_tree_ref · 02794222
      Liu Bo 提交于
      In a corrupted btrfs image, we can come across this BUG_ON and
      get an unreponsive system, but if we return errors instead,
      its caller can handle everything gracefully by aborting the current
      transaction.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      02794222
    • J
      Btrfs: don't leak reloc root nodes on error · 6bdf131f
      Josef Bacik 提交于
      We don't track the reloc roots in any sort of normal way, so the only way the
      root/commit_root nodes get free'd is if the relocation finishes successfully and
      the reloc root is deleted.  Fix this by free'ing them in free_reloc_roots.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6bdf131f
    • M
      btrfs: squash lines for simple wrapper functions · e2c89907
      Masahiro Yamada 提交于
      Remove unneeded variables and assignments.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e2c89907
    • L
      Btrfs: improve check_node to avoid reading corrupted nodes · 6b722c17
      Liu Bo 提交于
      We need to check items in a node to make sure that we're reading
      a valid one, otherwise we could get various crashes while processing
      delayed_refs.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6b722c17
    • L
      Btrfs: add error handling for extent buffer in print tree · a42cbec9
      Liu Bo 提交于
      Somehow we missed btrfs_print_tree when last time we
      updated error handling for read_extent_block().
      
      This keeps us from getting a NULL pointer panic when
      btrfs_print_tree's read_extent_block() fails.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a42cbec9
    • L
      Btrfs: remove BUG_ON in start_transaction · a43f7f82
      Liu Bo 提交于
      Since we could get errors from the concurrent aborted transaction,
      the check of this BUG_ON in start_transaction is not true any more.
      
      Say, while flushing free space cache inode's dirty pages,
      btrfs_finish_ordered_io
       -> btrfs_join_transaction_nolock
            (the transaction has been aborted.)
            -> BUG_ON(type == TRANS_JOIN_NOLOCK);
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a43f7f82
    • L
      Btrfs: memset to avoid stale content in btree node block · 3eb548ee
      Liu Bo 提交于
      During updating btree, we could push items between sibling
      nodes/leaves, for leaves data sections starts reversely from
      the end of the block while for nodes we only have key pairs
      which are stored one by one from the start of the block.
      
      So we could do try to push key pairs from one node to the next
      node right in the tree, and after that, we update the node's
      nritems to reflect the correct end while leaving the stale
      content in the node.  One may intentionally corrupt the fs
      image and access the stale content by bumping the nritems and
      causes various crashes.
      
      This takes the in-memory @nritems as the correct one and
      gets to memset the unused part of a btree node.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3eb548ee
  2. 26 9月, 2016 22 次提交
    • L
      Btrfs: return gracefully from balance if fs tree is corrupted · 3561b9db
      Liu Bo 提交于
      When relocating tree blocks, we firstly get block information from
      back references in the extent tree, we then search fs tree to try to
      find all parents of a block.
      
      However, if fs tree is corrupted, eg. if there're some missing
      items, we could come across these WARN_ONs and BUG_ONs.
      
      This makes us print some error messages and return gracefully
      from balance.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3561b9db
    • J
      Btrfs: kill BUG_ON()'s in btrfs_mark_extent_written · 9c8e63db
      Josef Bacik 提交于
      No reason to bug on in here, fs corruption could easily cause these things to
      happen.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9c8e63db
    • J
      Btrfs: kill the start argument to read_extent_buffer_pages · 8436ea91
      Josef Bacik 提交于
      Nobody uses this, it makes no sense to do partial reads of extent buffers.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8436ea91
    • J
      Btrfs: add a flags field to btrfs_fs_info · afcdd129
      Josef Bacik 提交于
      We have a lot of random ints in btrfs_fs_info that can be put into flags.  This
      is mostly equivalent with the exception of how we deal with quota going on or
      off, now instead we set a flag when we are turning it on or off and deal with
      that appropriately, rather than just having a pending state that the current
      quota_enabled gets set to.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      afcdd129
    • Q
      btrfs: extend btrfs_set_extent_delalloc and its friends to support in-band... · ba8b04c1
      Qu Wenruo 提交于
      btrfs: extend btrfs_set_extent_delalloc and its friends to support in-band dedupe and subpage size patchset
      
      Extend btrfs_set_extent_delalloc() and extent_clear_unlock_delalloc()
      parameters for both in-band dedupe and subpage sector size patchset.
      
      This should reduce conflict of both patchset and the effort to rebase
      them.
      
      Cc: Chandan Rajendra <chandan@linux.vnet.ibm.com>
      Cc: David Sterba <dsterba@suse.cz>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ba8b04c1
    • J
      btrfs: add dynamic debug support · 897a41b1
      Jeff Mahoney 提交于
      We can re-use the dynamic debugging descriptor to make use of the dynamic
      debugging mechanism but still use our own printk interface.
      
      Defining the DEBUG macro works as it did before.  When it's defined,
      all of the messages default to print.  We can also enable all debug
      messages at boot or module-load time using the 'dyndbg' and
      'btrfs.dyndbg' options.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      897a41b1
    • L
      btrfs: Fix warning "variable ‘gen’ set but not used" · 2309e796
      Luis Henriques 提交于
      Variable 'gen' in reada_for_search() is not used since commit 58dc4ce4
      ("btrfs: remove unused parameter from readahead_tree_block").  This patch
      simply removes this variable.
      Signed-off-by: NLuis Henriques <luis.henriques@canonical.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2309e796
    • L
      btrfs: Fix warning "variable ‘blocksize’ set but not used" · 1f079fa2
      Luis Henriques 提交于
      Variable 'blocksize' in reada_walk_down() is not used since commit
      d3e46fea ("btrfs: sink blocksize parameter to readahead_tree_block").
      This patch simply removes this variable.
      Signed-off-by: NLuis Henriques <luis.henriques@canonical.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1f079fa2
    • N
      btrfs: let btrfs_delete_unused_bgs() to clean relocated bgs · 5d8eb6fe
      Naohiro Aota 提交于
      Currently, btrfs_relocate_chunk() is removing relocated BG by itself. But
      the work can be done by btrfs_delete_unused_bgs() (and it's better since it
      trim the BG). Let's dedupe the code.
      
      While btrfs_delete_unused_bgs() is already hitting the relocated BG, it
      skip the BG since the BG has "ro" flag set (to keep balancing BG intact).
      On the other hand, btrfs cannot drop "ro" flag here to prevent additional
      writes. So this patch make use of "removed" flag.
      btrfs_delete_unused_bgs() now detect the flag to distinguish whether a
      read-only BG is relocating or not.
      Signed-off-by: NNaohiro Aota <naohiro.aota@hgst.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5d8eb6fe
    • L
      Btrfs: bail out if block group has different mixed flag · 49303381
      Liu Bo 提交于
      Currently we allow inconsistence about mixed flag
       (BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA).
      
      We'd get ENOSPC if block group has mixed flag and btrfs doesn't.
      If that happens, we have one space_info with mixed flag and another
      space_info only with BTRFS_BLOCK_GROUP_METADATA, and
      global_block_rsv.space_info points to the latter one, but all bytes
      from block_group contributes to the mixed space_info, thus all the
      allocation will fail with ENOSPC.
      
      This adds a check for the above case.
      Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      [ updated message ]
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      49303381
    • L
      Btrfs: fix memory leak in reading btree blocks · 2571e739
      Liu Bo 提交于
      So we can read a btree block via readahead or intentional read,
      and we can end up with a memory leak when something happens as
      follows,
      1) readahead starts to read block A but does not wait for read
         completion,
      2) btree_readpage_end_io_hook finds that block A is corrupted,
         and it needs to clear all block A's pages' uptodate bit.
      3) meanwhile an intentional read kicks in and checks block A's
         pages' uptodate to decide which page needs to be read.
      4) when some pages have the uptodate bit during 3)'s check so
         3) doesn't count them for eb->io_pages, but they are later
         cleared by 2) so we has to readpage on the page, we get
         the wrong eb->io_pages which results in a memory leak of
         this block.
      
      This fixes the problem by firstly getting all pages's locking and
      then checking pages' uptodate bit.
      
         t1(readahead)                              t2(readahead endio)                                       t3(the following read)
      read_extent_buffer_pages                    end_bio_extent_readpage
        for pg in eb:                                for page 0,1,2 in eb:
            if pg is uptodate:                           btree_readpage_end_io_hook(pg)
                num_reads++                              if uptodate:
        eb->io_pages = num_reads                             SetPageUptodate(pg)              _______________
        for pg in eb:                                for page 3 in eb:                                     read_extent_buffer_pages
             if pg is NOT uptodate:                      btree_readpage_end_io_hook(pg)                       for pg in eb:
                 __extent_read_full_page(pg)                 sanity check reports something wrong                 if pg is uptodate:
                                                             clear_extent_buffer_uptodate(eb)                         num_reads++
                                                                 for pg in eb:                                eb->io_pages = num_reads
                                                                     ClearPageUptodate(page)  _______________
                                                                                                              for pg in eb:
                                                                                                                  if pg is NOT uptodate:
                                                                                                                      __extent_read_full_page(pg)
      
      So t3's eb->io_pages is not consistent with the number of pages it's reading,
      and during endio(), atomic_dec_and_test(&eb->io_pages) will get a negative
      number so that we're not able to free the eb.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2571e739
    • L
      Btrfs: remove BUG() in raid56 · e46a28ca
      Liu Bo 提交于
      This BUG() has been triggered by a fuzz testing image, which contains
      an invalid chunk type, ie. a single stripe chunk has the raid6 type.
      
      Btrfs can handle this gracefully by returning -EIO, so besides using
      btrfs_warn to give us more debugging information rather than a single
      BUG(), we can return error properly.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e46a28ca
    • L
      btrfs: fix check_shared for fiemap ioctl · afce772e
      Lu Fengqi 提交于
      Only in the case of different root_id or different object_id, check_shared
      identified extent as the shared. However, If a extent was referred by
      different offset of same file, it should also be identified as shared.
      In addition, check_shared's loop scale is at least n^3, so if a extent
      has too many references, even causes soft hang up.
      
      First, add all delayed_ref to the ref_tree and calculate the unqiue_refs,
      if the unique_refs is greater than one, return BACKREF_FOUND_SHARED.
      Then individually add the on-disk reference(inline/keyed) to the ref_tree
      and calculate the unique_refs of the ref_tree to check if the unique_refs
      is greater than one.Because once there are two references to return
      SHARED, so the time complexity is close to the constant.
      Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      afce772e
    • D
      b0de6c4c
    • E
      btrfs: fix perms on demonstration debugfs interface · 07f6a480
      Eric Sandeen 提交于
      btrfs provides a helpful demonstration of how to export
      a global variable via debugfs; however, it is unique among
      other debugfs files in that it is world-writable, which causes
      some concern to people who are not familiar with its purpose.
      
      Fix it so that it is only user-writable.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      07f6a480
    • L
      Btrfs: fix memory leak of block group cache · c79a1751
      Liu Bo 提交于
      While processing delayed refs, we may update block group's statistics
      and attach it to cur_trans->dirty_bgs, and later writing dirty block
      groups will process the list, which happens during
      btrfs_commit_transaction().
      
      For whatever reason, the transaction is aborted and dirty_bgs
      is not processed in cleanup_transaction(), we end up with memory leak
      of these dirty block group cache.
      
      Since btrfs_start_dirty_block_groups() doesn't make it go to the commit
      critical section, this also adds the cleanup work inside it.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c79a1751
    • L
      Linux 4.8-rc8 · 08895a8b
      Linus Torvalds 提交于
      08895a8b
    • L
      Merge tag 'trace-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 4c04b4b5
      Linus Torvalds 提交于
      Pull tracefs fixes from Steven Rostedt:
       "Al Viro has been looking at the tracefs code, and has pointed out some
        issues.  This contains one fix by me and one by Al.  I'm sure that
        he'll come up with more but for now I tested these patches and they
        don't appear to have any negative impact on tracing"
      
      * tag 'trace-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        fix memory leaks in tracing_buffers_splice_read()
        tracing: Move mutex to protect against resetting of seq data
      4c04b4b5
    • D
      fault_in_multipages_readable() throws set-but-unused error · 90b75db6
      Dave Chinner 提交于
      When building XFS with -Werror, it now fails with:
      
        include/linux/pagemap.h: In function 'fault_in_multipages_readable':
        include/linux/pagemap.h:602:16: error: variable 'c' set but not used [-Werror=unused-but-set-variable]
          volatile char c;
                        ^
      
      This is a regression caused by commit e23d4159 ("fix
      fault_in_multipages_...() on architectures with no-op access_ok()").
      Fix it by re-adding the "(void)c" trick taht was previously used to make
      the compiler think the variable is used.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90b75db6
    • L
      mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing · 38e08854
      Lorenzo Stoakes 提交于
      The NUMA balancing logic uses an arch-specific PROT_NONE page table flag
      defined by pte_protnone() or pmd_protnone() to mark PTEs or huge page
      PMDs respectively as requiring balancing upon a subsequent page fault.
      User-defined PROT_NONE memory regions which also have this flag set will
      not normally invoke the NUMA balancing code as do_page_fault() will send
      a segfault to the process before handle_mm_fault() is even called.
      
      However if access_remote_vm() is invoked to access a PROT_NONE region of
      memory, handle_mm_fault() is called via faultin_page() and
      __get_user_pages() without any access checks being performed, meaning
      the NUMA balancing logic is incorrectly invoked on a non-NUMA memory
      region.
      
      A simple means of triggering this problem is to access PROT_NONE mmap'd
      memory using /proc/self/mem which reliably results in the NUMA handling
      functions being invoked when CONFIG_NUMA_BALANCING is set.
      
      This issue was reported in bugzilla (issue 99101) which includes some
      simple repro code.
      
      There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page()
      added at commit c0e7cad9 to avoid accidentally provoking strange
      behaviour by attempting to apply NUMA balancing to pages that are in
      fact PROT_NONE.  The BUG_ON()'s are consistently triggered by the repro.
      
      This patch moves the PROT_NONE check into mm/memory.c rather than
      invoking BUG_ON() as faulting in these pages via faultin_page() is a
      valid reason for reaching the NUMA check with the PROT_NONE page table
      flag set and is therefore not always a bug.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101Reported-by: NTrevor Saunders <tbsaunde@tbsaunde.org>
      Signed-off-by: NLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38e08854
    • L
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 831e45d8
      Linus Torvalds 提交于
      Pull MIPS fixes from Ralf Baechle:
       "A round of 4.8 fixes:
      
        MIPS generic code:
         - Add a missing ".set pop" in an early commit
         - Fix memory regions reaching top of physical
         - MAAR: Fix address alignment
         - vDSO: Fix Malta EVA mapping to vDSO page structs
         - uprobes: fix incorrect uprobe brk handling
         - uprobes: select HAVE_REGS_AND_STACK_ACCESS_API
         - Avoid a BUG warning during PR_SET_FP_MODE prctl
         - SMP: Fix possibility of deadlock when bringing CPUs online
         - R6: Remove compact branch policy Kconfig entries
         - Fix size calc when avoiding IPIs for small icache flushes
         - Fix pre-r6 emulation FPU initialisation
         - Fix delay slot emulation count in debugfs
      
        ATH79:
         - Fix test for error return of clk_register_fixed_factor.
      
        Octeon:
         - Fix kernel header to work for VDSO build.
         - Fix initialization of platform device probing.
      
        paravirt:
         - Fix undefined reference to smp_bootstrap"
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        MIPS: Fix delay slot emulation count in debugfs
        MIPS: SMP: Fix possibility of deadlock when bringing CPUs online
        MIPS: Fix pre-r6 emulation FPU initialisation
        MIPS: vDSO: Fix Malta EVA mapping to vDSO page structs
        MIPS: Select HAVE_REGS_AND_STACK_ACCESS_API
        MIPS: Octeon: Fix platform bus probing
        MIPS: Octeon: mangle-port: fix build failure with VDSO code
        MIPS: Avoid a BUG warning during prctl(PR_SET_FP_MODE, ...)
        MIPS: c-r4k: Fix size calc when avoiding IPIs for small icache flushes
        MIPS: Add a missing ".set pop" in an early commit
        MIPS: paravirt: Fix undefined reference to smp_bootstrap
        MIPS: Remove compact branch policy Kconfig entries
        MIPS: MAAR: Fix address alignment
        MIPS: Fix memory regions reaching top of physical
        MIPS: uprobes: fix incorrect uprobe brk handling
        MIPS: ath79: Fix test for error return of clk_register_fixed_factor().
      831e45d8
    • L
      Merge tag 'powerpc-4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 751b9a5d
      Linus Torvalds 提交于
      Pull one more powerpc fix from Michael Ellerman:
       "powernv/pci: Fix m64 checks for SR-IOV and window alignment from
        Russell Currey"
      
      * tag 'powerpc-4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/powernv/pci: Fix m64 checks for SR-IOV and window alignment
      751b9a5d