1. 21 5月, 2015 1 次提交
  2. 20 5月, 2015 1 次提交
    • F
      Btrfs: fix racy system chunk allocation when setting block group ro · a9629596
      Filipe Manana 提交于
      If while setting a block group read-only we end up allocating a system
      chunk, through check_system_chunk(), we were not doing it while holding
      the chunk mutex which is a problem if a concurrent chunk allocation is
      happening, through do_chunk_alloc(), as it means both block groups can
      end up using the same logical addresses and physical regions in the
      device(s). So make sure we hold the chunk mutex.
      
      Cc: stable@vger.kernel.org  # 4.0+
      Fixes: 2f081088 ("btrfs: delete chunk allocation attemp when
                            setting block group ro")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a9629596
  3. 11 5月, 2015 1 次提交
    • F
      Btrfs: fix race between block group creation and their cache writeout · ff1f8250
      Filipe Manana 提交于
      So creating a block group has 2 distinct phases:
      
      Phase 1 - creates the btrfs_block_group_cache item and adds it to the
      rbtree fs_info->block_group_cache_tree and to the corresponding list
      space_info->block_groups[];
      
      Phase 2 - adds the block group item to the extent tree and corresponding
      items to the chunk tree.
      
      The first phase adds the block_group_cache_item to a list of pending block
      groups in the transaction handle, and phase 2 happens when
      btrfs_end_transaction() is called against the transaction handle.
      
      It happens that once phase 1 completes, other concurrent tasks that use
      their own transaction handle, but points to the same running transaction
      (struct btrfs_trans_handle->transaction), can use this block group for
      space allocations and therefore mark it dirty. Dirty block groups are
      tracked in a list belonging to the currently running transaction (struct
      btrfs_transaction) and not in the transaction handle (btrfs_trans_handle).
      
      This is a problem because once a task calls btrfs_commit_transaction(),
      it calls btrfs_start_dirty_block_groups() which will see all dirty block
      groups and attempt to start their writeout, including those that are
      still attached to the transaction handle of some concurrent task that
      hasn't called btrfs_end_transaction() yet - which means those block
      groups haven't gone through phase 2 yet and therefore when
      write_one_cache_group() is called, it won't find the block group items
      in the extent tree and abort the current transaction with -ENOENT,
      turning the fs into readonly mode and require a remount.
      
      Fix this by ignoring -ENOENT when looking for block group items in the
      extent tree when we attempt to start the writeout of the block group
      caches outside the critical section of the transaction commit. We will
      try again later during the critical section and if there we still don't
      find the block group item in the extent tree, we then abort the current
      transaction.
      
      This issue happened twice, once while running fstests btrfs/067 and once
      for btrfs/078, which produced the following trace:
      
      [ 3278.703014] WARNING: CPU: 7 PID: 18499 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
      [ 3278.707329] BTRFS: Transaction aborted (error -2)
      (...)
      [ 3278.731555] Call Trace:
      [ 3278.732396]  [<ffffffff8142fa46>] dump_stack+0x4f/0x7b
      [ 3278.733860]  [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad
      [ 3278.735312]  [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb
      [ 3278.736874]  [<ffffffffa03ada6d>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
      [ 3278.738302]  [<ffffffff81045f05>] warn_slowpath_fmt+0x46/0x48
      [ 3278.739520]  [<ffffffffa03ada6d>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
      [ 3278.741222]  [<ffffffffa03b9e56>] write_one_cache_group+0xae/0xbf [btrfs]
      [ 3278.742797]  [<ffffffffa03c487b>] btrfs_start_dirty_block_groups+0x170/0x2b2 [btrfs]
      [ 3278.744492]  [<ffffffffa03d309c>] btrfs_commit_transaction+0x130/0x9c9 [btrfs]
      [ 3278.746084]  [<ffffffff8107d33d>] ? trace_hardirqs_on+0xd/0xf
      [ 3278.747249]  [<ffffffffa03e5660>] btrfs_sync_file+0x313/0x387 [btrfs]
      [ 3278.748744]  [<ffffffff8117acad>] vfs_fsync_range+0x95/0xa4
      [ 3278.749958]  [<ffffffff81435b54>] ? ret_from_sys_call+0x1d/0x58
      [ 3278.751218]  [<ffffffff8117acd8>] vfs_fsync+0x1c/0x1e
      [ 3278.754197]  [<ffffffff8117ae54>] do_fsync+0x34/0x4e
      [ 3278.755192]  [<ffffffff8117b07c>] SyS_fsync+0x10/0x14
      [ 3278.756236]  [<ffffffff81435b32>] system_call_fastpath+0x12/0x17
      [ 3278.757366] ---[ end trace 9a4d4df4969709aa ]---
      
      Fixes: 1bbc621e ("Btrfs: allow block group cache writeout
                            outside critical section in commit")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      ff1f8250
  4. 26 4月, 2015 4 次提交
    • O
      btrfs: handle ENOMEM in btrfs_alloc_tree_block · 67b7859e
      Omar Sandoval 提交于
      This is one of the first places to give out when memory is tight. Handle
      it properly rather than with a BUG_ON.
      
      Also fix the comment about the return value, which is an ERR_PTR, not
      NULL, on error.
      Signed-off-by: NOmar Sandoval <osandov@osandov.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      67b7859e
    • C
      Btrfs: don't check for delalloc_bytes in cache_save_setup · e4c88f00
      Chris Mason 提交于
      Now that we're doing free space cache writeback outside the critical
      section in the commit, there is a bigger window for delalloc_bytes to
      be added after a cache has been written.  find_free_extent may do this
      without putting the block group back into the dirty list, and also
      without a transaction running.
      
      Checking for delalloc_bytes in cache_save_setup means we might leave the
      cache marked as written without invalidating it.  Consistency checks
      during mount will toss the cache, but it's better to get rid of the
      check in cache_save_setup and let it get invalidated by the checks
      already done during cache write out.
      Signed-off-by: NChris Mason <clm@fb.com>
      e4c88f00
    • F
      Btrfs: fix deadlock when starting writeback of bg caches · 24b89d08
      Filipe Manana 提交于
      While starting the writes of the dirty block group caches, if we don't
      find a block group item in the extent tree we were leaving without
      releasing our path, running delayed references and then looping again to
      process any new dirty block groups. However this second iteration of the
      loop could cause a deadlock because it tries to lock some other extent
      tree node/leaf which another task already locked and it's blocked because
      it's waiting for a lock on some node/leaf that is in our path that was not
      released before.
      We could also deadlock when running the delayed references - as we could
      end up trying to lock the same nodes/leafs that we have in our local path
      (with a different lock type).
      
      Got into such case when running xfstests:
      
      [20892.242791] ------------[ cut here ]------------
      [20892.243776] WARNING: CPU: 0 PID: 13299 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
      [20892.245874] BTRFS: Transaction aborted (error -2)
      (...)
      [20892.269378] Call Trace:
      [20892.269915]  [<ffffffff8142fa46>] dump_stack+0x4f/0x7b
      [20892.271097]  [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad
      [20892.272173]  [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb
      [20892.273386]  [<ffffffffa0509a6d>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
      [20892.274857]  [<ffffffff81045f05>] warn_slowpath_fmt+0x46/0x48
      [20892.275851]  [<ffffffffa0509a6d>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
      [20892.277341]  [<ffffffffa0515e10>] write_one_cache_group+0x68/0xaf [btrfs]
      [20892.278628]  [<ffffffffa052088a>] btrfs_start_dirty_block_groups+0x18d/0x29b [btrfs]
      [20892.280191]  [<ffffffffa052f077>] btrfs_commit_transaction+0x130/0x9c9 [btrfs]
      (...)
      [20892.291316] ---[ end trace 597f77e664245373 ]---
      [20892.293955] BTRFS: error (device sdg) in write_one_cache_group:3184: errno=-2 No such entry
      [20892.297390] BTRFS info (device sdg): forced readonly
      [20892.298222] ------------[ cut here ]------------
      [20892.299190] WARNING: CPU: 0 PID: 13299 at fs/btrfs/ctree.c:2683 btrfs_search_slot+0x7e/0x7d2 [btrfs]()
      (...)
      [20892.326253] Call Trace:
      [20892.326904]  [<ffffffff8142fa46>] dump_stack+0x4f/0x7b
      [20892.329503]  [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad
      [20892.330815]  [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb
      [20892.332556]  [<ffffffffa0510b73>] ? btrfs_search_slot+0x7e/0x7d2 [btrfs]
      [20892.333955]  [<ffffffff81045f62>] warn_slowpath_null+0x1a/0x1c
      [20892.335562]  [<ffffffffa0510b73>] btrfs_search_slot+0x7e/0x7d2 [btrfs]
      [20892.336849]  [<ffffffff8107b024>] ? arch_local_irq_save+0x9/0xc
      [20892.338222]  [<ffffffffa051ad52>] ? cache_save_setup+0x43/0x2a5 [btrfs]
      [20892.339823]  [<ffffffffa051ad66>] ? cache_save_setup+0x57/0x2a5 [btrfs]
      [20892.341275]  [<ffffffff814351a4>] ? _raw_spin_unlock+0x32/0x46
      [20892.342810]  [<ffffffffa0515de7>] write_one_cache_group+0x3f/0xaf [btrfs]
      [20892.344184]  [<ffffffffa052088a>] btrfs_start_dirty_block_groups+0x18d/0x29b [btrfs]
      [20892.347162]  [<ffffffffa052f077>] btrfs_commit_transaction+0x130/0x9c9 [btrfs]
      (...)
      [20892.361015] ---[ end trace 597f77e664245374 ]---
      [21120.688097] INFO: task kworker/u8:17:29854 blocked for more than 120 seconds.
      [21120.689881]       Tainted: G        W       4.0.0-rc5-btrfs-next-9+ #2
      [21120.691384] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      (...)
      [21120.703696] Call Trace:
      [21120.704310]  [<ffffffff8143107e>] schedule+0x74/0x83
      [21120.705490]  [<ffffffffa055f025>] btrfs_tree_lock+0xd7/0x236 [btrfs]
      [21120.706757]  [<ffffffff81075cd6>] ? signal_pending_state+0x31/0x31
      [21120.708156]  [<ffffffffa054ac1e>] lock_extent_buffer_for_io+0x3e/0x194 [btrfs]
      [21120.709892]  [<ffffffffa054bb86>] ? btree_write_cache_pages+0x273/0x385 [btrfs]
      [21120.711605]  [<ffffffffa054bc42>] btree_write_cache_pages+0x32f/0x385 [btrfs]
      [21120.723440]  [<ffffffffa0527552>] btree_writepages+0x23/0x5c [btrfs]
      [21120.724943]  [<ffffffff8110c4c8>] do_writepages+0x23/0x2c
      [21120.726008]  [<ffffffff81176dde>] __writeback_single_inode+0x73/0x2fa
      [21120.727230]  [<ffffffff8117714a>] ? writeback_sb_inodes+0xe5/0x38b
      [21120.728526]  [<ffffffff811771fb>] ? writeback_sb_inodes+0x196/0x38b
      [21120.729701]  [<ffffffff8117726a>] writeback_sb_inodes+0x205/0x38b
      (...)
      [21120.747853] INFO: task btrfs:13282 blocked for more than 120 seconds.
      [21120.749459]       Tainted: G        W       4.0.0-rc5-btrfs-next-9+ #2
      [21120.751137] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      (...)
      [21120.768457] Call Trace:
      [21120.769039]  [<ffffffff8143107e>] schedule+0x74/0x83
      [21120.770107]  [<ffffffffa052f25c>] btrfs_commit_transaction+0x315/0x9c9 [btrfs]
      [21120.771558]  [<ffffffff81075cd6>] ? signal_pending_state+0x31/0x31
      [21120.773659]  [<ffffffffa056fd8c>] prepare_to_relocate+0xcb/0xd2 [btrfs]
      [21120.776257]  [<ffffffffa05741da>] relocate_block_group+0x44/0x4a9 [btrfs]
      [21120.777755]  [<ffffffffa05747a0>] ? btrfs_relocate_block_group+0x161/0x288 [btrfs]
      [21120.779459]  [<ffffffffa05747a8>] btrfs_relocate_block_group+0x169/0x288 [btrfs]
      [21120.781153]  [<ffffffffa0550403>] btrfs_relocate_chunk.isra.29+0x3e/0xa7 [btrfs]
      [21120.783918]  [<ffffffffa05518fd>] btrfs_balance+0xaa4/0xc52 [btrfs]
      [21120.785436]  [<ffffffff8114306e>] ? cpu_cache_get.isra.39+0xe/0x1f
      [21120.786434]  [<ffffffffa0559252>] btrfs_ioctl_balance+0x23f/0x2b0 [btrfs]
      (...)
      [21120.889251] INFO: task fsstress:13288 blocked for more than 120 seconds.
      [21120.890526]       Tainted: G        W       4.0.0-rc5-btrfs-next-9+ #2
      [21120.891773] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      (...)
      [21120.899960] Call Trace:
      [21120.900743]  [<ffffffff8143107e>] schedule+0x74/0x83
      [21120.903004]  [<ffffffffa055f025>] btrfs_tree_lock+0xd7/0x236 [btrfs]
      [21120.904383]  [<ffffffff81075cd6>] ? signal_pending_state+0x31/0x31
      [21120.905608]  [<ffffffffa051125b>] btrfs_search_slot+0x766/0x7d2 [btrfs]
      [21120.906812]  [<ffffffff8114290e>] ? virt_to_head_page+0x9/0x2c
      [21120.907874]  [<ffffffff81144b7f>] ? cache_alloc_debugcheck_after.isra.42+0x16c/0x1cb
      [21120.909551]  [<ffffffffa05124e0>] btrfs_insert_empty_items+0x5d/0xa8 [btrfs]
      [21120.910914]  [<ffffffffa0512585>] btrfs_insert_item+0x5a/0xa5 [btrfs]
      [21120.912181]  [<ffffffffa0520271>] ? btrfs_create_pending_block_groups+0x96/0x130 [btrfs]
      [21120.913784]  [<ffffffffa052028a>] btrfs_create_pending_block_groups+0xaf/0x130 [btrfs]
      [21120.915374]  [<ffffffffa052ffc2>] __btrfs_end_transaction+0x84/0x366 [btrfs]
      [21120.916735]  [<ffffffffa05302b4>] btrfs_end_transaction+0x10/0x12 [btrfs]
      [21120.917996]  [<ffffffffa051ab26>] btrfs_check_data_free_space+0x11f/0x27c [btrfs]
      [21120.919478]  [<ffffffffa051ba25>] btrfs_delalloc_reserve_space+0x1e/0x51 [btrfs]
      [21120.921226]  [<ffffffffa05382f2>] btrfs_truncate_page+0x85/0x2c4 [btrfs]
      [21120.923121]  [<ffffffffa0538572>] btrfs_cont_expand+0x41/0x3ef [btrfs]
      [21120.924449]  [<ffffffffa0541091>] ? btrfs_file_write_iter+0x19a/0x431 [btrfs]
      [21120.926602]  [<ffffffff8107b024>] ? arch_local_irq_save+0x9/0xc
      [21120.927769]  [<ffffffffa0541091>] ? btrfs_file_write_iter+0x19a/0x431 [btrfs]
      [21120.929324]  [<ffffffffa05410a0>] ? btrfs_file_write_iter+0x1a9/0x431 [btrfs]
      [21120.930723]  [<ffffffffa05410d9>] btrfs_file_write_iter+0x1e2/0x431 [btrfs]
      [21120.931897]  [<ffffffff81067d85>] ? get_parent_ip+0xe/0x3e
      [21120.934446]  [<ffffffff811534c3>] new_sync_write+0x7c/0xa0
      [21120.935528]  [<ffffffff81153b58>] vfs_write+0xb2/0x117
      (...)
      
      Fixes: 1bbc621e ("Btrfs: allow block group cache writeout
                            outside critical section in commit")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      24b89d08
    • F
      Btrfs: fix race between start dirty bg cache writeout and bg deletion · b58d1a9e
      Filipe Manana 提交于
      While running xfstests I ran into the following:
      
      [20892.242791] ------------[ cut here ]------------
      [20892.243776] WARNING: CPU: 0 PID: 13299 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
      [20892.245874] BTRFS: Transaction aborted (error -2)
      [20892.247329] Modules linked in: btrfs dm_snapshot dm_bufio dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse$
      [20892.258488] CPU: 0 PID: 13299 Comm: fsstress Tainted: G        W       4.0.0-rc5-btrfs-next-9+ #2
      [20892.262011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [20892.264738]  0000000000000009 ffff880427f8bc18 ffffffff8142fa46 ffffffff8108b6a2
      [20892.266244]  ffff880427f8bc68 ffff880427f8bc58 ffffffff81045ea5 ffff880427f8bc48
      [20892.267761]  ffffffffa0509a6d 00000000fffffffe ffff8803545d6f40 ffffffffa05a15a0
      [20892.269378] Call Trace:
      [20892.269915]  [<ffffffff8142fa46>] dump_stack+0x4f/0x7b
      [20892.271097]  [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad
      [20892.272173]  [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb
      [20892.273386]  [<ffffffffa0509a6d>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
      [20892.274857]  [<ffffffff81045f05>] warn_slowpath_fmt+0x46/0x48
      [20892.275851]  [<ffffffffa0509a6d>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
      [20892.277341]  [<ffffffffa0515e10>] write_one_cache_group+0x68/0xaf [btrfs]
      [20892.278628]  [<ffffffffa052088a>] btrfs_start_dirty_block_groups+0x18d/0x29b [btrfs]
      [20892.280191]  [<ffffffffa052f077>] btrfs_commit_transaction+0x130/0x9c9 [btrfs]
      [20892.281781]  [<ffffffff8107d33d>] ? trace_hardirqs_on+0xd/0xf
      [20892.282873]  [<ffffffffa054163b>] btrfs_sync_file+0x313/0x387 [btrfs]
      [20892.284111]  [<ffffffff8117acad>] vfs_fsync_range+0x95/0xa4
      [20892.285203]  [<ffffffff810e603f>] ? time_hardirqs_on+0x15/0x28
      [20892.286290]  [<ffffffff8123960b>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [20892.287469]  [<ffffffff8117acd8>] vfs_fsync+0x1c/0x1e
      [20892.288412]  [<ffffffff8117ae54>] do_fsync+0x34/0x4e
      [20892.289348]  [<ffffffff8117b07c>] SyS_fsync+0x10/0x14
      [20892.290255]  [<ffffffff81435b32>] system_call_fastpath+0x12/0x17
      [20892.291316] ---[ end trace 597f77e664245373 ]---
      [20892.293955] BTRFS: error (device sdg) in write_one_cache_group:3184: errno=-2 No such entry
      [20892.297390] BTRFS info (device sdg): forced readonly
      
      This happens because in btrfs_start_dirty_block_groups() we splice the
      transaction's list of dirty block groups into a local list and then we
      keep extracting the first element of the list without holding the
      cache_write_mutex mutex. This means that before we acquire that mutex
      the first block group on the list might be removed by a conurrent task
      running btrfs_remove_block_group(). So make sure we extract the first
      element (and test the list emptyness) while holding that mutex.
      
      Fixes: 1bbc621e ("Btrfs: allow block group cache writeout
                            outside critical section in commit")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      b58d1a9e
  5. 13 4月, 2015 10 次提交
    • D
      btrfs: qgroup: do a reservation in a higher level. · e2d1f923
      Dongsheng Yang 提交于
      There are two problems in qgroup:
      
      a). The PAGE_CACHE is 4K, even when we are writing a data of 1K,
      qgroup will reserve a 4K size. It will cause the last 3K in a qgroup
      is not available to user.
      
      b). When user is writing a inline data, qgroup will not reserve it,
      it means this is a window we can exceed the limit of a qgroup.
      
      The main idea of this patch is reserving the data size of write_bytes
      rather than the reserve_bytes. It means qgroup will not care about
      the data size btrfs will reserve for user, but only care about the
      data size user is going to write. Then reserve it when user want to
      write and release it in transaction committed.
      
      In this way, qgroup can be released from the complex procedure in
      btrfs and only do the reserve when user want to write and account
      when the data is written in commit_transaction().
      Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e2d1f923
    • D
      Btrfs: qgroup, Account data space in more proper timings. · 237c0e9f
      Dongsheng Yang 提交于
      Currenly, in data writing, ->reserved is accounted in
      fill_delalloc(), but ->may_use is released in clear_bit_hook()
      which is called by btrfs_finish_ordered_io(). That's too late,
      that said, between fill_delalloc() and btrfs_finish_ordered_io(),
      the data is doublely accounted by qgroup. It will cause some
      unexpected -EDQUOT.
      
      Example:
      	# btrfs quota enable /root/btrfs-auto-test/
      	# btrfs subvolume create /root/btrfs-auto-test//sub
      	Create subvolume '/root/btrfs-auto-test/sub'
      	# btrfs qgroup limit 1G /root/btrfs-auto-test//sub
      	dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000
      	dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
      	681353+0 records in
      	681352+0 records out
      	697704448 bytes (698 MB) copied, 8.15563 s, 85.5 MB/s
      It's (698 MB) when we got an -EDQUOT, but we limit it by 1G.
      
      This patch move the btrfs_qgroup_reserve/free() for data from
      btrfs_delalloc_reserve/release_metadata() to btrfs_check_data_free_space()
      and btrfs_free_reserved_data_space(). Then the accounter in qgroup
      will be updated at the same time with the accounter in space_info updated.
      In this way, the unexpected -EDQUOT will be killed.
      Reported-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
      Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      237c0e9f
    • D
      Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use. · 31193213
      Dongsheng Yang 提交于
      Currently, for pre_alloc or delay_alloc, the bytes will be accounted
      in space_info by the three guys.
      space_info->bytes_may_use --- space_info->reserved --- space_info->used.
      But on the other hand, in qgroup, there are only two counters to account the
      bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
      bytes in space_info->bytes_may_use and qg->excl accounts bytes in
      space_info->used. So the bytes in space_info->reserved is not accounted
      in qgroup. If so, there is a window we can exceed the quota limit when
      bytes is in space_info->reserved.
      
      Example:
      	# btrfs quota enable /mnt
      	# btrfs qgroup limit -e 10M /mnt
      	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
      	# sync
      	# btrfs qgroup show -pcre /mnt
      qgroupid rfer     excl     max_rfer max_excl parent  child
      -------- ----     ----     -------- -------- ------  -----
      0/5      20987904 20987904 0        10485760 ---     ---
      
      qg->excl is 20987904 larger than max_excl 10485760.
      
      This patch introduce a new counter named may_use to qgroup, then
      there are three counters in qgroup to account bytes in space_info
      as below.
      space_info->bytes_may_use --- space_info->reserved --- space_info->used.
      qgroup->may_use           --- qgroup->reserved     --- qgroup->excl
      
      With this patch applied:
      	# btrfs quota enable /mnt
      	# btrfs qgroup limit -e 10M /mnt
      	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
      fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
      fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
      fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
      fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
      fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
      fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
      fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
      fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
      fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
      fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
      fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
      	# sync
      	# btrfs qgroup show -pcre /mnt
      qgroupid rfer    excl    max_rfer max_excl parent  child
      -------- ----    ----    -------- -------- ------  -----
      0/5      9453568 9453568 0        10485760 ---     ---
      Reported-by: NCyril SCETBON <cyril.scetbon@free.fr>
      Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      31193213
    • D
      Btrfs: qgroup: free reserved in exceeding quota. · 804ca127
      Dongsheng Yang 提交于
      When we exceed quota limit in writing, we will free
      some reserved extent when we need to drop but not free
      account in qgroup. It means, each time we exceed quota
      in writing, there will be some remain space in qg->reserved
      we can not use any more. If things go on like this, the
      all space will be ate up.
      Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      804ca127
    • Z
      btrfs: Support busy loop of write and delete · c99f1b0c
      Zhao Lei 提交于
      Reproduce:
       while true; do
         dd if=/dev/zero of=/mnt/btrfs/file count=[75% fs_size]
         rm /mnt/btrfs/file
       done
       Then we can see above loop failed on NO_SPACE.
      
      It it long-term problem since very beginning, because delayed-iput
      after rm are not run.
      
      We already have commit_transaction() in alloc_space code, but it is
      not triggered in above case.
      This patch trigger commit_transaction() to run delayed-iput and
      reflash pinned-space to to make write success.
      
      It is based on previous fix of delayed-iput in commit_transaction(),
      need to be applied on top of:
      btrfs: Fix NO_SPACE bug caused by delayed-iput
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c99f1b0c
    • Z
      btrfs: Fix NO_SPACE bug caused by delayed-iput · d7c15171
      Zhao Lei 提交于
      Steps to reproduce:
        while true; do
          dd if=/dev/zero of=/btrfs_dir/file count=[fs_size * 75%]
          rm /btrfs_dir/file
          sync
        done
      
        And we'll see dd failed because btrfs return NO_SPACE.
      
      Reason:
        Normally, btrfs_commit_transaction() call btrfs_run_delayed_iputs()
        in end to free fs space for next write, but sometimes it hadn't
        done work on time, because btrfs-cleaner thread get delayed-iputs
        from list before, but do iput() after next write.
      
        This is log:
        [ 2569.050776] comm=btrfs-cleaner func=btrfs_evict_inode() begin
      
        [ 2569.084280] comm=sync func=btrfs_commit_transaction() call btrfs_run_delayed_iputs()
        [ 2569.085418] comm=sync func=btrfs_commit_transaction() done btrfs_run_delayed_iputs()
        [ 2569.087554] comm=sync func=btrfs_commit_transaction() end
      
        [ 2569.191081] comm=dd begin
        [ 2569.790112] comm=dd func=__btrfs_buffered_write() ret=-28
      
        [ 2569.847479] comm=btrfs-cleaner func=add_pinned_bytes() 0 + 32677888 = 32677888
        [ 2569.849530] comm=btrfs-cleaner func=add_pinned_bytes() 32677888 + 23834624 = 56512512
        ...
        [ 2569.903893] comm=btrfs-cleaner func=add_pinned_bytes() 943976448 + 21762048 = 965738496
        [ 2569.908270] comm=btrfs-cleaner func=btrfs_evict_inode() end
      
      Fix:
        Make btrfs_commit_transaction() wait current running btrfs-cleaner's
        delayed-iputs() done in end.
      
      Test:
        Use script similar to above(more complex),
        before patch:
          7 failed in 100 * 20 loop.
        after patch:
          0 failed in 100 * 20 loop.
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d7c15171
    • Z
      btrfs: add WARN_ON() to check is space_info op current · 18d018ad
      Zhao Lei 提交于
      space_info's value calculation is some complex and easy to cause
      bug, add WARN_ON() to help debug.
      
      Changelog v1->v2:
       Put WARN_ON()s under the ENOSPC_DEBUG mount option.
       Suggested by: David Sterba <dsterba@suse.cz>
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      18d018ad
    • Z
      btrfs: Set relative data on clear btrfs_block_group_cache->pinned · c30666d4
      Zhao Lei 提交于
      Bug1:
        space_info->bytes_readonly was set to very large(negative) value in
        btrfs_remove_block_group().
      
      Reason:
        Current code set block_group_cache->pinned = 0 in btrfs_delete_unused_bgs(),
        but above space was not counted to space_info->bytes_readonly.
      
        Then in btrfs_remove_block_group():
          block_group->space_info->bytes_readonly -= block_group->key.offset;
        We can see following value in trace:
          btrfs_remove_block_group: pid=2677 comm=btrfs-cleaner WARNING: bytes_readonly=12582912, key.offset=134217728
      
      Bug2:
        space_info->total_bytes_pinned grow to value larger than fs size.
        In a 1.2G fs, we can get following trace log:
        at first:
          ZL_DEBUG: add_pinned_bytes: pid=2710 comm=sync change total_bytes_pinned flags=1 869793792 + 95944704 = 965738496
        after some op:
          ZL_DEBUG: add_pinned_bytes: pid=2770 comm=sync change total_bytes_pinned flags=1 1780178944 + 95944704 = 1876123648
        after some op:
          ZL_DEBUG: add_pinned_bytes: pid=3193 comm=sync change total_bytes_pinned flags=1 2924568576 + 95551488 = 3020120064
        ...
      
      Reason:
        Similar to bug1, we also need to adjust space_info->total_bytes_pinned
        in above code block.
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c30666d4
    • Z
      btrfs: Adjust commit-transaction condition to avoid NO_SPACE more · 264ca0f6
      Zhao Lei 提交于
      If we have any chance to make a successful write, we should not give up.
      
      This patch adjust commit-transaction condition from:
        pinned >= wanted
      to
        left + pinned >= wanted
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      264ca0f6
    • Z
      btrfs: fix condition of commit transaction · 94b947b2
      Zhao Lei 提交于
      Old code bypass commit transaction when we don't have enough
      pinned space, but another case is there exist freed bgs in current
      transction, it have possibility to make alloc_chunk success.
      
      This patch modify the condition to:
      if (have_free_bg || have_pinned_space) commit_transaction()
      
      Confirmed above action by printk before and after patch.
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      94b947b2
  6. 11 4月, 2015 7 次提交
    • C
      Btrfs: fix use after free when close_ctree frees the orphan_rsv · cdfb080e
      Chris Mason 提交于
      Near the end of close_ctree, we're calling btrfs_free_block_rsv
      to free up the orphan rsv.  The problem is this call updates the
      space_info, which has already been freed.
      
      This adds a new __ function that directly calls kfree instead of trying
      to update the space infos.
      Signed-off-by: NChris Mason <clm@fb.com>
      cdfb080e
    • C
      Btrfs: allow block group cache writeout outside critical section in commit · 1bbc621e
      Chris Mason 提交于
      We loop through all of the dirty block groups during commit and write
      the free space cache.  In order to make sure the cache is currect, we do
      this while no other writers are allowed in the commit.
      
      If a large number of block groups are dirty, this can introduce long
      stalls during the final stages of the commit, which can block new procs
      trying to change the filesystem.
      
      This commit changes the block group cache writeout to take appropriate
      locks and allow it to run earlier in the commit.  We'll still have to
      redo some of the block groups, but it means we can get most of the work
      out of the way without blocking the entire FS.
      Signed-off-by: NChris Mason <clm@fb.com>
      1bbc621e
    • C
      Btrfs: two stage dirty block group writeout · c9dc4c65
      Chris Mason 提交于
      Block group cache writeout is currently waiting on the pages for each
      block group cache before moving on to writing the next one.  This commit
      switches things around to send down all the caches and then wait on them
      in batches.
      
      The end result is much faster, since we're keeping the disk pipeline
      full.
      Signed-off-by: NChris Mason <clm@fb.com>
      c9dc4c65
    • J
      Btrfs: don't commit the transaction in the async space flushing · 365c5313
      Josef Bacik 提交于
      We're triggering a huge number of commits from
      btrfs_async_reclaim_metadata_space.  These aren't really requried,
      because everyone calling the async reclaim code is going to end up
      triggering a commit on their own.
      Signed-off-by: NChris Mason <clm@fb.com>
      365c5313
    • J
      Btrfs: reserve space for block groups · cb723e49
      Josef Bacik 提交于
      This changes our delayed refs calculations to include the space needed
      to write back dirty block groups.
      Signed-off-by: NChris Mason <clm@fb.com>
      cb723e49
    • C
      Btrfs: refill block reserves during truncate · 28f75a0e
      Chris Mason 提交于
      When truncate starts, it allocates some space in the block reserves so
      that we'll have enough to update metadata along the way.
      
      For very large files, we can easily go through all of that space as we
      loop through the extents.  This changes truncate to refill the space
      reservation as it progresses through the file.
      Signed-off-by: NChris Mason <clm@fb.com>
      28f75a0e
    • J
      Btrfs: account for crcs in delayed ref processing · 1262133b
      Josef Bacik 提交于
      As we delete large extents, we end up doing huge amounts of COW in order
      to delete the corresponding crcs.  This adds accounting so that we keep
      track of that space and flushing of delayed refs so that we don't build
      up too much delayed crc work.
      
      This helps limit the delayed work that must be done at commit time and
      tries to avoid ENOSPC aborts because the crcs eat all the global
      reserves.
      Signed-off-by: NChris Mason <clm@fb.com>
      1262133b
  7. 01 4月, 2015 1 次提交
    • C
      Btrfs: free and unlock our path before btrfs_free_and_pin_reserved_extent() · dd825259
      Chris Mason 提交于
      The error handling path for alloc_reserved_tree_block is calling
      btrfs_free_and_pin_reserved_extent with a spinning tree lock held.  This
      might sleep as we allocate extent_state objects:
      
       BUG: sleeping function called from invalid context at mm/slub.c:1268
       in_atomic(): 1, irqs_disabled(): 0, pid: 11093, name: kworker/u4:7
       5 locks held by kworker/u4:7/11093:
        #0:  ("%s-%s""btrfs", name){++++.+}, at: [<ffffffff81091d51>] process_one_work+0x151/0x520
        #1:  ((&work->normal_work)){+.+.+.}, at: [<ffffffff81091d51>] process_one_work+0x151/0x520
        #2:  (sb_internal){++++.+}, at: [<ffffffffa003a70e>] start_transaction+0x43e/0x590 [btrfs]
        #3:  (&head_ref->mutex){+.+...}, at: [<ffffffffa0089f8c>] btrfs_delayed_ref_lock+0x4c/0x240 [btrfs]
        #4:  (btrfs-extent-00){++++..}, at: [<ffffffffa007697b>] btrfs_clear_lock_blocking_rw+0x9b/0x150 [btrfs]
       CPU: 0 PID: 11093 Comm: kworker/u4:7 Tainted: G        W 4.0.0-rc6-default+ #246
       Hardware name: Intel Corporation Santa Rosa platform/Matanzas, BIOS TSRSCRB1.86C.0047.B00.0610170821 10/17/06
       Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
        00000000000004f4 ffff88006dd17848 ffffffff81ab0e3b ffff88006dd17848
        ffff88007a944760 ffff88006dd17868 ffffffff8109d516 ffff88006dd17898
        0000000000000000 ffff88006dd17898 ffffffff8109d5b2 ffffffff81aba2bb
       Call Trace:
        [<ffffffff81ab0e3b>] dump_stack+0x4f/0x6c
        [<ffffffff8109d516>] ___might_sleep+0xf6/0x140
        [<ffffffff8109d5b2>] __might_sleep+0x52/0x90
        [<ffffffff81aba2bb>] ? ftrace_call+0x5/0x34
        [<ffffffff81196363>] kmem_cache_alloc+0x163/0x1b0
        [<ffffffffa0056f31>] ? alloc_extent_state+0x31/0x150 [btrfs]
        [<ffffffffa0056f20>] ? alloc_extent_state+0x20/0x150 [btrfs]
        [<ffffffffa0056f31>] alloc_extent_state+0x31/0x150 [btrfs]
        [<ffffffffa005805b>] __set_extent_bit+0x37b/0x5d0 [btrfs]
        [<ffffffff81aba2bb>] ? ftrace_call+0x5/0x34
        [<ffffffffa005888d>] ? set_extent_bit+0xd/0x30 [btrfs]
        [<ffffffffa00588a3>] set_extent_bit+0x23/0x30 [btrfs]
        [<ffffffffa0058e80>] set_extent_dirty+0x20/0x30 [btrfs]
        [<ffffffffa00195ba>] pin_down_extent+0xaa/0x170 [btrfs]
        [<ffffffffa001d8ef>] __btrfs_free_reserved_extent+0xcf/0x160 [btrfs]
        [<ffffffffa0023856>] btrfs_free_and_pin_reserved_extent+0x16/0x20 [btrfs]
        [<ffffffffa002482a>] __btrfs_run_delayed_refs+0xfca/0x1290 [btrfs]
        [<ffffffffa0026eae>] btrfs_run_delayed_refs+0x6e/0x2e0 [btrfs]
        [<ffffffffa0027378>] delayed_ref_async_start+0x48/0xb0 [btrfs]
        [<ffffffffa006c883>] normal_work_helper+0x83/0x350 [btrfs]
        [<ffffffffa006cd79>] ? btrfs_extent_refs_helper+0x9/0x20 [btrfs]
        [<ffffffffa006cd82>] btrfs_extent_refs_helper+0x12/0x20 [btrfs]
        [<ffffffff81091dcb>] process_one_work+0x1cb/0x520
        [<ffffffff81091d51>] ? process_one_work+0x151/0x520
        [<ffffffff811c7abf>] ? seq_read+0x3f/0x400
        [<ffffffff8109260b>] worker_thread+0x5b/0x4e0
        [<ffffffff81097be2>] ? __kthread_parkme+0x12/0xa0
        [<ffffffff810925b0>] ? rescuer_thread+0x450/0x450
        [<ffffffff81098686>] kthread+0xf6/0x120
        [<ffffffff81098590>] ? flush_kthread_worker+0x1b0/0x1b0
        [<ffffffff81ab8088>] ret_from_fork+0x58/0x90
        [<ffffffff81098590>] ? flush_kthread_worker+0x1b0/0x1b0
       ------------[ cut here ]------------
      
      This changes things to free the path first, which will also unlock the
      extent buffer.
      Signed-off-by: NChris Mason <clm@fb.com>
      Reported-by: NDave Sterba <dsterba@suse.cz>
      Tested-by: NDave Sterba <dsterba@suse.cz>
      dd825259
  8. 27 3月, 2015 1 次提交
    • F
      Btrfs: fix log tree corruption when fs mounted with -o discard · dcc82f47
      Filipe Manana 提交于
      While committing a transaction we free the log roots before we write the
      new super block. Freeing the log roots implies marking the disk location
      of every node/leaf (metadata extent) as pinned before the new super block
      is written. This is to prevent the disk location of log metadata extents
      from being reused before the new super block is written, otherwise we
      would have a corrupted log tree if before the new super block is written
      a crash/reboot happens and the location of any log tree metadata extent
      ended up being reused and rewritten.
      
      Even though we pinned the log tree's metadata extents, we were issuing a
      discard against them if the fs was mounted with the -o discard option,
      resulting in corruption of the log tree if a crash/reboot happened before
      writing the new super block - the next time the fs was mounted, during
      the log replay process we would find nodes/leafs of the log btree with
      a content full of zeroes, causing the process to fail and require the
      use of the tool btrfs-zero-log to wipeout the log tree (and all data
      previously fsynced becoming lost forever).
      
      Fix this by not doing a discard when pinning an extent. The discard will
      be done later when it's safe (after the new super block is committed) at
      extent-tree.c:btrfs_finish_extent_commit().
      
      Fixes: e688b725 (Btrfs: fix extent pinning bugs in the tree log)
      CC: <stable@vger.kernel.org>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      dcc82f47
  9. 18 3月, 2015 1 次提交
    • J
      Btrfs: add sanity test for outstanding_extents accounting · 6a3891c5
      Josef Bacik 提交于
      I introduced a regression wrt outstanding_extents accounting.  These are tricky
      areas that aren't easily covered by xfstests as we could change MAX_EXTENT_SIZE
      at any time.  So add sanity tests to cover the various conditions that are
      tricky in order to make sure we don't introduce regressions in the future.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      6a3891c5
  10. 17 3月, 2015 1 次提交
    • J
      Btrfs: prepare block group cache before writing · dcdf7f6d
      Josef Bacik 提交于
      Writing the block group cache will modify the extent tree quite a bit because it
      truncates the old space cache and pre-allocates new stuff.  To try and cut down
      on the churn lets do the setup dance first, then later on hopefully we can avoid
      looping with newly dirtied roots.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      dcdf7f6d
  11. 14 3月, 2015 1 次提交
  12. 04 3月, 2015 5 次提交
  13. 03 3月, 2015 1 次提交
  14. 21 2月, 2015 1 次提交
  15. 17 2月, 2015 2 次提交
  16. 15 2月, 2015 2 次提交
    • F
      Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group · 3d84be79
      Forrest Liu 提交于
      Removing large amount of block group in a transaction may encounters
      BUG_ON() in btrfs_orphan_add(). That is because btrfs_orphan_reserve_metadata()
      will grab metadata reservation from transaction handle, and
      btrfs_delete_unused_bgs() didn't reserve metadata for trnasaction handle when
      delete unused block group.
      
      The problem can be reproduce by following script
      
          mntpath=/btrfs
          loopdev=/dev/loop0
          filepath=/home/forrest/image
      
          umount $mntpath
          losetup -d $loopdev
          truncate --size 1000g $filepath
          losetup $loopdev $filepath
          mkfs.btrfs -f $loopdev
          mount $loopdev $mntpath
      
          for j in `seq 1 1 1000`; do
              fallocate -l 1g $mntpath/$j
          done
          # wait cleaner thread remove unused block group
          sleep 300
      
      The call trace that results from the BUG_ON() is:
      
      [  613.093084] ------------[ cut here ]------------
      [  613.097928] kernel BUG at fs/btrfs/inode.c:3142!
      [  613.105855] invalid opcode: 0000 [#1] SMP
      [  613.112702] Modules linked in: coretemp(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) snd_ens1371(E) snd_ac97_codec(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ppdev(E) ac97_bus(E) ablk_helper(E) gameport(E) cryptd(E) snd_rawmidi(E) snd_seq_device(E) snd_pcm(E) vmw_balloon(E) snd_timer(E) snd(E) soundcore(E) serio_raw(E) vmwgfx(E) ttm(E) drm_kms_helper(E) drm(E) vmw_vmci(E) parport_pc(E) shpchp(E) i2c_piix4(E) mac_hid(E) lp(E) parport(E) btrfs(E) xor(E) raid6_pq(E) hid_generic(E) usbhid(E) hid(E) psmouse(E) ahci(E) libahci(E) e1000(E) mptspi(E) mptscsih(E) mptbase(E) floppy(E) vmw_pvscsi(E) vmxnet3(E)
      [  613.144196] CPU: 0 PID: 1480 Comm: btrfs-cleaner Tainted: G            E  3.19.0-rc7-custom #2
      [  613.148501] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
      [  613.152694] task: ffff880035cdb1a0 ti: ffff880039cf4000 task.ti: ffff880039cf4000
      [  613.154969] RIP: 0010:[<ffffffffa01441c2>]  [<ffffffffa01441c2>] btrfs_orphan_add+0x1d2/0x1e0 [btrfs]
      [  613.157780] RSP: 0018:ffff880039cf7c48  EFLAGS: 00010286
      [  613.159560] RAX: 00000000ffffffe4 RBX: ffff88003bd981a0 RCX: ffff88003c9e4000
      [  613.161904] RDX: 0000000000002244 RSI: 0000000000040000 RDI: ffff88003c9e4138
      [  613.164264] RBP: ffff880039cf7c88 R08: 000060ffc0000850 R09: 0000000000000000
      [  613.166507] R10: ffff88003bc4b7a0 R11: ffffea0000eb6740 R12: ffff88003c9c0000
      [  613.168681] R13: ffff88003c102160 R14: ffff88003c9c0458 R15: 0000000000000001
      [  613.170932] FS:  0000000000000000(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
      [  613.173316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  613.175227] CR2: 00007f6343537000 CR3: 0000000036329000 CR4: 00000000000407f0
      [  613.177554] Stack:
      [  613.178712]  ffff880039cf7c88 ffffffffa0182a54 ffff88003c9e4b04 ffff88003c9c7800
      [  613.181297]  ffff88003bc4b7a0 ffff88003bd981a0 ffff88003c8db200 ffff88003c2fcc60
      [  613.183782]  ffff880039cf7d18 ffffffffa012da97 ffff88003bc4b7a4 ffff88003bc4b7a0
      [  613.186171] Call Trace:
      [  613.187493]  [<ffffffffa0182a54>] ? lookup_free_space_inode+0x44/0x100 [btrfs]
      [  613.189801]  [<ffffffffa012da97>] btrfs_remove_block_group+0x137/0x740 [btrfs]
      [  613.192126]  [<ffffffffa0166912>] btrfs_remove_chunk+0x672/0x780 [btrfs]
      [  613.194267]  [<ffffffffa012e2ff>] btrfs_delete_unused_bgs+0x25f/0x280 [btrfs]
      [  613.196567]  [<ffffffffa0135e4c>] cleaner_kthread+0x12c/0x190 [btrfs]
      [  613.198687]  [<ffffffffa0135d20>] ? check_leaf+0x350/0x350 [btrfs]
      [  613.200758]  [<ffffffff8108f232>] kthread+0xd2/0xf0
      [  613.202616]  [<ffffffff8108f160>] ? kthread_create_on_node+0x180/0x180
      [  613.204738]  [<ffffffff8175dabc>] ret_from_fork+0x7c/0xb0
      [  613.206652]  [<ffffffff8108f160>] ? kthread_create_on_node+0x180/0x180
      [  613.208741] Code: ff ff 0f 1f 80 00 00 00 00 89 45 c8 3e 80 63 80 fd 48 89 df e8 d0 23 fe ff 8b 45 c8 e9 14 ff ff ff b8 f4 ff ff ff e9 12 ff ff ff <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
      [  613.216562] RIP  [<ffffffffa01441c2>] btrfs_orphan_add+0x1d2/0x1e0 [btrfs]
      [  613.218828]  RSP <ffff880039cf7c48>
      [  613.220382] ---[ end trace 71073106deb8a457 ]---
      
      This patch replace btrfs_join_transaction() with btrfs_start_transaction() in
      btrfs_delete_unused_bgs() to revent BUG_ON() in btrfs_orphan_add()
      Signed-off-by: NForrest Liu <forrestl@synology.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3d84be79
    • J
      Btrfs: account for large extents with enospc · dcab6a3b
      Josef Bacik 提交于
      On our gluster boxes we stream large tar balls of backups onto our fses.  With
      160gb of ram this means we get really large contiguous ranges of dirty data, but
      the way our ENOSPC stuff works is that as long as it's contiguous we only hold
      metadata reservation for one extent.  The problem is we limit our extents to
      128mb, so we'll end up with at least 800 extents so our enospc accounting is
      quite a bit lower than what we need.  To keep track of this make sure we
      increase outstanding_extents for every multiple of the max extent size so we can
      be sure to have enough reserved metadata space.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      dcab6a3b