1. 27 10月, 2015 1 次提交
    • D
      btrfs: extend balance filter limit to take minimum and maximum · 12907fc7
      David Sterba 提交于
      The 'limit' filter is underdesigned, it should have been a range for
      [min,max], with some relaxed semantics when one of the bounds is
      missing. Besides that, using a full u64 for a single value is a waste of
      bytes.
      
      Let's fix both by extending the use of the u64 bytes for the [min,max]
      range. This can be done in a backward compatible way, the range will be
      interpreted only if the appropriate flag is set
      (BTRFS_BALANCE_ARGS_LIMIT_RANGE).
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      12907fc7
  2. 22 10月, 2015 4 次提交
  3. 11 10月, 2015 1 次提交
  4. 08 10月, 2015 3 次提交
  5. 02 10月, 2015 1 次提交
  6. 01 10月, 2015 3 次提交
  7. 29 9月, 2015 5 次提交
  8. 01 9月, 2015 1 次提交
    • Z
      btrfs: Add raid56 support for updating · 943c6e99
      Zhao Lei 提交于
       num_tolerated_disk_barrier_failures in btrfs_balance
      
      Code for updating fs_info->num_tolerated_disk_barrier_failures in
      btrfs_balance() lacks raid56 support.
      
      Reason:
       Above code was wroten in 2012-08-01, together with
       btrfs_calc_num_tolerated_disk_barrier_failures()'s first version.
      
       Then, btrfs_calc_num_tolerated_disk_barrier_failures() got updated
       later to support raid56, but code in btrfs_balance() was not
       updated together.
      
      Fix:
       Merge above similar code to a common function:
       btrfs_get_num_tolerated_disk_barrier_failures()
       and make it support both case.
      
       It can fix this bug with a bonus of cleanup, and make these code
       never in above no-sync state from now on.
      Suggested-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      943c6e99
  9. 22 8月, 2015 1 次提交
    • C
      btrfs: fix compile when block cgroups are not enabled · 3a9508b0
      Chris Mason 提交于
      bio->bi_css and bio->bi_ioc don't exist when block cgroups are not on.
      This adds an ifdef around them.  It's not perfect, but our
      use of bi_ioc is being removed in the 4.3 merge window.
      
      The bi_css usage really should go into bio_clone, but I want to make
      sure that doesn't introduce problems for other bio_clone use cases.
      Signed-off-by: NChris Mason <clm@fb.com>
      3a9508b0
  10. 20 8月, 2015 1 次提交
    • M
      btrfs: use __GFP_NOFAIL in alloc_btrfs_bio · 277fb5fc
      Michal Hocko 提交于
      alloc_btrfs_bio relies on GFP_NOFS allocation when committing the
      transaction but this allocation context is rather weak wrt. reclaim
      capabilities. The page allocator currently tries hard to not fail these
      allocations if they are small (<=PAGE_ALLOC_COSTLY_ORDER) but it can
      still fail if the _current_ process is the OOM killer victim. Moreover
      there is an attempt to move away from the default no-fail behavior and
      allow these allocation to fail more eagerly. This would lead to:
      
      [   37.928625] kernel BUG at fs/btrfs/extent_io.c:4045
      
      which is clearly undesirable and the nofail behavior should be explicit
      if the allocation failure cannot be tolerated.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      277fb5fc
  11. 14 8月, 2015 1 次提交
  12. 09 8月, 2015 3 次提交
    • C
      Btrfs: add support for blkio controllers · da2f0f74
      Chris Mason 提交于
      This attaches accounting information to bios as we submit them so the
      new blkio controllers can throttle on btrfs filesystems.
      
      Not much is required, we're just associating bios with blkcgs during clone,
      calling wbc_init_bio()/wbc_account_io() during writepages submission,
      and attaching the bios to the current context during direct IO.
      
      Finally if we are splitting bios during btrfs_map_bio, this attaches
      accounting information to the split.
      
      The end result is able to throttle nicely on single disk filesystems.  A
      little more work is required for multi-device filesystems.
      Signed-off-by: NChris Mason <clm@fb.com>
      da2f0f74
    • Z
      btrfs: Cleanup: Remove chunk_objectid argument from btrfs_relocate_chunk() · dc2ee4e2
      Zhaolei 提交于
      Remove chunk_objectid argument from btrfs_relocate_chunk() because
      it is not necessary, it can also cleanup some code in caller for
      prepare its value.
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      dc2ee4e2
    • Z
      btrfs: Fix data checksum error cause by replace with io-load. · 55e3a601
      Zhaolei 提交于
      xfstests btrfs/070 sometimes failed.
      In my test machine, its fail rate is about 30%.
      In another vm(vmware), its fail rate is about 50%.
      
      Reason:
        btrfs/070 do replace and defrag with fsstress simultaneously,
        after above operation, checksum error is found by scrub.
      
        Actually, it have no relationship with defrag operation, only
        replace with fsstress can trigger this bug.
      
        New data writen to target device have possibility rewrited by
        old data from source device by replace code in debug, to avoid
        above problem, we can set target block group to readonly in
        replace period, so new data requested by other operation will
        not write to same place with replace code.
      
        Before patch(4.1-rc3):
          30% failed in 100 xfstests.
        After patch:
          0% failed in 300 xfstests.
      
      It also happened in btrfs/071 as it's another scrub with IO load tests.
      Reported-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      55e3a601
  13. 29 7月, 2015 2 次提交
    • J
      btrfs: iterate over unused chunk space in FITRIM · 499f377f
      Jeff Mahoney 提交于
      Since we now clean up block groups automatically as they become
      empty, iterating over block groups is no longer sufficient to discard
      unused space.
      
      This patch iterates over the unused chunk space and discards any regions
      that are unallocated, regardless of whether they were ever used.  This is
      a change for btrfs but is consistent with other file systems.
      
      We do this in a transactionless manner since the discard process can take
      a substantial amount of time and a transaction would need to be started
      before the acquisition of the device list lock.  That would mean a
      transaction would be held open across /all/ of the discards collectively.
      In order to prevent other threads from allocating or freeing chunks, we
      hold the chunks lock across the search and discard calls.  We release it
      between searches to allow the file system to perform more-or-less
      normally.  Since the running transaction can commit and disappear while
      we're using the transaction pointer, we take a reference to it and
      release it after the search.  This is safe since it would happen normally
      at the end of the transaction commit after any locks are released anyway.
      We also take the commit_root_sem to protect against a transaction starting
      and committing while we're running.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Tested-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      499f377f
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  14. 01 7月, 2015 2 次提交
    • F
      Btrfs: fix race between balance and unused block group deletion · 67c5e7d4
      Filipe Manana 提交于
      We have a race between deleting an unused block group and balancing the
      same block group that leads to an assertion failure/BUG(), producing the
      following trace:
      
      [181631.208236] BTRFS: assertion failed: 0, file: fs/btrfs/volumes.c, line: 2622
      [181631.220591] ------------[ cut here ]------------
      [181631.222959] kernel BUG at fs/btrfs/ctree.h:4062!
      [181631.223932] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [181631.224566] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse acpi_cpufreq parpor$
      [181631.224566] CPU: 8 PID: 17451 Comm: btrfs Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
      [181631.224566] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
      [181631.224566] task: ffff880127e09590 ti: ffff8800b5824000 task.ti: ffff8800b5824000
      [181631.224566] RIP: 0010:[<ffffffffa03f19f6>]  [<ffffffffa03f19f6>] assfail.constprop.50+0x1e/0x20 [btrfs]
      [181631.224566] RSP: 0018:ffff8800b5827ae8  EFLAGS: 00010246
      [181631.224566] RAX: 0000000000000040 RBX: ffff8800109fc218 RCX: ffffffff81095dce
      [181631.224566] RDX: 0000000000005124 RSI: ffffffff81464819 RDI: 00000000ffffffff
      [181631.224566] RBP: ffff8800b5827ae8 R08: 0000000000000001 R09: 0000000000000000
      [181631.224566] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800109fc200
      [181631.224566] R13: ffff880020095000 R14: ffff8800b1a13f38 R15: ffff880020095000
      [181631.224566] FS:  00007f70ca0b0c80(0000) GS:ffff88013ec00000(0000) knlGS:0000000000000000
      [181631.224566] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [181631.224566] CR2: 00007f2872ab6e68 CR3: 00000000a717c000 CR4: 00000000000006e0
      [181631.224566] Stack:
      [181631.224566]  ffff8800b5827ba8 ffffffffa03f3916 ffff8800b5827b38 ffffffffa03d080e
      [181631.224566]  ffffffffa03d1423 ffff880020095000 ffff88001233c000 0000000000000001
      [181631.224566]  ffff880020095000 ffff8800b1a13f38 0000000a69c00000 0000000000000000
      [181631.224566] Call Trace:
      [181631.224566]  [<ffffffffa03f3916>] btrfs_remove_chunk+0xa4/0x6bb [btrfs]
      [181631.224566]  [<ffffffffa03d080e>] ? join_transaction.isra.8+0xb9/0x3ba [btrfs]
      [181631.224566]  [<ffffffffa03d1423>] ? wait_current_trans.isra.13+0x22/0xfc [btrfs]
      [181631.224566]  [<ffffffffa03f3fbc>] btrfs_relocate_chunk.isra.29+0x8f/0xa7 [btrfs]
      [181631.224566]  [<ffffffffa03f54df>] btrfs_balance+0xaa4/0xc52 [btrfs]
      [181631.224566]  [<ffffffffa03fd388>] btrfs_ioctl_balance+0x23f/0x2b0 [btrfs]
      [181631.224566]  [<ffffffff810872f9>] ? trace_hardirqs_on+0xd/0xf
      [181631.224566]  [<ffffffffa04019a3>] btrfs_ioctl+0xfe2/0x2220 [btrfs]
      [181631.224566]  [<ffffffff812603ed>] ? __this_cpu_preempt_check+0x13/0x15
      [181631.224566]  [<ffffffff81084669>] ? arch_local_irq_save+0x9/0xc
      [181631.224566]  [<ffffffff81138def>] ? handle_mm_fault+0x834/0xcd2
      [181631.224566]  [<ffffffff81138def>] ? handle_mm_fault+0x834/0xcd2
      [181631.224566]  [<ffffffff8103e48c>] ? __do_page_fault+0x211/0x424
      [181631.224566]  [<ffffffff811755e6>] do_vfs_ioctl+0x3c6/0x479
      (...)
      
      The sequence of steps leading to this are:
      
                 CPU 0                                         CPU 1
      
        btrfs_balance()
          btrfs_relocate_chunk()
      
            btrfs_relocate_block_group(bg X)
              btrfs_lookup_block_group(bg X)
      
                                                     cleaner_kthread
                                                        locks fs_info->cleaner_mutex
      
                                                        btrfs_delete_unused_bgs()
                                                          finds bg X, which became
                                                          unused in the previous
                                                          transaction
      
                                                          checks bg X ->ro == 0,
                                                          so it proceeds
              sets bg X ->ro to 1
              (btrfs_set_block_group_ro(bg X))
      
              blocks on fs_info->cleaner_mutex
                                                          btrfs_remove_chunk(bg X)
                                                        unlocks fs_info->cleaner_mutex
      
              acquires fs_info->cleaner_mutex
              relocate_block_group()
                --> does nothing, no extents found in
                    the extent tree from bg X
              unlocks fs_info->cleaner_mutex
      
            btrfs_relocate_block_group(bg X) returns
      
          btrfs_remove_chunk(bg X)
             extent map not found
                --> ASSERT(0)
      
      Fix this by using a new mutex to make sure these 2 operations, block
      group relocation and removal, are serialized.
      
      This issue is reproducible by running fstests generic/038 (which stresses
      chunk allocation and automatic removal of unused block groups) together
      with the following balance loop:
      
          while true; do btrfs balance start -dusage=0 <mountpoint> ; done
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      67c5e7d4
    • Z
      btrfs: cleanup noused initialization of dev in btrfs_end_bio() · 65f53338
      Zhao Lei 提交于
      It is introduced by:
       c404e0dc
       Btrfs: fix use-after-free in the finishing procedure of the device replace
      
      But seems no relationship with that bug, this patch revirt these
      code block for cleanup.
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      65f53338
  15. 19 6月, 2015 2 次提交
  16. 10 6月, 2015 1 次提交
    • F
      Btrfs: fix necessary chunk tree space calculation when allocating a chunk · 4617ea3a
      Filipe Manana 提交于
      When allocating a new chunk or removing one we need to update num_devs
      device items and insert or remove a chunk item in the chunk tree, so
      in the worst case the space needed in the chunk space_info is:
      
        btrfs_calc_trunc_metadata_size(chunk_root, num_devs) +
           btrfs_calc_trans_metadata_size(chunk_root, 1)
      
      That is, in the worst case we need to cow num_devs paths and cow 1 other
      path that can result in splitting every node and leaf, and each path
      consisting of BTRFS_MAX_LEVEL - 1 nodes and 1 leaf. We were requiring
      some additional chunk_root->nodesize * BTRFS_MAX_LEVEL * num_devs bytes,
      which were unnecessary since updating the existing device items does
      not result in splitting the nodes and leaf since after updating them
      they remain with the same size.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      4617ea3a
  17. 03 6月, 2015 7 次提交
    • F
      Btrfs: fix -ENOSPC on block group removal · 39c2d7fa
      Filipe Manana 提交于
      Unlike when attempting to allocate a new block group, where we check
      that we have enough space in the system space_info to update the device
      items and insert a new chunk item in the chunk tree, we were not checking
      if the system space_info had enough space for updating the device items
      and deleting the chunk item in the chunk tree. This often lead to -ENOSPC
      error when attempting to allocate blocks for the chunk tree (during btree
      node/leaf COW operations) while updating the device items or deleting the
      chunk item, which resulted in the current transaction being aborted and
      turning the filesystem into read-only mode.
      
      While running fstests generic/038, which stresses allocation of block
      groups and removal of unused block groups, with a large scratch device
      (750Gb) this happened often, despite more than enough unallocated space,
      and resulted in the following trace:
      
      [68663.586604] WARNING: CPU: 3 PID: 1521 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
      [68663.600407] BTRFS: Transaction aborted (error -28)
      (...)
      [68663.730829] Call Trace:
      [68663.732585]  [<ffffffff8142fa46>] dump_stack+0x4f/0x7b
      [68663.734334]  [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad
      [68663.739980]  [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb
      [68663.757153]  [<ffffffffa036ca6d>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
      [68663.760925]  [<ffffffff81045f05>] warn_slowpath_fmt+0x46/0x48
      [68663.762854]  [<ffffffffa03b159d>] ? btrfs_update_device+0x15a/0x16c [btrfs]
      [68663.764073]  [<ffffffffa036ca6d>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
      [68663.765130]  [<ffffffffa03b3638>] btrfs_remove_chunk+0x597/0x5ee [btrfs]
      [68663.765998]  [<ffffffffa0384663>] ? btrfs_delete_unused_bgs+0x245/0x296 [btrfs]
      [68663.767068]  [<ffffffffa0384676>] btrfs_delete_unused_bgs+0x258/0x296 [btrfs]
      [68663.768227]  [<ffffffff8143527f>] ? _raw_spin_unlock_irq+0x2d/0x4c
      [68663.769081]  [<ffffffffa038b109>] cleaner_kthread+0x13d/0x16c [btrfs]
      [68663.799485]  [<ffffffffa038afcc>] ? btrfs_alloc_root+0x28/0x28 [btrfs]
      [68663.809208]  [<ffffffff8105f367>] kthread+0xef/0xf7
      [68663.828795]  [<ffffffff810e603f>] ? time_hardirqs_on+0x15/0x28
      [68663.844942]  [<ffffffff8105f278>] ? __kthread_parkme+0xad/0xad
      [68663.846486]  [<ffffffff81435a88>] ret_from_fork+0x58/0x90
      [68663.847760]  [<ffffffff8105f278>] ? __kthread_parkme+0xad/0xad
      [68663.849503] ---[ end trace 798477c6d6dbaad6 ]---
      [68663.850525] BTRFS: error (device sdc) in btrfs_remove_chunk:2652: errno=-28 No space left
      
      So fix this by verifying that enough space exists in system space_info,
      and reserving the space in the chunk block reserve, before attempting to
      delete the block group and allocate a new system chunk if we don't have
      enough space to perform the necessary updates and delete in the chunk
      tree. Like for the block group creation case, we don't error our if we
      fail to allocate a new system chunk, since we might end up not needing
      it (no node/leaf splits happen during the COW operations and/or we end
      up not needing to COW any btree nodes or leafs because they were already
      COWed in the current transaction and their writeback didn't start yet).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      39c2d7fa
    • F
      Btrfs: fix chunk allocation regression leading to transaction abort · c152b63e
      Filipe Manana 提交于
      With commit 1b984508 ("Btrfs: fix find_free_dev_extent() malfunction
      in case device tree has hole") introduced in the kernel 4.1 merge window,
      we end up using part of a device hole for which there are already pending
      chunks or pinned chunks. Before that commit we didn't use the hole and
      would just move on to the next hole in the device.
      
      However when we adjust the start offset for the chunk allocation and we
      have pinned chunks, we set it blindly to the end offset of the pinned
      chunk we are currently processing, which is dangerous because we can
      have a pending chunk that has a start offset that matches the end offset
      of our pinned chunk - leading us to a case where we end up getting two
      pending chunks that start at the same physical device offset, which makes
      us later abort the current transaction with -EEXIST when finishing the
      chunk allocation at btrfs_create_pending_block_groups():
      
      [194737.659017] ------------[ cut here ]------------
      [194737.660192] WARNING: CPU: 15 PID: 31111 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x106 [btrfs]()
      [194737.662209] BTRFS: Transaction aborted (error -17)
      [194737.663175] Modules linked in: btrfs dm_snapshot dm_bufio dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse
      [194737.674015] CPU: 15 PID: 31111 Comm: xfs_io Tainted: G        W       4.0.0-rc5-btrfs-next-9+ #2
      [194737.675986] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [194737.682999]  0000000000000009 ffff8800564c7a98 ffffffff8142fa46 ffffffff8108b6a2
      [194737.684540]  ffff8800564c7ae8 ffff8800564c7ad8 ffffffff81045ea5 ffff8800564c7b78
      [194737.686017]  ffffffffa0383aa7 00000000ffffffef ffff88000c7ba000 ffff8801a1f66f40
      [194737.687509] Call Trace:
      [194737.688068]  [<ffffffff8142fa46>] dump_stack+0x4f/0x7b
      [194737.689027]  [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad
      [194737.690095]  [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb
      [194737.691198]  [<ffffffffa0383aa7>] ? __btrfs_abort_transaction+0x52/0x106 [btrfs]
      [194737.693789]  [<ffffffff81045f05>] warn_slowpath_fmt+0x46/0x48
      [194737.695065]  [<ffffffffa0383aa7>] __btrfs_abort_transaction+0x52/0x106 [btrfs]
      [194737.696806]  [<ffffffffa039a3bd>] btrfs_create_pending_block_groups+0x101/0x130 [btrfs]
      [194737.698683]  [<ffffffffa03aa433>] __btrfs_end_transaction+0x84/0x366 [btrfs]
      [194737.700329]  [<ffffffffa03aa725>] btrfs_end_transaction+0x10/0x12 [btrfs]
      [194737.701924]  [<ffffffffa0394b51>] btrfs_check_data_free_space+0x11f/0x27c [btrfs]
      [194737.703675]  [<ffffffffa03b8ba4>] __btrfs_buffered_write+0x16a/0x4c8 [btrfs]
      [194737.705417]  [<ffffffffa03bb502>] ? btrfs_file_write_iter+0x19a/0x431 [btrfs]
      [194737.707058]  [<ffffffffa03bb511>] ? btrfs_file_write_iter+0x1a9/0x431 [btrfs]
      [194737.708560]  [<ffffffffa03bb68d>] btrfs_file_write_iter+0x325/0x431 [btrfs]
      [194737.710673]  [<ffffffff81067d85>] ? get_parent_ip+0xe/0x3e
      [194737.712076]  [<ffffffff811534c3>] new_sync_write+0x7c/0xa0
      [194737.713293]  [<ffffffff81153b58>] vfs_write+0xb2/0x117
      [194737.714443]  [<ffffffff81154424>] SyS_pwrite64+0x64/0x82
      [194737.715646]  [<ffffffff81435b32>] system_call_fastpath+0x12/0x17
      [194737.717175] ---[ end trace f2d5dc04e56d7e48 ]---
      [194737.718170] BTRFS: error (device sdc) in btrfs_create_pending_block_groups:9524: errno=-17 Object already exists
      
      The -EEXIST failure comes from btrfs_finish_chunk_alloc(), called by
      btrfs_create_pending_block_groups(), when it attempts to insert a
      duplicated device extent item via btrfs_alloc_dev_extent().
      
      This issue was reproducible with fstests generic/038 running in a loop for
      several hours (it's very hard to hit) and using MOUNT_OPTIONS="-o discard".
      Applying Jeff's recent patch titled "btrfs: add missing discards when
      unpinning extents with -o discard" makes the issue much easier to reproduce
      (usually within 4 to 5 hours), since it pins chunks for longer periods of
      time when an unused block group is deleted by the cleaner kthread.
      
      Fix this by making sure that we never adjust the start offset to a lower
      value than it currently has.
      
      Fixes: 1b984508 ("Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole"
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c152b63e
    • S
      btrfs: use after free when closing devices · 2037a093
      Sasha Levin 提交于
      __btrfs_close_devices() would call_rcu to free the device, which is racy with
      list_for_each_entry() accessing the memory to retrieve the next device on the
      list.
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      2037a093
    • A
      Btrfs: check error before reporting missing device and add uuid · 33b97e43
      Anand Jain 提交于
      Report missing device when add is successful,
      otherwise it would exit as ENOMEM. And add uuid
      to the report.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      33b97e43
    • A
      Btrfs: log when missing device is created · 816fcebe
      Anand Jain 提交于
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      816fcebe
    • D
      btrfs: fix warnings after changes in btrfs_abort_transaction · 6d13f549
      David Sterba 提交于
      fs/btrfs/volumes.c: In function ‘btrfs_create_uuid_tree’:
      fs/btrfs/volumes.c:3909:3: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has type ‘long int’ [-Wformat=]
         btrfs_abort_transaction(trans, tree_root,
         ^
        CC [M]  fs/btrfs/ioctl.o
      fs/btrfs/ioctl.c: In function ‘create_subvol’:
      fs/btrfs/ioctl.c:549:3: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has type ‘long int’ [-Wformat=]
         btrfs_abort_transaction(trans, root, PTR_ERR(new_root));
      
      PTR_ERR returns long, but we're really using 'int' for the error codes
      everywhere so just set and use the local variable.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      6d13f549
    • F
      Btrfs: check pending chunks when shrinking fs to avoid corruption · 53e489bc
      Filipe Manana 提交于
      When we shrink the usable size of a device (its total_bytes), we go over
      all the device extent items in the device tree and attempt to relocate
      the chunk of any device extent that goes beyond the new usable size for
      the device. We do that after setting the new usable size (total_bytes) in
      the device object, so that all new allocations (and reallocations) don't
      use areas of the device that go beyond the new (shorter) size. However we
      were not considering that before setting the new size in the device,
      pending chunks might have been created that use device extents that go
      beyond the new size, and those device extents are not yet in the device
      tree after we search the device tree - they are still attached to the
      list of new block group for some ongoing transaction handle, and they are
      only added to the device tree when the transaction handle is ended (via
      btrfs_create_pending_block_groups()).
      
      So check for pending chunks with device extents that go beyond the new
      size and if any exists, commit the current transaction and repeat the
      search in the device tree.
      
      Not doing this it would mean we would return success to user space while
      still having extents that go beyond the new size, and later user space
      could override those locations on the device while the fs still references
      them, causing all sorts of corruption and unexpected events.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      53e489bc
  18. 27 5月, 2015 1 次提交