1. 09 9月, 2019 16 次提交
  2. 07 8月, 2019 1 次提交
    • F
      Btrfs: fix sysfs warning and missing raid sysfs directories · d7cd4dd9
      Filipe Manana 提交于
      In the 5.3 merge window, commit 7c7e3014 ("btrfs: sysfs: Replace
      default_attrs in ktypes with groups"), we started using the member
      "defaults_groups" for the kobject type "btrfs_raid_ktype". That leads
      to a series of warnings when running some test cases of fstests, such
      as btrfs/027, btrfs/124 and btrfs/176. The traces produced by those
      warnings are like the following:
      
        [116648.059212] kernfs: can not remove 'total_bytes', no directory
        [116648.060112] WARNING: CPU: 3 PID: 28500 at fs/kernfs/dir.c:1504 kernfs_remove_by_name_ns+0x75/0x80
        (...)
        [116648.066482] CPU: 3 PID: 28500 Comm: umount Tainted: G        W         5.3.0-rc3-btrfs-next-54 #1
        (...)
        [116648.069376] RIP: 0010:kernfs_remove_by_name_ns+0x75/0x80
        (...)
        [116648.072385] RSP: 0018:ffffabfd0090bd08 EFLAGS: 00010282
        [116648.073437] RAX: 0000000000000000 RBX: ffffffffc0c11998 RCX: 0000000000000000
        [116648.074201] RDX: ffff9fff603a7a00 RSI: ffff9fff603978a8 RDI: ffff9fff603978a8
        [116648.074956] RBP: ffffffffc0b9ca2f R08: 0000000000000000 R09: 0000000000000001
        [116648.075708] R10: ffff9ffe1f72e1c0 R11: 0000000000000000 R12: ffffffffc0b94120
        [116648.076434] R13: ffffffffb3d9b4e0 R14: 0000000000000000 R15: dead000000000100
        [116648.077143] FS:  00007f9cdc78a2c0(0000) GS:ffff9fff60380000(0000) knlGS:0000000000000000
        [116648.077852] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [116648.078546] CR2: 00007f9fc4747ab4 CR3: 00000005c7832003 CR4: 00000000003606e0
        [116648.079235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [116648.079907] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [116648.080585] Call Trace:
        [116648.081262]  remove_files+0x31/0x70
        [116648.081929]  sysfs_remove_group+0x38/0x80
        [116648.082596]  sysfs_remove_groups+0x34/0x70
        [116648.083258]  kobject_del+0x20/0x60
        [116648.083933]  btrfs_free_block_groups+0x405/0x430 [btrfs]
        [116648.084608]  close_ctree+0x19a/0x380 [btrfs]
        [116648.085278]  generic_shutdown_super+0x6c/0x110
        [116648.085951]  kill_anon_super+0xe/0x30
        [116648.086621]  btrfs_kill_super+0x12/0xa0 [btrfs]
        [116648.087289]  deactivate_locked_super+0x3a/0x70
        [116648.087956]  cleanup_mnt+0xb4/0x160
        [116648.088620]  task_work_run+0x7e/0xc0
        [116648.089285]  exit_to_usermode_loop+0xfa/0x100
        [116648.089933]  do_syscall_64+0x1cb/0x220
        [116648.090567]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [116648.091197] RIP: 0033:0x7f9cdc073b37
        (...)
        [116648.100046] ---[ end trace 22e24db328ccadf8 ]---
        [116648.100618] ------------[ cut here ]------------
        [116648.101175] kernfs: can not remove 'used_bytes', no directory
        [116648.101731] WARNING: CPU: 3 PID: 28500 at fs/kernfs/dir.c:1504 kernfs_remove_by_name_ns+0x75/0x80
        (...)
        [116648.105649] CPU: 3 PID: 28500 Comm: umount Tainted: G        W         5.3.0-rc3-btrfs-next-54 #1
        (...)
        [116648.107461] RIP: 0010:kernfs_remove_by_name_ns+0x75/0x80
        (...)
        [116648.109336] RSP: 0018:ffffabfd0090bd08 EFLAGS: 00010282
        [116648.109979] RAX: 0000000000000000 RBX: ffffffffc0c119a0 RCX: 0000000000000000
        [116648.110625] RDX: ffff9fff603a7a00 RSI: ffff9fff603978a8 RDI: ffff9fff603978a8
        [116648.111283] RBP: ffffffffc0b9ca41 R08: 0000000000000000 R09: 0000000000000001
        [116648.111940] R10: ffff9ffe1f72e1c0 R11: 0000000000000000 R12: ffffffffc0b94120
        [116648.112603] R13: ffffffffb3d9b4e0 R14: 0000000000000000 R15: dead000000000100
        [116648.113268] FS:  00007f9cdc78a2c0(0000) GS:ffff9fff60380000(0000) knlGS:0000000000000000
        [116648.113939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [116648.114607] CR2: 00007f9fc4747ab4 CR3: 00000005c7832003 CR4: 00000000003606e0
        [116648.115286] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [116648.115966] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [116648.116649] Call Trace:
        [116648.117326]  remove_files+0x31/0x70
        [116648.117997]  sysfs_remove_group+0x38/0x80
        [116648.118671]  sysfs_remove_groups+0x34/0x70
        [116648.119342]  kobject_del+0x20/0x60
        [116648.120022]  btrfs_free_block_groups+0x405/0x430 [btrfs]
        [116648.120707]  close_ctree+0x19a/0x380 [btrfs]
        [116648.121396]  generic_shutdown_super+0x6c/0x110
        [116648.122057]  kill_anon_super+0xe/0x30
        [116648.122702]  btrfs_kill_super+0x12/0xa0 [btrfs]
        [116648.123335]  deactivate_locked_super+0x3a/0x70
        [116648.123961]  cleanup_mnt+0xb4/0x160
        [116648.124586]  task_work_run+0x7e/0xc0
        [116648.125210]  exit_to_usermode_loop+0xfa/0x100
        [116648.125830]  do_syscall_64+0x1cb/0x220
        [116648.126463]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [116648.127080] RIP: 0033:0x7f9cdc073b37
        (...)
        [116648.135923] ---[ end trace 22e24db328ccadf9 ]---
      
      These happen because, during the unmount path, we call kobject_del() for
      raid kobjects that are not fully initialized, meaning that we set their
      ktype (as btrfs_raid_ktype) through link_block_group() but we didn't set
      their parent kobject, which is done through btrfs_add_raid_kobjects().
      
      We have this split raid kobject setup since commit 75cb379d
      ("btrfs: defer adding raid type kobject until after chunk relocation") in
      order to avoid triggering reclaim during contextes where we can not
      (either we are holding a transaction handle or some lock required by
      the transaction commit path), so that we do the calls to kobject_add(),
      which triggers GFP_KERNEL allocations, through btrfs_add_raid_kobjects()
      in contextes where it is safe to trigger reclaim. That change expected
      that a new raid kobject can only be created either when mounting the
      filesystem or after raid profile conversion through the relocation path.
      However, we can have new raid kobject created in other two cases at least:
      
      1) During device replace (or scrub) after adding a device a to the
         filesystem. The replace procedure (and scrub) do calls to
         btrfs_inc_block_group_ro() which can allocate a new block group
         with a new raid profile (because we now have more devices). This
         can be triggered by test cases btrfs/027 and btrfs/176.
      
      2) During a degraded mount trough any write path. This can be triggered
         by test case btrfs/124.
      
      Fixing this by adding extra calls to btrfs_add_raid_kobjects(), not only
      makes things more complex and fragile, can also introduce deadlocks with
      reclaim the following way:
      
      1) Calling btrfs_add_raid_kobjects() at btrfs_inc_block_group_ro() or
         anywhere in the replace/scrub path will cause a deadlock with reclaim
         because if reclaim happens and a transaction commit is triggered,
         the transaction commit path will block at btrfs_scrub_pause().
      
      2) During degraded mounts it is essentially impossible to figure out where
         to add extra calls to btrfs_add_raid_kobjects(), because allocation of
         a block group with a new raid profile can happen anywhere, which means
         we can't safely figure out which contextes are safe for reclaim, as
         we can either hold a transaction handle or some lock needed by the
         transaction commit path.
      
      So it is too complex and error prone to have this split setup of raid
      kobjects. So fix the issue by consolidating the setup of the kobjects in a
      single place, at link_block_group(), and setup a nofs context there in
      order to prevent reclaim being triggered by the memory allocations done
      through the call chain of kobject_add().
      
      Besides fixing the sysfs warnings during kobject_del(), this also ensures
      the sysfs directories for the new raid profiles end up created and visible
      to users (a bug that existed before the 5.3 commit 7c7e3014
      ("btrfs: sysfs: Replace default_attrs in ktypes with groups")).
      
      Fixes: 75cb379d ("btrfs: defer adding raid type kobject until after chunk relocation")
      Fixes: 7c7e3014 ("btrfs: sysfs: Replace default_attrs in ktypes with groups")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d7cd4dd9
  3. 17 7月, 2019 1 次提交
  4. 02 7月, 2019 5 次提交
    • J
      btrfs: move space_info to space-info.h · 8719aaae
      Josef Bacik 提交于
      Migrate the struct definition and the one helper that's in ctree.h into
      space-info.h
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8719aaae
    • N
      btrfs: Use btrfs_get_io_geometry appropriately · 89b798ad
      Nikolay Borisov 提交于
      Presently btrfs_map_block is used not only to do everything necessary to
      map a bio to the underlying allocation profile but it's also used to
      identify how much data could be written based on btrfs' stripe logic
      without actually submitting anything. This is achieved by passing NULL
      for 'bbio_ret' parameter.
      
      This patch refactors all callers that require just the mapping length
      by switching them to using btrfs_io_geometry instead of calling
      btrfs_map_block with a special NULL value for 'bbio_ret'. No functional
      change.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      89b798ad
    • N
      btrfs: Introduce btrfs_io_geometry infrastructure · 5f141126
      Nikolay Borisov 提交于
      Add a structure that holds various parameters for IO calculations and a
      helper that fills the values. This will help further refactoring and
      reduction of functions that in some way open-coded the calculations.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5f141126
    • F
      Btrfs: prevent send failures and crashes due to concurrent relocation · 9e967495
      Filipe Manana 提交于
      Send always operates on read-only trees and always expected that while it
      is in progress, nothing changes in those trees. Due to that expectation
      and the fact that send is a read-only operation, it operates on commit
      roots and does not hold transaction handles. However relocation can COW
      nodes and leafs from read-only trees, which can cause unexpected failures
      and crashes (hitting BUG_ONs). while send using a node/leaf, it gets
      COWed, the transaction used to COW it is committed, a new transaction
      starts, the extent previously used for that node/leaf gets allocated,
      possibly for another tree, and the respective extent buffer' content
      changes while send is still using it. When this happens send normally
      fails with EIO being returned to user space and messages like the
      following are found in dmesg/syslog:
      
        [ 3408.699121] BTRFS error (device sdc): parent transid verify failed on 58703872 wanted 250 found 253
        [ 3441.523123] BTRFS error (device sdc): did not find backref in send_root. inode=63211, offset=0, disk_byte=5222825984 found extent=5222825984
      
      Other times, less often, we hit a BUG_ON() because an extent buffer that
      send is using used to be a node, and while send is still using it, it
      got COWed and got reused as a leaf while send is still using, producing
      the following trace:
      
       [ 3478.466280] ------------[ cut here ]------------
       [ 3478.466282] kernel BUG at fs/btrfs/ctree.c:1806!
       [ 3478.466965] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
       [ 3478.467635] CPU: 0 PID: 2165 Comm: btrfs Not tainted 5.0.0-btrfs-next-46 #1
       [ 3478.468311] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
       [ 3478.469681] RIP: 0010:read_node_slot+0x122/0x130 [btrfs]
       (...)
       [ 3478.471758] RSP: 0018:ffffa437826bfaa0 EFLAGS: 00010246
       [ 3478.472457] RAX: ffff961416ed7000 RBX: 000000000000003d RCX: 0000000000000002
       [ 3478.473151] RDX: 000000000000003d RSI: ffff96141e387408 RDI: ffff961599b30000
       [ 3478.473837] RBP: ffffa437826bfb8e R08: 0000000000000001 R09: ffffa437826bfb8e
       [ 3478.474515] R10: ffffa437826bfa70 R11: 0000000000000000 R12: ffff9614385c8708
       [ 3478.475186] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       [ 3478.475840] FS:  00007f8e0e9cc8c0(0000) GS:ffff9615b6a00000(0000) knlGS:0000000000000000
       [ 3478.476489] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [ 3478.477127] CR2: 00007f98b67a056e CR3: 0000000005df6005 CR4: 00000000003606f0
       [ 3478.477762] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       [ 3478.478385] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       [ 3478.479003] Call Trace:
       [ 3478.479600]  ? do_raw_spin_unlock+0x49/0xc0
       [ 3478.480202]  tree_advance+0x173/0x1d0 [btrfs]
       [ 3478.480810]  btrfs_compare_trees+0x30c/0x690 [btrfs]
       [ 3478.481388]  ? process_extent+0x1280/0x1280 [btrfs]
       [ 3478.481954]  btrfs_ioctl_send+0x1037/0x1270 [btrfs]
       [ 3478.482510]  _btrfs_ioctl_send+0x80/0x110 [btrfs]
       [ 3478.483062]  btrfs_ioctl+0x13fe/0x3120 [btrfs]
       [ 3478.483581]  ? rq_clock_task+0x2e/0x60
       [ 3478.484086]  ? wake_up_new_task+0x1f3/0x370
       [ 3478.484582]  ? do_vfs_ioctl+0xa2/0x6f0
       [ 3478.485075]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
       [ 3478.485552]  do_vfs_ioctl+0xa2/0x6f0
       [ 3478.486016]  ? __fget+0x113/0x200
       [ 3478.486467]  ksys_ioctl+0x70/0x80
       [ 3478.486911]  __x64_sys_ioctl+0x16/0x20
       [ 3478.487337]  do_syscall_64+0x60/0x1b0
       [ 3478.487751]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
       [ 3478.488159] RIP: 0033:0x7f8e0d7d4dd7
       (...)
       [ 3478.489349] RSP: 002b:00007ffcf6fb4908 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
       [ 3478.489742] RAX: ffffffffffffffda RBX: 0000000000000105 RCX: 00007f8e0d7d4dd7
       [ 3478.490142] RDX: 00007ffcf6fb4990 RSI: 0000000040489426 RDI: 0000000000000005
       [ 3478.490548] RBP: 0000000000000005 R08: 00007f8e0d6f3700 R09: 00007f8e0d6f3700
       [ 3478.490953] R10: 00007f8e0d6f39d0 R11: 0000000000000202 R12: 0000000000000005
       [ 3478.491343] R13: 00005624e0780020 R14: 0000000000000000 R15: 0000000000000001
       (...)
       [ 3478.493352] ---[ end trace d5f537302be4f8c8 ]---
      
      Another possibility, much less likely to happen, is that send will not
      fail but the contents of the stream it produces may not be correct.
      
      To avoid this, do not allow send and relocation (balance) to run in
      parallel. In the long term the goal is to allow for both to be able to
      run concurrently without any problems, but that will take a significant
      effort in development and testing.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9e967495
    • D
      btrfs: add mask for all RAID1 types · c7369b3f
      David Sterba 提交于
      Preparatory patch for additional RAID1 profiles with more copies. The
      mask will contain 3-copy and 4-copy, most of the checks for plain RAID1
      work the same for the other profiles.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c7369b3f
  5. 01 7月, 2019 12 次提交
  6. 30 4月, 2019 5 次提交