1. 11 10月, 2015 1 次提交
  2. 10 9月, 2015 1 次提交
    • F
      Btrfs: remove unnecessary locking of cleaner_mutex to avoid deadlock · 85e0a0f2
      Filipe Manana 提交于
      After commmit e44163e1 ("btrfs: explictly delete unused block groups
      in close_ctree and ro-remount"), added in the 4.3 merge window, we have
      calls to btrfs_delete_unused_bgs() while holding the cleaner_mutex.
      This can cause a deadlock with a concurrent block group relocation (when
      a filesystem balance or shrink operation is in progress for example)
      because btrfs_delete_unused_bgs() locks delete_unused_bgs_mutex and the
      relocation path locks first delete_unused_bgs_mutex and then it locks
      cleaner_mutex, resulting in a classic ABBA deadlock:
      
               CPU 0                                        CPU 1
      
      lock fs_info->cleaner_mutex
      
                                                 __btrfs_balance() || btrfs_shrink_device()
                                                   lock fs_info->delete_unused_bgs_mutex
                                                   btrfs_relocate_chunk()
                                                     btrfs_relocate_block_group()
                                                       lock fs_info->cleaner_mutex
      btrfs_delete_unused_bgs()
        lock fs_info->delete_unused_bgs_mutex
      
      Fix this by not taking the cleaner_mutex before calling
      btrfs_delete_unused_bgs() because it's no longer needed after
      commit 67c5e7d4 ("Btrfs: fix race between balance and unused block
      group deletion"). The mutex fs_info->delete_unused_bgs_mutex, the
      spinlock fs_info->unused_bgs_lock and a block group's spinlock are
      enough to get correct serialization between tasks running relocation
      and unused block group deletion (as well as between multiple tasks
      concurrently calling btrfs_delete_unused_bgs()).
      
      This issue was discussed (in the mailing list) during the review of
      the patch titled "btrfs: explictly delete unused block groups in
      close_ctree and ro-remount" and it was agreed that acquiring the
      cleaner mutex had to be dropped after the patch titled
      "Btrfs: fix race between balance and unused block group deletion"
      got merged (both patches were submitted at about the same time, but
      one landed in kernel 4.2 and the other in the 4.3 merge window).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      85e0a0f2
  3. 01 9月, 2015 2 次提交
    • Z
      btrfs: Add raid56 support for updating · 943c6e99
      Zhao Lei 提交于
       num_tolerated_disk_barrier_failures in btrfs_balance
      
      Code for updating fs_info->num_tolerated_disk_barrier_failures in
      btrfs_balance() lacks raid56 support.
      
      Reason:
       Above code was wroten in 2012-08-01, together with
       btrfs_calc_num_tolerated_disk_barrier_failures()'s first version.
      
       Then, btrfs_calc_num_tolerated_disk_barrier_failures() got updated
       later to support raid56, but code in btrfs_balance() was not
       updated together.
      
      Fix:
       Merge above similar code to a common function:
       btrfs_get_num_tolerated_disk_barrier_failures()
       and make it support both case.
      
       It can fix this bug with a bonus of cleanup, and make these code
       never in above no-sync state from now on.
      Suggested-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      943c6e99
    • Z
      btrfs: Cleanup for btrfs_calc_num_tolerated_disk_barrier_failures · 2c458045
      Zhao Lei 提交于
      1: Use ARRAY_SIZE(types) to replace a static-value variant:
         int num_types = 4;
      
      2: Use 'continue' on condition to reduce one level tab
         if (!XXX) {
             code;
             ...
         }
         ->
         if (XXX)
             continue;
         code;
         ...
      
      3: Put setting 'num_tolerated_disk_barrier_failures = 2' to
         (num_tolerated_disk_barrier_failures > 2) condition to make
         make logic neat.
         if (num_tolerated_disk_barrier_failures > 0 && XXX)
             num_tolerated_disk_barrier_failures = 0;
         else if (num_tolerated_disk_barrier_failures > 1) {
             if (XXX)
                 num_tolerated_disk_barrier_failures = 1;
             else if (XXX)
                 num_tolerated_disk_barrier_failures = 2;
         ->
         if (num_tolerated_disk_barrier_failures > 0 && XXX)
             num_tolerated_disk_barrier_failures = 0;
         if (num_tolerated_disk_barrier_failures > 1 && XXX)
             num_tolerated_disk_barrier_failures = ;
         if (num_tolerated_disk_barrier_failures > 2 && XXX)
             num_tolerated_disk_barrier_failures = 2;
      
      4: Remove comment of:
         num_mirrors - 1: if RAID1 or RAID10 is configured and more
         than 2 mirrors are used.
         which is not fit with code.
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      2c458045
  4. 09 8月, 2015 3 次提交
    • C
      Btrfs: add support for blkio controllers · da2f0f74
      Chris Mason 提交于
      This attaches accounting information to bios as we submit them so the
      new blkio controllers can throttle on btrfs filesystems.
      
      Not much is required, we're just associating bios with blkcgs during clone,
      calling wbc_init_bio()/wbc_account_io() during writepages submission,
      and attaching the bios to the current context during direct IO.
      
      Finally if we are splitting bios during btrfs_map_bio, this attaches
      accounting information to the split.
      
      The end result is able to throttle nicely on single disk filesystems.  A
      little more work is required for multi-device filesystems.
      Signed-off-by: NChris Mason <clm@fb.com>
      da2f0f74
    • B
      Btrfs: remove unused mutex from struct 'btrfs_fs_info' · a4027a20
      Byongho Lee 提交于
      The code using 'ordered_extent_flush_mutex' mutex has removed by below
      commit.
       - 8d875f95
         btrfs: disable strict file flushes for renames and truncates
      But the mutex still lives in struct 'btrfs_fs_info'.
      
      So, this patch removes the mutex from struct 'btrfs_fs_info' and its
      initialization code.
      Signed-off-by: NByongho Lee <bhlee.kernel@gmail.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a4027a20
    • Z
      btrfs: Show detail information when mount failed on missing devices · 78fa1770
      Zhao Lei 提交于
      When mount failed because missing device, we can see following
      dmesg:
       [ 1060.267743] BTRFS: too many missing devices, writeable mount is not allowed
       [ 1060.273158] BTRFS: open_ctree failed
      
      This patch add missing_device_number and tolerated_missing_device_number
      to above output, to let user know what really happened, and helps
      bug-report and debug.
      
      dmesg after patch:
       [  127.050367] BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not allowed
       [  127.056099] BTRFS: open_ctree failed
      
      Changelog v1->v2:
      1: Changed to more clear description, suggested-by:
         Anand Jain <anand.jain@oracle.com>
      Suggested-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      78fa1770
  5. 29 7月, 2015 2 次提交
    • J
      btrfs: explictly delete unused block groups in close_ctree and ro-remount · e44163e1
      Jeff Mahoney 提交于
      The cleaner thread may already be sleeping by the time we enter
      close_ctree.  If that's the case, we'll skip removing any unused
      block groups queued for removal, even during a normal umount.
      They'll be cleaned up automatically at next mount, but users
      expect a umount to be a clean synchronization point, especially
      when used on thin-provisioned storage with -odiscard.  We also
      explicitly remove unused block groups in the ro-remount path
      for the same reason.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Tested-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e44163e1
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  6. 23 7月, 2015 1 次提交
    • Z
      btrfs: Avoid NULL pointer dereference of free_extent_buffer when read_tree_block() fail · 95ab1f64
      Zhao Lei 提交于
      When read_tree_block() failed, we can see following dmesg:
       [  134.371389] BUG: unable to handle kernel NULL pointer dereference at 0000000000000063
       [  134.372236] IP: [<ffffffff813a4a51>] free_extent_buffer+0x21/0x90
       [  134.372236] PGD 0
       [  134.372236] Oops: 0000 [#1] SMP
       [  134.372236] Modules linked in:
       [  134.372236] CPU: 0 PID: 2289 Comm: mount Not tainted 4.2.0-rc1_HEAD_c65b99f0_+ #115
       [  134.372236] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
       [  134.372236] task: ffff88003b6e1a00 ti: ffff880011e60000 task.ti: ffff880011e60000
       [  134.372236] RIP: 0010:[<ffffffff813a4a51>]  [<ffffffff813a4a51>] free_extent_buffer+0x21/0x90
       ...
       [  134.372236] Call Trace:
       [  134.372236]  [<ffffffff81379aa1>] free_root_extent_buffers+0x91/0xb0
       [  134.372236]  [<ffffffff81379c3d>] free_root_pointers+0x17d/0x190
       [  134.372236]  [<ffffffff813801b0>] open_ctree+0x1ca0/0x25b0
       [  134.372236]  [<ffffffff8144d017>] ? disk_name+0x97/0xb0
       [  134.372236]  [<ffffffff813558aa>] btrfs_mount+0x8fa/0xab0
       ...
      
      Reason:
       read_tree_block() changed to return error number on fail,
       and this value(not NULL) is set to tree_root->node, then subsequent
       code will run to:
        free_root_pointers()
        ->free_root_extent_buffers()
        ->free_extent_buffer()
        ->atomic_read((extent_buffer *)(-E_XXX)->refs);
       and trigger above error.
      
      Fix:
       Set tree_root->node to NULL on fail to make error_handle code
       happy.
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      95ab1f64
  7. 01 7月, 2015 2 次提交
    • F
      Btrfs: fix crash on close_ctree() if cleaner starts new transaction · da288d28
      Filipe Manana 提交于
      Often when running fstests btrfs/079 I was running into the following
      trace during umount on one of my qemu/kvm test vms:
      
      [ 8245.682441] WARNING: CPU: 8 PID: 25064 at fs/btrfs/extent-tree.c:138 btrfs_put_block_group+0x51/0x69 [btrfs]()
      [ 8245.685039] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 acpi_cpufreq processor psmouse i2c_core thermal_sys parport evdev serio_raw button pcspkr microcode ext4 crc16 jbd2 mbcache sg sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring scsi_mod virtio e1000 [last unloaded: btrfs]
      [ 8245.693860] CPU: 8 PID: 25064 Comm: umount Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
      [ 8245.695081] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
      [ 8245.697583]  0000000000000009 ffff88020d047ce8 ffffffff8145eec7 ffffffff81095dce
      [ 8245.699234]  0000000000000000 ffff88020d047d28 ffffffff8104b399 0000000000000028
      [ 8245.700995]  ffffffffa04db07b ffff8801c6036c00 ffff8801c6036d68 ffff880202eb40b0
      [ 8245.702510] Call Trace:
      [ 8245.703006]  [<ffffffff8145eec7>] dump_stack+0x4f/0x7b
      [ 8245.705393]  [<ffffffff81095dce>] ? console_unlock+0x356/0x3a2
      [ 8245.706569]  [<ffffffff8104b399>] warn_slowpath_common+0xa1/0xbb
      [ 8245.707747]  [<ffffffffa04db07b>] ? btrfs_put_block_group+0x51/0x69 [btrfs]
      [ 8245.709101]  [<ffffffff8104b456>] warn_slowpath_null+0x1a/0x1c
      [ 8245.710274]  [<ffffffffa04db07b>] btrfs_put_block_group+0x51/0x69 [btrfs]
      [ 8245.711823]  [<ffffffffa04e3473>] btrfs_free_block_groups+0x145/0x322 [btrfs]
      [ 8245.713251]  [<ffffffffa04ef31a>] close_ctree+0x1ef/0x325 [btrfs]
      [ 8245.714448]  [<ffffffff8117d26e>] ? evict_inodes+0xdc/0xeb
      [ 8245.715539]  [<ffffffffa04cb3ad>] btrfs_put_super+0x19/0x1b [btrfs]
      [ 8245.716835]  [<ffffffff81167607>] generic_shutdown_super+0x73/0xef
      [ 8245.718015]  [<ffffffff81167a3a>] kill_anon_super+0x13/0x1e
      [ 8245.719101]  [<ffffffffa04cb1b6>] btrfs_kill_super+0x17/0x23 [btrfs]
      [ 8245.720316]  [<ffffffff81167544>] deactivate_locked_super+0x3b/0x68
      [ 8245.721517]  [<ffffffff81167dd6>] deactivate_super+0x3f/0x43
      [ 8245.722581]  [<ffffffff8117fbb9>] cleanup_mnt+0x59/0x78
      [ 8245.723538]  [<ffffffff8117fc18>] __cleanup_mnt+0x12/0x14
      [ 8245.724572]  [<ffffffff81065371>] task_work_run+0x8f/0xbc
      [ 8245.725598]  [<ffffffff810028fb>] do_notify_resume+0x45/0x53
      [ 8245.726892]  [<ffffffff814651ac>] int_signal+0x12/0x17
      [ 8245.737887] ---[ end trace a01d038397e99b92 ]---
      [ 8245.769363] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [ 8245.770737] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 acpi_cpufreq processor psmouse i2c_core thermal_sys parport evdev serio_raw button pcspkr microcode ext4 crc16 jbd2 mbcache sg sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring scsi_mod virtio e1000 [last unloaded: btrfs]
      [ 8245.772641] CPU: 2 PID: 25064 Comm: umount Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
      [ 8245.772641] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
      [ 8245.772641] task: ffff880013005810 ti: ffff88020d044000 task.ti: ffff88020d044000
      [ 8245.772641] RIP: 0010:[<ffffffffa051c8e6>]  [<ffffffffa051c8e6>] btrfs_queue_work+0x2c/0x14d [btrfs]
      [ 8245.772641] RSP: 0018:ffff88020d0478b8  EFLAGS: 00010202
      [ 8245.772641] RAX: 0000000000000004 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffffa0581488
      [ 8245.772641] RDX: 0000000000000000 RSI: ffff880194b7bf48 RDI: ffff880144b6a7a0
      [ 8245.772641] RBP: ffff88020d0478d8 R08: 0000000000000000 R09: 000000000000ffff
      [ 8245.772641] R10: 0000000000000004 R11: 0000000000000005 R12: ffff880194b7bf48
      [ 8245.772641] R13: ffff880194b7bf48 R14: 0000000000000410 R15: 0000000000000000
      [ 8245.772641] FS:  00007f991e77d840(0000) GS:ffff88023e280000(0000) knlGS:0000000000000000
      [ 8245.772641] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 8245.772641] CR2: 00007fbbd325ee68 CR3: 000000021de8e000 CR4: 00000000000006e0
      [ 8245.772641] Stack:
      [ 8245.772641]  ffff880194b7bf00 ffff880202eb4000 ffff880194b7bf48 0000000000000410
      [ 8245.772641]  ffff88020d047958 ffffffffa04ec6d5 ffff8801629b2ee8 0000000082987570
      [ 8245.772641]  0000000000a5813f 0000000000000001 ffff880013006100 0000000000000002
      [ 8245.772641] Call Trace:
      [ 8245.772641]  [<ffffffffa04ec6d5>] btrfs_wq_submit_bio+0xe1/0x17b [btrfs]
      [ 8245.772641]  [<ffffffff81086bff>] ? check_irq_usage+0x76/0x87
      [ 8245.772641]  [<ffffffffa04ec825>] btree_submit_bio_hook+0xb6/0xd9 [btrfs]
      [ 8245.772641]  [<ffffffffa04ebb7c>] ? btree_csum_one_bio+0xad/0xad [btrfs]
      [ 8245.772641]  [<ffffffffa04eb1a6>] ? btree_io_failed_hook+0x5e/0x5e [btrfs]
      [ 8245.772641]  [<ffffffffa050a6e7>] submit_one_bio+0x8c/0xc7 [btrfs]
      [ 8245.772641]  [<ffffffffa050d75b>] submit_extent_page.isra.18+0x9d/0x186 [btrfs]
      [ 8245.772641]  [<ffffffffa050d95b>] write_one_eb+0x117/0x1ae [btrfs]
      [ 8245.772641]  [<ffffffffa050a79b>] ? end_extent_buffer_writeback+0x21/0x21 [btrfs]
      [ 8245.772641]  [<ffffffffa0510510>] btree_write_cache_pages+0x2ab/0x385 [btrfs]
      [ 8245.772641]  [<ffffffffa04eb2b8>] btree_writepages+0x23/0x5c [btrfs]
      [ 8245.772641]  [<ffffffff8111c661>] do_writepages+0x23/0x2c
      [ 8245.772641]  [<ffffffff81189cd4>] __writeback_single_inode+0xda/0x5bd
      [ 8245.772641]  [<ffffffff8118aa60>] ? writeback_single_inode+0x2b/0x173
      [ 8245.772641]  [<ffffffff8118aafd>] writeback_single_inode+0xc8/0x173
      [ 8245.772641]  [<ffffffff8118ac95>] write_inode_now+0x8a/0x95
      [ 8245.772641]  [<ffffffff81247bf0>] ? _atomic_dec_and_lock+0x30/0x4e
      [ 8245.772641]  [<ffffffff8117cc5e>] iput+0x17d/0x26a
      [ 8245.772641]  [<ffffffffa04ef355>] close_ctree+0x22a/0x325 [btrfs]
      [ 8245.772641]  [<ffffffff8117d26e>] ? evict_inodes+0xdc/0xeb
      [ 8245.772641]  [<ffffffffa04cb3ad>] btrfs_put_super+0x19/0x1b [btrfs]
      [ 8245.772641]  [<ffffffff81167607>] generic_shutdown_super+0x73/0xef
      [ 8245.772641]  [<ffffffff81167a3a>] kill_anon_super+0x13/0x1e
      [ 8245.772641]  [<ffffffffa04cb1b6>] btrfs_kill_super+0x17/0x23 [btrfs]
      [ 8245.772641]  [<ffffffff81167544>] deactivate_locked_super+0x3b/0x68
      [ 8245.772641]  [<ffffffff81167dd6>] deactivate_super+0x3f/0x43
      [ 8245.772641]  [<ffffffff8117fbb9>] cleanup_mnt+0x59/0x78
      [ 8245.772641]  [<ffffffff8117fc18>] __cleanup_mnt+0x12/0x14
      [ 8245.772641]  [<ffffffff81065371>] task_work_run+0x8f/0xbc
      [ 8245.772641]  [<ffffffff810028fb>] do_notify_resume+0x45/0x53
      [ 8245.772641]  [<ffffffff814651ac>] int_signal+0x12/0x17
      [ 8245.772641] Code: 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 f4 48 8b 46 70 a8 04 74 09 48 8b 5f 08 48 85 db 75 03 48 8b 1f 49 89 5c 24 68 <83> 7b 5c ff 74 04 f0 ff 43 50 49 83 7c 24 08 00 74 2c 4c 8d 6b
      [ 8245.772641] RIP  [<ffffffffa051c8e6>] btrfs_queue_work+0x2c/0x14d [btrfs]
      [ 8245.772641]  RSP <ffff88020d0478b8>
      [ 8245.845040] ---[ end trace a01d038397e99b93 ]---
      
      For logical reasons such as the phase of the moon, this happened more
      often with "-o inode_cache" than without any mount options.
      
      After some debugging it turned out to be simple to understand what was
      happening:
      
      1) close_ctree() is called;
      
      2) It then stops the transaction kthread, which commits the current
         transaction;
      
      3) It asks the cleaner kthread to stop, which is currently running
         btrfs_delete_unused_bgs();
      
      4) btrfs_delete_unused_bgs() finds an unused block group, starts a new
         transaction, deletes the block group, which implies COWing some
         tree nodes and leafs and dirtying their respective pages, and then
         finally it ends the transaction it started, without committing it;
      
      5) The cleaner kthread stops;
      
      6) close_ctree() releases (from memory) the block group objects, which
         produces the warning in the trace pasted above;
      
      7) Then it invalidates all pages of the btree inode, by calling
         invalidate_inode_pages2(), which waits for any pages under writeback,
         and releases any non-dirty pages;
      
      8) All work queues are destroyed (waiting first for their current tasks
         to finish execution);
      
      9) A final iput() is called against the btree inode;
      
      10) This iput triggers a writeback of the btree inode because it still
          has dirty pages;
      
      11) This starts the whole chain of callbacks for the btree inode until
          it eventually reaches btrfs_wq_submit_bio() where it leads to a
          NULL pointer dereference because the work queues were already
          destroyed.
      
      Fix this by making the cleaner commit any transaction that it started
      after the transaction kthread was stopped.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      da288d28
    • F
      Btrfs: fix race between balance and unused block group deletion · 67c5e7d4
      Filipe Manana 提交于
      We have a race between deleting an unused block group and balancing the
      same block group that leads to an assertion failure/BUG(), producing the
      following trace:
      
      [181631.208236] BTRFS: assertion failed: 0, file: fs/btrfs/volumes.c, line: 2622
      [181631.220591] ------------[ cut here ]------------
      [181631.222959] kernel BUG at fs/btrfs/ctree.h:4062!
      [181631.223932] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [181631.224566] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse acpi_cpufreq parpor$
      [181631.224566] CPU: 8 PID: 17451 Comm: btrfs Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
      [181631.224566] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
      [181631.224566] task: ffff880127e09590 ti: ffff8800b5824000 task.ti: ffff8800b5824000
      [181631.224566] RIP: 0010:[<ffffffffa03f19f6>]  [<ffffffffa03f19f6>] assfail.constprop.50+0x1e/0x20 [btrfs]
      [181631.224566] RSP: 0018:ffff8800b5827ae8  EFLAGS: 00010246
      [181631.224566] RAX: 0000000000000040 RBX: ffff8800109fc218 RCX: ffffffff81095dce
      [181631.224566] RDX: 0000000000005124 RSI: ffffffff81464819 RDI: 00000000ffffffff
      [181631.224566] RBP: ffff8800b5827ae8 R08: 0000000000000001 R09: 0000000000000000
      [181631.224566] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800109fc200
      [181631.224566] R13: ffff880020095000 R14: ffff8800b1a13f38 R15: ffff880020095000
      [181631.224566] FS:  00007f70ca0b0c80(0000) GS:ffff88013ec00000(0000) knlGS:0000000000000000
      [181631.224566] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [181631.224566] CR2: 00007f2872ab6e68 CR3: 00000000a717c000 CR4: 00000000000006e0
      [181631.224566] Stack:
      [181631.224566]  ffff8800b5827ba8 ffffffffa03f3916 ffff8800b5827b38 ffffffffa03d080e
      [181631.224566]  ffffffffa03d1423 ffff880020095000 ffff88001233c000 0000000000000001
      [181631.224566]  ffff880020095000 ffff8800b1a13f38 0000000a69c00000 0000000000000000
      [181631.224566] Call Trace:
      [181631.224566]  [<ffffffffa03f3916>] btrfs_remove_chunk+0xa4/0x6bb [btrfs]
      [181631.224566]  [<ffffffffa03d080e>] ? join_transaction.isra.8+0xb9/0x3ba [btrfs]
      [181631.224566]  [<ffffffffa03d1423>] ? wait_current_trans.isra.13+0x22/0xfc [btrfs]
      [181631.224566]  [<ffffffffa03f3fbc>] btrfs_relocate_chunk.isra.29+0x8f/0xa7 [btrfs]
      [181631.224566]  [<ffffffffa03f54df>] btrfs_balance+0xaa4/0xc52 [btrfs]
      [181631.224566]  [<ffffffffa03fd388>] btrfs_ioctl_balance+0x23f/0x2b0 [btrfs]
      [181631.224566]  [<ffffffff810872f9>] ? trace_hardirqs_on+0xd/0xf
      [181631.224566]  [<ffffffffa04019a3>] btrfs_ioctl+0xfe2/0x2220 [btrfs]
      [181631.224566]  [<ffffffff812603ed>] ? __this_cpu_preempt_check+0x13/0x15
      [181631.224566]  [<ffffffff81084669>] ? arch_local_irq_save+0x9/0xc
      [181631.224566]  [<ffffffff81138def>] ? handle_mm_fault+0x834/0xcd2
      [181631.224566]  [<ffffffff81138def>] ? handle_mm_fault+0x834/0xcd2
      [181631.224566]  [<ffffffff8103e48c>] ? __do_page_fault+0x211/0x424
      [181631.224566]  [<ffffffff811755e6>] do_vfs_ioctl+0x3c6/0x479
      (...)
      
      The sequence of steps leading to this are:
      
                 CPU 0                                         CPU 1
      
        btrfs_balance()
          btrfs_relocate_chunk()
      
            btrfs_relocate_block_group(bg X)
              btrfs_lookup_block_group(bg X)
      
                                                     cleaner_kthread
                                                        locks fs_info->cleaner_mutex
      
                                                        btrfs_delete_unused_bgs()
                                                          finds bg X, which became
                                                          unused in the previous
                                                          transaction
      
                                                          checks bg X ->ro == 0,
                                                          so it proceeds
              sets bg X ->ro to 1
              (btrfs_set_block_group_ro(bg X))
      
              blocks on fs_info->cleaner_mutex
                                                          btrfs_remove_chunk(bg X)
                                                        unlocks fs_info->cleaner_mutex
      
              acquires fs_info->cleaner_mutex
              relocate_block_group()
                --> does nothing, no extents found in
                    the extent tree from bg X
              unlocks fs_info->cleaner_mutex
      
            btrfs_relocate_block_group(bg X) returns
      
          btrfs_remove_chunk(bg X)
             extent map not found
                --> ASSERT(0)
      
      Fix this by using a new mutex to make sure these 2 operations, block
      group relocation and removal, are serialized.
      
      This issue is reproducible by running fstests generic/038 (which stresses
      chunk allocation and automatic removal of unused block groups) together
      with the following balance loop:
      
          while true; do btrfs balance start -dusage=0 <mountpoint> ; done
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      67c5e7d4
  8. 13 6月, 2015 1 次提交
  9. 11 6月, 2015 1 次提交
  10. 03 6月, 2015 1 次提交
  11. 27 5月, 2015 2 次提交
  12. 22 5月, 2015 1 次提交
    • M
      block: remove management of bi_remaining when restoring original bi_end_io · 326e1dbb
      Mike Snitzer 提交于
      Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
      non-chains") regressed all existing callers that followed this pattern:
       1) saving a bio's original bi_end_io
       2) wiring up an intermediate bi_end_io
       3) restoring the original bi_end_io from intermediate bi_end_io
       4) calling bio_endio() to execute the restored original bi_end_io
      
      The regression was due to BIO_CHAIN only ever getting set if
      bio_inc_remaining() is called.  For the above pattern it isn't set until
      step 3 above (step 2 would've needed to establish BIO_CHAIN).  As such
      the first bio_endio(), in step 2 above, never decremented __bi_remaining
      before calling the intermediate bi_end_io -- leaving __bi_remaining with
      the value 1 instead of 0.  When bio_inc_remaining() occurred during step
      3 it brought it to a value of 2.  When the second bio_endio() was
      called, in step 4 above, it should've called the original bi_end_io but
      it didn't because there was an extra reference that wasn't dropped (due
      to atomic operations being optimized away since BIO_CHAIN wasn't set
      upfront).
      
      Fix this issue by removing the __bi_remaining management complexity for
      all callers that use the above pattern -- bio_chain() is the only
      interface that _needs_ to be concerned with __bi_remaining.  For the
      above pattern callers just expect the bi_end_io they set to get called!
      Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
      that aren't associated with the bio_chain() interface.
      
      Also, the bio_inc_remaining() interface has been moved local to bio.c.
      
      Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      326e1dbb
  13. 19 5月, 2015 1 次提交
  14. 13 4月, 2015 1 次提交
    • Z
      btrfs: Fix NO_SPACE bug caused by delayed-iput · d7c15171
      Zhao Lei 提交于
      Steps to reproduce:
        while true; do
          dd if=/dev/zero of=/btrfs_dir/file count=[fs_size * 75%]
          rm /btrfs_dir/file
          sync
        done
      
        And we'll see dd failed because btrfs return NO_SPACE.
      
      Reason:
        Normally, btrfs_commit_transaction() call btrfs_run_delayed_iputs()
        in end to free fs space for next write, but sometimes it hadn't
        done work on time, because btrfs-cleaner thread get delayed-iputs
        from list before, but do iput() after next write.
      
        This is log:
        [ 2569.050776] comm=btrfs-cleaner func=btrfs_evict_inode() begin
      
        [ 2569.084280] comm=sync func=btrfs_commit_transaction() call btrfs_run_delayed_iputs()
        [ 2569.085418] comm=sync func=btrfs_commit_transaction() done btrfs_run_delayed_iputs()
        [ 2569.087554] comm=sync func=btrfs_commit_transaction() end
      
        [ 2569.191081] comm=dd begin
        [ 2569.790112] comm=dd func=__btrfs_buffered_write() ret=-28
      
        [ 2569.847479] comm=btrfs-cleaner func=add_pinned_bytes() 0 + 32677888 = 32677888
        [ 2569.849530] comm=btrfs-cleaner func=add_pinned_bytes() 32677888 + 23834624 = 56512512
        ...
        [ 2569.903893] comm=btrfs-cleaner func=add_pinned_bytes() 943976448 + 21762048 = 965738496
        [ 2569.908270] comm=btrfs-cleaner func=btrfs_evict_inode() end
      
      Fix:
        Make btrfs_commit_transaction() wait current running btrfs-cleaner's
        delayed-iputs() done in end.
      
      Test:
        Use script similar to above(more complex),
        before patch:
          7 failed in 100 * 20 loop.
        after patch:
          0 failed in 100 * 20 loop.
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d7c15171
  15. 11 4月, 2015 2 次提交
    • C
      Btrfs: fix use after free when close_ctree frees the orphan_rsv · cdfb080e
      Chris Mason 提交于
      Near the end of close_ctree, we're calling btrfs_free_block_rsv
      to free up the orphan rsv.  The problem is this call updates the
      space_info, which has already been freed.
      
      This adds a new __ function that directly calls kfree instead of trying
      to update the space infos.
      Signed-off-by: NChris Mason <clm@fb.com>
      cdfb080e
    • C
      Btrfs: allow block group cache writeout outside critical section in commit · 1bbc621e
      Chris Mason 提交于
      We loop through all of the dirty block groups during commit and write
      the free space cache.  In order to make sure the cache is currect, we do
      this while no other writers are allowed in the commit.
      
      If a large number of block groups are dirty, this can introduce long
      stalls during the final stages of the commit, which can block new procs
      trying to change the filesystem.
      
      This commit changes the block group cache writeout to take appropriate
      locks and allow it to run earlier in the commit.  We'll still have to
      redo some of the block groups, but it means we can get most of the work
      out of the way without blocking the entire FS.
      Signed-off-by: NChris Mason <clm@fb.com>
      1bbc621e
  16. 27 3月, 2015 1 次提交
  17. 14 3月, 2015 1 次提交
  18. 04 3月, 2015 2 次提交
  19. 21 2月, 2015 1 次提交
  20. 17 2月, 2015 13 次提交