1. 20 6月, 2017 4 次提交
  2. 16 5月, 2017 1 次提交
    • J
      btrfs: Make flush bios explicitely sync · 8d910125
      Jan Kara 提交于
      Commit b685d3d6 "block: treat REQ_FUA and REQ_PREFLUSH as
      synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
      definitions.  generic_make_request_checks() however strips REQ_FUA and
      REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
      write cache and thus write effectively becomes asynchronous which can
      lead to performance regressions
      
      Fix the problem by making sure all bios which are synchronous are
      properly marked with REQ_SYNC.
      
      CC: David Sterba <dsterba@suse.com>
      CC: linux-btrfs@vger.kernel.org
      Fixes: b685d3d6Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8d910125
  3. 05 5月, 2017 1 次提交
  4. 21 4月, 2017 1 次提交
  5. 18 4月, 2017 7 次提交
  6. 29 3月, 2017 1 次提交
    • G
      btrfs: Change qgroup_meta_rsv to 64bit · ce0dcee6
      Goldwyn Rodrigues 提交于
      Using an int value is causing qg->reserved to become negative and
      exclusive -EDQUOT to be reached prematurely.
      
      This affects exclusive qgroups only.
      
      TEST CASE:
      
      DEVICE=/dev/vdb
      MOUNTPOINT=/mnt
      SUBVOL=$MOUNTPOINT/tmp
      
      umount $SUBVOL
      umount $MOUNTPOINT
      
      mkfs.btrfs -f $DEVICE
      mount /dev/vdb $MOUNTPOINT
      btrfs quota enable $MOUNTPOINT
      btrfs subvol create $SUBVOL
      umount $MOUNTPOINT
      mount /dev/vdb $MOUNTPOINT
      mount -o subvol=tmp $DEVICE $SUBVOL
      btrfs qgroup limit -e 3G $SUBVOL
      
      btrfs quota rescan /mnt -w
      
      for i in `seq 1 44000`; do
        dd if=/dev/zero of=/mnt/tmp/test_$i bs=10k count=1
        if [[ $? > 0 ]]; then
           btrfs qgroup show -pcref $SUBVOL
           exit 1
        fi
      done
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      [ add reproducer to changelog ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ce0dcee6
  7. 28 2月, 2017 4 次提交
  8. 24 2月, 2017 2 次提交
    • F
      Btrfs: fix use-after-free due to wrong order of destroying work queues · a9b9477d
      Filipe Manana 提交于
      Before we destroy all work queues (and wait for their tasks to complete)
      we were destroying the work queues used for metadata I/O operations, which
      can result in a use-after-free problem because most tasks from all work
      queues do metadata I/O operations. For example, the tasks from the caching
      workers work queue (fs_info->caching_workers), which is destroyed only
      after the work queue used for metadata reads (fs_info->endio_meta_workers)
      is destroyed, do metadata reads, which result in attempts to queue tasks
      into the later work queue, triggering a use-after-free with a trace like
      the following:
      
      [23114.613543] general protection fault: 0000 [#1] PREEMPT SMP
      [23114.614442] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c btrfs xor raid6_pq dm_flakey dm_mod crc32c_generic
      acpi_cpufreq tpm_tis tpm_tis_core tpm ppdev parport_pc parport i2c_piix4 processor sg evdev i2c_core psmouse pcspkr serio_raw button loop autofs4 ext4 crc16
      jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod floppy [last unloaded: scsi_debug]
      [23114.616932] CPU: 9 PID: 4537 Comm: kworker/u32:8 Not tainted 4.9.0-rc7-btrfs-next-36+ #1
      [23114.616932] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
      [23114.616932] Workqueue: btrfs-cache btrfs_cache_helper [btrfs]
      [23114.616932] task: ffff880221d45780 task.stack: ffffc9000bc50000
      [23114.616932] RIP: 0010:[<ffffffffa037c1bf>]  [<ffffffffa037c1bf>] btrfs_queue_work+0x2c/0x190 [btrfs]
      [23114.616932] RSP: 0018:ffff88023f443d60  EFLAGS: 00010246
      [23114.616932] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000102
      [23114.616932] RDX: ffffffffa0419000 RSI: ffff88011df534f0 RDI: ffff880101f01c00
      [23114.616932] RBP: ffff88023f443d80 R08: 00000000000f7000 R09: 000000000000ffff
      [23114.616932] R10: ffff88023f443d48 R11: 0000000000001000 R12: ffff88011df534f0
      [23114.616932] R13: ffff880135963868 R14: 0000000000001000 R15: 0000000000001000
      [23114.616932] FS:  0000000000000000(0000) GS:ffff88023f440000(0000) knlGS:0000000000000000
      [23114.616932] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [23114.616932] CR2: 00007f0fb9f8e520 CR3: 0000000001a0b000 CR4: 00000000000006e0
      [23114.616932] Stack:
      [23114.616932]  ffff880101f01c00 ffff88011df534f0 ffff880135963868 0000000000001000
      [23114.616932]  ffff88023f443da0 ffffffffa03470af ffff880149b37200 ffff880135963868
      [23114.616932]  ffff88023f443db8 ffffffff8125293c ffff880149b37200 ffff88023f443de0
      [23114.616932] Call Trace:
      [23114.616932]  <IRQ> [23114.616932]  [<ffffffffa03470af>] end_workqueue_bio+0xd5/0xda [btrfs]
      [23114.616932]  [<ffffffff8125293c>] bio_endio+0x54/0x57
      [23114.616932]  [<ffffffffa0377929>] btrfs_end_bio+0xf7/0x106 [btrfs]
      [23114.616932]  [<ffffffff8125293c>] bio_endio+0x54/0x57
      [23114.616932]  [<ffffffff8125955f>] blk_update_request+0x21a/0x30f
      [23114.616932]  [<ffffffffa0022316>] scsi_end_request+0x31/0x182 [scsi_mod]
      [23114.616932]  [<ffffffffa00235fc>] scsi_io_completion+0x1ce/0x4c8 [scsi_mod]
      [23114.616932]  [<ffffffffa001ba9d>] scsi_finish_command+0x104/0x10d [scsi_mod]
      [23114.616932]  [<ffffffffa002311f>] scsi_softirq_done+0x101/0x10a [scsi_mod]
      [23114.616932]  [<ffffffff8125fbd9>] blk_done_softirq+0x82/0x8d
      [23114.616932]  [<ffffffff814c8a4b>] __do_softirq+0x1ab/0x412
      [23114.616932]  [<ffffffff8105b01d>] irq_exit+0x49/0x99
      [23114.616932]  [<ffffffff81035135>] smp_call_function_single_interrupt+0x24/0x26
      [23114.616932]  [<ffffffff814c7ec9>] call_function_single_interrupt+0x89/0x90
      [23114.616932]  <EOI> [23114.616932]  [<ffffffffa0023262>] ? scsi_request_fn+0x13a/0x2a1 [scsi_mod]
      [23114.616932]  [<ffffffff814c5966>] ? _raw_spin_unlock_irq+0x2c/0x4a
      [23114.616932]  [<ffffffff814c596c>] ? _raw_spin_unlock_irq+0x32/0x4a
      [23114.616932]  [<ffffffff814c5966>] ? _raw_spin_unlock_irq+0x2c/0x4a
      [23114.616932]  [<ffffffffa0023262>] scsi_request_fn+0x13a/0x2a1 [scsi_mod]
      [23114.616932]  [<ffffffff8125590e>] __blk_run_queue_uncond+0x22/0x2b
      [23114.616932]  [<ffffffff81255930>] __blk_run_queue+0x19/0x1b
      [23114.616932]  [<ffffffff8125ab01>] blk_queue_bio+0x268/0x282
      [23114.616932]  [<ffffffff81258f44>] generic_make_request+0xbd/0x160
      [23114.616932]  [<ffffffff812590e7>] submit_bio+0x100/0x11d
      [23114.616932]  [<ffffffff81298603>] ? __this_cpu_preempt_check+0x13/0x15
      [23114.616932]  [<ffffffff812a1805>] ? __percpu_counter_add+0x8e/0xa7
      [23114.616932]  [<ffffffffa03bfd47>] btrfsic_submit_bio+0x1a/0x1d [btrfs]
      [23114.616932]  [<ffffffffa0377db2>] btrfs_map_bio+0x1f4/0x26d [btrfs]
      [23114.616932]  [<ffffffffa0348a33>] btree_submit_bio_hook+0x74/0xbf [btrfs]
      [23114.616932]  [<ffffffffa03489bf>] ? btrfs_wq_submit_bio+0x160/0x160 [btrfs]
      [23114.616932]  [<ffffffffa03697a9>] submit_one_bio+0x6b/0x89 [btrfs]
      [23114.616932]  [<ffffffffa036f5be>] read_extent_buffer_pages+0x170/0x1ec [btrfs]
      [23114.616932]  [<ffffffffa03471fa>] ? free_root_pointers+0x64/0x64 [btrfs]
      [23114.616932]  [<ffffffffa0348adf>] readahead_tree_block+0x3f/0x4c [btrfs]
      [23114.616932]  [<ffffffffa032e115>] read_block_for_search.isra.20+0x1ce/0x23d [btrfs]
      [23114.616932]  [<ffffffffa032fab8>] btrfs_search_slot+0x65f/0x774 [btrfs]
      [23114.616932]  [<ffffffffa036eff1>] ? free_extent_buffer+0x73/0x7e [btrfs]
      [23114.616932]  [<ffffffffa0331ba4>] btrfs_next_old_leaf+0xa1/0x33c [btrfs]
      [23114.616932]  [<ffffffffa0331e4f>] btrfs_next_leaf+0x10/0x12 [btrfs]
      [23114.616932]  [<ffffffffa0336aa6>] caching_thread+0x22d/0x416 [btrfs]
      [23114.616932]  [<ffffffffa037bce9>] btrfs_scrubparity_helper+0x187/0x3b6 [btrfs]
      [23114.616932]  [<ffffffffa037c036>] btrfs_cache_helper+0xe/0x10 [btrfs]
      [23114.616932]  [<ffffffff8106cf96>] process_one_work+0x273/0x4e4
      [23114.616932]  [<ffffffff8106d6db>] worker_thread+0x1eb/0x2ca
      [23114.616932]  [<ffffffff8106d4f0>] ? rescuer_thread+0x2b6/0x2b6
      [23114.616932]  [<ffffffff81072a81>] kthread+0xd5/0xdd
      [23114.616932]  [<ffffffff810729ac>] ? __kthread_unpark+0x5a/0x5a
      [23114.616932]  [<ffffffff814c6257>] ret_from_fork+0x27/0x40
      [23114.616932] Code: 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 f4 48 8b 46 70 a8 04 74 09 48 8b 5f 08 48 85 db 75 03 48 8b 1f 49 89 5c 24 68 <83> 7b
      64 ff 74 04 f0 ff 43 58 49 83 7c 24 08 00 74 2c 4c 8d 6b
      [23114.616932] RIP  [<ffffffffa037c1bf>] btrfs_queue_work+0x2c/0x190 [btrfs]
      [23114.616932]  RSP <ffff88023f443d60>
      [23114.689493] ---[ end trace 6e48b6bc707ca34b ]---
      [23114.690166] Kernel panic - not syncing: Fatal exception in interrupt
      [23114.691283] Kernel Offset: disabled
      [23114.691918] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
      
      The following diagram shows the sequence of operations that lead to the
      use-after-free problem from the above trace:
      
              CPU 1                               CPU 2                                     CPU 3
      
                                             caching_thread()
       close_ctree()
         btrfs_stop_all_workers()
           btrfs_destroy_workqueue(
            fs_info->endio_meta_workers)
      
                                               btrfs_search_slot()
                                                read_block_for_search()
                                                 readahead_tree_block()
                                                  read_extent_buffer_pages()
                                                   submit_one_bio()
                                                    btree_submit_bio_hook()
                                                     btrfs_bio_wq_end_io()
                                                      --> sets the bio's
                                                          bi_end_io callback
                                                          to end_workqueue_bio()
                                                     --> bio is submitted
                                                                                        bio completes
                                                                                        and its bi_end_io callback
                                                                                        is invoked
                                                                                         --> end_workqueue_bio()
                                                                                             --> attempts to queue
                                                                                                 a task on fs_info->endio_meta_workers
      
           btrfs_destroy_workqueue(
            fs_info->caching_workers)
      
      So fix this by destroying the queues used for metadata I/O tasks only
      after destroying all the other queues.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      a9b9477d
    • F
      Btrfs: fix assertion failure when freeing block groups at close_ctree() · 5cdd7db6
      Filipe Manana 提交于
      At close_ctree() we free the block groups and then only after we wait for
      any running worker kthreads to finish and shutdown the workqueues. This
      behaviour is racy and it triggers an assertion failure when freeing block
      groups because while we are doing it we can have for example a block group
      caching kthread running, and in that case the block group's reference
      count can still be greater than 1 by the time we assert its reference count
      is 1, leading to an assertion failure:
      
      [19041.198004] assertion failed: atomic_read(&block_group->count) == 1, file: fs/btrfs/extent-tree.c, line: 9799
      [19041.200584] ------------[ cut here ]------------
      [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418!
      [19041.202830] invalid opcode: 0000 [#1] PREEMPT SMP
      [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev tpm_tis parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs]
      [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted 4.9.0-rc7-btrfs-next-36+ #1
      [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
      [19041.208082] task: ffff88015f028980 task.stack: ffffc9000ad34000
      [19041.208082] RIP: 0010:[<ffffffffa03e319e>]  [<ffffffffa03e319e>] assfail.constprop.41+0x1c/0x1e [btrfs]
      [19041.208082] RSP: 0018:ffffc9000ad37d60  EFLAGS: 00010286
      [19041.208082] RAX: 0000000000000061 RBX: ffff88015ecb4000 RCX: 0000000000000001
      [19041.208082] RDX: ffff88023f392fb8 RSI: ffffffff817ef7ba RDI: 00000000ffffffff
      [19041.208082] RBP: ffffc9000ad37d60 R08: 0000000000000001 R09: 0000000000000000
      [19041.208082] R10: ffffc9000ad37cb0 R11: ffffffff82f2b66d R12: ffff88023431d170
      [19041.208082] R13: ffff88015ecb40c0 R14: ffff88023431d000 R15: ffff88015ecb4100
      [19041.208082] FS:  00007f44f3d42840(0000) GS:ffff88023f380000(0000) knlGS:0000000000000000
      [19041.208082] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [19041.208082] CR2: 00007f65d623b000 CR3: 00000002166f2000 CR4: 00000000000006e0
      [19041.208082] Stack:
      [19041.208082]  ffffc9000ad37d98 ffffffffa035989f ffff88015ecb4000 ffff88015ecb5630
      [19041.208082]  ffff88014f6be000 0000000000000000 00007ffcf0ba6a10 ffffc9000ad37df8
      [19041.208082]  ffffffffa0368cd4 ffff88014e9658e0 ffffc9000ad37e08 ffffffff811a634d
      [19041.208082] Call Trace:
      [19041.208082]  [<ffffffffa035989f>] btrfs_free_block_groups+0x17f/0x392 [btrfs]
      [19041.208082]  [<ffffffffa0368cd4>] close_ctree+0x1c5/0x2e1 [btrfs]
      [19041.208082]  [<ffffffff811a634d>] ? evict_inodes+0x132/0x141
      [19041.208082]  [<ffffffffa034356d>] btrfs_put_super+0x15/0x17 [btrfs]
      [19041.208082]  [<ffffffff8118fc32>] generic_shutdown_super+0x6a/0xeb
      [19041.208082]  [<ffffffff8119004f>] kill_anon_super+0x12/0x1c
      [19041.208082]  [<ffffffffa0343370>] btrfs_kill_super+0x16/0x21 [btrfs]
      [19041.208082]  [<ffffffff8118fad1>] deactivate_locked_super+0x3b/0x68
      [19041.208082]  [<ffffffff8118fb34>] deactivate_super+0x36/0x39
      [19041.208082]  [<ffffffff811a9946>] cleanup_mnt+0x58/0x76
      [19041.208082]  [<ffffffff811a99a2>] __cleanup_mnt+0x12/0x14
      [19041.208082]  [<ffffffff81071573>] task_work_run+0x6f/0x95
      [19041.208082]  [<ffffffff81001897>] prepare_exit_to_usermode+0xa3/0xc1
      [19041.208082]  [<ffffffff81001a23>] syscall_return_slowpath+0x16e/0x1d2
      [19041.208082]  [<ffffffff814c607d>] entry_SYSCALL_64_fastpath+0xab/0xad
      [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 f1 48 c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 d4 e0 <0f> 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9
      [19041.208082] RIP  [<ffffffffa03e319e>] assfail.constprop.41+0x1c/0x1e [btrfs]
      [19041.208082]  RSP <ffffc9000ad37d60>
      [19041.279264] ---[ end trace 23330586f16f064d ]---
      
      This started happening as of kernel 4.8, since commit f3bca802
      ("Btrfs: add ASSERT for block group's memory leak") introduced these
      assertions.
      
      So fix this by freeing the block groups only after waiting for all
      worker kthreads to complete and shutdown the workqueues.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      5cdd7db6
  9. 17 2月, 2017 5 次提交
  10. 02 2月, 2017 1 次提交
  11. 06 12月, 2016 10 次提交
  12. 30 11月, 2016 3 次提交
    • W
      btrfs: improve delayed refs iterations · 1d57ee94
      Wang Xiaoguang 提交于
      This issue was found when I tried to delete a heavily reflinked file,
      when deleting such files, other transaction operation will not have a
      chance to make progress, for example, start_transaction() will blocked
      in wait_current_trans(root) for long time, sometimes it even triggers
      soft lockups, and the time taken to delete such heavily reflinked file
      is also very large, often hundreds of seconds. Using perf top, it reports
      that:
      
      PerfTop:    7416 irqs/sec  kernel:99.8%  exact:  0.0% [4000Hz cpu-clock],  (all, 4 CPUs)
      ---------------------------------------------------------------------------------------
          84.37%  [btrfs]             [k] __btrfs_run_delayed_refs.constprop.80
          11.02%  [kernel]            [k] delay_tsc
           0.79%  [kernel]            [k] _raw_spin_unlock_irq
           0.78%  [kernel]            [k] _raw_spin_unlock_irqrestore
           0.45%  [kernel]            [k] do_raw_spin_lock
           0.18%  [kernel]            [k] __slab_alloc
      It seems __btrfs_run_delayed_refs() took most cpu time, after some debug
      work, I found it's select_delayed_ref() causing this issue, for a delayed
      head, in our case, it'll be full of BTRFS_DROP_DELAYED_REF nodes, but
      select_delayed_ref() will firstly try to iterate node list to find
      BTRFS_ADD_DELAYED_REF nodes, obviously it's a disaster in this case, and
      waste much time.
      
      To fix this issue, we introduce a new ref_add_list in struct btrfs_delayed_ref_head,
      then in select_delayed_ref(), if this list is not empty, we can directly use
      nodes in this list. With this patch, it just took about 10~15 seconds to
      delte the same file. Now using perf top, it reports that:
      
      PerfTop:    2734 irqs/sec  kernel:99.5%  exact:  0.0% [4000Hz cpu-clock],  (all, 4 CPUs)
      ----------------------------------------------------------------------------------------
      
          20.74%  [kernel]          [k] _raw_spin_unlock_irqrestore
          16.33%  [kernel]          [k] __slab_alloc
           5.41%  [kernel]          [k] lock_acquired
           4.42%  [kernel]          [k] lock_acquire
           4.05%  [kernel]          [k] lock_release
           3.37%  [kernel]          [k] _raw_spin_unlock_irq
      
      For normal files, this patch also gives help, at least we do not need to
      iterate whole list to found BTRFS_ADD_DELAYED_REF nodes.
      Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1d57ee94
    • D
      btrfs: change btrfs_csum_final result param type to u8 · 0b5e3daf
      Domagoj Tršan 提交于
      csum member of struct btrfs_super_block has array type of u8. It makes
      sense that function btrfs_csum_final should be also declared to accept
      u8 *. I changed the declaration of method void btrfs_csum_final(u32 crc,
      char *result); to void btrfs_csum_final(u32 crc, u8 *result);
      Signed-off-by: NDomagoj Tršan <domagoj.trsan@gmail.com>
      [ changed cast to u8 at several call sites ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0b5e3daf
    • D
      btrfs: remove constant parameter to memset_extent_buffer and rename it · b159fa28
      David Sterba 提交于
      The only memset we do is to 0, so sink the parameter to the function and
      simplify all calls. Rename the function to reflect the behaviour.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b159fa28