1. 01 7月, 2015 1 次提交
  2. 10 6月, 2015 1 次提交
    • Z
      btrfs: Fix lockdep warning of wr_ctx->wr_lock in scrub_free_wr_ctx() · 20b2e302
      Zhao Lei 提交于
      lockdep report following warning in test:
       [25176.843958] =================================
       [25176.844519] [ INFO: inconsistent lock state ]
       [25176.845047] 4.1.0-rc3 #22 Tainted: G        W
       [25176.845591] ---------------------------------
       [25176.846153] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
       [25176.846713] fsstress/26661 [HC0[0]:SC1[1]:HE1:SE0] takes:
       [25176.847246]  (&wr_ctx->wr_lock){+.?...}, at: [<ffffffffa04cdc6d>] scrub_free_ctx+0x2d/0xf0 [btrfs]
       [25176.847838] {SOFTIRQ-ON-W} state was registered at:
       [25176.848396]   [<ffffffff810bf460>] __lock_acquire+0x6a0/0xe10
       [25176.848955]   [<ffffffff810bfd1e>] lock_acquire+0xce/0x2c0
       [25176.849491]   [<ffffffff816489af>] mutex_lock_nested+0x7f/0x410
       [25176.850029]   [<ffffffffa04d04ff>] scrub_stripe+0x4df/0x1080 [btrfs]
       [25176.850575]   [<ffffffffa04d11b1>] scrub_chunk.isra.19+0x111/0x130 [btrfs]
       [25176.851110]   [<ffffffffa04d144c>] scrub_enumerate_chunks+0x27c/0x510 [btrfs]
       [25176.851660]   [<ffffffffa04d3b87>] btrfs_scrub_dev+0x1c7/0x6c0 [btrfs]
       [25176.852189]   [<ffffffffa04e918e>] btrfs_dev_replace_start+0x36e/0x450 [btrfs]
       [25176.852771]   [<ffffffffa04a98e0>] btrfs_ioctl+0x1e10/0x2d20 [btrfs]
       [25176.853315]   [<ffffffff8121c5b8>] do_vfs_ioctl+0x318/0x570
       [25176.853868]   [<ffffffff8121c851>] SyS_ioctl+0x41/0x80
       [25176.854406]   [<ffffffff8164da17>] system_call_fastpath+0x12/0x6f
       [25176.854935] irq event stamp: 51506
       [25176.855511] hardirqs last  enabled at (51506): [<ffffffff810d4ce5>] vprintk_emit+0x225/0x5e0
       [25176.856059] hardirqs last disabled at (51505): [<ffffffff810d4b77>] vprintk_emit+0xb7/0x5e0
       [25176.856642] softirqs last  enabled at (50886): [<ffffffff81067a23>] __do_softirq+0x363/0x640
       [25176.857184] softirqs last disabled at (50949): [<ffffffff8106804d>] irq_exit+0x10d/0x120
       [25176.857746]
       other info that might help us debug this:
       [25176.858845]  Possible unsafe locking scenario:
       [25176.859981]        CPU0
       [25176.860537]        ----
       [25176.861059]   lock(&wr_ctx->wr_lock);
       [25176.861705]   <Interrupt>
       [25176.862272]     lock(&wr_ctx->wr_lock);
       [25176.862881]
        *** DEADLOCK ***
      
      Reason:
       Above warning is caused by:
       Interrupt
       -> bio_endio()
       -> ...
       -> scrub_put_ctx()
       -> scrub_free_ctx() *1
       -> ...
       -> mutex_lock(&wr_ctx->wr_lock);
      
       scrub_put_ctx() is allowed to be called in end_bio interrupt, but
       in code design, it will never call scrub_free_ctx(sctx) in interrupe
       context(above *1), because btrfs_scrub_dev() get one additional
       reference of sctx->refs, which makes scrub_free_ctx() only called
       withine btrfs_scrub_dev().
      
       Now the code runs out of our wish, because free sequence in
       scrub_pending_bio_dec() have a gap.
      
       Current code:
       -----------------------------------+-----------------------------------
       scrub_pending_bio_dec()            |  btrfs_scrub_dev
       -----------------------------------+-----------------------------------
       atomic_dec(&sctx->bios_in_flight); |
       wake_up(&sctx->list_wait);         |
                                          | scrub_put_ctx()
                                          | -> atomic_dec_and_test(&sctx->refs)
       scrub_put_ctx(sctx);               |
       -> atomic_dec_and_test(&sctx->refs)|
       -> scrub_free_ctx()                |
       -----------------------------------+-----------------------------------
      
       We expected:
       -----------------------------------+-----------------------------------
       scrub_pending_bio_dec()            |  btrfs_scrub_dev
       -----------------------------------+-----------------------------------
       atomic_dec(&sctx->bios_in_flight); |
       wake_up(&sctx->list_wait);         |
       scrub_put_ctx(sctx);               |
       -> atomic_dec_and_test(&sctx->refs)|
                                          | scrub_put_ctx()
                                          | -> atomic_dec_and_test(&sctx->refs)
                                          | -> scrub_free_ctx()
       -----------------------------------+-----------------------------------
      
      Fix:
       Move scrub_pending_bio_dec() to a workqueue, to avoid this function run
       in interrupt context.
       Tested by check tracelog in debug.
      
      Changelog v1->v2:
       Use workqueue instead of adjust function call sequence in v1,
       because v1 will introduce a bug pointed out by:
       Filipe David Manana <fdmanana@gmail.com>
      Reported-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      20b2e302
  3. 04 3月, 2015 7 次提交
  4. 17 2月, 2015 1 次提交
  5. 15 2月, 2015 1 次提交
    • F
      Btrfs: scrub, fix sleep in atomic context · f55985f4
      Filipe Manana 提交于
      My previous patch "Btrfs: fix scrub race leading to use-after-free"
      introduced the possibility to sleep in an atomic context, which happens
      when the scrub_lock mutex is held at the time scrub_pending_bio_dec()
      is called - this function can be called under an atomic context.
      Chris ran into this in a debug kernel which gave the following trace:
      
      [ 1928.950319] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:621
      [ 1928.967334] in_atomic(): 1, irqs_disabled(): 0, pid: 149670, name: fsstress
      [ 1928.981324] INFO: lockdep is turned off.
      [ 1928.989244] CPU: 24 PID: 149670 Comm: fsstress Tainted: G        W     3.19.0-rc7-mason+ #41
      [ 1929.006418] Hardware name: ZTSYSTEMS Echo Ridge T4  /A9DRPF-10D, BIOS 1.07 05/10/2012
      [ 1929.022207]  ffffffff81a22cf8 ffff881076e03b78 ffffffff816b8dd9 ffff881076e03b78
      [ 1929.037267]  ffff880d8e828710 ffff881076e03ba8 ffffffff810856c4 ffff881076e03bc8
      [ 1929.052315]  0000000000000000 000000000000026d ffffffff81a22cf8 ffff881076e03bd8
      [ 1929.067381] Call Trace:
      [ 1929.072344]  <IRQ>  [<ffffffff816b8dd9>] dump_stack+0x4f/0x6e
      [ 1929.083968]  [<ffffffff810856c4>] ___might_sleep+0x174/0x230
      [ 1929.095352]  [<ffffffff810857d2>] __might_sleep+0x52/0x90
      [ 1929.106223]  [<ffffffff816bb68f>] mutex_lock_nested+0x2f/0x3b0
      [ 1929.117951]  [<ffffffff810ab37d>] ? trace_hardirqs_on+0xd/0x10
      [ 1929.129708]  [<ffffffffa05dc838>] scrub_pending_bio_dec+0x38/0x70 [btrfs]
      [ 1929.143370]  [<ffffffffa05dd0e0>] scrub_parity_bio_endio+0x50/0x70 [btrfs]
      [ 1929.157191]  [<ffffffff812fa603>] bio_endio+0x53/0xa0
      [ 1929.167382]  [<ffffffffa05f96bc>] rbio_orig_end_io+0x7c/0xa0 [btrfs]
      [ 1929.180161]  [<ffffffffa05f97ba>] raid_write_parity_end_io+0x5a/0x80 [btrfs]
      [ 1929.194318]  [<ffffffff812fa603>] bio_endio+0x53/0xa0
      [ 1929.204496]  [<ffffffff8130401b>] blk_update_request+0x1eb/0x450
      [ 1929.216569]  [<ffffffff81096e58>] ? trigger_load_balance+0x78/0x500
      [ 1929.229176]  [<ffffffff8144c74d>] scsi_end_request+0x3d/0x1f0
      [ 1929.240740]  [<ffffffff8144ccac>] scsi_io_completion+0xac/0x5b0
      [ 1929.252654]  [<ffffffff81441c50>] scsi_finish_command+0xf0/0x150
      [ 1929.264725]  [<ffffffff8144d317>] scsi_softirq_done+0x147/0x170
      [ 1929.276635]  [<ffffffff8130ace6>] blk_done_softirq+0x86/0xa0
      [ 1929.288014]  [<ffffffff8105d92e>] __do_softirq+0xde/0x600
      [ 1929.298885]  [<ffffffff8105df6d>] irq_exit+0xbd/0xd0
      (...)
      
      Fix this by using a reference count on the scrub context structure
      instead of locking the scrub_lock mutex.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      f55985f4
  6. 03 2月, 2015 1 次提交
    • F
      Btrfs: fix scrub race leading to use-after-free · de554a4f
      Filipe Manana 提交于
      While running a scrub on a kernel with CONFIG_DEBUG_PAGEALLOC=y, I got
      the following trace:
      
      [68127.807663] BUG: unable to handle kernel paging request at ffff8803f8947a50
      [68127.807663] IP: [<ffffffff8107da31>] do_raw_spin_lock+0x94/0x122
      [68127.807663] PGD 3003067 PUD 43e1f5067 PMD 43e030067 PTE 80000003f8947060
      [68127.807663] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [68127.807663] Modules linked in: dm_flakey dm_mod crc32c_generic btrfs xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop parport_pc processor parpo
      [68127.807663] CPU: 2 PID: 3081 Comm: kworker/u8:5 Not tainted 3.18.0-rc6-btrfs-next-3+ #4
      [68127.807663] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [68127.807663] Workqueue: btrfs-btrfs-scrub btrfs_scrub_helper [btrfs]
      [68127.807663] task: ffff880101fc5250 ti: ffff8803f097c000 task.ti: ffff8803f097c000
      [68127.807663] RIP: 0010:[<ffffffff8107da31>]  [<ffffffff8107da31>] do_raw_spin_lock+0x94/0x122
      [68127.807663] RSP: 0018:ffff8803f097fbb8  EFLAGS: 00010093
      [68127.807663] RAX: 0000000028dd386c RBX: ffff8803f8947a50 RCX: 0000000028dd3854
      [68127.807663] RDX: 0000000000000018 RSI: 0000000000000002 RDI: 0000000000000001
      [68127.807663] RBP: ffff8803f097fbd8 R08: 0000000000000004 R09: 0000000000000001
      [68127.807663] R10: ffff880102620980 R11: ffff8801f3e8c900 R12: 000000000001d390
      [68127.807663] R13: 00000000cabd13c8 R14: ffff8803f8947800 R15: ffff88037c574f00
      [68127.807663] FS:  0000000000000000(0000) GS:ffff88043dd00000(0000) knlGS:0000000000000000
      [68127.807663] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [68127.807663] CR2: ffff8803f8947a50 CR3: 00000000b6481000 CR4: 00000000000006e0
      [68127.807663] Stack:
      [68127.807663]  ffffffff823942a8 ffff8803f8947a50 ffff8802a3416f80 0000000000000000
      [68127.807663]  ffff8803f097fc18 ffffffff8141e7c0 ffffffff81072948 000000000034f314
      [68127.807663]  ffff8803f097fc08 0000000000000292 ffff8803f097fc48 ffff8803f8947a50
      [68127.807663] Call Trace:
      [68127.807663]  [<ffffffff8141e7c0>] _raw_spin_lock_irqsave+0x4b/0x55
      [68127.807663]  [<ffffffff81072948>] ? __wake_up+0x22/0x4b
      [68127.807663]  [<ffffffff81072948>] __wake_up+0x22/0x4b
      [68127.807663]  [<ffffffffa0392327>] scrub_pending_bio_dec+0x32/0x36 [btrfs]
      [68127.807663]  [<ffffffffa0395e70>] scrub_bio_end_io_worker+0x5a3/0x5c9 [btrfs]
      [68127.807663]  [<ffffffff810e0c7c>] ? time_hardirqs_off+0x15/0x28
      [68127.807663]  [<ffffffff81078106>] ? trace_hardirqs_off_caller+0x4c/0xb9
      [68127.807663]  [<ffffffffa0372a7c>] normal_work_helper+0xf1/0x238 [btrfs]
      [68127.807663]  [<ffffffffa0372d3d>] btrfs_scrub_helper+0x12/0x14 [btrfs]
      [68127.807663]  [<ffffffff810582d2>] process_one_work+0x1e4/0x3b6
      [68127.807663]  [<ffffffff81078180>] ? trace_hardirqs_off+0xd/0xf
      [68127.807663]  [<ffffffff81058dc9>] worker_thread+0x1fb/0x2a8
      [68127.807663]  [<ffffffff81058bce>] ? rescuer_thread+0x219/0x219
      [68127.807663]  [<ffffffff8105cd75>] kthread+0xdb/0xe3
      [68127.807663]  [<ffffffff8105cc9a>] ? __kthread_parkme+0x67/0x67
      [68127.807663]  [<ffffffff8141f1ec>] ret_from_fork+0x7c/0xb0
      [68127.807663]  [<ffffffff8105cc9a>] ? __kthread_parkme+0x67/0x67
      [68127.807663] Code: 39 c2 75 14 8d 8a 00 00 01 00 89 d0 f0 0f b1 0b 39 d0 0f 84 81 00 00 00 4c 69 2d 27 86 99 00 fa 00 00 00 45 31 e4 4d 39 ec 74 2b <8b> 13 89 d0 c1 e8 10 66 39 c2 75
      [68127.807663] RIP  [<ffffffff8107da31>] do_raw_spin_lock+0x94/0x122
      [68127.807663]  RSP <ffff8803f097fbb8>
      [68127.807663] CR2: ffff8803f8947a50
      [68127.807663] ---[ end trace d7045aac00a66cd8 ]---
      
      This is due to a race that can happen in a very tiny time window and is
      illustrated by the following sequence diagram:
      
               CPU 1                                                     CPU 2
      
                                                                      btrfs_scrub_dev()
      scrub_bio_end_io_worker()
         scrub_pending_bio_dec()
             atomic_dec(&sctx->bios_in_flight)
                                                                         wait sctx->bios_in_flight == 0
                                                                         wait sctx->workers_pending == 0
                                                                         mutex_lock(&fs_info->scrub_lock)
                                                                         (...)
                                                                         mutex_lock(&fs_info->scrub_lock)
                                                                         scrub_free_ctx(sctx)
                                                                            kfree(sctx)
             wake_up(&sctx->list_wait)
                __wake_up()
                    spin_lock_irqsave(&sctx->list_wait->lock, flags)
      
      Another variation of this scenario that results in the same use-after-free
      issue is:
      
               CPU 1                                                     CPU 2
      
                                                                      btrfs_scrub_dev()
                                                                         wait sctx->bios_in_flight == 0
      scrub_bio_end_io_worker()
         scrub_pending_bio_dec()
             __wake_up(&sctx->list_wait)
                spin_lock_irqsave(&sctx->list_wait->lock, flags)
                default_wake_function()
                    wake up task at CPU 2
                                                                         wait sctx->workers_pending == 0
                                                                         mutex_lock(&fs_info->scrub_lock)
                                                                         (...)
                                                                         mutex_lock(&fs_info->scrub_lock)
                                                                         scrub_free_ctx(sctx)
                                                                            kfree(sctx)
                spin_unlock_irqrestore(&sctx->list_wait->lock, flags)
      
      Fix this by holding the scrub lock while doing the wakeup.
      
      This isn't a recent regression, the issue as been around since the scrub
      feature was added (2011, commit a2de733c).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      de554a4f
  7. 28 1月, 2015 1 次提交
  8. 22 1月, 2015 12 次提交
  9. 20 1月, 2015 1 次提交
  10. 15 1月, 2015 2 次提交
  11. 03 1月, 2015 1 次提交
  12. 03 12月, 2014 4 次提交
    • M
      Btrfs, raid56: fix use-after-free problem in the final device replace procedure on raid56 · 4245215d
      Miao Xie 提交于
      The commit c404e0dc (Btrfs: fix use-after-free in the finishing
      procedure of the device replace) fixed a use-after-free problem
      which happened when removing the source device at the end of device
      replace, but at that time, btrfs didn't support device replace
      on raid56, so we didn't fix the problem on the raid56 profile.
      Currently, we implemented device replace for raid56, so we need
      kick that problem out before we enable that function for raid56.
      
      The fix method is very simple, we just increase the bio per-cpu
      counter before we submit a raid56 io, and decrease the counter
      when the raid56 io ends.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      4245215d
    • M
      Btrfs, replace: write raid56 parity into the replace target device · 76035976
      Miao Xie 提交于
      This function reused the code of parity scrub, and we just write
      the right parity or corrected parity into the target device before
      the parity scrub end.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      76035976
    • M
      Btrfs, raid56: support parity scrub on raid56 · 5a6ac9ea
      Miao Xie 提交于
      The implementation is:
      - Read and check all the data with checksum in the same stripe.
        All the data which has checksum is COW data, and we are sure
        that it is not changed though we don't lock the stripe. because
        the space of that data just can be reclaimed after the current
        transction is committed, and then the fs can use it to store the
        other data, but when doing scrub, we hold the current transaction,
        that is that data can not be recovered, it is safe that read and check
        it out of the stripe lock.
      - Lock the stripe
      - Read out all the data without checksum and parity
        The data without checksum and the parity may be changed if we don't
        lock the stripe, so we need read it in the stripe lock context.
      - Check the parity
      - Re-calculate the new parity and write back it if the old parity
        is not right
      - Unlock the stripe
      
      If we can not read out the data or the data we read is corrupted,
      we will try to repair it. If the repair fails. we will mark the
      horizontal sub-stripe(pages on the same horizontal) as corrupted
      sub-stripe, and we will skip the parity check and repair of that
      horizontal sub-stripe.
      
      And in order to skip the horizontal sub-stripe that has no data, we
      introduce a bitmap. If there is some data on the horizontal sub-stripe,
      we will the relative bit to 1, and when we check and repair the
      parity, we will skip those horizontal sub-stripes that the relative
      bits is 0.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      5a6ac9ea
    • M
      Btrfs, scrub: repair the common data on RAID5/6 if it is corrupted · af8e2d1d
      Miao Xie 提交于
      This patch implement the RAID5/6 common data repair function, the
      implementation is similar to the scrub on the other RAID such as
      RAID1, the differentia is that we don't read the data from the
      mirror, we use the data repair function of RAID5/6.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      af8e2d1d
  13. 21 11月, 2014 1 次提交
    • G
      btrfs: fix dead lock while running replace and defrag concurrently · 32159242
      Gui Hecheng 提交于
      This can be reproduced by fstests: btrfs/070
      
      The scenario is like the following:
      
      replace worker thread		defrag thread
      ---------------------		-------------
      copy_nocow_pages_worker		btrfs_defrag_file
        copy_nocow_pages_for_inode	    ...
      				  btrfs_writepages
        |A| lock_extent_bits		    extent_write_cache_pages
      				|B|   lock_page
      					__extent_writepage
      		...			  writepage_delalloc
      					    find_lock_delalloc_range
      				|B| 	      lock_extent_bits
        find_or_create_page
          pagecache_get_page
        |A| lock_page
      
      This leads to an ABBA pattern deadlock. To fix it,
      o we just change it to an AABB pattern which means to @unlock_extent_bits()
        before we @lock_page(), and in this way the @extent_read_full_page_nolock()
        is no longer in an locked context, so change it back to @extent_read_full_page()
        to regain protection.
      
      o Since we @unlock_extent_bits() earlier, then before @write_page_nocow(),
        the extent may not really point at the physical block we want, so we
        have to check it before write.
      Signed-off-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com>
      Tested-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      32159242
  14. 02 10月, 2014 1 次提交
  15. 18 9月, 2014 5 次提交