1. 22 1月, 2015 9 次提交
  2. 15 1月, 2015 2 次提交
  3. 03 1月, 2015 1 次提交
  4. 03 12月, 2014 4 次提交
    • M
      Btrfs, raid56: fix use-after-free problem in the final device replace procedure on raid56 · 4245215d
      Miao Xie 提交于
      The commit c404e0dc (Btrfs: fix use-after-free in the finishing
      procedure of the device replace) fixed a use-after-free problem
      which happened when removing the source device at the end of device
      replace, but at that time, btrfs didn't support device replace
      on raid56, so we didn't fix the problem on the raid56 profile.
      Currently, we implemented device replace for raid56, so we need
      kick that problem out before we enable that function for raid56.
      
      The fix method is very simple, we just increase the bio per-cpu
      counter before we submit a raid56 io, and decrease the counter
      when the raid56 io ends.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      4245215d
    • M
      Btrfs, replace: write raid56 parity into the replace target device · 76035976
      Miao Xie 提交于
      This function reused the code of parity scrub, and we just write
      the right parity or corrected parity into the target device before
      the parity scrub end.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      76035976
    • M
      Btrfs, raid56: support parity scrub on raid56 · 5a6ac9ea
      Miao Xie 提交于
      The implementation is:
      - Read and check all the data with checksum in the same stripe.
        All the data which has checksum is COW data, and we are sure
        that it is not changed though we don't lock the stripe. because
        the space of that data just can be reclaimed after the current
        transction is committed, and then the fs can use it to store the
        other data, but when doing scrub, we hold the current transaction,
        that is that data can not be recovered, it is safe that read and check
        it out of the stripe lock.
      - Lock the stripe
      - Read out all the data without checksum and parity
        The data without checksum and the parity may be changed if we don't
        lock the stripe, so we need read it in the stripe lock context.
      - Check the parity
      - Re-calculate the new parity and write back it if the old parity
        is not right
      - Unlock the stripe
      
      If we can not read out the data or the data we read is corrupted,
      we will try to repair it. If the repair fails. we will mark the
      horizontal sub-stripe(pages on the same horizontal) as corrupted
      sub-stripe, and we will skip the parity check and repair of that
      horizontal sub-stripe.
      
      And in order to skip the horizontal sub-stripe that has no data, we
      introduce a bitmap. If there is some data on the horizontal sub-stripe,
      we will the relative bit to 1, and when we check and repair the
      parity, we will skip those horizontal sub-stripes that the relative
      bits is 0.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      5a6ac9ea
    • M
      Btrfs, scrub: repair the common data on RAID5/6 if it is corrupted · af8e2d1d
      Miao Xie 提交于
      This patch implement the RAID5/6 common data repair function, the
      implementation is similar to the scrub on the other RAID such as
      RAID1, the differentia is that we don't read the data from the
      mirror, we use the data repair function of RAID5/6.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      af8e2d1d
  5. 21 11月, 2014 1 次提交
    • G
      btrfs: fix dead lock while running replace and defrag concurrently · 32159242
      Gui Hecheng 提交于
      This can be reproduced by fstests: btrfs/070
      
      The scenario is like the following:
      
      replace worker thread		defrag thread
      ---------------------		-------------
      copy_nocow_pages_worker		btrfs_defrag_file
        copy_nocow_pages_for_inode	    ...
      				  btrfs_writepages
        |A| lock_extent_bits		    extent_write_cache_pages
      				|B|   lock_page
      					__extent_writepage
      		...			  writepage_delalloc
      					    find_lock_delalloc_range
      				|B| 	      lock_extent_bits
        find_or_create_page
          pagecache_get_page
        |A| lock_page
      
      This leads to an ABBA pattern deadlock. To fix it,
      o we just change it to an AABB pattern which means to @unlock_extent_bits()
        before we @lock_page(), and in this way the @extent_read_full_page_nolock()
        is no longer in an locked context, so change it back to @extent_read_full_page()
        to regain protection.
      
      o Since we @unlock_extent_bits() earlier, then before @write_page_nocow(),
        the extent may not really point at the physical block we want, so we
        have to check it before write.
      Signed-off-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com>
      Tested-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      32159242
  6. 02 10月, 2014 1 次提交
  7. 18 9月, 2014 7 次提交
  8. 24 8月, 2014 1 次提交
    • L
      Btrfs: fix task hang under heavy compressed write · 9e0af237
      Liu Bo 提交于
      This has been reported and discussed for a long time, and this hang occurs in
      both 3.15 and 3.16.
      
      Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.
      
      Btrfs has a kind of work queued as an ordered way, which means that its
      ordered_func() must be processed in the way of FIFO, so it usually looks like --
      
      normal_work_helper(arg)
          work = container_of(arg, struct btrfs_work, normal_work);
      
          work->func() <---- (we name it work X)
          for ordered_work in wq->ordered_list
                  ordered_work->ordered_func()
                  ordered_work->ordered_free()
      
      The hang is a rare case, first when we find free space, we get an uncached block
      group, then we go to read its free space cache inode for free space information,
      so it will
      
      file a readahead request
          btrfs_readpages()
               for page that is not in page cache
                      __do_readpage()
                           submit_extent_page()
                                 btrfs_submit_bio_hook()
                                       btrfs_bio_wq_end_io()
                                       submit_bio()
                                       end_workqueue_bio() <--(ret by the 1st endio)
                                            queue a work(named work Y) for the 2nd
                                            also the real endio()
      
      So the hang occurs when work Y's work_struct and work X's work_struct happens
      to share the same address.
      
      A bit more explanation,
      
      A,B,C -- struct btrfs_work
      arg   -- struct work_struct
      
      kthread:
      worker_thread()
          pick up a work_struct from @worklist
          process_one_work(arg)
      	worker->current_work = arg;  <-- arg is A->normal_work
      	worker->current_func(arg)
      		normal_work_helper(arg)
      		     A = container_of(arg, struct btrfs_work, normal_work);
      
      		     A->func()
      		     A->ordered_func()
      		     A->ordered_free()  <-- A gets freed
      
      		     B->ordered_func()
      			  submit_compressed_extents()
      			      find_free_extent()
      				  load_free_space_inode()
      				      ...   <-- (the above readhead stack)
      				      end_workqueue_bio()
      					   btrfs_queue_work(work C)
      		     B->ordered_free()
      
      As if work A has a high priority in wq->ordered_list and there are more ordered
      works queued after it, such as B->ordered_func(), its memory could have been
      freed before normal_work_helper() returns, which means that kernel workqueue
      code worker_thread() still has worker->current_work pointer to be work
      A->normal_work's, ie. arg's address.
      
      Meanwhile, work C is allocated after work A is freed, work C->normal_work
      and work A->normal_work are likely to share the same address(I confirmed this
      with ftrace output, so I'm not just guessing, it's rare though).
      
      When another kthread picks up work C->normal_work to process, and finds our
      kthread is processing it(see find_worker_executing_work()), it'll think
      work C as a collision and skip then, which ends up nobody processing work C.
      
      So the situation is that our kthread is waiting forever on work C.
      
      Besides, there're other cases that can lead to deadlock, but the real problem
      is that all btrfs workqueue shares one work->func, -- normal_work_helper,
      so this makes each workqueue to have its own helper function, but only a
      wraper pf normal_work_helper.
      
      With this patch, I no long hit the above hang.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9e0af237
  9. 19 8月, 2014 1 次提交
  10. 20 6月, 2014 1 次提交
  11. 10 6月, 2014 2 次提交
  12. 11 4月, 2014 1 次提交
  13. 08 4月, 2014 1 次提交
    • W
      Btrfs: scrub raid56 stripes in the right way · 3b080b25
      Wang Shilong 提交于
      Steps to reproduce:
       # mkfs.btrfs -f /dev/sda[8-11] -m raid5 -d raid5
       # mount /dev/sda8 /mnt
       # btrfs scrub start -BR /mnt
       # echo $? <--unverified errors make return value be 3
      
      This is because we don't setup right mapping between physical
      and logical address for raid56, which makes checksum mismatch.
      But we will find everthing is fine later when rechecking using
      btrfs_map_block().
      
      This patch fixed the problem by settuping right mappings and
      we only verify data stripes' checksums.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3b080b25
  14. 11 3月, 2014 4 次提交
  15. 29 1月, 2014 4 次提交