1. 17 2月, 2012 2 次提交
    • M
      Btrfs: fix deadlock on page lock when doing auto-defragment · 600a45e1
      Miao Xie 提交于
      When I ran xfstests circularly on a auto-defragment btrfs, the deadlock
      happened.
      
      Steps to reproduce:
      [tty0]
       # export MOUNT_OPTIONS="-o autodefrag"
       # export TEST_DEV=<partition1>
       # export TEST_DIR=<mountpoint1>
       # export SCRATCH_DEV=<partition2>
       # export SCRATCH_MNT=<mountpoint2>
       # while [ 1 ]
       > do
       > ./check 091 127 263
       > sleep 1
       > done
      [tty1]
       # while [ 1 ]
       > do
       > echo 3 > /proc/sys/vm/drop_caches
       > done
      
      Several hours later, the test processes will hang on, and the deadlock will
      happen on page lock.
      
      The reason is that:
        Auto defrag task		Flush thread			Test task
      				btrfs_writepages()
      				  add ordered extent
      				  (including page 1, 2)
      				  set page 1 writeback
      				  set page 2 writeback
      				endio_fn()
      				  end page 2 writeback
      								release page 2
      lock page 1
      alloc and lock page 2
      page 2 is not uptodate
        btrfs_readpage()
          start ordered extent()
          btrfs_writepages()
            try  to lock page 1
      
      so deadlock happens.
      
      Fix this bug by unlocking the page which is in writeback, and re-locking it
      after the writeback end.
      Signed-off-by: NMiao Xie <miax@cn.fujitsu.com>
      600a45e1
    • T
      Btrfs: fix return value check of extent_io_ops · 013bd4c3
      Tsutomu Itoh 提交于
      This patch adds the check on the return value of extent_io_ops.
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      013bd4c3
  2. 16 2月, 2012 1 次提交
  3. 15 2月, 2012 9 次提交
    • D
      btrfs: silence warning in raid array setup · 8a334426
      David Sterba 提交于
      Raid array setup code creates an extent buffer in an usual way. When the
      PAGE_CACHE_SIZE is > super block size, the extent pages are not marked
      up-to-date, which triggers a WARN_ON in the following
      write_extent_buffer call. Add an explicit up-to-date call to silence the
      warning.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      8a334426
    • D
      btrfs: fix structs where bitfields and spinlock/atomic share 8B word · c08782da
      David Sterba 提交于
      On ia64, powerpc64 and sparc64 the bitfield is modified through a RMW cycle and current
      gcc rewrites the adjacent 4B word, which in case of a spinlock or atomic has
      disaterous effect.
      
      https://lkml.org/lkml/2012/2/1/220Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      c08782da
    • J
      btrfs: delalloc for page dirtied out-of-band in fixup worker · 87826df0
      Jeff Mahoney 提交于
       We encountered an issue that was easily observable on s/390 systems but
       could really happen anywhere. The timing just seemed to hit reliably
       on s/390 with limited memory.
      
       The gist is that when an unexpected set_page_dirty() happened, we'd
       run into the BUG() in btrfs_writepage_fixup_worker since it wasn't
       properly set up for delalloc.
      
       This patch does the following:
       - Performs the missing delalloc in the fixup worker
       - Allow the start hook to return -EBUSY which informs __extent_writepage
         that it should mark the page skipped and not to redirty it. This is
         required since the fixup worker can fail with -ENOSPC and the page
         will have already been redirtied. That causes an Oops in
         drop_outstanding_extents later. Retrying the fixup worker could
         lead to an infinite loop. Deferring the page redirty also saves us
         some cycles since the page would be stuck in a resubmit-redirty loop
         until the fixup worker completes. It's not harmful, just wasteful.
       - If the fixup worker fails, we mark the page and mapping as errored,
         and end the writeback, similar to what we would do had the page
         actually been submitted to writeback.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      87826df0
    • T
      Btrfs: fix memory leak in load_free_space_cache() · a7e221e9
      Tsutomu Itoh 提交于
      load_free_space_cache() has forgotten to free path.
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      a7e221e9
    • A
      btrfs: don't check DUP chunks twice · 859acaf1
      Arne Jansen 提交于
      Because scrub enumerates the dev extent tree to find the chunks to scrub,
      it currently finds each DUP chunk twice and also scrubs it twice. This
      patch makes sure that scrub_chunk only checks that part of the chunk the
      dev extent has been found for. This only changes the behaviour for DUP
      chunks.
      Reported-and-tested-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      859acaf1
    • L
      Btrfs: fix trim 0 bytes after a device delete · 2cac13e4
      Liu Bo 提交于
      A user reported a bug of btrfs's trim, that is we will trim 0 bytes
      after a device delete.
      
      The reproducer:
      
      $ mkfs.btrfs disk1
      $ mkfs.btrfs disk2
      $ mount disk1 /mnt
      $ fstrim -v /mnt
      $ btrfs device add disk2 /mnt
      $ btrfs device del disk1 /mnt
      $ fstrim -v /mnt
      
      This is because after we delete the device, the block group may start from
      a non-zero place, which will confuse trim to discard nothing.
      Reported-by: NLutz Euler <lutz.euler@freenet.de>
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      2cac13e4
    • J
      Btrfs: return the internal error unchanged if btrfs_get_extent_fiemap() call... · 6af021d8
      Jeff Liu 提交于
      Btrfs: return the internal error unchanged if btrfs_get_extent_fiemap() call failed for SEEK_DATA/SEEK_HOLE inquiry
      
      Given that ENXIO only means "offset beyond EOF" for either SEEK_DATA or SEEK_HOLE inquiry
      in a desired file range, so we should return the internal error unchanged if btrfs_get_extent_fiemap()
      call failed, rather than ENXIO.
      
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      6af021d8
    • J
      Btrfs: avoid positive number with ERR_PTR · 8f24b496
      Jan Schmidt 提交于
      inode_ref_info() returns 1 when the element wasn't found and < 0 on error,
      just like btrfs_search_slot(). In iref_to_path() it's an error when the
      inode ref can't be found, thus we return ERR_PTR(ret) in that case. In order
      to avoid ERR_PTR(1), we now set ret to -ENOENT in that case.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      8f24b496
    • K
      btrfs: Sector Size check during Mount · 941b2ddf
      Keith Mannthey 提交于
      Gracefully fail when trying to mount a BTRFS file system that has a
      sectorsize smaller than PAGE_SIZE.
      
      On PPC it is possible to build a FS while using a 4k PAGE_SIZE kernel
      then boot into a 64K PAGE_SIZE kernel.  Presently open_ctree fails in an
      endless loop and hangs the machine in this situation.
      
      My debugging has show this Sector size < Page size to be a non trivial
      situation and a graceful exit from the situation would be nice for the
      time being.
      Signed-off-by: NKeith Mannthey <kmannth@us.ibm.com>
      941b2ddf
  4. 01 2月, 2012 1 次提交
  5. 27 1月, 2012 11 次提交
    • C
      Btrfs: fix reservations in btrfs_page_mkwrite · 9998eb70
      Chris Mason 提交于
      Josef fixed btrfs_page_mkwrite to properly release reserved
      extents if there was an error.  But if we fail to get a reservation
      and we fail to dirty the inode (for ENOSPC reasons), we'll end up
      trying to release a reservation we never had.
      
      This makes sure we only release if we were able to reserve.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9998eb70
    • J
      Btrfs: advance window_start if we're using a bitmap · 9b230628
      Josef Bacik 提交于
      If we span a long area in a bitmap we could end up taking a lot of time
      searching to the next free area if we're searching from the original
      window_start, so advance window_start in order to make sure we don't do any
      superficial searching.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9b230628
    • D
      btrfs: mask out gfp flags in releasepage · 0c4e538b
      David Sterba 提交于
      btree_releasepage is a callback and can be passed unknown gfp flags and then
      they may end up in kmem_cache_alloc called from alloc_extent_state, slab
      allocator will BUG_ON when there is HIGHMEM or DMA32 flag set.
      
      This may happen when btrfs is mounted from a loop device, which masks out
      __GFP_IO flag. The check in try_release_extent_state
      
      3399                 if ((mask & GFP_NOFS) == GFP_NOFS)
      3400                         mask = GFP_NOFS;
      
      will not work and passes unfiltered flags further resulting in crash at
      mm/slab.c:2963
      
       [<000000000024ae4c>] cache_alloc_refill+0x3b4/0x5c8
       [<000000000024c810>] kmem_cache_alloc+0x204/0x294
       [<00000000001fd3c2>] mempool_alloc+0x52/0x170
       [<000003c000ced0b0>] alloc_extent_state+0x40/0xd4 [btrfs]
       [<000003c000cee5ae>] __clear_extent_bit+0x38a/0x4cc [btrfs]
       [<000003c000cee78c>] try_release_extent_state+0x9c/0xd4 [btrfs]
       [<000003c000cc4c66>] btree_releasepage+0x7e/0xd0 [btrfs]
       [<0000000000210d84>] shrink_page_list+0x6a0/0x724
       [<0000000000211394>] shrink_inactive_list+0x230/0x578
       [<0000000000211bb8>] shrink_list+0x6c/0x120
       [<0000000000211e4e>] shrink_zone+0x1e2/0x228
       [<0000000000211f24>] shrink_zones+0x90/0x254
       [<0000000000213410>] do_try_to_free_pages+0xac/0x420
       [<0000000000213ae0>] try_to_free_pages+0x13c/0x1b0
       [<0000000000204e6c>] __alloc_pages_nodemask+0x5b4/0x9a8
       [<00000000001fb04a>] grab_cache_page_write_begin+0x7e/0xe8
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0c4e538b
    • M
      Btrfs: fix enospc error caused by wrong checks of the chunk · 9e622d6b
      Miao Xie 提交于
      When we did sysbench test for inline files, enospc error happened easily though
      there was lots of free disk space which could be allocated for new chunks.
      
      Reproduce steps:
       # mkfs.btrfs -b $((2 * 1024 * 1024 * 1024)) <test partition>
       # mount <test partition> /mnt
       # ulimit -n 102400
       # cd /mnt
       # sysbench --num-threads=1 --test=fileio --file-num=81920 \
       > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
       > --file-test-mode=seqwr prepare
       # sysbench --num-threads=1 --test=fileio --file-num=81920 \
       > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
       > --file-test-mode=seqwr run
       <soon later, BUG_ON() was triggered by enospc error>
      
      The reason of this bug is:
      Now, we can reserve space which is larger than the free space in the chunks if
      we have enough free disk space which can be used for new chunks. By this way,
      the space allocator should allocate a new chunk by force if there is no free
      space in the free space cache. But there are two wrong checks which break this
      operation.
      
      One is
      	if (ret == -ENOSPC && num_bytes > min_alloc_size)
      in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk
      even we fail to allocate free space by minimum allocable size.
      
      The other is
      	if (space_info->force_alloc)
      		force = space_info->force_alloc;
      in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone
      sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen.
      
      Fix these two wrong checks. Especially the second one, we fix it by changing
      the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make
      CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has
      higher priority. And if the value which is passed in by the caller is greater
      than ->force_alloc, use the passed value.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9e622d6b
    • L
      Btrfs: do not defrag a file partially · 7ec31b54
      Liu Bo 提交于
      xfstests 218 complains that btrfs defrags a file partially:
       After: 1
       Write backwards sync, but contiguous - should defrag to 1 extent
       Before: 10
      -After: 1
      +After: 2
      
      To fix this, we need to set max_to_defrag count properly.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7ec31b54
    • S
      Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c · 0b485143
      Stefan Behrens 提交于
      There have been 4 warnings on 32-bit build, they are herewith fixed.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0b485143
    • J
      Btrfs: use cluster->window_start when allocating from a cluster bitmap · 0b4a9d24
      Josef Bacik 提交于
      We specifically set window_start in the cluster struct to indicate where the
      cluster starts in a bitmap, but we've been using min_start to indicate where
      we're searching from.  This is usually the start of the blockgroup, so
      essentially means we're constantly searching from the start of any bitmap we
      find, which completely negates all the trouble we go to in order to setup a
      cluster.  So start using window_start to make sure we actually use the area we
      found.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0b4a9d24
    • M
      Btrfs: Check for NULL page in extent_range_uptodate · 8bedd51b
      Mitch Harder 提交于
      A user has encountered a NULL pointer kernel oops in btrfs when
      encountering media errors.  The problem has been identified
      as an unhandled NULL pointer returned from find_get_page().
      This modification simply checks for a NULL page, and returns
      with an error if found (the extent_range_uptodate() function
      returns 1 on errors).
      
      After testing this patch, the user reported that the error with
      the NULL pointer oops was solved.  However, there is still a
      remaining problem with a thread becoming stuck in
      wait_on_page_locked(page) in the read_extent_buffer_pages(...)
      function in extent_io.c
      
             for (i = start_i; i < num_pages; i++) {
                     page = extent_buffer_page(eb, i);
                     wait_on_page_locked(page);
                     if (!PageUptodate(page))
                             ret = -EIO;
             }
      
      This patch leaves the issue with the locked page yet to be resolved.
      Signed-off-by: NMitch Harder <mitch.harder@sabayonlinux.org>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      8bedd51b
    • J
      btrfs: Fix busyloops in transaction waiting code · 6dd70ce4
      Jan Kara 提交于
      wait_log_commit() and wait_for_writer() were using slightly different
      conditions for deciding whether they should call schedule() and whether they
      should continue in the wait loop. Thus it could happen that we busylooped when
      the first condition was not true while the second one was. That is burning CPU
      cycles needlessly and is deadly on UP machines...
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6dd70ce4
    • J
      Btrfs: make sure a bitmap has enough bytes · 357b9784
      Josef Bacik 提交于
      We have only been checking for min_bytes available in bitmap entries, but we
      won't successfully setup a bitmap cluster unless it has at least bytes in the
      bitmap, so in the common case min_bytes is 4k and we want something like 2MB, so
      if there are a bunch of bitmap entries with less than 2mb's in them, we'll
      search all them anyway, which is suboptimal.  Fix this check.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      357b9784
    • J
      Btrfs: fix uninit warning in backref.c · b1375d64
      Jan Schmidt 提交于
      Added initialization with the declaration of ret. It isn't set later on the
      switch-default branch (which should never be taken).
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b1375d64
  6. 17 1月, 2012 16 次提交