1. 29 1月, 2010 6 次提交
  2. 18 1月, 2010 7 次提交
    • J
      Btrfs: fix possible panic on unmount · 11dfe35a
      Josef Bacik 提交于
      We can race with the unmount of an fs and the stopping of a kthread where we
      will free the block group before we're done using it.  The reason for this is
      because we do not hold a reference on the block group while its caching, since
      the allocator drops its reference once it exits or moves on to the next block
      group.  This patch fixes the problem by taking a reference to the block group
      before we start caching and dropping it when we're done to make sure all
      accesses to the block group are safe.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      11dfe35a
    • C
      Btrfs: deal with NULL acl sent to btrfs_set_acl · a9cc71a6
      Chris Mason 提交于
      It is legal for btrfs_set_acl to be sent a NULL acl.  This
      makes sure we don't dereference it.  A similar patch was sent by
      Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a9cc71a6
    • J
      Btrfs: fix regression in orphan cleanup · 6c090a11
      Josef Bacik 提交于
      Currently orphan cleanup only ever gets triggered if we cross subvolumes during
      a lookup, which means that if we just mount a plain jane fs that has orphans in
      it, they will never get cleaned up.  This results in panic's like these
      
      http://www.kerneloops.org/oops.php?number=1109085
      
      where adding an orphan entry results in -EEXIST being returned and we panic.  In
      order to fix this, we check to see on lookup if our root has had the orphan
      cleanup done, and if not go ahead and do it.  This is easily reproduceable by
      running this testcase
      
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <string.h>
      #include <unistd.h>
      #include <stdio.h>
      
      int main(int argc, char **argv)
      {
      	char data[4096];
      	char newdata[4096];
      	int fd1, fd2;
      
      	memset(data, 'a', 4096);
      	memset(newdata, 'b', 4096);
      
      	while (1) {
      		int i;
      
      		fd1 = creat("file1", 0666);
      		if (fd1 < 0)
      			break;
      
      		for (i = 0; i < 512; i++)
      			write(fd1, data, 4096);
      
      		fsync(fd1);
      		close(fd1);
      
      		fd2 = creat("file2", 0666);
      		if (fd2 < 0)
      			break;
      
      		ftruncate(fd2, 4096 * 512);
      
      		for (i = 0; i < 512; i++)
      			write(fd2, newdata, 4096);
      		close(fd2);
      
      		i = rename("file2", "file1");
      		unlink("file1");
      	}
      
      	return 0;
      }
      
      and then pulling the power on the box, and then trying to run that test again
      when the box comes back up.  I've tested this locally and it fixes the problem.
      Thanks to Tomas Carnecky for helping me track this down initially.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6c090a11
    • Y
      Btrfs: Fix race in btrfs_mark_extent_written · 6c7d54ac
      Yan, Zheng 提交于
      Fix bug reported by Johannes Hirte. The reason of that bug
      is btrfs_del_items is called after btrfs_duplicate_item and
      btrfs_del_items triggers tree balance. The fix is check that
      case and call btrfs_search_slot when needed.
      Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6c7d54ac
    • J
      Btrfs, fix memory leaks in error paths · 2423fdfb
      Jiri Slaby 提交于
      Stanse found 2 memory leaks in relocate_block_group and
      __btrfs_map_block. cluster and multi are not freed/assigned on all
      paths. Fix that.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: linux-btrfs@vger.kernel.org
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      2423fdfb
    • Y
      Btrfs: align offsets for btrfs_ordered_update_i_size · a038fab0
      Yan, Zheng 提交于
      Some callers of btrfs_ordered_update_i_size can now pass in
      a NULL for the ordered extent to update against.  This makes
      sure we properly align the offset they pass in when deciding
      how much to bump the on disk i_size.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a038fab0
    • J
      btrfs: fix missing last-entry in readdir(3) · 406266ab
      Jan Engelhardt 提交于
      parent 49313cdac7b34c9f7ecbb1780cfc648b1c082cd7 (v2.6.32-1-g49313cd)
      commit ff48c08e1c05c67e8348ab6f8a24de8034e0e34d
      Author: Jan Engelhardt <jengelh@medozas.de>
      Date:   Wed Dec 9 22:57:36 2009 +0100
      
      Btrfs: fix missing last-entry in readdir(3)
      
      When one does a 32-bit readdir(3), the last entry of a directory is
      missing. This is however not due to passing a large value to filldir,
      but it seems to have to do with glibc doing telldir or something
      quirky. In any case, this patch fixes it in practice.
      Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      406266ab
  3. 18 12月, 2009 15 次提交
  4. 16 12月, 2009 3 次提交
  5. 12 11月, 2009 9 次提交
    • J
      Btrfs: fix panic when trying to destroy a newly allocated · a6dbd429
      Josef Bacik 提交于
      There is a problem where iget5_locked will look for an inode, not find it, and
      then subsequently try to allocate it.  Another CPU will have raced in and
      allocated the inode instead, so when iget5_locked gets the inode spin lock again
      and does a search, it finds the new inode.  So it goes ahead and calls
      destroy_inode on the inode it just allocated.  The problem is we don't set
      BTRFS_I(inode)->root until the new inode is completely initialized.  This patch
      makes us set root to NULL when alloc'ing a new inode, so when we get to
      btrfs_destroy_inode and we see that root is NULL we can just free up the memory
      and continue on.  This fixes the panic
      
      http://www.kerneloops.org/submitresult.php?number=812690
      
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a6dbd429
    • C
      Btrfs: allow more metadata chunk preallocation · 33b25808
      Chris Mason 提交于
      On an FS where all of the space has not been allocated into chunks yet,
      the enospc can return enospc just because the existing metadata chunks
      are full.
      
      We get around this by allowing more metadata chunks to be allocated up
      to a certain limit, and finding the right limit is a little fuzzy.  The
      problem is the reservations for delalloc would preallocate way too much
      of the FS as metadata.  We need to start saying no and just force some
      IO to happen.
      
      But we also need to let a reasonable amount of the FS become metadata.
      This bumps the hard limit up, later releases will have a better system.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      33b25808
    • J
      Btrfs: fallback on uncompressed io if compressed io fails · f5a84ee3
      Josef Bacik 提交于
      Currently compressed IO does not deal with not having its entire extent able to
      be allocated.  So if we have enough free space to allocate for the extent, but
      its not contiguous, it will fail spectacularly.  This patch fixes this by
      falling back on uncompressed IO which lets us spread the delalloc extent across
      multiple extents.  I tested this by making us randomly think the reservation had
      failed to make it fallback on the uncompressed io way and it seemed to work
      fine.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f5a84ee3
    • J
      Btrfs: find ideal block group for caching · ccf0e725
      Josef Bacik 提交于
      This patch changes a few things.  Hopefully the comments are helpfull, but
      I'll try and be as verbose here.
      
      Problem:
      
      My fedora box was taking 1 minute and 21 seconds to boot with btrfs as root.
      Part of this problem was we pick the first block group we can find and start
      caching it, even if it may not have enough free space.  The other problem is
      we only search for cached block groups the first time around, which we won't
      find any cached block groups because this is a newly mounted fs, so we end up
      caching several block groups during bootup, which with alot of fragmentation
      takes around 30-45 seconds to complete, which bogs down the system.  So
      
      Solution:
      
      1) Don't cache block groups willy-nilly at first.  Instead try and figure out
      which block group has the most free, and therefore will take the least amount
      of time to cache.
      
      2) Don't be so picky about cached block groups.  The other problem is once
      we've filled up a cluster, if the block group isn't finished caching the next
      time we try and do the allocation we'll completely ignore the cluster and
      start searching from the beginning of the space, which makes us cache more
      block groups, which slows us down even more.  So instead of skipping block
      groups that are not finished caching when we have a hint, only skip the block
      group if it hasn't started caching yet.
      
      There is one other tweak in here.  Before if we allocated a chunk and still
      couldn't find new space, we'd end up switching the space info to force another
      chunk allocation.  This could make us end up with way too many chunks, so keep
      track of this particular case.
      
      With this patch and my previous cluster fixes my fedora box now boots in 43
      seconds, and according to the bootchart is not held up by our block group
      caching at all.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ccf0e725
    • D
      Btrfs: avoid null deref in unpin_extent_cache() · 4eb3991c
      Dan Carpenter 提交于
      I re-orderred the checks to avoid dereferencing "em" if it was null.
      
      Found by smatch static checker.
      Signed-off-by: NDan Carpenter <error27@gmail.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      4eb3991c
    • L
      Btrfs: skip btrfs_release_path in btrfs_update_root and btrfs_del_root · df66916e
      Li Dongyang 提交于
      We don't need to call btrfs_release_path because btrfs_free_path will do
      that for us.
      Signed-off-by: NLi Dongyang <Jerry87905@gmail.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      df66916e
    • J
      Btrfs: fix some metadata enospc issues · 5df6a9f6
      Josef Bacik 提交于
      We weren't reserving metadata space for rename, rmdir and unlink, which could
      cause problems.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      5df6a9f6
    • J
      Btrfs: fix how we set max_size for free space clusters · 01dea1ef
      Josef Bacik 提交于
      This patch fixes a problem where max_size can be set to 0 even though we
      filled the cluster properly.  We set max_size to 0 if we restart the cluster
      window, but if the new start entry is big enough to be our new cluster then we
      could return with a max_size set to 0, which will mean the next time we try to
      allocate from this cluster it will fail.  So set max_extent to the entry's
      size.  Tested this on my box and now we actually allocate from the cluster
      after we fill it.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      01dea1ef
    • J
      Btrfs: cleanup transaction starting and fix journal_info usage · 249ac1e5
      Josef Bacik 提交于
      We use journal_info to tell if we're in a nested transaction to make sure we
      don't commit the transaction within a nested transaction.  We use another
      method to see if there are any outstanding ioctl trans handles, so if we're
      starting one do not set current->journal_info, since it will screw with other
      filesystems.  This patch also cleans up the starting stuff so there aren't any
      magic numbers.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      249ac1e5