1. 24 5月, 2011 12 次提交
    • J
      Btrfs: don't try to allocate from a block group that doesn't have enough space · cca1c81f
      Josef Bacik 提交于
      If we have a very large filesystem, we can spend a lot of time in
      find_free_extent just trying to allocate from empty block groups.  So instead
      check to see if the block group even has enough space for the allocation, and if
      not go on to the next block group.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      cca1c81f
    • J
      Btrfs: don't always do readahead · 026fd317
      Josef Bacik 提交于
      Our readahead is sort of sloppy, and really isn't always needed.  For example if
      ls is doing a stating ls (which is the default) it's going to stat in non-disk
      order, so if say you have a directory with a stupid amount of files, readahead
      is going to do nothing but waste time in the case of doing the stat.  Taking the
      unconditional readahead out made my test go from 57 minutes to 36 minutes.  This
      means that everywhere we do loop through the tree we want to make sure we do set
      path->reada properly, so I went through and found all of the places where we
      loop through the path and set reada to 1.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      026fd317
    • J
      Btrfs: try not to sleep as much when doing slow caching · 589d8ade
      Josef Bacik 提交于
      When the fs is super full and we unmount the fs, we could get stuck in this
      thing where unmount is waiting for the caching kthread to make progress and the
      caching kthread keeps scheduling because we're in the middle of a commit.  So
      instead just let the caching kthread keep going and only yeild if
      need_resched().  This makes my horrible umount case go from taking up to 10
      minutes to taking less than 20 seconds.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      589d8ade
    • J
      Btrfs: kill BTRFS_I(inode)->block_group · d82a6f1d
      Josef Bacik 提交于
      Originally this was going to be used as a way to give hints to the allocator,
      but frankly we can get much better hints elsewhere and it's not even used at all
      for anything usefull.  In addition to be completely useless, when we initialize
      an inode we try and find a freeish block group to set as the inodes block group,
      and with a completely full 40gb fs this takes _forever_, so I imagine with say
      1tb fs this is just unbearable.  So just axe the thing altoghether, we don't
      need it and it saves us 8 bytes in the inode and saves us 500 microseconds per
      inode lookup in my testcase.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      d82a6f1d
    • J
      Btrfs: don't look at the extent buffer level 3 times in a row · 7e2355ba
      Josef Bacik 提交于
      We have a bit of debugging in btrfs_search_slot to make sure the level of the
      cow block is the same as the original block we were cow'ing.  I don't think I've
      ever seen this tripped, so kill it.  This saves us 2 kmap's per level in our
      search.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      7e2355ba
    • J
      Btrfs: map the node block when looking for readahead targets · cb25c2ea
      Josef Bacik 提交于
      If we have particularly full nodes, we could call btrfs_node_blockptr up to 32
      times, which is 32 pairs of kmap/kunmap, which _sucks_.  So go ahead and map the
      extent buffer while we look for readahead targets.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      cb25c2ea
    • J
      Btrfs: set range_start to the right start in count_range_bits · af60bed2
      Josef Bacik 提交于
      In count_range_bits we are adjusting total_bytes based on the range we are
      searching for, but we don't adjust the range start according to the range we are
      searching for, which makes for weird results.  For example, if the range
      
      [0-8192]
      
      is set DELALLOC, but I search for 4096-8192, I will get back 4096 for the number
      of bytes found, but the range_start will be 0, which makes it look like the
      range is [0-4096].  So instead set range_start = max(cur_start, state->start).
      This makes everything come out right.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      af60bed2
    • J
      Btrfs: fix how we do space reservation for truncate · fcb80c2a
      Josef Bacik 提交于
      The ceph guys keep running into problems where we have space reserved in our
      orphan block rsv when freeing it up.  This is because they tend to do snapshots
      alot, so their truncates tend to use a bunch of space, so when we go to do
      things like update the inode we have to steal reservation space in order to make
      the reservation happen.  This happens because truncate can use as much space as
      it freaking feels like, but we still have to hold space for removing the orphan
      item and updating the inode, which will definitely always happen.  So in order
      to fix this we need to split all of the reservation stuf up.  So with this patch
      we have
      
      1) The orphan block reserve which only holds the space for deleting our orphan
      item when everything is over.
      
      2) The truncate block reserve which gets allocated and used specifically for the
      space that the truncate will use on a per truncate basis.
      
      3) The transaction will always have 1 item's worth of data reserved so we can
      update the inode normally.
      
      Hopefully this will make the ceph problem go away.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      fcb80c2a
    • J
      Btrfs: kill trans_mutex · a4abeea4
      Josef Bacik 提交于
      We use trans_mutex for lots of things, here's a basic list
      
      1) To serialize trans_handles joining the currently running transaction
      2) To make sure that no new trans handles are started while we are committing
      3) To protect the dead_roots list and the transaction lists
      
      Really the serializing trans_handles joining is not too hard, and can really get
      bogged down in acquiring a reference to the transaction.  So replace the
      trans_mutex with a trans_lock spinlock and use it to do the following
      
      1) Protect fs_info->running_transaction.  All trans handles have to do is check
      this, and then take a reference of the transaction and keep on going.
      2) Protect the fs_info->trans_list.  This doesn't get used too much, basically
      it just holds the current transactions, which will usually just be the currently
      committing transaction and the currently running transaction at most.
      3) Protect the dead roots list.  This is only ever processed by splicing the
      list so this is relatively simple.
      4) Protect the fs_info->reloc_ctl stuff.  This is very lightweight and was using
      the trans_mutex before, so this is a pretty straightforward change.
      5) Protect fs_info->no_trans_join.  Because we don't hold the trans_lock over
      the entirety of the commit we need to have a way to block new people from
      creating a new transaction while we're doing our work.  So we set no_trans_join
      and in join_transaction we test to see if that is set, and if it is we do a
      wait_on_commit.
      6) Make the transaction use count atomic so we don't need to take locks to
      modify it when we're dropping references.
      7) Add a commit_lock to the transaction to make sure multiple people trying to
      commit the same transaction don't race and commit at the same time.
      8) Make open_ioctl_trans an atomic so we don't have to take any locks for ioctl
      trans.
      
      I have tested this with xfstests, but obviously it is a pretty hairy change so
      lots of testing is greatly appreciated.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      a4abeea4
    • J
      Btrfs: if we've already started a trans handle, use that one · 2a1eb461
      Josef Bacik 提交于
      We currently track trans handles in current->journal_info, but we don't actually
      use it.  This patch fixes it.  This will cover the case where we have multiple
      people starting transactions down the call chain.  This keeps us from having to
      allocate a new handle and all of that, we just increase the use count of the
      current handle, save the old block_rsv, and return.  I tested this with xfstests
      and it worked out fine.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      2a1eb461
    • J
      Btrfs: take away the num_items argument from btrfs_join_transaction · 7a7eaa40
      Josef Bacik 提交于
      I keep forgetting that btrfs_join_transaction() just ignores the num_items
      argument, which leads me to sending pointless patches and looking stupid :).  So
      just kill the num_items argument from btrfs_join_transaction and
      btrfs_start_ioctl_transaction, since neither of them use it.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      7a7eaa40
    • J
      Btrfs: make sure to use the delalloc reserve when filling delalloc · 74b21075
      Josef Bacik 提交于
      In the prealloc filling code and compressed code we don't set trans->block_rsv
      to the delalloc block reserve properly, which is going to make us use metadata
      from the wrong pool, this patch fixes that.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      74b21075
  2. 15 5月, 2011 5 次提交
  3. 27 4月, 2011 1 次提交
  4. 26 4月, 2011 8 次提交
  5. 20 4月, 2011 1 次提交
    • C
      Btrfs: do some plugging in the submit_bio threads · 211588ad
      Chris Mason 提交于
      The Btrfs submit bio threads have a small number of
      threads responsible for pushing down bios we've collected
      for a large number of devices.
      
      Since we do all the bios for a single device at once,
      we want to make sure we unplug and send down the bios
      for each device as we're done processing them.
      
      The new plugging API removed the btrfs code to
      unplug while processing bios, this adds it back with
      the new API.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      211588ad
  6. 18 4月, 2011 1 次提交
    • C
      Btrfs: fix free space cache leak · f65647c2
      Chris Mason 提交于
      The free space caching code was recently reworked to
      cache all the pages it needed instead of using find_get_page everywhere.
      
      One loop was missed though, so it ended up leaking pages.  This fixes
      it to use our page array instead of find_get_page.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f65647c2
  7. 16 4月, 2011 3 次提交
    • J
      Btrfs: avoid taking the chunk_mutex in do_chunk_alloc · 6d74119f
      Josef Bacik 提交于
      Everytime we try to allocate disk space we try and see if we can pre-emptively
      allocate a chunk, but in the common case we don't allocate anything, so there is
      no sense in taking the chunk_mutex at all.  So instead if we are allocating a
      chunk, mark it in the space_info so we don't get two people trying to allocate
      at the same time.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Reviewed-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      6d74119f
    • C
      Btrfs end_bio_extent_readpage should look for locked bits · 0d399205
      Chris Mason 提交于
      A recent commit caches the extent state in end_bio_extent_readpage,
      but the search it does should look for locked extents.  This
      fixes things to make it more effective.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0d399205
    • C
      Btrfs: don't force chunk allocation in find_free_extent · 0e4f8f88
      Chris Mason 提交于
      find_free_extent likes to allocate in contiguous clusters,
      which makes writeback faster, especially on SSD storage.  As
      the FS fragments, these clusters become harder to find and we have
      to decide between allocating a new chunk to make more clusters
      or giving up on the cluster to allocate from the free space
      we have.
      
      Right now it creates too many chunks, and you can end up with
      a whole FS that is mostly empty metadata chunks.  This commit
      changes the allocation code to be more strict and only
      allocate new chunks when we've made good use of the chunks we
      already have.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0e4f8f88
  8. 13 4月, 2011 5 次提交
  9. 12 4月, 2011 4 次提交
    • A
      btrfs: using cached extent_state in set/unlock combinations · 507903b8
      Arne Jansen 提交于
      In several places the sequence (set_extent_uptodate, unlock_extent) is used.
      This leads to a duplicate lookup of the extent state. This patch lets
      set_extent_uptodate return a cached extent_state which can be passed to
      unlock_extent_cached.
      The occurences of the above sequences are updated to use the cache. Only
      end_bio_extent_readpage is updated that it first gets a cached state to
      pass it to the readpage_end_io_hook as the prototype requested and is later
      on being used for set/unlock.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      507903b8
    • J
      Btrfs: avoid taking the trans_mutex in btrfs_end_transaction · 13c5a93e
      Josef Bacik 提交于
      I've been working on making our O_DIRECT latency not suck and I noticed we were
      taking the trans_mutex in btrfs_end_transaction.  So to do this we convert
      num_writers and use_count to atomic_t's and just decrement them in
      btrfs_end_transaction.  Instead of deleting the transaction from the trans list
      in put_transaction we do that in btrfs_commit_transaction() since that's the
      only time it actually needs to be removed from the list.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      13c5a93e
    • X
      Btrfs: fix subvolume mount by name problem when default mount subvolume is set · e15d0542
      Xin Zhong 提交于
      We create two subvolumes (meego_root and meego_home) in
      btrfs root directory. And set meego_root as default mount
      subvolume. After we remount btrfs, meego_root is mounted
      to top directory by default. Then when we try to mount
      meego_home (subvol=meego_home) to a subdirectory, it failed.
      The problem is when default mount subvolume is set to
      meego_root, we search meego_home in meego_root but can not find
      it. So the solution is to add a new mount option (subvolrootid)
      to specify subvol id of root and search subvol name in it. For
      our case, now we can use "-o subvolrootid=0,subvol=meego_home)
      to mount meego_home.
      
      Detail information can be found in meego bugzilla:
      https://bugs.meego.com/show_bug.cgi?id=15055Signed-off-by: NZhong, Xin <xin.zhong@intel.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      e15d0542
    • D
      fix user annotation in ioctl.c · 13f2696f
      Daniel J Blueman 提交于
      Fix address space annotation correct in ioctl.c.
      Signed-off-by: NDaniel J Blueman <daniel.blueman@gmail.com>
      
       		       BTRFS_BLOCK_GROUP_SYSTEM,
      @@ -2387,7 +2387,7 @@ long btrfs_ioctl_space_info(struct btrfs_root
      *root, void __user *arg)
       		up_read(&info->groups_sem);
       	}
      
      -	user_dest = (struct btrfs_ioctl_space_info *)
      +	user_dest = (struct btrfs_ioctl_space_info __user *)
       		(arg + sizeof(struct btrfs_ioctl_space_args));
      
       	if (copy_to_user(user_dest, dest_orig, alloc_size))
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      13f2696f