1. 12 5月, 2011 2 次提交
    • J
      btrfs: new ioctls for scrub · 475f6387
      Jan Schmidt 提交于
      adds ioctls necessary to start and cancel scrubs, to get current
      progress and to get info about devices to be scrubbed.
      Note that the scrub is done per-device and that the ioctl only
      returns after the scrub for this devices is finished or has been
      canceled.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      475f6387
    • A
      btrfs: scrub · a2de733c
      Arne Jansen 提交于
      This adds an initial implementation for scrub. It works quite
      straightforward. The usermode issues an ioctl for each device in the
      fs. For each device, it enumerates the allocated device chunks. For
      each chunk, the contained extents are enumerated and the data checksums
      fetched. The extents are read sequentially and the checksums verified.
      If an error occurs (checksum or EIO), a good copy is searched for. If
      one is found, the bad copy will be rewritten.
      All enumerations happen from the commit roots. During a transaction
      commit, the scrubs get paused and afterwards continue from the new
      roots.
      
      This commit is based on the series originally posted to linux-btrfs
      with some improvements that resulted from comments from David Sterba,
      Ilya Dryomov and Jan Schmidt.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      a2de733c
  2. 26 4月, 2011 8 次提交
  3. 18 4月, 2011 1 次提交
    • C
      Btrfs: fix free space cache leak · f65647c2
      Chris Mason 提交于
      The free space caching code was recently reworked to
      cache all the pages it needed instead of using find_get_page everywhere.
      
      One loop was missed though, so it ended up leaking pages.  This fixes
      it to use our page array instead of find_get_page.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f65647c2
  4. 16 4月, 2011 3 次提交
    • J
      Btrfs: avoid taking the chunk_mutex in do_chunk_alloc · 6d74119f
      Josef Bacik 提交于
      Everytime we try to allocate disk space we try and see if we can pre-emptively
      allocate a chunk, but in the common case we don't allocate anything, so there is
      no sense in taking the chunk_mutex at all.  So instead if we are allocating a
      chunk, mark it in the space_info so we don't get two people trying to allocate
      at the same time.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Reviewed-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      6d74119f
    • C
      Btrfs end_bio_extent_readpage should look for locked bits · 0d399205
      Chris Mason 提交于
      A recent commit caches the extent state in end_bio_extent_readpage,
      but the search it does should look for locked extents.  This
      fixes things to make it more effective.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0d399205
    • C
      Btrfs: don't force chunk allocation in find_free_extent · 0e4f8f88
      Chris Mason 提交于
      find_free_extent likes to allocate in contiguous clusters,
      which makes writeback faster, especially on SSD storage.  As
      the FS fragments, these clusters become harder to find and we have
      to decide between allocating a new chunk to make more clusters
      or giving up on the cluster to allocate from the free space
      we have.
      
      Right now it creates too many chunks, and you can end up with
      a whole FS that is mostly empty metadata chunks.  This commit
      changes the allocation code to be more strict and only
      allocate new chunks when we've made good use of the chunks we
      already have.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0e4f8f88
  5. 13 4月, 2011 5 次提交
  6. 12 4月, 2011 7 次提交
    • A
      btrfs: using cached extent_state in set/unlock combinations · 507903b8
      Arne Jansen 提交于
      In several places the sequence (set_extent_uptodate, unlock_extent) is used.
      This leads to a duplicate lookup of the extent state. This patch lets
      set_extent_uptodate return a cached extent_state which can be passed to
      unlock_extent_cached.
      The occurences of the above sequences are updated to use the cache. Only
      end_bio_extent_readpage is updated that it first gets a cached state to
      pass it to the readpage_end_io_hook as the prototype requested and is later
      on being used for set/unlock.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      507903b8
    • J
      Btrfs: avoid taking the trans_mutex in btrfs_end_transaction · 13c5a93e
      Josef Bacik 提交于
      I've been working on making our O_DIRECT latency not suck and I noticed we were
      taking the trans_mutex in btrfs_end_transaction.  So to do this we convert
      num_writers and use_count to atomic_t's and just decrement them in
      btrfs_end_transaction.  Instead of deleting the transaction from the trans list
      in put_transaction we do that in btrfs_commit_transaction() since that's the
      only time it actually needs to be removed from the list.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      13c5a93e
    • X
      Btrfs: fix subvolume mount by name problem when default mount subvolume is set · e15d0542
      Xin Zhong 提交于
      We create two subvolumes (meego_root and meego_home) in
      btrfs root directory. And set meego_root as default mount
      subvolume. After we remount btrfs, meego_root is mounted
      to top directory by default. Then when we try to mount
      meego_home (subvol=meego_home) to a subdirectory, it failed.
      The problem is when default mount subvolume is set to
      meego_root, we search meego_home in meego_root but can not find
      it. So the solution is to add a new mount option (subvolrootid)
      to specify subvol id of root and search subvol name in it. For
      our case, now we can use "-o subvolrootid=0,subvol=meego_home)
      to mount meego_home.
      
      Detail information can be found in meego bugzilla:
      https://bugs.meego.com/show_bug.cgi?id=15055Signed-off-by: NZhong, Xin <xin.zhong@intel.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      e15d0542
    • D
      fix user annotation in ioctl.c · 13f2696f
      Daniel J Blueman 提交于
      Fix address space annotation correct in ioctl.c.
      Signed-off-by: NDaniel J Blueman <daniel.blueman@gmail.com>
      
       		       BTRFS_BLOCK_GROUP_SYSTEM,
      @@ -2387,7 +2387,7 @@ long btrfs_ioctl_space_info(struct btrfs_root
      *root, void __user *arg)
       		up_read(&info->groups_sem);
       	}
      
      -	user_dest = (struct btrfs_ioctl_space_info *)
      +	user_dest = (struct btrfs_ioctl_space_info __user *)
       		(arg + sizeof(struct btrfs_ioctl_space_args));
      
       	if (copy_to_user(user_dest, dest_orig, alloc_size))
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      13f2696f
    • J
      Btrfs: check for duplicate iov_base's when doing dio reads · a1b75f7d
      Josef Bacik 提交于
      Apparently it is ok to submit a read to an IDE device with the same target page
      for different offsets.  This is what Windows does under qemu.  The problem is
      under DIO we expect them to be different buffers for checksumming reasons, and
      so this sort of thing will result in checksum errors, when in reality the file
      is fine.  So when reading, check to make sure that all iov bases are different,
      and if they aren't fall back to buffered mode, since that will work out right.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a1b75f7d
    • S
      btrfs: properly handle overlapping areas in memmove_extent_buffer · 3387206f
      Sergei Trofimovich 提交于
      Fix data corruption caused by memcpy() usage on overlapping data.
      I've observed it first when found out usermode linux crash on btrfs.
      
      ?all chain is the following:
      ------------[ cut here ]------------
      WARNING: at /home/slyfox/linux-2.6/fs/btrfs/extent_io.c:3900 memcpy_extent_buffer+0x1a5/0x219()
      Call Trace:
      6fa39a58:  [<601b495e>] _raw_spin_unlock_irqrestore+0x18/0x1c
      6fa39a68:  [<60029ad9>] warn_slowpath_common+0x59/0x70
      6fa39aa8:  [<60029b05>] warn_slowpath_null+0x15/0x17
      6fa39ab8:  [<600efc97>] memcpy_extent_buffer+0x1a5/0x219
      6fa39b48:  [<600efd9f>] memmove_extent_buffer+0x94/0x208
      6fa39bc8:  [<600becbf>] btrfs_del_items+0x214/0x473
      6fa39c78:  [<600ce1b0>] btrfs_delete_one_dir_name+0x7c/0xda
      6fa39cc8:  [<600dad6b>] __btrfs_unlink_inode+0xad/0x25d
      6fa39d08:  [<600d7864>] btrfs_start_transaction+0xe/0x10
      6fa39d48:  [<600dc9ff>] btrfs_unlink_inode+0x1b/0x3b
      6fa39d78:  [<600e04bc>] btrfs_unlink+0x70/0xef
      6fa39dc8:  [<6007f0d0>] vfs_unlink+0x58/0xa3
      6fa39df8:  [<60080278>] do_unlinkat+0xd4/0x162
      6fa39e48:  [<600517db>] call_rcu_sched+0xe/0x10
      6fa39e58:  [<600452a8>] __put_cred+0x58/0x5a
      6fa39e78:  [<6007446c>] sys_faccessat+0x154/0x166
      6fa39ed8:  [<60080317>] sys_unlink+0x11/0x13
      6fa39ee8:  [<60016b80>] handle_syscall+0x58/0x70
      6fa39f08:  [<60021377>] userspace+0x2d4/0x381
      6fa39fc8:  [<60014507>] fork_handler+0x62/0x69
      ---[ end trace 70b0ca2ef0266b93 ]---
      
      http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg09302.htmlSigned-off-by: NSergei Trofimovich <slyfox@gentoo.org>
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      3387206f
    • Y
      Btrfs: fix memory leaks in btrfs_new_inode() · 8fb27640
      Yoshinori Sano 提交于
      This patch fixes memory leaks in btrfs_new_inode().
      Signed-off-by: NYoshinori Sano <yoshinori.sano@gmail.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      8fb27640
  7. 09 4月, 2011 8 次提交
    • J
      Btrfs: check for duplicate iov_base's when doing dio reads · 93a54bc4
      Josef Bacik 提交于
      Apparently it is ok to submit a read to an IDE device with the same target page
      for different offsets.  This is what Windows does under qemu.  The problem is
      under DIO we expect them to be different buffers for checksumming reasons, and
      so this sort of thing will result in checksum errors, when in reality the file
      is fine.  So when reading, check to make sure that all iov bases are different,
      and if they aren't fall back to buffered mode, since that will work out right.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      93a54bc4
    • J
      Btrfs: reuse the extent_map we found when calling btrfs_get_extent · 16d299ac
      Josef Bacik 提交于
      In btrfs_get_block_direct we call btrfs_get_extent to lookup the extent for the
      range that we are looking for.  If we don't find an extent, btrfs_get_extent
      will insert a extent_map for that area and mark it as a hole.  So it does the
      job of allocating a new extent map and inserting it into the io tree.  But if
      we're creating a new extent we free it up and redo all of that work.  So instead
      pass the em to btrfs_new_extent_direct(), and if it will work just allocate the
      disk space and set it up properly and bypass the freeing/allocating of a new
      extent map and the expensive operation of inserting the thing into the io_tree.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      16d299ac
    • J
      Btrfs: do not use async submit for small DIO io's · 1ae39938
      Josef Bacik 提交于
      When looking at our DIO performance Chris said that for small IO's doing the
      async submit stuff tends to be more overhead than it's worth.  With this on top
      of my other fixes I get about a 17-20% speedup doing a sequential dd with 4k
      IO's.  Basically if we don't have to split the bio for the map length it's small
      enough to be directly submitted, otherwise go back to the async submit.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      1ae39938
    • J
      Btrfs: don't split dio bios if we don't have to · 02f57c7a
      Josef Bacik 提交于
      We have been unconditionally allocating a new bio and re-adding all pages from
      our original bio to the new bio.  This is needed if our original bio is larger
      than our stripe size, but if it is smaller than the stripe size then there is no
      need to do this.  So check the map length and if we are under that then go ahead
      and submit the original bio.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      02f57c7a
    • J
      Btrfs: do not call btrfs_update_inode in endio if nothing changed · 1ef30be1
      Josef Bacik 提交于
      In the DIO code we often don't update the i_disk_size because the i_size isn't
      updated until after the DIO is completed, so basically we are allocating a path,
      doing a search, and updating the inode item for no reason since nothing changed.
      btrfs_ordered_update_i_size will return 1 if it didn't update i_disk_size, so
      only run btrfs_update_inode if btrfs_ordered_update_i_size returns 0.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      1ef30be1
    • J
      Btrfs: map the inode item when doing fill_inode_item · 12ddb96c
      Josef Bacik 提交于
      Instead of calling kmap_atomic for every thing we set in the inode item, map the
      entire inode item at the start and unmap it at the end.  This makes a sequential
      dd of 400mb O_DIRECT something like 1% faster.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      12ddb96c
    • J
      Btrfs: only retry transaction reservation once · 06d5a589
      Josef Bacik 提交于
      I saw a lockup where we kept getting into this start transaction->commit
      transaction loop because of enospce.  The fact is if we fail to make our
      reservation, we've tried _everything_ several times, so we only need to try and
      commit the transaction once, and if that doesn't work then we really are out of
      space and need to just exit.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      06d5a589
    • J
      Btrfs: deal with the case that we run out of space in the cache · be1a12a0
      Josef Bacik 提交于
      Currently we don't handle running out of space in the cache, so to fix this we
      keep track of how far in the cache we are.  Then we only dirty the pages if we
      successfully modify all of them, otherwise if we have an error or run out of
      space we can just drop them and not worry about the vm writing them out.
      Thanks,
      
      Tested-by Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      be1a12a0
  8. 05 4月, 2011 6 次提交