1. 07 5月, 2013 2 次提交
  2. 13 4月, 2013 1 次提交
    • J
      Btrfs: make sure nbytes are right after log replay · 4bc4bee4
      Josef Bacik 提交于
      While trying to track down a tree log replay bug I noticed that fsck was always
      complaining about nbytes not being right for our fsynced file.  That is because
      the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
      nbytes are not necessarily updated properly when we log it.  So to fix this we
      need to set nbytes to whatever it is on the inode that is on disk, so when we
      replay the extents we can just add the bytes that are being added as we replay
      the extent.  This makes it work for the case that we have the wrong nbytes or
      the case that we logged everything and nbytes is actually correct.  With this
      I'm no longer getting nbytes errors out of btrfsck.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      4bc4bee4
  3. 05 3月, 2013 1 次提交
    • J
      Btrfs: use set_nlink if our i_nlink is 0 · 9bf7a489
      Josef Bacik 提交于
      We need to inc the nlink of deleted entries when running replay so we can do the
      unlink on the fs_root and get everything cleaned up and then have the orphan
      cleanup do the right thing.  The problem is inc_nlink complains about this, even
      thought it still does the right thing.  So use set_nlink() if our i_nlink is 0
      to keep users from seeing the warnings during log replay.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      9bf7a489
  4. 02 3月, 2013 1 次提交
    • J
      Btrfs: delete inline extents when we find them during logging · 124fe663
      Josef Bacik 提交于
      Apparently when we do inline extents we allow the data to overlap the last chunk
      of the btrfs_file_extent_item, which means that we can possibly have a
      btrfs_file_extent_item that isn't actually as large as a btrfs_file_extent_item.
      This messes with us when we try to overwrite the extent when logging new extents
      since we expect for it to be the right size.  To fix this just delete the item
      and try to do the insert again which will give us the proper sized
      btrfs_file_extent_item.  This fixes a panic where map_private_extent_buffer
      would blow up because we're trying to write past the end of the leaf.  Thanks,
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      124fe663
  5. 01 3月, 2013 1 次提交
  6. 27 2月, 2013 1 次提交
    • Q
      btrfs: cleanup for open-coded alignment · fda2832f
      Qu Wenruo 提交于
      Though most of the btrfs codes are using ALIGN macro for page alignment,
      there are still some codes using open-coded alignment like the
      following:
      ------
              u64 mask = ((u64)root->stripesize - 1);
              u64 ret = (val + mask) & ~mask;
      ------
      Or even hidden one:
      ------
              num_bytes = (end - start + blocksize) & ~(blocksize - 1);
      ------
      
      Sometimes these open-coded alignment is not so easy to understand for
      newbie like me.
      
      This commit changes the open-coded alignment to the ALIGN macro for a
      better readability.
      
      Also there is a previous patch from David Sterba with similar changes,
      but the patch is for 3.2 kernel and seems not merged.
      http://www.spinics.net/lists/linux-btrfs/msg12747.html
      
      Cc: David Sterba <dave@jikos.cz>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fda2832f
  7. 21 2月, 2013 1 次提交
  8. 20 2月, 2013 2 次提交
  9. 25 1月, 2013 2 次提交
  10. 17 12月, 2012 4 次提交
  11. 13 12月, 2012 4 次提交
  12. 13 10月, 2012 1 次提交
    • E
      btrfs: Fix compilation with user namespace support enabled · e9069f47
      Eric W. Biederman 提交于
      When compiling with user namespace support btrfs fails like:
      
      fs/btrfs/tree-log.c: In function ‘fill_inode_item’:
      fs/btrfs/tree-log.c:2955:2: error: incompatible type for argument 3 of ‘btrfs_set_inode_uid’
      fs/btrfs/ctree.h:2026:1: note: expected ‘u32’ but argument is of type ‘kuid_t’
      fs/btrfs/tree-log.c:2956:2: error: incompatible type for argument 3 of ‘btrfs_set_inode_gid’
      fs/btrfs/ctree.h:2027:1: note: expected ‘u32’ but argument is of type ‘kgid_t’
      
      Fix this by using i_uid_read and i_gid_read in
      
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      e9069f47
  13. 09 10月, 2012 9 次提交
  14. 04 10月, 2012 1 次提交
    • J
      Btrfs: do not hold the write_lock on the extent tree while logging · ff44c6e3
      Josef Bacik 提交于
      Dave Sterba pointed out a sleeping while atomic bug while doing fsync.  This
      is because I'm an idiot and didn't realize that rwlock's were spin locks, so
      we've been holding this thing while doing allocations and such which is not
      good.  This patch fixes this by dropping the write lock before we do
      anything heavy and re-acquire it when it is done.  We also need to take a
      ref on the em's in case their corresponding pages are evicted and mark them
      as being logged so that releasepage does not remove them and doesn't remove
      them from our local list.  Thanks,
      Reported-by: NDave Sterba <dave@jikos.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      ff44c6e3
  15. 02 10月, 2012 9 次提交
    • M
      Btrfs: fix unprotected ->log_batch · 2ecb7923
      Miao Xie 提交于
      We forget to protect ->log_batch when syncing a file, this patch fix
      this problem by atomic operation. And ->log_batch is used to check
      if there are parallel sync operations or not, so it is unnecessary to
      reset it to 0 after the sync operation of the current log tree complete.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      2ecb7923
    • J
      Btrfs: add hole punching · 2aaa6655
      Josef Bacik 提交于
      This patch adds hole punching via fallocate.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2aaa6655
    • J
      Btrfs: remove unused hint byte argument for btrfs_drop_extents · 2671485d
      Josef Bacik 提交于
      I audited all users of btrfs_drop_extents and found that nobody actually uses
      the hint_byte argument.  I'm sure it was used for something at some point but
      it's not used now, and the way the pinning works the disk bytenr would never be
      immediately useful anyway so lets just remove it.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2671485d
    • L
      Btrfs: check if an inode has no checksum when logging it · d2794405
      Liu Bo 提交于
      This is based on Josef's "Btrfs: turbo charge fsync".
      
      If an inode is a BTRFS_INODE_NODATASUM one, we don't need to look for csum
      items any more.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      d2794405
    • L
      Btrfs: fix a bug in checking whether a inode is already in log · 46d8bc34
      Liu Bo 提交于
      This is based on Josef's "Btrfs: turbo charge fsync".
      
      The current btrfs checks if an inode is in log by comparing
      root's last_log_commit to inode's last_sub_trans[2].
      
      But the problem is that this root->last_log_commit is shared among
      inodes.
      
      Say we have N inodes to be logged, after the first inode,
      root's last_log_commit is updated and the N-1 remained files will
      be skipped.
      
      This fixes the bug by keeping a local copy of root's last_log_commit
      inside each inode and this local copy will be maintained itself.
      
      [1]: we regard each log transaction as a subset of btrfs's transaction,
      i.e. sub_trans
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      46d8bc34
    • L
      Btrfs: improve fsync by filtering extents that we want · 4e2f84e6
      Liu Bo 提交于
      This is based on Josef's "Btrfs: turbo charge fsync".
      
      The above Josef's patch performs very good in random sync write test,
      because we won't have too much extents to merge.
      
      However, it does not performs good on the test:
      dd if=/dev/zero of=foobar bs=4k count=12500 oflag=sync
      
      The reason is when we do sequencial sync write, we need to merge the
      current extent just with the previous one, so that we can get accumulated
      extents to log:
      
      A(4k) --> AA(8k) --> AAA(12k) --> AAAA(16k) ...
      
      So we'll have to flush more and more checksum into log tree, which is the
      bottleneck according to my tests.
      
      But we can avoid this by telling fsync the real extents that are needed
      to be logged.
      
      With this, I did the above dd sync write test (size=50m),
      
               w/o (orig)   w/ (josef's)   w/ (this)
      SATA      104KB/s       109KB/s       121KB/s
      ramdisk   1.5MB/s       1.5MB/s       10.7MB/s (613%)
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      4e2f84e6
    • L
      Btrfs: cleanup extents after we finish logging inode · 06d3d22b
      Liu Bo 提交于
      This is based on Josef's "Btrfs: turbo charge fsync".
      
      We should cleanup those extents after we've finished logging inode,
      otherwise we may do redundant work on them.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      06d3d22b
    • J
      Btrfs: only warn if we hit an error when doing the tree logging · 0fa83cdb
      Josef Bacik 提交于
      I hit this a couple times while working on my fsync patch (all my bugs, not
      normal operation), but with my new stuff we could have new errors from cases
      I have not encountered, so instead of BUG()'ing we should be WARN()'ing so
      that we are notified there is a problem but the user doesn't lose their
      data.  We can easily commit the transaction in the case that the tree
      logging fails and still be fine, so let's try and be as nice to the user as
      possible.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      0fa83cdb
    • J
      Btrfs: turbo charge fsync · 5dc562c5
      Josef Bacik 提交于
      At least for the vm workload.  Currently on fsync we will
      
      1) Truncate all items in the log tree for the given inode if they exist
      
      and
      
      2) Copy all items for a given inode into the log
      
      The problem with this is that for things like VMs you can have lots of
      extents from the fragmented writing behavior, and worst yet you may have
      only modified a few extents, not the entire thing.  This patch fixes this
      problem by tracking which transid modified our extent, and then when we do
      the tree logging we find all of the extents we've modified in our current
      transaction, sort them and commit them.  We also only truncate up to the
      xattrs of the inode and copy that stuff in normally, and then just drop any
      extents in the range we have that exist in the log already.  Here are some
      numbers of a 50 meg fio job that does random writes and fsync()s after every
      write
      
      		Original	Patched
      SATA drive	82KB/s		140KB/s
      Fusion drive	431KB/s		2532KB/s
      
      So around 2-6 times faster depending on your hardware.  There are a few
      corner cases, for example if you truncate at all we have to do it the old
      way since there is no way to be sure what is in the log is ok.  This
      probably could be done smarter, but if you write-fsync-truncate-write-fsync
      you deserve what you get.  All this work is in RAM of course so if your
      inode gets evicted from cache and you read it in and fsync it we'll do it
      the slow way if we are still in the same transaction that we last modified
      the inode in.
      
      The biggest cool part of this is that it requires no changes to the recovery
      code, so if you fsync with this patch and crash and load an old kernel, it
      will run the recovery and be a-ok.  I have tested this pretty thoroughly
      with an fsync tester and everything comes back fine, as well as xfstests.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      5dc562c5