1. 02 3月, 2013 1 次提交
    • J
      Btrfs: delete inline extents when we find them during logging · 124fe663
      Josef Bacik 提交于
      Apparently when we do inline extents we allow the data to overlap the last chunk
      of the btrfs_file_extent_item, which means that we can possibly have a
      btrfs_file_extent_item that isn't actually as large as a btrfs_file_extent_item.
      This messes with us when we try to overwrite the extent when logging new extents
      since we expect for it to be the right size.  To fix this just delete the item
      and try to do the insert again which will give us the proper sized
      btrfs_file_extent_item.  This fixes a panic where map_private_extent_buffer
      would blow up because we're trying to write past the end of the leaf.  Thanks,
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      124fe663
  2. 01 3月, 2013 1 次提交
  3. 27 2月, 2013 1 次提交
    • Q
      btrfs: cleanup for open-coded alignment · fda2832f
      Qu Wenruo 提交于
      Though most of the btrfs codes are using ALIGN macro for page alignment,
      there are still some codes using open-coded alignment like the
      following:
      ------
              u64 mask = ((u64)root->stripesize - 1);
              u64 ret = (val + mask) & ~mask;
      ------
      Or even hidden one:
      ------
              num_bytes = (end - start + blocksize) & ~(blocksize - 1);
      ------
      
      Sometimes these open-coded alignment is not so easy to understand for
      newbie like me.
      
      This commit changes the open-coded alignment to the ALIGN macro for a
      better readability.
      
      Also there is a previous patch from David Sterba with similar changes,
      but the patch is for 3.2 kernel and seems not merged.
      http://www.spinics.net/lists/linux-btrfs/msg12747.html
      
      Cc: David Sterba <dave@jikos.cz>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fda2832f
  4. 21 2月, 2013 1 次提交
  5. 20 2月, 2013 2 次提交
  6. 25 1月, 2013 2 次提交
  7. 17 12月, 2012 4 次提交
  8. 13 12月, 2012 4 次提交
  9. 13 10月, 2012 1 次提交
    • E
      btrfs: Fix compilation with user namespace support enabled · e9069f47
      Eric W. Biederman 提交于
      When compiling with user namespace support btrfs fails like:
      
      fs/btrfs/tree-log.c: In function ‘fill_inode_item’:
      fs/btrfs/tree-log.c:2955:2: error: incompatible type for argument 3 of ‘btrfs_set_inode_uid’
      fs/btrfs/ctree.h:2026:1: note: expected ‘u32’ but argument is of type ‘kuid_t’
      fs/btrfs/tree-log.c:2956:2: error: incompatible type for argument 3 of ‘btrfs_set_inode_gid’
      fs/btrfs/ctree.h:2027:1: note: expected ‘u32’ but argument is of type ‘kgid_t’
      
      Fix this by using i_uid_read and i_gid_read in
      
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      e9069f47
  10. 09 10月, 2012 9 次提交
  11. 04 10月, 2012 1 次提交
    • J
      Btrfs: do not hold the write_lock on the extent tree while logging · ff44c6e3
      Josef Bacik 提交于
      Dave Sterba pointed out a sleeping while atomic bug while doing fsync.  This
      is because I'm an idiot and didn't realize that rwlock's were spin locks, so
      we've been holding this thing while doing allocations and such which is not
      good.  This patch fixes this by dropping the write lock before we do
      anything heavy and re-acquire it when it is done.  We also need to take a
      ref on the em's in case their corresponding pages are evicted and mark them
      as being logged so that releasepage does not remove them and doesn't remove
      them from our local list.  Thanks,
      Reported-by: NDave Sterba <dave@jikos.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      ff44c6e3
  12. 02 10月, 2012 9 次提交
    • M
      Btrfs: fix unprotected ->log_batch · 2ecb7923
      Miao Xie 提交于
      We forget to protect ->log_batch when syncing a file, this patch fix
      this problem by atomic operation. And ->log_batch is used to check
      if there are parallel sync operations or not, so it is unnecessary to
      reset it to 0 after the sync operation of the current log tree complete.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      2ecb7923
    • J
      Btrfs: add hole punching · 2aaa6655
      Josef Bacik 提交于
      This patch adds hole punching via fallocate.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2aaa6655
    • J
      Btrfs: remove unused hint byte argument for btrfs_drop_extents · 2671485d
      Josef Bacik 提交于
      I audited all users of btrfs_drop_extents and found that nobody actually uses
      the hint_byte argument.  I'm sure it was used for something at some point but
      it's not used now, and the way the pinning works the disk bytenr would never be
      immediately useful anyway so lets just remove it.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2671485d
    • L
      Btrfs: check if an inode has no checksum when logging it · d2794405
      Liu Bo 提交于
      This is based on Josef's "Btrfs: turbo charge fsync".
      
      If an inode is a BTRFS_INODE_NODATASUM one, we don't need to look for csum
      items any more.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      d2794405
    • L
      Btrfs: fix a bug in checking whether a inode is already in log · 46d8bc34
      Liu Bo 提交于
      This is based on Josef's "Btrfs: turbo charge fsync".
      
      The current btrfs checks if an inode is in log by comparing
      root's last_log_commit to inode's last_sub_trans[2].
      
      But the problem is that this root->last_log_commit is shared among
      inodes.
      
      Say we have N inodes to be logged, after the first inode,
      root's last_log_commit is updated and the N-1 remained files will
      be skipped.
      
      This fixes the bug by keeping a local copy of root's last_log_commit
      inside each inode and this local copy will be maintained itself.
      
      [1]: we regard each log transaction as a subset of btrfs's transaction,
      i.e. sub_trans
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      46d8bc34
    • L
      Btrfs: improve fsync by filtering extents that we want · 4e2f84e6
      Liu Bo 提交于
      This is based on Josef's "Btrfs: turbo charge fsync".
      
      The above Josef's patch performs very good in random sync write test,
      because we won't have too much extents to merge.
      
      However, it does not performs good on the test:
      dd if=/dev/zero of=foobar bs=4k count=12500 oflag=sync
      
      The reason is when we do sequencial sync write, we need to merge the
      current extent just with the previous one, so that we can get accumulated
      extents to log:
      
      A(4k) --> AA(8k) --> AAA(12k) --> AAAA(16k) ...
      
      So we'll have to flush more and more checksum into log tree, which is the
      bottleneck according to my tests.
      
      But we can avoid this by telling fsync the real extents that are needed
      to be logged.
      
      With this, I did the above dd sync write test (size=50m),
      
               w/o (orig)   w/ (josef's)   w/ (this)
      SATA      104KB/s       109KB/s       121KB/s
      ramdisk   1.5MB/s       1.5MB/s       10.7MB/s (613%)
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      4e2f84e6
    • L
      Btrfs: cleanup extents after we finish logging inode · 06d3d22b
      Liu Bo 提交于
      This is based on Josef's "Btrfs: turbo charge fsync".
      
      We should cleanup those extents after we've finished logging inode,
      otherwise we may do redundant work on them.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      06d3d22b
    • J
      Btrfs: only warn if we hit an error when doing the tree logging · 0fa83cdb
      Josef Bacik 提交于
      I hit this a couple times while working on my fsync patch (all my bugs, not
      normal operation), but with my new stuff we could have new errors from cases
      I have not encountered, so instead of BUG()'ing we should be WARN()'ing so
      that we are notified there is a problem but the user doesn't lose their
      data.  We can easily commit the transaction in the case that the tree
      logging fails and still be fine, so let's try and be as nice to the user as
      possible.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      0fa83cdb
    • J
      Btrfs: turbo charge fsync · 5dc562c5
      Josef Bacik 提交于
      At least for the vm workload.  Currently on fsync we will
      
      1) Truncate all items in the log tree for the given inode if they exist
      
      and
      
      2) Copy all items for a given inode into the log
      
      The problem with this is that for things like VMs you can have lots of
      extents from the fragmented writing behavior, and worst yet you may have
      only modified a few extents, not the entire thing.  This patch fixes this
      problem by tracking which transid modified our extent, and then when we do
      the tree logging we find all of the extents we've modified in our current
      transaction, sort them and commit them.  We also only truncate up to the
      xattrs of the inode and copy that stuff in normally, and then just drop any
      extents in the range we have that exist in the log already.  Here are some
      numbers of a 50 meg fio job that does random writes and fsync()s after every
      write
      
      		Original	Patched
      SATA drive	82KB/s		140KB/s
      Fusion drive	431KB/s		2532KB/s
      
      So around 2-6 times faster depending on your hardware.  There are a few
      corner cases, for example if you truncate at all we have to do it the old
      way since there is no way to be sure what is in the log is ok.  This
      probably could be done smarter, but if you write-fsync-truncate-write-fsync
      you deserve what you get.  All this work is in RAM of course so if your
      inode gets evicted from cache and you read it in and fsync it we'll do it
      the slow way if we are still in the same transaction that we last modified
      the inode in.
      
      The biggest cool part of this is that it requires no changes to the recovery
      code, so if you fsync with this patch and crash and load an old kernel, it
      will run the recovery and be a-ok.  I have tested this pretty thoroughly
      with an fsync tester and everything comes back fine, as well as xfstests.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      5dc562c5
  13. 24 7月, 2012 1 次提交
  14. 03 7月, 2012 1 次提交
    • C
      Btrfs: run delayed directory updates during log replay · b6305567
      Chris Mason 提交于
      While we are resolving directory modifications in the
      tree log, we are triggering delayed metadata updates to
      the filesystem btrees.
      
      This commit forces the delayed updates to run so the
      replay code can find any modifications done.  It stops
      us from crashing because the directory deleltion replay
      expects items to be removed immediately from the tree.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      cc: stable@kernel.org
      b6305567
  15. 30 5月, 2012 2 次提交
    • J
      Btrfs: fix return code in drop_objectid_items · 5bdbeb21
      Josef Bacik 提交于
      So dpkg fsync()'s the file and the directory containing the file whenever it
      writes to a file which is really slow in btrfs.  This is partly because
      fsync()'ing a directory _always_ committed the transaction instead of just
      going to the tree log.  This is because drop_objectid_items() would return 1
      since it does a btrfs_search_slot() which returns 1.  In tree-log jargon
      this means that we have to commit the transaction to be safe.  So just check
      if ret is greater than 0 and set it to 0 if it does.  With this patch we now
      use the tree-log instead of committing the entire transaction, which is
      twice as fast on my box.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      5bdbeb21
    • J
      Btrfs: check to see if the inode is in the log before fsyncing · 22ee6985
      Josef Bacik 提交于
      We have this check down in the actual logging code, but this is after we
      start a transaction and all that good stuff.  So move the helper
      inode_in_log() out so we can call it in fsync() and avoid starting a
      transaction altogether and just exit if we've already fsync()'ed this file
      recently.  You would notice this issue if you fsync()'ed a file over and
      over again until the transaction committed.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      22ee6985