1. 07 5月, 2013 4 次提交
    • T
      Btrfs: cleanup of function where fixup_low_keys() is called · afe5fea7
      Tsutomu Itoh 提交于
      If argument 'trans' is unnecessary in the function where
      fixup_low_keys() is called, 'trans' is deleted.
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      afe5fea7
    • J
      Btrfs: fix bad extent logging · 09a2a8f9
      Josef Bacik 提交于
      A user sent me a btrfs-image of a file system that was panicing on mount during
      the log recovery.  I had originally thought these problems were from a bug in
      the free space cache code, but that was just a symptom of the problem.  The
      problem is if your application does something like this
      
      [prealloc][prealloc][prealloc]
      
      the internal extent maps will merge those all together into one extent map, even
      though on disk they are 3 separate extents.  So if you go to write into one of
      these ranges the extent map will be right since we use the physical extent when
      doing the write, but when we log the extents they will use the wrong sizes for
      the remainder prealloc space.  If this doesn't happen to trip up the free space
      cache (which it won't in a lot of cases) then you will get bogus entries in your
      extent tree which will screw stuff up later.  The data and such will still work,
      but everything else is broken.  This patch fixes this by not allowing extents
      that are on the modified list to be merged.  This has the side effect that we
      are no longer adding everything to the modified list all the time, which means
      we now have to call btrfs_drop_extents every time we log an extent into the
      tree.  So this allows me to drop all this speciality code I was using to get
      around calling btrfs_drop_extents.  With this patch the testcase I've created no
      longer creates a bogus file system after replaying the log.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      09a2a8f9
    • J
      Btrfs: log ram bytes properly · cc95bef6
      Josef Bacik 提交于
      When logging changed extents I was logging ram_bytes as the current length,
      which isn't correct, it's supposed to be the ram bytes of the original extent.
      This is for compression where even if we split the extent we need to know the
      ram bytes so when we uncompress the extent we know how big it will be.  This was
      still working out right with compression for some reason but I think we were
      getting lucky.  It was definitely off for prealloc which is why I noticed it,
      btrfsck was complaining about it.  With this patch btrfsck no longer complains
      after a log replay.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      cc95bef6
    • Z
      6841ebee
  2. 13 4月, 2013 1 次提交
    • J
      Btrfs: make sure nbytes are right after log replay · 4bc4bee4
      Josef Bacik 提交于
      While trying to track down a tree log replay bug I noticed that fsck was always
      complaining about nbytes not being right for our fsynced file.  That is because
      the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
      nbytes are not necessarily updated properly when we log it.  So to fix this we
      need to set nbytes to whatever it is on the inode that is on disk, so when we
      replay the extents we can just add the bytes that are being added as we replay
      the extent.  This makes it work for the case that we have the wrong nbytes or
      the case that we logged everything and nbytes is actually correct.  With this
      I'm no longer getting nbytes errors out of btrfsck.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      4bc4bee4
  3. 05 3月, 2013 1 次提交
    • J
      Btrfs: use set_nlink if our i_nlink is 0 · 9bf7a489
      Josef Bacik 提交于
      We need to inc the nlink of deleted entries when running replay so we can do the
      unlink on the fs_root and get everything cleaned up and then have the orphan
      cleanup do the right thing.  The problem is inc_nlink complains about this, even
      thought it still does the right thing.  So use set_nlink() if our i_nlink is 0
      to keep users from seeing the warnings during log replay.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      9bf7a489
  4. 02 3月, 2013 1 次提交
    • J
      Btrfs: delete inline extents when we find them during logging · 124fe663
      Josef Bacik 提交于
      Apparently when we do inline extents we allow the data to overlap the last chunk
      of the btrfs_file_extent_item, which means that we can possibly have a
      btrfs_file_extent_item that isn't actually as large as a btrfs_file_extent_item.
      This messes with us when we try to overwrite the extent when logging new extents
      since we expect for it to be the right size.  To fix this just delete the item
      and try to do the insert again which will give us the proper sized
      btrfs_file_extent_item.  This fixes a panic where map_private_extent_buffer
      would blow up because we're trying to write past the end of the leaf.  Thanks,
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      124fe663
  5. 01 3月, 2013 1 次提交
  6. 27 2月, 2013 1 次提交
    • Q
      btrfs: cleanup for open-coded alignment · fda2832f
      Qu Wenruo 提交于
      Though most of the btrfs codes are using ALIGN macro for page alignment,
      there are still some codes using open-coded alignment like the
      following:
      ------
              u64 mask = ((u64)root->stripesize - 1);
              u64 ret = (val + mask) & ~mask;
      ------
      Or even hidden one:
      ------
              num_bytes = (end - start + blocksize) & ~(blocksize - 1);
      ------
      
      Sometimes these open-coded alignment is not so easy to understand for
      newbie like me.
      
      This commit changes the open-coded alignment to the ALIGN macro for a
      better readability.
      
      Also there is a previous patch from David Sterba with similar changes,
      but the patch is for 3.2 kernel and seems not merged.
      http://www.spinics.net/lists/linux-btrfs/msg12747.html
      
      Cc: David Sterba <dave@jikos.cz>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fda2832f
  7. 21 2月, 2013 1 次提交
  8. 20 2月, 2013 2 次提交
  9. 25 1月, 2013 2 次提交
  10. 17 12月, 2012 4 次提交
  11. 13 12月, 2012 4 次提交
  12. 13 10月, 2012 1 次提交
    • E
      btrfs: Fix compilation with user namespace support enabled · e9069f47
      Eric W. Biederman 提交于
      When compiling with user namespace support btrfs fails like:
      
      fs/btrfs/tree-log.c: In function ‘fill_inode_item’:
      fs/btrfs/tree-log.c:2955:2: error: incompatible type for argument 3 of ‘btrfs_set_inode_uid’
      fs/btrfs/ctree.h:2026:1: note: expected ‘u32’ but argument is of type ‘kuid_t’
      fs/btrfs/tree-log.c:2956:2: error: incompatible type for argument 3 of ‘btrfs_set_inode_gid’
      fs/btrfs/ctree.h:2027:1: note: expected ‘u32’ but argument is of type ‘kgid_t’
      
      Fix this by using i_uid_read and i_gid_read in
      
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      e9069f47
  13. 09 10月, 2012 9 次提交
  14. 04 10月, 2012 1 次提交
    • J
      Btrfs: do not hold the write_lock on the extent tree while logging · ff44c6e3
      Josef Bacik 提交于
      Dave Sterba pointed out a sleeping while atomic bug while doing fsync.  This
      is because I'm an idiot and didn't realize that rwlock's were spin locks, so
      we've been holding this thing while doing allocations and such which is not
      good.  This patch fixes this by dropping the write lock before we do
      anything heavy and re-acquire it when it is done.  We also need to take a
      ref on the em's in case their corresponding pages are evicted and mark them
      as being logged so that releasepage does not remove them and doesn't remove
      them from our local list.  Thanks,
      Reported-by: NDave Sterba <dave@jikos.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      ff44c6e3
  15. 02 10月, 2012 7 次提交
    • M
      Btrfs: fix unprotected ->log_batch · 2ecb7923
      Miao Xie 提交于
      We forget to protect ->log_batch when syncing a file, this patch fix
      this problem by atomic operation. And ->log_batch is used to check
      if there are parallel sync operations or not, so it is unnecessary to
      reset it to 0 after the sync operation of the current log tree complete.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      2ecb7923
    • J
      Btrfs: add hole punching · 2aaa6655
      Josef Bacik 提交于
      This patch adds hole punching via fallocate.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2aaa6655
    • J
      Btrfs: remove unused hint byte argument for btrfs_drop_extents · 2671485d
      Josef Bacik 提交于
      I audited all users of btrfs_drop_extents and found that nobody actually uses
      the hint_byte argument.  I'm sure it was used for something at some point but
      it's not used now, and the way the pinning works the disk bytenr would never be
      immediately useful anyway so lets just remove it.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2671485d
    • L
      Btrfs: check if an inode has no checksum when logging it · d2794405
      Liu Bo 提交于
      This is based on Josef's "Btrfs: turbo charge fsync".
      
      If an inode is a BTRFS_INODE_NODATASUM one, we don't need to look for csum
      items any more.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      d2794405
    • L
      Btrfs: fix a bug in checking whether a inode is already in log · 46d8bc34
      Liu Bo 提交于
      This is based on Josef's "Btrfs: turbo charge fsync".
      
      The current btrfs checks if an inode is in log by comparing
      root's last_log_commit to inode's last_sub_trans[2].
      
      But the problem is that this root->last_log_commit is shared among
      inodes.
      
      Say we have N inodes to be logged, after the first inode,
      root's last_log_commit is updated and the N-1 remained files will
      be skipped.
      
      This fixes the bug by keeping a local copy of root's last_log_commit
      inside each inode and this local copy will be maintained itself.
      
      [1]: we regard each log transaction as a subset of btrfs's transaction,
      i.e. sub_trans
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      46d8bc34
    • L
      Btrfs: improve fsync by filtering extents that we want · 4e2f84e6
      Liu Bo 提交于
      This is based on Josef's "Btrfs: turbo charge fsync".
      
      The above Josef's patch performs very good in random sync write test,
      because we won't have too much extents to merge.
      
      However, it does not performs good on the test:
      dd if=/dev/zero of=foobar bs=4k count=12500 oflag=sync
      
      The reason is when we do sequencial sync write, we need to merge the
      current extent just with the previous one, so that we can get accumulated
      extents to log:
      
      A(4k) --> AA(8k) --> AAA(12k) --> AAAA(16k) ...
      
      So we'll have to flush more and more checksum into log tree, which is the
      bottleneck according to my tests.
      
      But we can avoid this by telling fsync the real extents that are needed
      to be logged.
      
      With this, I did the above dd sync write test (size=50m),
      
               w/o (orig)   w/ (josef's)   w/ (this)
      SATA      104KB/s       109KB/s       121KB/s
      ramdisk   1.5MB/s       1.5MB/s       10.7MB/s (613%)
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      4e2f84e6
    • L
      Btrfs: cleanup extents after we finish logging inode · 06d3d22b
      Liu Bo 提交于
      This is based on Josef's "Btrfs: turbo charge fsync".
      
      We should cleanup those extents after we've finished logging inode,
      otherwise we may do redundant work on them.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      06d3d22b