1. 21 9月, 2013 2 次提交
    • J
      Btrfs: replay dir_index items before other items · dd8e7217
      Josef Bacik 提交于
      A user reported a bug where his log would not replay because he was getting
      -EEXIST back.  This was because he had a file moved into a directory that was
      logged.  What happens is the file had a lower inode number, and so it is
      processed first when replaying the log, and so we add the inode ref in for the
      directory it was moved to.  But then we process the directories DIR_INDEX item
      and try to add the inode ref for that inode and it fails because we already
      added it when we replayed the inode.  To solve this problem we need to just
      process any DIR_INDEX items we have in the log first so this all is taken care
      of, and then we can replay the rest of the items.  With this patch my reproducer
      can remount the file system properly instead of erroring out.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      dd8e7217
    • J
      Btrfs: actually log directory we are fsync()'ing · de2b530b
      Josef Bacik 提交于
      If you just create a directory and then fsync that directory and then pull the
      power plug you will come back up and the directory will not be there.  That is
      because we won't actually create directories if we've logged files inside of
      them since they will be created on replay, but in this check we will set our
      logged_trans of our current directory if it happens to be a directory, making us
      think it doesn't need to be logged.  Fix the logic to only do this to parent
      directories.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      de2b530b
  2. 01 9月, 2013 2 次提交
  3. 10 8月, 2013 1 次提交
  4. 14 6月, 2013 4 次提交
    • J
      Btrfs: exclude logged extents before replying when we are mixed · 8c2a1a30
      Josef Bacik 提交于
      With non-mixed block groups we replay the logs before we're allowed to do any
      writes, so we get away with not pinning/removing the data extents until right
      when we replay them.  However with mixed block groups we allocate out of the
      same pool, so we could easily allocate a metadata block that was logged in our
      tree log.  To deal with this we just need to notice that we have mixed block
      groups and do the normal excluding/removal dance during the pin stage of the log
      replay and that way we don't allocate metadata blocks from areas we have logged
      data extents.  With this patch we now pass xfstests generic/311 with mixed
      block groups turned on.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      8c2a1a30
    • M
      Btrfs: merge pending IO for tree log write back · c6adc9cc
      Miao Xie 提交于
      Before applying this patch, we flushed the log tree of the fs/file
      tree firstly, and then flushed the log root tree. It is ineffective,
      especially on the hard disk. This patch improved this problem by wrapping
      the above two flushes by the same blk_plug.
      
      By test, the performance of the sync write went up ~60%(2.9MB/s -> 4.6MB/s)
      on my scsi disk whose disk buffer was enabled.
      
      Test step:
       # mkfs.btrfs -f -m single <disk>
       # mount <disk> <mnt>
       # dd if=/dev/zero of=<mnt>/file0 bs=32K count=1024 oflag=sync
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c6adc9cc
    • L
      Btrfs: kill replicate code in replay_one_buffer · 2da1c669
      Liu Bo 提交于
      EXTREF is treated same as REF, so we can make the code tidy.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2da1c669
    • M
      Btrfs: cleanup the similar code of the fs root read · cb517eab
      Miao Xie 提交于
      There are several functions whose code is similar, such as
        btrfs_find_last_root()
        btrfs_read_fs_root_no_radix()
      
      Besides that, some functions are invoked twice, it is unnecessary,
      for example, we are sure that all roots which is found in
        btrfs_find_orphan_roots()
      have their orphan items, so it is unnecessary to check the orphan
      item again.
      
      So cleanup it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      cb517eab
  5. 07 5月, 2013 10 次提交
    • E
      btrfs: make static code static & remove dead code · 48a3b636
      Eric Sandeen 提交于
      Big patch, but all it does is add statics to functions which
      are in fact static, then remove the associated dead-code fallout.
      
      removed functions:
      
      btrfs_iref_to_path()
      __btrfs_lookup_delayed_deletion_item()
      __btrfs_search_delayed_insertion_item()
      __btrfs_search_delayed_deletion_item()
      find_eb_for_page()
      btrfs_find_block_group()
      range_straddles_pages()
      extent_range_uptodate()
      btrfs_file_extent_length()
      btrfs_scrub_cancel_devid()
      btrfs_start_transaction_lflush()
      
      btrfs_print_tree() is left because it is used for debugging.
      btrfs_start_transaction_lflush() and btrfs_reada_detach() are
      left for symmetry.
      
      ulist.c functions are left, another patch will take care of those.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      48a3b636
    • J
      Btrfs: remove almost all of the BUG()'s from tree-log.c · 3650860b
      Josef Bacik 提交于
      There were a whole bunch and I was doing it for other things.  I haven't tested
      these error paths but at the very least this is better than panicing.  I've only
      left 2 BUG_ON()'s since they are logic errors and I want to replace them with a
      ASSERT framework that we can compile out for production users.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      3650860b
    • J
      Btrfs: deal with free space cache errors while replaying log · b50c6e25
      Josef Bacik 提交于
      So everybody who got hit by my fsync bug will still continue to hit this
      BUG_ON() in the free space cache, which is pretty heavy handed.  So I took a
      file system that had this bug and fixed up all the BUG_ON()'s and leaks that
      popped up when I tried to mount a broken file system like this.  With this patch
      we just fail to mount instead of panicing.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b50c6e25
    • J
      Btrfs: check return value of commit when recovering log · abefa55a
      Josef Bacik 提交于
      We need to check the return value of the commit in case something goes wrong,
      otherwise we could end up going down the line and doing more stuff (like orphan
      cleanup) before we notice we should have errored out.  We need to do this before
      we free up the log_tree_root since the caller will handle all of that.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      abefa55a
    • J
      Btrfs: don't try and free ebs twice in log replay · 5ec8dca7
      Josef Bacik 提交于
      This work is done by btrfs_free_path() anyway so there's no need for this
      duplicate work.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      5ec8dca7
    • T
      Btrfs: remove unused argument of btrfs_extend_item() · 4b90c680
      Tsutomu Itoh 提交于
      Argument 'trans' is not used in btrfs_extend_item().
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      4b90c680
    • T
      Btrfs: cleanup of function where fixup_low_keys() is called · afe5fea7
      Tsutomu Itoh 提交于
      If argument 'trans' is unnecessary in the function where
      fixup_low_keys() is called, 'trans' is deleted.
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      afe5fea7
    • J
      Btrfs: fix bad extent logging · 09a2a8f9
      Josef Bacik 提交于
      A user sent me a btrfs-image of a file system that was panicing on mount during
      the log recovery.  I had originally thought these problems were from a bug in
      the free space cache code, but that was just a symptom of the problem.  The
      problem is if your application does something like this
      
      [prealloc][prealloc][prealloc]
      
      the internal extent maps will merge those all together into one extent map, even
      though on disk they are 3 separate extents.  So if you go to write into one of
      these ranges the extent map will be right since we use the physical extent when
      doing the write, but when we log the extents they will use the wrong sizes for
      the remainder prealloc space.  If this doesn't happen to trip up the free space
      cache (which it won't in a lot of cases) then you will get bogus entries in your
      extent tree which will screw stuff up later.  The data and such will still work,
      but everything else is broken.  This patch fixes this by not allowing extents
      that are on the modified list to be merged.  This has the side effect that we
      are no longer adding everything to the modified list all the time, which means
      we now have to call btrfs_drop_extents every time we log an extent into the
      tree.  So this allows me to drop all this speciality code I was using to get
      around calling btrfs_drop_extents.  With this patch the testcase I've created no
      longer creates a bogus file system after replaying the log.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      09a2a8f9
    • J
      Btrfs: log ram bytes properly · cc95bef6
      Josef Bacik 提交于
      When logging changed extents I was logging ram_bytes as the current length,
      which isn't correct, it's supposed to be the ram bytes of the original extent.
      This is for compression where even if we split the extent we need to know the
      ram bytes so when we uncompress the extent we know how big it will be.  This was
      still working out right with compression for some reason but I think we were
      getting lucky.  It was definitely off for prealloc which is why I noticed it,
      btrfsck was complaining about it.  With this patch btrfsck no longer complains
      after a log replay.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      cc95bef6
    • Z
      6841ebee
  6. 13 4月, 2013 1 次提交
    • J
      Btrfs: make sure nbytes are right after log replay · 4bc4bee4
      Josef Bacik 提交于
      While trying to track down a tree log replay bug I noticed that fsck was always
      complaining about nbytes not being right for our fsynced file.  That is because
      the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
      nbytes are not necessarily updated properly when we log it.  So to fix this we
      need to set nbytes to whatever it is on the inode that is on disk, so when we
      replay the extents we can just add the bytes that are being added as we replay
      the extent.  This makes it work for the case that we have the wrong nbytes or
      the case that we logged everything and nbytes is actually correct.  With this
      I'm no longer getting nbytes errors out of btrfsck.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      4bc4bee4
  7. 05 3月, 2013 1 次提交
    • J
      Btrfs: use set_nlink if our i_nlink is 0 · 9bf7a489
      Josef Bacik 提交于
      We need to inc the nlink of deleted entries when running replay so we can do the
      unlink on the fs_root and get everything cleaned up and then have the orphan
      cleanup do the right thing.  The problem is inc_nlink complains about this, even
      thought it still does the right thing.  So use set_nlink() if our i_nlink is 0
      to keep users from seeing the warnings during log replay.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      9bf7a489
  8. 02 3月, 2013 1 次提交
    • J
      Btrfs: delete inline extents when we find them during logging · 124fe663
      Josef Bacik 提交于
      Apparently when we do inline extents we allow the data to overlap the last chunk
      of the btrfs_file_extent_item, which means that we can possibly have a
      btrfs_file_extent_item that isn't actually as large as a btrfs_file_extent_item.
      This messes with us when we try to overwrite the extent when logging new extents
      since we expect for it to be the right size.  To fix this just delete the item
      and try to do the insert again which will give us the proper sized
      btrfs_file_extent_item.  This fixes a panic where map_private_extent_buffer
      would blow up because we're trying to write past the end of the leaf.  Thanks,
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      124fe663
  9. 01 3月, 2013 1 次提交
  10. 27 2月, 2013 1 次提交
    • Q
      btrfs: cleanup for open-coded alignment · fda2832f
      Qu Wenruo 提交于
      Though most of the btrfs codes are using ALIGN macro for page alignment,
      there are still some codes using open-coded alignment like the
      following:
      ------
              u64 mask = ((u64)root->stripesize - 1);
              u64 ret = (val + mask) & ~mask;
      ------
      Or even hidden one:
      ------
              num_bytes = (end - start + blocksize) & ~(blocksize - 1);
      ------
      
      Sometimes these open-coded alignment is not so easy to understand for
      newbie like me.
      
      This commit changes the open-coded alignment to the ALIGN macro for a
      better readability.
      
      Also there is a previous patch from David Sterba with similar changes,
      but the patch is for 3.2 kernel and seems not merged.
      http://www.spinics.net/lists/linux-btrfs/msg12747.html
      
      Cc: David Sterba <dave@jikos.cz>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fda2832f
  11. 21 2月, 2013 1 次提交
  12. 20 2月, 2013 2 次提交
  13. 25 1月, 2013 2 次提交
  14. 17 12月, 2012 4 次提交
  15. 13 12月, 2012 4 次提交
  16. 13 10月, 2012 1 次提交
    • E
      btrfs: Fix compilation with user namespace support enabled · e9069f47
      Eric W. Biederman 提交于
      When compiling with user namespace support btrfs fails like:
      
      fs/btrfs/tree-log.c: In function ‘fill_inode_item’:
      fs/btrfs/tree-log.c:2955:2: error: incompatible type for argument 3 of ‘btrfs_set_inode_uid’
      fs/btrfs/ctree.h:2026:1: note: expected ‘u32’ but argument is of type ‘kuid_t’
      fs/btrfs/tree-log.c:2956:2: error: incompatible type for argument 3 of ‘btrfs_set_inode_gid’
      fs/btrfs/ctree.h:2027:1: note: expected ‘u32’ but argument is of type ‘kgid_t’
      
      Fix this by using i_uid_read and i_gid_read in
      
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      e9069f47
  17. 09 10月, 2012 2 次提交
    • C
      btrfs: init ref_index to zero in add_inode_ref · f46dbe3d
      Chris Mason 提交于
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      f46dbe3d
    • S
      Btrfs: make filesystem read-only when submitting barrier fails · 5af3e8cc
      Stefan Behrens 提交于
      So far the return code of barrier_all_devices() is ignored, which
      means that errors are ignored. The result can be a corrupt
      filesystem which is not consistent.
      This commit adds code to evaluate the return code of
      barrier_all_devices(). The normal btrfs_error() mechanism is used to
      switch the filesystem into read-only mode when errors are detected.
      
      In order to decide whether barrier_all_devices() should return
      error or success, the number of disks that are allowed to fail the
      barrier submission is calculated. This calculation accounts for the
      worst RAID level of metadata, system and data. If single, dup or
      RAID0 is in use, a single disk error is already considered to be
      fatal. Otherwise a single disk error is tolerated.
      
      The calculation of the number of disks that are tolerated to fail
      the barrier operation is performed when the filesystem gets mounted,
      when a balance operation is started and finished, and when devices
      are added or removed.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      5af3e8cc