1. 12 11月, 2013 4 次提交
  2. 21 9月, 2013 1 次提交
  3. 04 9月, 2013 1 次提交
  4. 01 9月, 2013 3 次提交
  5. 10 8月, 2013 1 次提交
    • J
      Btrfs: allow splitting of hole em's when dropping extent cache · ee20a983
      Josef Bacik 提交于
      I noticed while running multi-threaded fsync tests that sometimes fsck would
      complain about an improper gap.  This happens because we fail to add a hole
      extent to the file, which was happening when we'd split a hole EM because
      btrfs_drop_extent_cache was just discarding the whole em instead of splitting
      it.  So this patch fixes this by allowing us to split a hole em properly, which
      means that added holes actually get logged properly and we no longer see this
      fsck error.  Thankfully we're tolerant of these sort of problems so a user would
      not see any adverse effects of this bug, other than fsck complaining.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      ee20a983
  6. 03 7月, 2013 1 次提交
    • J
      vfs: export lseek_execute() to modules · 46a1c2c7
      Jie Liu 提交于
      For those file systems(btrfs/ext4/ocfs2/tmpfs) that support
      SEEK_DATA/SEEK_HOLE functions, we end up handling the similar
      matter in lseek_execute() to update the current file offset
      to the desired offset if it is valid, ceph also does the
      simliar things at ceph_llseek().
      
      To reduce the duplications, this patch make lseek_execute()
      public accessible so that we can call it directly from the
      underlying file systems.
      
      Thanks Dave Chinner for this suggestion.
      
      [AV: call it vfs_setpos(), don't bring the removed 'inode' argument back]
      
      v2->v1:
      - Add kernel-doc comments for lseek_execute()
      - Call lseek_execute() in ceph->llseek()
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: Ted Tso <tytso@mit.edu>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Sage Weil <sage@inktank.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      46a1c2c7
  7. 02 7月, 2013 1 次提交
    • J
      Btrfs: check if we can nocow if we don't have data space · 7ee9e440
      Josef Bacik 提交于
      We always just try and reserve data space when we write, but if we are out of
      space but have prealloc'ed extents we should still successfully write.  This
      patch will try and see if we can write to prealloc'ed space and if we can go
      ahead and allow the write to continue.  With this patch we now pass xfstests
      generic/274.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      7ee9e440
  8. 01 7月, 2013 1 次提交
    • J
      Btrfs: move btrfs_truncate_page to btrfs_cont_expand instead of btrfs_truncate · a71754fc
      Josef Bacik 提交于
      This has plagued us forever and I'm so over working around it.  When we truncate
      down to a non-page aligned offset we will call btrfs_truncate_page to zero out
      the end of the page and write it back to disk, this will keep us from exposing
      stale data if we truncate back up from that point.  The problem with this is it
      requires data space to do this, and people don't really expect to get ENOSPC
      from truncate() for these sort of things.  This also tends to bite the orphan
      cleanup stuff too which keeps people from mounting.  To get around this we can
      just move this into btrfs_cont_expand() to make sure if we are truncating up
      from a non-page size aligned i_size we will zero out the rest of this page so
      that we don't expose stale data.  This will give ENOSPC if you try to truncate()
      up or if you try to write past the end of isize, which is much more reasonable.
      This fixes xfstests generic/083 failing to mount because of the orphan cleanup
      failing.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      a71754fc
  9. 14 6月, 2013 1 次提交
  10. 08 5月, 2013 1 次提交
  11. 07 5月, 2013 4 次提交
    • E
      btrfs: make static code static & remove dead code · 48a3b636
      Eric Sandeen 提交于
      Big patch, but all it does is add statics to functions which
      are in fact static, then remove the associated dead-code fallout.
      
      removed functions:
      
      btrfs_iref_to_path()
      __btrfs_lookup_delayed_deletion_item()
      __btrfs_search_delayed_insertion_item()
      __btrfs_search_delayed_deletion_item()
      find_eb_for_page()
      btrfs_find_block_group()
      range_straddles_pages()
      extent_range_uptodate()
      btrfs_file_extent_length()
      btrfs_scrub_cancel_devid()
      btrfs_start_transaction_lflush()
      
      btrfs_print_tree() is left because it is used for debugging.
      btrfs_start_transaction_lflush() and btrfs_reada_detach() are
      left for symmetry.
      
      ulist.c functions are left, another patch will take care of those.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      48a3b636
    • T
      Btrfs: cleanup of function where fixup_low_keys() is called · afe5fea7
      Tsutomu Itoh 提交于
      If argument 'trans' is unnecessary in the function where
      fixup_low_keys() is called, 'trans' is deleted.
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      afe5fea7
    • J
      Btrfs: fix bad extent logging · 09a2a8f9
      Josef Bacik 提交于
      A user sent me a btrfs-image of a file system that was panicing on mount during
      the log recovery.  I had originally thought these problems were from a bug in
      the free space cache code, but that was just a symptom of the problem.  The
      problem is if your application does something like this
      
      [prealloc][prealloc][prealloc]
      
      the internal extent maps will merge those all together into one extent map, even
      though on disk they are 3 separate extents.  So if you go to write into one of
      these ranges the extent map will be right since we use the physical extent when
      doing the write, but when we log the extents they will use the wrong sizes for
      the remainder prealloc space.  If this doesn't happen to trip up the free space
      cache (which it won't in a lot of cases) then you will get bogus entries in your
      extent tree which will screw stuff up later.  The data and such will still work,
      but everything else is broken.  This patch fixes this by not allowing extents
      that are on the modified list to be merged.  This has the side effect that we
      are no longer adding everything to the modified list all the time, which means
      we now have to call btrfs_drop_extents every time we log an extent into the
      tree.  So this allows me to drop all this speciality code I was using to get
      around calling btrfs_drop_extents.  With this patch the testcase I've created no
      longer creates a bogus file system after replaying the log.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      09a2a8f9
    • J
      Btrfs: log ram bytes properly · cc95bef6
      Josef Bacik 提交于
      When logging changed extents I was logging ram_bytes as the current length,
      which isn't correct, it's supposed to be the ram bytes of the original extent.
      This is for compression where even if we split the extent we need to know the
      ram bytes so when we uncompress the extent we know how big it will be.  This was
      still working out right with compression for some reason but I think we were
      getting lucky.  It was definitely off for prealloc which is why I noticed it,
      btrfsck was complaining about it.  With this patch btrfsck no longer complains
      after a log replay.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      cc95bef6
  12. 10 4月, 2013 1 次提交
  13. 22 3月, 2013 1 次提交
  14. 16 3月, 2013 1 次提交
    • L
      Btrfs: fix warning of free_extent_map · 3b277594
      Liu Bo 提交于
      Users report that an extent map's list is still linked when it's actually
      going to be freed from cache.
      
      The story is that
      
      a) when we're going to drop an extent map and may split this large one into
      smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means
      that it's on the list to be logged, then the smaller ems split from it will also
      be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected.
      
      b) we'll keep ems from unlinking the list and freeing when they are flagged with
      EXTENT_FLAG_LOGGING, because the log code holds one reference.
      
      The end result is the warning, but the truth is that we set the flag
      EXTENT_FLAG_LOGGING only during fsync.
      
      So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one.
      Reported-by: NJohannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Reported-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      3b277594
  15. 27 2月, 2013 1 次提交
    • Q
      btrfs: cleanup for open-coded alignment · fda2832f
      Qu Wenruo 提交于
      Though most of the btrfs codes are using ALIGN macro for page alignment,
      there are still some codes using open-coded alignment like the
      following:
      ------
              u64 mask = ((u64)root->stripesize - 1);
              u64 ret = (val + mask) & ~mask;
      ------
      Or even hidden one:
      ------
              num_bytes = (end - start + blocksize) & ~(blocksize - 1);
      ------
      
      Sometimes these open-coded alignment is not so easy to understand for
      newbie like me.
      
      This commit changes the open-coded alignment to the ALIGN macro for a
      better readability.
      
      Also there is a previous patch from David Sterba with similar changes,
      but the patch is for 3.2 kernel and seems not merged.
      http://www.spinics.net/lists/linux-btrfs/msg12747.html
      
      Cc: David Sterba <dave@jikos.cz>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fda2832f
  16. 23 2月, 2013 1 次提交
  17. 21 2月, 2013 3 次提交
    • M
      Btrfs: fix remount vs autodefrag · dc81cdc5
      Miao Xie 提交于
      If we remount the fs to close the auto defragment or make the fs R/O,
      we should stop the auto defragment.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      dc81cdc5
    • J
      Btrfs: place ordered operations on a per transaction list · 569e0f35
      Josef Bacik 提交于
      Miao made the ordered operations stuff run async, which introduced a
      deadlock where we could get somebody (sync) racing in and committing the
      transaction while a commit was already happening.  The new committer would
      try and flush ordered operations which would hang waiting for the commit to
      finish because it is done asynchronously and no longer inherits the callers
      trans handle.  To fix this we need to make the ordered operations list a per
      transaction list.  We can get new inodes added to the ordered operation list
      by truncating them and then having another process writing to them, so this
      makes it so that anybody trying to add an ordered operation _must_ start a
      transaction in order to add itself to the list, which will keep new inodes
      from getting added to the ordered operations list after we start committing.
      This should fix the deadlock and also keeps us from doing a lot more work
      than we need to during commit.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      569e0f35
    • M
      Btrfs: use bit operation for ->fs_state · 87533c47
      Miao Xie 提交于
      There is no lock to protect fs_info->fs_state, it will introduce
      some problems, such as the value may be covered by the other task
      when several tasks modify it. For example:
      	Task0 - CPU0		Task1 - CPU1
      	mov %fs_state rax
      	or $0x1 rax
      				mov %fs_state rax
      				or $0x2 rax
      	mov rax %fs_state
      				mov rax %fs_state
      The expected value is 3, but in fact, it is 2.
      
      Though this problem doesn't happen now (because there is only one
      flag currently), the code is error prone, if we add other flags,
      the above problem will happen to a certainty.
      
      Now we use bit operation for it to fix the above problem.
      In this way, we can make the code more robust and be easy to
      add new flags.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      87533c47
  18. 20 2月, 2013 2 次提交
  19. 06 2月, 2013 2 次提交
    • L
      Btrfs: fix race between snapshot deletion and getting inode · 6f1c3605
      Liu Bo 提交于
      While running snapshot testscript created by Mitch and David,
      the race between autodefrag and snapshot deletion can lead to
      corruption of dead_root list so that we can get crash on
      btrfs_clean_old_snapshots().
      
      And besides autodefrag, scrub also does the same thing, ie. read
      root first and get inode.
      
      Here is the story(take autodefrag as an example):
      (1) when we delete a snapshot or subvolume, it will set its root's
      refs to zero and do a iput() on its own inode, and if this inode happens
      to be the only active in-meory one in root's inode rbtree, it will add
      itself to the global dead_roots list for later cleanup.
      
      (2) after (1), the autodefrag thread may read another inode for defrag
      and the inode is just in the deleted snapshot/subvolume, but all of these
      are without checking if the root is still valid(refs > 0).  So the end up
      result is adding the deleted snapshot/subvolume's root to the global
      dead_roots list AGAIN.
      
      Fortunately, we already have a srcu lock to avoid the race, ie. subvol_srcu.
      
      So all we need to do is to take the lock to protect 'read root and get inode',
      since we synchronize to wait for the rcu grace period before adding something
      to the global dead_roots list.
      Reported-by: NMitch Harder <mitch.harder@sabayonlinux.org>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      6f1c3605
    • M
      Btrfs: fix wrong sync_writers decrement in btrfs_file_aio_write() · 0a3404dc
      Miao Xie 提交于
      If the checks at the beginning of btrfs_file_aio_write() fail, we needn't
      decrease ->sync_writers, because we have not increased it. Fix it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      0a3404dc
  20. 15 1月, 2013 2 次提交
  21. 18 12月, 2012 1 次提交
  22. 17 12月, 2012 6 次提交
    • J
      Btrfs: do not call file_update_time in aio_write · 6c760c07
      Josef Bacik 提交于
      This starts a transaction and dirties the inode everytime we call it, which
      is super expensive if you have a write heavy workload.  We will be updating
      the inode when the IO completes and we reserve the space for the inode
      update when we reserve space for the write, so there is no chance of loss of
      information or enospc issues.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      6c760c07
    • J
      Btrfs: log changed inodes based on the extent map tree · 70c8a91c
      Josef Bacik 提交于
      We don't really need to copy extents from the source tree since we have all
      of the information already available to us in the extent_map tree.  So
      instead just write the extents straight to the log tree and don't bother to
      copy the extent items from the source tree.
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      70c8a91c
    • J
      Btrfs: keep track of the extents original block length · b4939680
      Josef Bacik 提交于
      If we've written to a prealloc extent we need to know the original block len
      for the extent.  We can't figure this out currently since ->block_len is
      just set to the extent length.  So introduce ->orig_block_len so that we
      know how many bytes were in the original extent for proper extent logging
      that future patches will need.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      b4939680
    • J
      Btrfs: inline csums if we're fsyncing · b812ce28
      Josef Bacik 提交于
      The tree logging stuff needs the csums to be on the ordered extents in order
      to log them properly, so mark that we're sync and inline the csum creation
      so we don't have to wait on the csumming to be done when logging extents
      that are still in flight.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      b812ce28
    • M
      Btrfs: punch hole past the end of the file · 7426cc04
      Miao Xie 提交于
      Since we can pre-allocate the space past EOF, we should be able to reclaim
      that space if we need. This patch implements it by removing the EOF check.
      
      Though the manual of fallocate command says we can use truncate command to
      reclaim the pre-allocated space which past EOF, but because truncate command
      changes the file size, we must run several commands to reclaim the space if we
      don't want to change the file size, so it is not a good choice.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      7426cc04
    • M
      Btrfs: fix the page that is beyond EOF · 0061280d
      Miao Xie 提交于
      Steps to reproduce:
       # mkfs.btrfs <disk>
       # mount <disk> <mnt>
       # dd if=/dev/zero of=<mnt>/<file> bs=512 seek=5 count=8
       # fallocate -p -o 2048 -l 16384 <mnt>/<file>
       # dd if=/dev/zero of=<mnt>/<file> bs=4096 seek=3 count=8 conv=notrunc,nocreat
       # umount <mnt>
       # dmesg
       WARNING: at fs/btrfs/inode.c:7140 btrfs_destroy_inode+0x2eb/0x330
      
      The reason is that we inputed a range which is beyond the end of the file. And
      because the end of this range was not page-aligned, we had to truncate the last
      page in this range, this operation is similar to a buffered file write. In other
      words, we reserved enough space and clear the data which was in the hole range
      on that page. But when we expanded that test file, write the data into the same
      page, we forgot that we have reserved enough space for the buffered write of
      that page because in most cases there is no page that is beyond the end of
      the file. As a result, we reserved the space twice.
      
      In fact, we needn't truncate the page if it is beyond the end of the file, just
      release the allocated space in that range. Fix the above problem by this way.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      0061280d