1. 12 11月, 2013 21 次提交
    • M
      Btrfs: improve jitter performance of the sequential buffered write · fa7c1494
      Miao Xie 提交于
      The performance was slowed down sometimes when we ran sysbench to measure
      the performance of the sequential buffered write by 2 or more threads.
      
      It was because the write order of the test threads might be confused
      by the task scheduler, and the coming write would be beyond the end of
      the file, in this case, we need insert dummy file extents and create
      a hole for the area we skip. But in order to avoid the ongoing ordered
      extents which are in the area, we need wait for them. Unfortunately,
      the current code doesn't check if there are ordered extents in the area
      or not, try to find and flush the dirty pages directly, but in fact,
      there is no dirty page in that area, this step of the current code is
      unnecessary, and just wastes time. Sometimes, it would increase
      the contention of some locks, and makes the performance slow down suddenly.
      
      So we remove the ordered extent flush function before the check, and flush
      the dirty pages and wait for the ordered extents only when we find them.
      
      According to my test, we got 1-2 times of the performance regression when
      we ran the test by 10 times before applying this patch. After applying
      this patch, the regression went away.
      
      Test Environment:
       CPU:		1CPU * 4Cores
       Memory:	6GB
       Partition:	20GB
      
      Test Command:
       # sysbench --test=fileio --file-total-size=16G --file-test-mode=seqwr \
       > --num-threads=512 --file-block-size=16384 --max-time=60 --max-requests=0 run
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      fa7c1494
    • M
      Btrfs: fix BUG_ON() casued by the reserved space migration · 20dd2cbf
      Miao Xie 提交于
      When we did space balance and snapshot creation at the same time, we might
      meet the following oops:
       kernel BUG at fs/btrfs/inode.c:3038!
       [SNIP]
       Call Trace:
       [<ffffffffa0411ec7>] btrfs_orphan_cleanup+0x293/0x407 [btrfs]
       [<ffffffffa042dc45>] btrfs_mksubvol.isra.28+0x259/0x373 [btrfs]
       [<ffffffffa042de85>] btrfs_ioctl_snap_create_transid+0x126/0x156 [btrfs]
       [<ffffffffa042dff1>] btrfs_ioctl_snap_create_v2+0xd0/0x121 [btrfs]
       [<ffffffffa0430b2c>] btrfs_ioctl+0x414/0x1854 [btrfs]
       [<ffffffff813b60b7>] ? __do_page_fault+0x305/0x379
       [<ffffffff811215a9>] vfs_ioctl+0x1d/0x39
       [<ffffffff81121d7c>] do_vfs_ioctl+0x32d/0x3e2
       [<ffffffff81057fe7>] ? finish_task_switch+0x80/0xb8
       [<ffffffff81121e88>] SyS_ioctl+0x57/0x83
       [<ffffffff813b39ff>] ? do_device_not_available+0x12/0x14
       [<ffffffff813b99c2>] system_call_fastpath+0x16/0x1b
       [SNIP]
       RIP  [<ffffffffa040da40>] btrfs_orphan_add+0xc3/0x126 [btrfs]
      
      The reason of the problem is that the relocation root creation stole
      the reserved space, which was reserved for orphan item deletion.
      
      There are several ways to fix this problem, one is to increasing
      the reserved space size of the space balace, and then we can use
      that space to create the relocation tree for each fs/file trees.
      But it is hard to calculate the suitable size because we doesn't
      know how many fs/file trees we need relocate.
      
      We fixed this problem by reserving the space for relocation root creation
      actively since the space it need is very small (one tree block, used for
      root node copy), then we use that reserved space to create the
      relocation tree. If we don't reserve space for relocation tree creation,
      we will use the reserved space of the balance.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      20dd2cbf
    • R
      btrfs: remove unused parameter from btrfs_header_fsid · 0a4e5586
      Ross Kirk 提交于
      Remove unused parameter, 'eb'. Unused since introduction in
      5f39d397
      
      Updated to be rebased against current upstream and correct diff supplied this time!
      Signed-off-by: NRoss Kirk <ross.kirk@gmail.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      0a4e5586
    • J
      Btrfs: fix two use-after-free bugs with transaction cleanup · 724e2315
      Josef Bacik 提交于
      I was noticing the slab redzone stuff going off every once and a while during
      transaction aborts.  This was caused by two things
      
      1) We would walk the pending snapshots and set their error to -ECANCELED.  We
      don't need to do this, the snapshot stuff waits for a transaction commit and if
      there is a problem we just free our pending snapshot object and exit.  Doing
      this was causing us to touch the pending snapshot object after the thing had
      already been freed.
      
      2) We were freeing the transaction manually with wanton disregard for it's
      use_count reference counter.  To fix this I cleaned up the transaction freeing
      loop to either wait for the transaction commit to finish if it was in the middle
      of that (since it will be cleaned and freed up there) or to do the cleanup
      oursevles.
      
      I also moved the global "kill all things dirty everywhere" stuff outside of the
      transaction cleanup loop since that only needs to be done once.  With this patch
      I'm no longer seeing slab corruption because of use after frees.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      724e2315
    • J
      Btrfs: remove all BUG_ON()'s from commit_cowonly_roots · c16ce190
      Josef Bacik 提交于
      Noticed this when forcing errors to happen during delayed ref running.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      c16ce190
    • J
      Btrfs: don't delete ordered roots from list during cleanup · 1de2cfde
      Josef Bacik 提交于
      During transaction cleanup after an abort we are just removing roots from the
      ordered roots list which is incorrect.  We have a BUG_ON() to make sure that the
      root is still part of the ordered roots list when we put our ordered extent
      which we were tripping in this case.  So do like we do everywhere else and just
      move it to the tail of the ordered roots list and allow the normal cleanup to
      take care of stuff.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      1de2cfde
    • J
      Btrfs: cleanup transaction on abort · 4e121c06
      Josef Bacik 提交于
      If we abort not during a transaction commit we won't clean up anything until we
      unmount.  Unfortunately if we abort in the middle of writing out an ordered
      extent we won't clean it up and if somebody is waiting on that ordered extent
      they will wait forever.  To fix this just make the transaction kthread call the
      cleanup transaction stuff if it notices theres an error, and make
      btrfs_end_transaction wake up the transaction kthread if there is an error.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      4e121c06
    • J
      Btrfs: do not release metadata for space cache inodes · b6d08f06
      Josef Bacik 提交于
      I've been testing our error paths and I was tripping the BUG_ON() in
      drop_outstanding_extent because our outstanding_extents is 0 for space cache
      inodes.  This is because we don't reserve metadata space for these inodes since
      we depend on the global block reserve for our space.  To fix this we need to
      make sure the DO_ACCOUNTING stuff doesn't actually call release_metadata for
      space cache inodes.  With this patch I'm no longer panicing.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      b6d08f06
    • J
      Btrfs: reset intwrite on transaction abort · e0228285
      Josef Bacik 提交于
      If we abort a transaction in the middle of a commit we weren't undoing the
      intwrite locking.  This patch fixes that problem.
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      e0228285
    • J
      Btrfs: relocate csums properly with prealloc extents · 4577b014
      Josef Bacik 提交于
      A user reported a problem where they were getting csum errors when running a
      balance and running systemd's journal.  This is because systemd is awesome and
      fallocate()'s its log space and writes into it.  Unfortunately we assume that
      when we read in all the csums for an extent that they are sequential starting at
      the bytenr we care about.  This obviously isn't the case for prealloc extents,
      where we could have written to the middle of the prealloc extent only, which
      means the csum would be for the bytenr in the middle of our range and not the
      front of our range.  Fix this by offsetting the new bytenr we are logging to
      based on the original bytenr the csum was for.  With this patch I no longer see
      the csum errors I was seeing.  Thanks,
      
      Cc: stable@vger.kernel.org
      Reported-by: NChris Murphy <lists@colorremedies.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      4577b014
    • F
      Btrfs: don't leak block group on error · e84cc142
      Filipe David Borba Manana 提交于
      In extent-tree.c:btrfs_write_dirty_block_groups(), if the call to
      write_one_cache_group() failed, we would return without putting
      the block group first.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      e84cc142
    • F
      Btrfs: fix sync fs to actually wait for all data to be persisted · 9b199859
      Filipe David Borba Manana 提交于
      Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
      wait for delayed work to finish before returning success to the
      caller. This change fixes this, ensuring that there's no data loss
      if a power failure happens right after fs sync returns success to
      the caller and before the next commit happens.
      
      Steps to reproduce the data loss issue:
      
      $ mkfs.btrfs -f /dev/sdb3
      $ mount /dev/sdb3 /mnt/btrfs
      $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
      
      Right after the btrfs fi sync command (a second or 2 for example), power
      off the machine and reboot it. The file will be empty, as it can be verified
      after mounting the filesystem and through btrfs-debug-tree:
      
      $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
              item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
                      location key (257 INODE_ITEM 0) type FILE
                      namelen 6 datalen 0 name: foobar
              item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
                      inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
              item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
                      inode ref index 2 namelen 6 name: foobar
      checksum tree key (CSUM_TREE ROOT_ITEM 0)
      leaf 29429760 items 0 free space 3995 generation 7 owner 7
      fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
      chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
      uuid tree key (UUID_TREE ROOT_ITEM 0)
      
      After this patch, the data loss no longer happens after a power failure and
      btrfs-debug-tree shows:
      
      $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
      	item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
      		location key (257 INODE_ITEM 0) type FILE
      		namelen 6 datalen 0 name: foobar
      	item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
      		inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
      	item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
      		inode ref index 2 namelen 6 name: foobar
      	item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
      		extent data disk byte 12845056 nr 8192
      		extent data offset 0 nr 8192 ram 8192
      		extent compression 0
      checksum tree key (CSUM_TREE ROOT_ITEM 0)
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      9b199859
    • F
      Btrfs: fix tracking of orphan inode count · 703c88e0
      Filipe David Borba Manana 提交于
      In inode.c:btrfs_orphan_add() if we failed to insert the orphan
      item, we would return without decrementing the orphan count that
      we just incremented before attempting the insertion, leaving the
      orphan inode count wrong.
      
      In inode.c:btrfs_orphan_del(), we were decrementing the inode
      orphan count if the bit BTRFS_INODE_ORPHAN_META_RESERVED was set,
      which is logically wrong because it should be decremented if the
      bit BTRFS_INODE_HAS_ORPHAN_ITEM was set - after all we increment
      the count when we set the bit BTRFS_INODE_HAS_ORPHAN_ITEM elsewhere.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      703c88e0
    • L
      Btrfs: export btrfs space shared info to userspace · fe09e16c
      Liu Bo 提交于
      Similar to ocfs2, btrfs also supports that extents can be shared by
      different inodes, and there are some userspace tools requesting
      for this kind of 'space shared infomation'.[1]
      
      ocfs2 uses flag FIEMAP_EXTENT_SHARED, so does btrfs.
      
      [1]: http://thr3ads.net/ocfs2-devel/2010/09/489052-PATCH-3-3-shared-du-using-fiemap-to-figure-up-the-shared-extents-per-file-and-the-footprint-inReviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      fe09e16c
    • F
      Btrfs: remove path arg from btrfs_truncate_free_space_cache · 74514323
      Filipe David Borba Manana 提交于
      Not used for anything, and removing it avoids caller's need to
      allocate a path structure.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      74514323
    • F
      Btrfs: remove duplicated ino cache's inode lookup · 53645a91
      Filipe David Borba Manana 提交于
      We're doing a unnecessary extra lookup of the ino cache's
      inode when we already have it (and holding a reference)
      during the process of saving the ino cache contents to disk.
      Therefore remove this extra lookup.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      53645a91
    • J
      Btrfs: do a full search everytime in btrfs_search_old_slot · d4b4087c
      Josef Bacik 提交于
      While running some snashot aware defrag tests I noticed I was panicing every
      once and a while in key_search.  This is because of the optimization that says
      if we find a key at slot 0 it will be at slot 0 all the way down the rest of the
      tree.  This isn't the case for btrfs_search_old_slot since it will likely replay
      changes to a buffer if something has changed since we took our sequence number.
      So short circuit this optimization by setting prev_cmp to -1 every time we call
      key_search so we will do our normal binary search.  With this patch I am no
      longer seeing the panics I was seeing before.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      d4b4087c
    • J
      Btrfs: add a sanity test for btrfs_split_item · 06ea65a3
      Josef Bacik 提交于
      While looking at somebodys corruption I became completely convinced that
      btrfs_split_item was broken, so I wrote this test to verify that it was working
      as it was supposed to.  Thankfully it appears to be working as intended, so just
      add this test to make sure nobody breaks it in the future.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      06ea65a3
    • R
      btrfs: drop unused parameter from btrfs_item_nr · dd3cc16b
      Ross Kirk 提交于
      Remove unused eb parameter from btrfs_item_nr
      Signed-off-by: NRoss Kirk <ross.kirk@gmail.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      dd3cc16b
    • F
      Btrfs: don't store NULL byte in symlink extents · f06becc4
      Filipe David Borba Manana 提交于
      It is not necessary to store the NULL byte in a symlink inline file
      extent. There's currently no code that requires the NULL byte to be
      present in the extent. This change also doesn't break file format
      compatibility nor the send/receive feature.
      
      The VFS also doesn't need the NULL byte to be present in the extent,
      as it reads up to inode->i_size bytes (which already excluded the NULL
      byte) and sets the NULL byte for us (in fs/namei.c:page_getlink()).
      
      So with this change we save 1 byte per symlink file extent (which is
      always inlined in the btree leaf) without losing backward and forward
      compatibility.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      f06becc4
    • S
      Btrfs: eliminate the exceptional root_tree refs=0 · 69e9c6c6
      Stefan Behrens 提交于
      The fact that btrfs_root_refs() returned 0 for the tree_root caused
      bugs in the past, therefore it is set to 1 with this patch and
      (hopefully) all affected code is adapted to this change.
      
      I verified this change by temporarily adding WARN_ON() checks
      everywhere where btrfs_root_refs() is used, checking whether the
      logic of the code is changed by btrfs_root_refs() returning 1
      instead of 0 for root->root_key.objectid == BTRFS_ROOT_TREE_OBJECTID.
      With these added checks, I ran the xfstests './check -g auto'.
      
      The two roots chunk_root and log_root_tree that are only referenced
      by the superblock and the log_roots below the log_root_tree still
      have btrfs_root_refs() == 0, only the tree_root is changed.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      69e9c6c6
  2. 04 11月, 2013 3 次提交
  3. 03 11月, 2013 2 次提交
  4. 02 11月, 2013 14 次提交