1. 02 7月, 2013 10 次提交
    • M
      Btrfs: fix oops when recovering the file data by scrub function · 26b25891
      Miao Xie 提交于
      We get oops while running btrfs replace start test,
      ------------[ cut here ]------------
      kernel BUG at mm/filemap.c:608!
      [SNIP]
      Call Trace:
        [<ffffffffa04b36c7>] copy_nocow_pages_for_inode+0x217/0x3f0 [btrfs]
        [<ffffffffa04b34b0>] ? scrub_print_warning_inode+0x230/0x230 [btrfs]
        [<ffffffffa04b34b0>] ? scrub_print_warning_inode+0x230/0x230 [btrfs]
        [<ffffffffa04bb8ce>] iterate_extent_inodes+0x1ae/0x300 [btrfs]
        [<ffffffffa04bbab2>] iterate_inodes_from_logical+0x92/0xb0 [btrfs]
        [<ffffffffa04b34b0>] ? scrub_print_warning_inode+0x230/0x230 [btrfs]
        [<ffffffffa04b3b07>] copy_nocow_pages_worker+0x97/0x150 [btrfs]
        [<ffffffffa048eed4>] worker_loop+0x134/0x540 [btrfs]
        [<ffffffff816274ea>] ? __schedule+0x3ca/0x7f0
        [<ffffffffa048eda0>] ? btrfs_queue_worker+0x300/0x300 [btrfs]
        [<ffffffff8106f2f0>] kthread+0xc0/0xd0
        [<ffffffff8106f230>] ? flush_kthread_worker+0x80/0x80
        [<ffffffff8163181c>] ret_from_fork+0x7c/0xb0
        [<ffffffff8106f230>] ? flush_kthread_worker+0x80/0x80
      [SNIP]
       RIP  [<ffffffff8111f4c5>] unlock_page+0x35/0x40
        RSP <ffff88010316bb98>
       ---[ end trace 421e79ad0dd72c7d ]---
      
      it is because we forgot to lock the page again after we read data to
      the page. Fix it.
      Signed-off-by: NLin Feng <linfeng@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      26b25891
    • J
      Btrfs: make the chunk allocator completely tree lockless · 6df9a95e
      Josef Bacik 提交于
      When adjusting the enospc rules for relocation I ran into a deadlock because we
      were relocating the only system chunk and that forced us to try and allocate a
      new system chunk while holding locks in the chunk tree, which caused us to
      deadlock.  To fix this I've moved all of the dev extent addition and chunk
      addition out to the delayed chunk completion stuff.  We still keep the in-memory
      stuff which makes sure everything is consistent.
      
      One change I had to make was to search the commit root of the device tree to
      find a free dev extent, and hold onto any chunk em's that we allocated in that
      transaction so we do not allocate the same dev extent twice.  This has the side
      effect of fixing a bug with balance that has been there ever since balance
      existed.  Basically you can free a block group and it's dev extent and then
      immediately allocate that dev extent for a new block group and write stuff to
      that dev extent, all within the same transaction.  So if you happen to crash
      during a balance you could come back to a completely broken file system.  This
      patch should keep these sort of things from happening in the future since we
      won't be able to allocate free'd dev extents until after the transaction
      commits.  This has passed all of the xfstests and my super annoying stress test
      followed by a balance.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      6df9a95e
    • J
      Btrfs: cleanup orphaned root orphan item · 68a7342c
      Josef Bacik 提交于
      I hit a weird problem were my root item had been deleted but the orphan item had
      not.  This isn't necessarily a problem, but it keeps the file system from being
      mounted.  To fix this we just need to axe the orphan item if we can't find the
      fs root when we're putting them altogether.  With this patch I was able to
      successfully mount my file system.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      68a7342c
    • M
      Btrfs: fix wrong mirror number tuning · a70c6172
      Miao Xie 提交于
      Now reading the data from the target device of the replace operation is allowed,
      so the mirror number that is greater than the stripes number of a chunk is valid,
      we will tune it when we find there is no target device later. Fix it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      a70c6172
    • M
      e6da5d2e
    • M
      Btrfs: remove btrfs_sector_sum structure · f51a4a18
      Miao Xie 提交于
      Using the structure btrfs_sector_sum to keep the checksum value is
      unnecessary, because the extents that btrfs_sector_sum points to are
      continuous, we can find out the expected checksums by btrfs_ordered_sum's
      bytenr and the offset, so we can remove btrfs_sector_sum's bytenr. After
      removing bytenr, there is only one member in the structure, so it makes
      no sense to keep the structure, just remove it, and use a u32 array to
      store the checksum value.
      
      By this change, we don't use the while loop to get the checksums one by
      one. Now, we can get several checksum value at one time, it improved the
      performance by ~74% on my SSD (31MB/s -> 54MB/s).
      
      test command:
       # dd if=/dev/zero of=/mnt/btrfs/file0 bs=1M count=1024 oflag=sync
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      f51a4a18
    • J
      Btrfs: check if we can nocow if we don't have data space · 7ee9e440
      Josef Bacik 提交于
      We always just try and reserve data space when we write, but if we are out of
      space but have prealloc'ed extents we should still successfully write.  This
      patch will try and see if we can write to prealloc'ed space and if we can go
      ahead and allow the write to continue.  With this patch we now pass xfstests
      generic/274.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      7ee9e440
    • J
      Btrfs: stop using try_to_writeback_inodes_sb_nr to flush delalloc · 925a6efb
      Josef Bacik 提交于
      try_to_writeback_inodes_sb_nr returns 1 if writeback is already underway, which
      is completely fraking useless for us as we need to make sure pages are actually
      written before we go and check if there are ordered extents.  So replace this
      with an open coding of try_to_writeback_inodes_sb_nr minus the writeback
      underway check so that we are sure to actually have flushed some dirty pages out
      and will have ordered extents to use.  With this patch xfstests generic/273 now
      passes.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      925a6efb
    • J
      Btrfs: use a percpu to keep track of possibly pinned bytes · b150a4f1
      Josef Bacik 提交于
      There are all of these checks in the ENOSPC code to see if committing the
      transaction would free up enough space to make the allocation.  This is because
      early on we just committed the transaction and hoped and prayed, which resulted
      in cases where it took _forever_ to get an ENOSPC when we really were out of
      space.  So we check space_info->bytes_pinned, except this isn't completely true
      because it doesn't account for space we may free but are stuck in delayed refs.
      So tests like xfstests 226 would fail because we wouldn't commit the transaction
      to free up the data space.  So instead add a percpu counter that will be a
      little fuzzier, it will add bytes as soon as we try to free up the space, and
      remove any space it doesn't actually free up when we get around to doing the
      actual free.  We then 0 out this counter every transaction period so we have a
      better idea of how much space we will actually free up by committing this
      transaction.  With this patch we now pass xfstests 226.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b150a4f1
    • J
      Btrfs: check for actual acls rather than just xattrs when caching no acl · f23b5a59
      Josef Bacik 提交于
      We have an optimization that will go ahead and cache no acls on an inode if
      there are no xattrs on the inode.  This saves us a lookup later to check the
      acls for writes or any other access.  The problem is I use selinux so I always
      have an xattr on inodes, so make this test a little smarter and check for the
      actual acl hash on the key and if it isn't there then we still get to cache no
      acl which makes everybody who uses selinux a little happier.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      f23b5a59
  2. 01 7月, 2013 13 次提交
    • J
      Btrfs: move btrfs_truncate_page to btrfs_cont_expand instead of btrfs_truncate · a71754fc
      Josef Bacik 提交于
      This has plagued us forever and I'm so over working around it.  When we truncate
      down to a non-page aligned offset we will call btrfs_truncate_page to zero out
      the end of the page and write it back to disk, this will keep us from exposing
      stale data if we truncate back up from that point.  The problem with this is it
      requires data space to do this, and people don't really expect to get ENOSPC
      from truncate() for these sort of things.  This also tends to bite the orphan
      cleanup stuff too which keeps people from mounting.  To get around this we can
      just move this into btrfs_cont_expand() to make sure if we are truncating up
      from a non-page size aligned i_size we will zero out the rest of this page so
      that we don't expose stale data.  This will give ENOSPC if you try to truncate()
      up or if you try to write past the end of isize, which is much more reasonable.
      This fixes xfstests generic/083 failing to mount because of the orphan cleanup
      failing.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      a71754fc
    • J
      Btrfs: optimize reada_for_balance · 0b08851f
      Josef Bacik 提交于
      This patch does two things.  First we no longer explicitly read in the blocks
      we're trying to readahead.  For things like balance_level we may never actually
      use the blocks so this just adds uneeded latency, and balance_level and
      split_node will both read in the blocks they care about explicitly so if the
      blocks need to be waited on it will be done there.  Secondly we no longer drop
      the path if we do readahead, we just set the path blocking before we call
      reada_for_balance() and then we're good to go.  Hopefully this will cut down on
      the number of re-searches.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      0b08851f
    • J
      Btrfs: optimize read_block_for_search · bdf7c00e
      Josef Bacik 提交于
      This patch does two things, first it only does one call to
      btrfs_buffer_uptodate() with the gen specified instead of once with 0 and then
      again with gen specified.  The other thing is to call btrfs_read_buffer() on the
      buffer we've found instead of dropping it and then calling read_tree_block().
      This will keep us from doing yet another radix tree lookup for a buffer we've
      already found.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      bdf7c00e
    • J
      Btrfs: unlock extent range on enospc in compressed submit · fdf8e2ea
      Josef Bacik 提交于
      A user reported a deadlock where the async submit thread was blocked on the
      lock_extent() lock, and then everybody behind him was locked on the page lock
      for the page he was holding.  Looking at the code I noticed we do not unlock the
      extent range when we get ENOSPC and goto retry.  This is bad because we
      immediately try to lock that range again to do the cow, which will cause a
      deadlock.  Fix this by unlocking the range.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fdf8e2ea
    • W
      Btrfs: fix the comment typo for btrfs_attach_transaction_barrier · 90b6d283
      Wang Sheng-Hui 提交于
      The comment is for btrfs_attach_transaction_barrier, not for
      btrfs_attach_transaction. Fix the typo.
      Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>
      Acked-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      90b6d283
    • J
      Btrfs: fix not being able to find skinny extents during relocate · aee68ee5
      Josef Bacik 提交于
      We unconditionally search for the EXTENT_ITEM_KEY for metadata during balance,
      and then check the key that we found to see if it is actually a
      METADATA_ITEM_KEY, but this doesn't work right because METADATA is a higher key
      value, so if what we are looking for happens to be the first item in the leaf
      the search will dump us out at the previous leaf, and we won't find our item.
      So instead do what we do everywhere else, search for the skinny extent first and
      if we don't find it go back and re-search for the extent item.  This patch fixes
      the panic I was hitting when balancing a large file system with skinny extents.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      aee68ee5
    • J
      Btrfs: cleanup backref search commit root flag stuff · da61d31a
      Josef Bacik 提交于
      Looking into this backref problem I noticed we're using a macro to what turns
      out to essentially be a NULL check to see if we need to search the commit root.
      I'm killing this, let's just do what everybody else does and checks if trans ==
      NULL.  I've also made it so we pass in the path to __resolve_indirect_refs which
      will have the search_commit_root flag set properly already and that way we can
      avoid allocating another path when we have a perfectly good one to use.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      da61d31a
    • J
      Btrfs: free csums when we're done scrubbing an extent · d88d46c6
      Josef Bacik 提交于
      A user reported scrub taking up an unreasonable amount of ram as it ran.  This
      is because we lookup the csums for the extent we're scrubbing but don't free it
      up until after we're done with the scrub, which means we can take up a whole lot
      of ram.  This patch fixes this by dropping the csums once we're done with the
      extent we've scrubbed.  The user reported this to fix their problem.  Thanks,
      Reported-and-tested-by: NRemco Hosman <remco@hosman.xs4all.nl>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      d88d46c6
    • J
      Btrfs: fix transaction throttling for delayed refs · 1be41b78
      Josef Bacik 提交于
      Dave has this fs_mark script that can make btrfs abort with sufficient amount of
      ram.  This is because with more ram we can keep more dirty metadata in cache
      which in a round about way makes for many more pending delayed refs.  What
      happens is we end up not throttling the transaction enough so when we go to
      commit the transaction when we've completely filled the file system we'll
      abort() because we use all of the space in the global reserve and we still have
      delayed refs to run.  To fix this we need to make the delayed ref flushing and
      the transaction throttling dependant upon the number of delayed refs that we
      have instead of how much reserved space is left in the global reserve.  With
      this patch we not only stop aborting transactions but we also get a smoother run
      speed with fs_mark and it makes us about 10% faster.  Thanks,
      Reported-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      1be41b78
    • J
      Btrfs: stop waiting on current trans if we aborted · 501407aa
      Josef Bacik 提交于
      I hit a hang when run_delayed_refs returned an error in the beginning of
      btrfs_commit_transaction.  If we decide we need to commit the transaction in
      btrfs_end_transaction we'll set BLOCKED and start to commit, but if we get an
      error this early on we'll just exit without committing.  This is fine, except
      that anybody else who tried to start a transaction will sit in
      wait_current_trans() since we're set to BLOCKED and we never set it to something
      else and woke people up.  To fix this we want to check for trans->aborted
      everywhere we wait for the transaction state to change, and make
      btrfs_abort_transaction() wake up any waiters there may be.  All the callers
      will notice that the transaction has aborted and exit out properly.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      501407aa
    • J
      Btrfs: wake up delayed ref flushing waiters on abort · f971fe29
      Josef Bacik 提交于
      I hit a deadlock because we aborted when flushing delayed refs but didn't wake
      any of the other flushers up and so everybody was just sleeping forever.  This
      should fix the problem.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      f971fe29
    • J
      btrfs: fix the code comments for LZO compression workspace · 3fb40375
      Jie Liu 提交于
      Fix the code comments for lzo compression workspace.
      The buf item is used to store the decompressed data
      and cbuf is used to store the compressed data.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      3fb40375
    • M
      Btrfs: fix broken nocow after balance · 5bc7247a
      Miao Xie 提交于
      Balance will create reloc_root for each fs root, and it's going to
      record last_snapshot to filter shared blocks.  The side effect of
      setting last_snapshot is to break nocow attributes of files.
      
      Since the extents are not shared by the relocation tree after the balance,
      we can recover the old last_snapshot safely if no one snapshoted the
      source tree. We fix the above problem by this way.
      Reported-by: NKyle Gates <kylegates@hotmail.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      5bc7247a
  3. 14 6月, 2013 17 次提交