1. 15 2月, 2014 1 次提交
    • L
      Btrfs: fix a lockdep warning when cleaning up aborted transaction · a9d2d4ad
      Liu Bo 提交于
      Given now we have 2 spinlock for management of delayed refs,
      CONFIG_DEBUG_SPINLOCK=y helped me find this,
      
      [ 4723.413809] BUG: spinlock wrong CPU on CPU#1, btrfs-transacti/2258
      [ 4723.414882]  lock: 0xffff880048377670, .magic: dead4ead, .owner: btrfs-transacti/2258, .owner_cpu: 2
      [ 4723.417146] CPU: 1 PID: 2258 Comm: btrfs-transacti Tainted: G        W  O 3.12.0+ #4
      [ 4723.421321] Call Trace:
      [ 4723.421872]  [<ffffffff81680fe7>] dump_stack+0x54/0x74
      [ 4723.422753]  [<ffffffff81681093>] spin_dump+0x8c/0x91
      [ 4723.424979]  [<ffffffff816810b9>] spin_bug+0x21/0x26
      [ 4723.425846]  [<ffffffff81323956>] do_raw_spin_unlock+0x66/0x90
      [ 4723.434424]  [<ffffffff81689bf7>] _raw_spin_unlock+0x27/0x40
      [ 4723.438747]  [<ffffffffa015da9e>] btrfs_cleanup_one_transaction+0x35e/0x710 [btrfs]
      [ 4723.443321]  [<ffffffffa015df54>] btrfs_cleanup_transaction+0x104/0x570 [btrfs]
      [ 4723.444692]  [<ffffffff810c1b5d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
      [ 4723.450336]  [<ffffffff810c1c2d>] ? trace_hardirqs_on+0xd/0x10
      [ 4723.451332]  [<ffffffffa015e5ee>] transaction_kthread+0x22e/0x270 [btrfs]
      [ 4723.452543]  [<ffffffffa015e3c0>] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
      [ 4723.457833]  [<ffffffff81079efa>] kthread+0xea/0xf0
      [ 4723.458990]  [<ffffffff81079e10>] ? kthread_create_on_node+0x140/0x140
      [ 4723.460133]  [<ffffffff81692aac>] ret_from_fork+0x7c/0xb0
      [ 4723.460865]  [<ffffffff81079e10>] ? kthread_create_on_node+0x140/0x140
      [ 4723.496521] ------------[ cut here ]------------
      
      ----------------------------------------------------------------------
      
      The reason is that we get to call cond_resched_lock(&head_ref->lock) while
      still holding @delayed_refs->lock.
      
      So it's different with __btrfs_run_delayed_refs(), where we do drop-acquire
      dance before and after actually processing delayed refs.
      
      Here we don't drop the lock, others are not able to add new delayed refs to
      head_ref, so cond_resched_lock(&head_ref->lock) is not necessary here.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a9d2d4ad
  2. 04 2月, 2014 1 次提交
  3. 29 1月, 2014 12 次提交
    • A
      btrfs: undo sysfs when open_ctree() fails · 2365dd3c
      Anand Jain 提交于
      reproducer:
      mkfs.btrfs -f /dev/sdb &&\
      mount /dev/sdb /btrfs &&\
      btrfs dev add -f /dev/sdc /btrfs &&\
      umount /btrfs &&\
      wipefs -a /dev/sdc &&\
      mount -o degraded /dev/sdb /btrfs
      //above mount fails so try with RO
      mount -o degraded,ro /dev/sdb /btrfs
      
      ------
      sysfs: cannot create duplicate filename '/fs/btrfs/3f48c79e-5ed0-4e87-b189-86e749e503f4'
      ::
      
      dump_stack+0x49/0x5e
      warn_slowpath_common+0x87/0xb0
      warn_slowpath_fmt+0x41/0x50
      strlcat+0x69/0x80
      sysfs_warn_dup+0x87/0xa0
      sysfs_add_one+0x40/0x50
      create_dir+0x76/0xc0
      sysfs_create_dir_ns+0x7a/0xc0
      kobject_add_internal+0xad/0x220
      kobject_add_varg+0x38/0x60
      kobject_init_and_add+0x53/0x70
      mutex_lock+0x11/0x40
      __free_pages+0x25/0x30
      free_pages+0x41/0x50
      selinux_sb_copy_data+0x14e/0x1e0
      mount_fs+0x3e/0x1a0
      vfs_kern_mount+0x71/0x120
      do_mount+0x3f7/0x980
      SyS_mount+0x8b/0xe0
      system_call_fastpath+0x16/0x1b
      ------
      
      further 'modprobe -r btrfs' fails as well
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      2365dd3c
    • L
      Btrfs: fix extent state leak on transaction abortion · 1a4319cc
      Liu Bo 提交于
      When transaction is aborted, we fail to commit transaction, instead we do
      cleanup work.  After that when we umount btrfs, we get to free fs roots' log
      trees respectively, but that happens after we unpin extents, so those extents
      pinned by freeing log trees will remain in memory and lead to the leak.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      1a4319cc
    • Q
      btrfs: Add noinode_cache mount option · 3818aea2
      Qu Wenruo 提交于
      Add noinode_cache mount option for btrfs.
      
      Since inode map cache involves all the btrfs_find_free_ino/return_ino
      things and if just trigger the mount_opt,
      an inode number get from inode map cache will not returned to inode map
      cache.
      
      To keep the find and return inode both in the same behavior,
      a new bit in mount_opt, CHANGE_INODE_CACHE, is introduced for this idea.
      CHANGE_INODE_CACHE is set/cleared in remounting, and the original
      INODE_MAP_CACHE is set/cleared according to CHANGE_INODE_CACHE after a
      success transaction.
      Since find/return inode is all done between btrfs_start_transaction and
      btrfs_commit_transaction, this will keep consistent behavior.
      
      Also noinode_cache mount option will not stop the caching_kthread.
      
      Cc: David Sterba <dsterba@suse.cz>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3818aea2
    • J
      Btrfs: throttle delayed refs better · 0a2b2a84
      Josef Bacik 提交于
      On one of our gluster clusters we noticed some pretty big lag spikes.  This
      turned out to be because our transaction commit was taking like 3 minutes to
      complete.  This is because we have like 30 gigs of metadata, so our global
      reserve would end up being the max which is like 512 mb.  So our throttling code
      would allow a ridiculous amount of delayed refs to build up and then they'd all
      get run at transaction commit time, and for a cold mounted file system that
      could take up to 3 minutes to run.  So fix the throttling to be based on both
      the size of the global reserve and how long it takes us to run delayed refs.
      This patch tracks the time it takes to run delayed refs and then only allows 1
      seconds worth of outstanding delayed refs at a time.  This way it will auto-tune
      itself from cold cache up to when everything is in memory and it no longer has
      to go to disk.  This makes our transaction commits take much less time to run.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      0a2b2a84
    • J
      Btrfs: attach delayed ref updates to delayed ref heads · d7df2c79
      Josef Bacik 提交于
      Currently we have two rb-trees, one for delayed ref heads and one for all of the
      delayed refs, including the delayed ref heads.  When we process the delayed refs
      we have to hold onto the delayed ref lock for all of the selecting and merging
      and such, which results in quite a bit of lock contention.  This was solved by
      having a waitqueue and only one flusher at a time, however this hurts if we get
      a lot of delayed refs queued up.
      
      So instead just have an rb tree for the delayed ref heads, and then attach the
      delayed ref updates to an rb tree that is per delayed ref head.  Then we only
      need to take the delayed ref lock when adding new delayed refs and when
      selecting a delayed ref head to process, all the rest of the time we deal with a
      per delayed ref head lock which will be much less contentious.
      
      The locking rules for this get a little more complicated since we have to lock
      up to 3 things to properly process delayed refs, but I will address that problem
      later.  For now this passes all of xfstests and my overnight stress tests.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d7df2c79
    • W
      Btrfs: only fua the first superblock when writting supers · e8117c26
      Wang Shilong 提交于
      We only intent to fua the first superblock in every device from
      comments, fix it.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e8117c26
    • F
      Btrfs: convert printk to btrfs_ and fix BTRFS prefix · efe120a0
      Frank Holton 提交于
      Convert all applicable cases of printk and pr_* to the btrfs_* macros.
      
      Fix all uses of the BTRFS prefix.
      Signed-off-by: NFrank Holton <fholton@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      efe120a0
    • J
      Btrfs: move the extent buffer radix tree into the fs_info · f28491e0
      Josef Bacik 提交于
      I need to create a fake tree to test qgroups and I don't want to have to setup a
      fake btree_inode.  The fact is we only use the radix tree for the fs_info, so
      everybody else who allocates an extent_io_tree is just wasting the space anyway.
      This patch moves the radix tree and its lock into btrfs_fs_info so there is less
      stuff I have to fake to do qgroup sanity tests.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      f28491e0
    • K
      btrfs: expand btrfs_find_item() to include find_orphan_item functionality · 3f870c28
      Kelley Nielsen 提交于
      This is the third step in bootstrapping the btrfs_find_item interface.
      The function find_orphan_item(), in orphan.c, is similar to the two
      functions already replaced by the new interface. It uses two parameters,
      which are already present in the interface, and is nearly identical to
      the function brought in in the previous patch.
      
      Replace the two calls to find_orphan_item() with calls to
      btrfs_find_item(), with the defined objectid and type that was used
      internally by find_orphan_item(), a null path, and a null key. Add a
      test for a null path to btrfs_find_item, and if it passes, allocate and
      free the path. Finally, remove find_orphan_item().
      Signed-off-by: NKelley Nielsen <kelleynnn@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3f870c28
    • V
      btrfs: remove unused variables from disk-io.c · 71db2a77
      Valentina Giusti 提交于
      Remove unused variables:
      * tree from csum_dirty_buffer,
      * tree from btree_readpage_end_io_hook,
      * tree from btree_writepages,
      * bytenr from btrfs_create_tree,
      * fs_info from end_workqueue_fn.
      Signed-off-by: NValentina Giusti <valentina.giusti@microon.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      71db2a77
    • J
      btrfs: publish per-super attributes in sysfs · 5ac1d209
      Jeff Mahoney 提交于
      This patch adds per-super attributes to sysfs.
      
      It doesn't publish any attributes yet, but does the proper lifetime
      handling as well as the basic infrastructure to add new attributes.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5ac1d209
    • L
      Btrfs: introduce a head ref rbtree · c46effa6
      Liu Bo 提交于
      The way how we process delayed refs is
      1) get a bunch of head refs,
      2) pick up one head ref,
      3) go one node back for any delayed ref updates.
      
      The head ref is also linked in the same rbtree as the delayed ref is,
      so in 1) stage, we have to walk one by one including not only head refs, but
      delayed refs.
      
      When we have a great number of delayed refs pending to process,
      this'll cost time a lot.
      
      Here we introduce a head ref specific rbtree, it only has head refs, so troubles
      go away.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c46effa6
  4. 04 12月, 2013 1 次提交
  5. 24 11月, 2013 1 次提交
    • K
      block: Convert various code to bio_for_each_segment() · 2c30c71b
      Kent Overstreet 提交于
      With immutable biovecs we don't want code accessing bi_io_vec directly -
      the uses this patch changes weren't incorrect since they all own the
      bio, but it makes the code harder to audit for no good reason - also,
      this will help with multipage bvecs later.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      2c30c71b
  6. 21 11月, 2013 1 次提交
    • L
      Btrfs: avoid heavy operations in btrfs_commit_super · d52c1bcc
      Liu Bo 提交于
      The 'git blame' history shows that, the old transaction commit code has to do
      twice to ensure roots are updated and we have to flush metadata and super block
      manually, however, right now all of these can be handled well inside
      the transaction commit code without extra efforts.
      
      And the error handling part remains same with the current code, -- 'return to
      caller once we get error'.
      
      This saves us a transaction commit and a flush of super block, which are both
      heavy operations according to ftrace output analysis.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      d52c1bcc
  7. 12 11月, 2013 16 次提交
  8. 11 10月, 2013 1 次提交
    • M
      Btrfs: fix oops caused by the space balance and dead roots · c00869f1
      Miao Xie 提交于
      When doing space balance and subvolume destroy at the same time, we met
      the following oops:
      
      kernel BUG at fs/btrfs/relocation.c:2247!
      RIP: 0010: [<ffffffffa04cec16>] prepare_to_merge+0x154/0x1f0 [btrfs]
      Call Trace:
       [<ffffffffa04b5ab7>] relocate_block_group+0x466/0x4e6 [btrfs]
       [<ffffffffa04b5c7a>] btrfs_relocate_block_group+0x143/0x275 [btrfs]
       [<ffffffffa0495c56>] btrfs_relocate_chunk.isra.27+0x5c/0x5a2 [btrfs]
       [<ffffffffa0459871>] ? btrfs_item_key_to_cpu+0x15/0x31 [btrfs]
       [<ffffffffa048b46a>] ? btrfs_get_token_64+0x7e/0xcd [btrfs]
       [<ffffffffa04a3467>] ? btrfs_tree_read_unlock_blocking+0xb2/0xb7 [btrfs]
       [<ffffffffa049907d>] btrfs_balance+0x9c7/0xb6f [btrfs]
       [<ffffffffa049ef84>] btrfs_ioctl_balance+0x234/0x2ac [btrfs]
       [<ffffffffa04a1e8e>] btrfs_ioctl+0xd87/0x1ef9 [btrfs]
       [<ffffffff81122f53>] ? path_openat+0x234/0x4db
       [<ffffffff813c3b78>] ? __do_page_fault+0x31d/0x391
       [<ffffffff810f8ab6>] ? vma_link+0x74/0x94
       [<ffffffff811250f5>] vfs_ioctl+0x1d/0x39
       [<ffffffff811258c8>] do_vfs_ioctl+0x32d/0x3e2
       [<ffffffff811259d4>] SyS_ioctl+0x57/0x83
       [<ffffffff813c3bfa>] ? do_page_fault+0xe/0x10
       [<ffffffff813c73c2>] system_call_fastpath+0x16/0x1b
      
      It is because we returned the error number if the reference of the root was 0
      when doing space relocation. It was not right here, because though the root
      was dead(refs == 0), but the space it held still need be relocated, or we
      could not remove the block group. So in this case, we should return the root
      no matter it is dead or not.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      c00869f1
  9. 21 9月, 2013 2 次提交
  10. 01 9月, 2013 4 次提交
    • J
      Btrfs: don't use an async starter for most of our workers · 45d5fd14
      Josef Bacik 提交于
      We only need an async starter if we can't make a GFP_NOFS allocation in our
      current path.  This is the case for the endio stuff since it happens in IRQ
      context, but things like the caching thread workers and the delalloc flushers we
      can easily make this allocation and start threads right away.  Also change the
      worker count for the caching thread pool.  Traditionally we limited this to 2
      since we took read locks while caching, but nowadays we do this lockless so
      there's no reason to limit the number of caching threads.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      45d5fd14
    • S
      Btrfs: fix for patch "cleanup: don't check the same thing twice" · 48475471
      Stefan Behrens 提交于
      Mitch Harder noticed that the patch 3c64a1ab mentioned in the subject
      line was causing a kernel BUG() on snapshot deletion.
      
      The patch was wrong. It did not handle cached roots correctly. The
      check for root_refs == 0 was removed everywhere where
      btrfs_read_fs_root_no_name() had been used to retrieve the root,
      because this check was already dealt with in
      btrfs_read_fs_root_no_name(). But in the case when the root was
      found in the cache, there was no such check.
      
      This patch adds the missing check in the case where the root is
      found in the cache.
      Reported-by: NMitch Harder <mitch.harder@sabayonlinux.org>
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      48475471
    • S
      Btrfs: get rid of one BUG() in write_all_supers() · 9d565ba4
      Stefan Behrens 提交于
      The second round uses btrfs_error() and return -EIO, the first round
      can handle write errors the same way.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      9d565ba4
    • F
      Btrfs: fix race between removing a dev and writing sbs · d7306801
      Filipe David Borba Manana 提交于
      This change fixes an issue when removing a device and writing
      all super blocks run simultaneously. Here's the steps necessary
      for the issue to happen:
      
      1) disk-io.c:write_all_supers() gets a number of N devices from the
         super_copy, so it will not panic if it fails to write super blocks
         for N - 1 devices;
      
      2) Then it tries to acquire the device_list_mutex, but blocks because
         volumes.c:btrfs_rm_device() got it first;
      
      3) btrfs_rm_device() removes the device from the list, then unlocks the
         mutex and after the unlock it updates the number of devices in
         super_copy to N - 1.
      
      4) write_all_supers() finally acquires the mutex, iterates over all the
         devices in the list and gets N - 1 errors, that is, it failed to write
         super blocks to all the devices;
      
      5) Because write_all_supers() thinks there are a total of N devices, it
         considers N - 1 errors to be ok, and therefore won't panic.
      
      So this change just makes sure that write_all_supers() reads the number
      of devices from super_copy after it acquires the device_list_mutex.
      Conversely, it changes btrfs_rm_device() to update the number of devices
      in super_copy before it releases the device list mutex.
      
      The code path to add a new device (volumes.c:btrfs_init_new_device),
      already has the right behaviour: it updates the number of devices in
      super_copy while holding the device_list_mutex.
      
      The only code path that doesn't lock the device list mutex
      before updating the number of devices in the super copy is
      disk-io.c:next_root_backup(), called by open_ctree() during
      mount time where concurrency issues can't happen.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      d7306801