1. 12 11月, 2013 3 次提交
  2. 01 9月, 2013 3 次提交
  3. 29 6月, 2013 1 次提交
  4. 14 6月, 2013 1 次提交
  5. 07 5月, 2013 2 次提交
  6. 07 3月, 2013 1 次提交
    • C
      Btrfs: improve the delayed inode throttling · de3cb945
      Chris Mason 提交于
      The delayed inode code batches up changes to the btree in hopes of doing
      them in bulk.  As the changes build up, processes kick off worker
      threads and wait for them to make progress.
      
      The current code kicks off an async work queue item for each delayed
      node, which creates a lot of churn.  It also uses a fixed 1 HZ waiting
      period for the throttle, which allows us to build a lot of pending
      work and can slow down the commit.
      
      This changes us to watch a sequence counter as it is bumped during the
      operations.  We kick off fewer work items and have each work item do
      more work.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      de3cb945
  7. 21 2月, 2013 1 次提交
  8. 20 2月, 2013 2 次提交
  9. 13 12月, 2012 1 次提交
  10. 12 12月, 2012 1 次提交
    • M
      Btrfs: improve the noflush reservation · 08e007d2
      Miao Xie 提交于
      In some places(such as: evicting inode), we just can not flush the reserved
      space of delalloc, flushing the delayed directory index and delayed inode
      is OK, but we don't try to flush those things and just go back when there is
      no enough space to be reserved. This patch fixes this problem.
      
      We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL.
      If we can in the transaction, we should not flush anything, or the deadlock
      would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc
      would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used,
      and we will flush all things.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      08e007d2
  11. 02 10月, 2012 2 次提交
  12. 21 9月, 2012 1 次提交
  13. 29 8月, 2012 2 次提交
  14. 24 7月, 2012 2 次提交
    • L
      Btrfs: zero unused bytes in inode item · 293f7e07
      Li Zefan 提交于
      The otime field is not zeroed, so users will see random otime in an old
      filesystem with a new kernel which has otime support in the future.
      
      The reserved bytes are also not zeroed, and we'll have compatibility
      issue if we make use of those bytes.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      293f7e07
    • J
      Btrfs: flush delayed inodes if we're short on space · 96c3f433
      Josef Bacik 提交于
      Those crazy gentoo guys have been complaining about ENOSPC errors on their
      portage volumes.  This is because doing things like untar tends to create
      lots of new files which will soak up all the reservation space in the
      delayed inodes.  Usually this gets papered over by the fact that we will try
      and commit the transaction, however if this happens in the wrong spot or we
      choose not to commit the transaction you will be screwed.  So add the
      ability to expclitly flush delayed inodes to free up space.  Please test
      this out guys to make sure it works since as usual I cannot reproduce.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      96c3f433
  15. 15 6月, 2012 1 次提交
  16. 30 5月, 2012 2 次提交
    • J
      Btrfs: convert the inode bit field to use the actual bit operations · 72ac3c0d
      Josef Bacik 提交于
      Miao pointed this out while I was working on an orphan problem that messing
      with a bitfield where different ranges are protected by different locks
      doesn't work out right.  Turns out we've been doing this forever where we
      have different parts of the bit field protected by either no lock at all or
      different locks which could cause all sorts of weird problems including the
      issue I was hitting.  So instead make a runtime_flags thing that we use the
      normal bit operations on that are all atomic so we can keep having our
      no/different locking for the different flags and then make force_compress
      it's own thing so it can be treated normally.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      72ac3c0d
    • J
      Btrfs: use i_version instead of our own sequence · 0c4d2d95
      Josef Bacik 提交于
      We've been keeping around the inode sequence number in hopes that somebody
      would use it, but nobody uses it and people actually use i_version which
      serves the same purpose, so use i_version where we used the incore inode's
      sequence number and that way the sequence is updated properly across the
      board, and not just in file write.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      0c4d2d95
  17. 22 3月, 2012 2 次提交
  18. 17 1月, 2012 1 次提交
  19. 16 12月, 2011 1 次提交
  20. 11 11月, 2011 1 次提交
    • C
      Btrfs: tweak the delayed inode reservations again · 2115133f
      Chris Mason 提交于
      Josef sent along an incremental to the inode reservation
      code to make sure we try and fall back to directly updating
      the inode item if things go horribly wrong.
      
      This reworks that patch slightly, adding a fallback function
      that will always try to update the inode item directly without
      going through the delayed_inode code.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      2115133f
  21. 09 11月, 2011 1 次提交
    • J
      Btrfs: fix our reservations for updating an inode when completing io · 7fd2ae21
      Josef Bacik 提交于
      People have been reporting ENOSPC crashes in finish_ordered_io.  This is because
      we try to steal from the delalloc block rsv to satisfy a reservation to update
      the inode.  The problem with this is we don't explicitly save space for updating
      the inode when doing delalloc.  This is kind of a problem and we've gotten away
      with this because way back when we just stole from the delalloc reserve without
      any questions, and this worked out fine because generally speaking the leaf had
      been modified either by the mtime update when we did the original write or
      because we just updated the leaf when we inserted the file extent item, only on
      rare occasions had the leaf not actually been modified, and that was still ok
      because we'd just use a block or two out of the over-reservation that is
      delalloc.
      
      Then came the delayed inode stuff.  This is amazing, except it wants a full
      reservation for updating the inode since it may do it at some point down the
      road after we've written the blocks and we have to recow everything again.  This
      worked out because the delayed inode stuff just stole from the global reserve,
      that is until recently when I changed that because it caused other problems.
      
      So here we are, we're doing everything right and being screwed for it.  So take
      an extra reservation for the inode at delalloc reservation time and carry it
      through the life of the delalloc reservation.  If we need it we can steal it in
      the delayed inode stuff.  If we have already stolen it try and do a normal
      metadata reservation.  If that fails try to steal from the delalloc reservation.
      If _that_ fails we'll get a WARN_ON() so I can start thinking of a better way to
      solve this and in the meantime we'll steal from the global reserve.
      
      With this patch I ran xfstests 13 in a loop for a couple of hours and didn't see
      any problems.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7fd2ae21
  22. 06 11月, 2011 2 次提交
    • J
      Btrfs: fix delayed insertion reservation · c06a0e12
      Josef Bacik 提交于
      We all keep getting those stupid warnings from use_block_rsv when running
      stress.sh, and it's because the delayed insertion stuff is being stupid.  It's
      not the delayed insertion stuffs fault, it's all just stupid.  When marking an
      inode dirty for oh say updating the time on it, we just do a
      btrfs_join_transaction, which doesn't reserve any space.  This is stupid because
      we're going to have to have space reserve to make this change, but we do it
      because it's fast because chances are we're going to call it over and over again
      and it doesn't matter.  Well thanks to the delayed insertion stuff this is
      mostly the case, so we do actually need to make this reservation.  So if
      trans->bytes_reserved is 0 then try to do a normal reservation.  If not return
      ENOSPC which will make the btrfs_dirty_inode start a proper transaction which
      will let it do the whole ENOSPC dance and reserve enough space for the delayed
      insertion to steal the reservation from the transaction.
      
      The other stupid thing we do is not reserve space for the inode when writing to
      the thing.  Usually this is ok since we have to update the time so we'd have
      already done all this work before we get to the endio stuff, so it doesn't
      matter.  But this is stupid because we could write the data after the
      transaction commits where we changed the mtime of the inode so we have to cow
      all the way down to the inode anyway.  This used to be masked by the delalloc
      reservation stuff, but because we delay the update it doesn't get masked in this
      case.  So again the delayed insertion stuff bites us in the ass.  So if our
      trans->block_rsv is delalloc, just steal the reservation from the delalloc
      reserve.  Hopefully this won't bite us in the ass, but I've said that before.
      
      With this patch stress.sh no longer spits out those stupid warnings (famous last
      words).  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      c06a0e12
    • J
      Btrfs: make a delayed_block_rsv for the delayed item insertion · 6d668dda
      Josef Bacik 提交于
      I've been hitting warnings in use_block_rsv when running the delayed insertion
      stuff.  It's because we will readjust global block rsv based on what is in use,
      which means we could end up discarding reservations that are for the delayed
      insertion stuff.  So instead create a seperate block rsv for the delayed
      insertion stuff.  This will also make it easier to debug problems with the
      delayed insertion reservations since we will know that only the delayed
      insertion code touches this block_rsv.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6d668dda
  23. 02 11月, 2011 1 次提交
  24. 28 7月, 2011 1 次提交
    • C
      Btrfs: switch the btrfs tree locks to reader/writer · bd681513
      Chris Mason 提交于
      The btrfs metadata btree is the source of significant
      lock contention, especially in the root node.   This
      commit changes our locking to use a reader/writer
      lock.
      
      The lock is built on top of rw spinlocks, and it
      extends the lock tracking to remember if we have a
      read lock or a write lock when we go to blocking.  Atomics
      count the number of blocking readers or writers at any
      given time.
      
      It removes all of the adaptive spinning from the old code
      and uses only the spinning/blocking hints inside of btrfs
      to decide when it should continue spinning.
      
      In read heavy workloads this is dramatically faster.  In write
      heavy workloads we're still faster because of less contention
      on the root node lock.
      
      We suffer slightly in dbench because we schedule more often
      during write locks, but all other benchmarks so far are improved.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      bd681513
  25. 27 6月, 2011 1 次提交
    • M
      btrfs: fix inconsonant inode information · 2f7e33d4
      Miao Xie 提交于
      When iputting the inode, We may leave the delayed nodes if they have some
      delayed items that have not been dealt with. So when the inode is read again,
      we must look up the relative delayed node, and use the information in it to
      initialize the inode. Or we will get inconsonant inode information, it may
      cause that the same directory index number is allocated again, and hit the
      following oops:
      
      [ 5447.554187] err add delayed dir index item(name: pglog_0.965_0) into the
      insertion tree of the delayed node(root id: 262, inode id: 258, errno: -17)
      [ 5447.569766] ------------[ cut here ]------------
      [ 5447.575361] kernel BUG at fs/btrfs/delayed-inode.c:1301!
      [SNIP]
      [ 5447.790721] Call Trace:
      [ 5447.793191]  [<ffffffffa0641c4e>] btrfs_insert_dir_item+0x189/0x1bb [btrfs]
      [ 5447.800156]  [<ffffffffa0651a45>] btrfs_add_link+0x12b/0x191 [btrfs]
      [ 5447.806517]  [<ffffffffa0651adc>] btrfs_add_nondir+0x31/0x58 [btrfs]
      [ 5447.812876]  [<ffffffffa0651d6a>] btrfs_create+0xf9/0x197 [btrfs]
      [ 5447.818961]  [<ffffffff8111f840>] vfs_create+0x72/0x92
      [ 5447.824090]  [<ffffffff8111fa8c>] do_last+0x22c/0x40b
      [ 5447.829133]  [<ffffffff8112076a>] path_openat+0xc0/0x2ef
      [ 5447.834438]  [<ffffffff810c58e2>] ? __perf_event_task_sched_out+0x24/0x44
      [ 5447.841216]  [<ffffffff8103ecdd>] ? perf_event_task_sched_out+0x59/0x67
      [ 5447.847846]  [<ffffffff81121a79>] do_filp_open+0x3d/0x87
      [ 5447.853156]  [<ffffffff811e126c>] ? strncpy_from_user+0x43/0x4d
      [ 5447.859072]  [<ffffffff8111f1f5>] ? getname_flags+0x2e/0x80
      [ 5447.864636]  [<ffffffff8111f179>] ? do_getname+0x14b/0x173
      [ 5447.870112]  [<ffffffff8111f1b7>] ? audit_getname+0x16/0x26
      [ 5447.875682]  [<ffffffff8112b1ab>] ? spin_lock+0xe/0x10
      [ 5447.880882]  [<ffffffff81112d39>] do_sys_open+0x69/0xae
      [ 5447.886153]  [<ffffffff81112db1>] sys_open+0x20/0x22
      [ 5447.891114]  [<ffffffff813b9aab>] system_call_fastpath+0x16/0x1b
      
      Fix it by reusing the old delayed node.
      Reported-by: NJim Schutt <jaschut@sandia.gov>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Tested-by: NJim Schutt <jaschut@sandia.gov>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      2f7e33d4
  26. 18 6月, 2011 2 次提交
    • C
      Btrfs: avoid delayed metadata items during commits · e999376f
      Chris Mason 提交于
      Snapshot creation has two phases.  One is the initial snapshot setup,
      and the second is done during commit, while nobody is allowed to modify
      the root we are snapshotting.
      
      The delayed metadata insertion code can break that rule, it does a
      delayed inode update on the inode of the parent of the snapshot,
      and delayed directory item insertion.
      
      This makes sure to run the pending delayed operations before we
      record the snapshot root, which avoids corruptions.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      e999376f
    • M
      btrfs: fix wrong reservation when doing delayed inode operations · 19fd2949
      Miao Xie 提交于
      We have migrated the space for the delayed inode items from
      trans_block_rsv to global_block_rsv, but we forgot to set trans->block_rsv to
      global_block_rsv when we doing delayed inode operations, and the following Oops
      happened:
      
      [ 9792.654889] ------------[ cut here ]------------
      [ 9792.654898] WARNING: at fs/btrfs/extent-tree.c:5681
      btrfs_alloc_free_block+0xca/0x27c [btrfs]()
      [ 9792.654899] Hardware name: To Be Filled By O.E.M.
      [ 9792.654900] Modules linked in: btrfs zlib_deflate libcrc32c
      ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
      arc4 rt61pci rt2x00pci rt2x00lib snd_hda_codec_hdmi mac80211
      snd_hda_codec_realtek cfg80211 snd_hda_intel edac_core snd_seq rfkill
      pcspkr serio_raw snd_hda_codec eeprom_93cx6 edac_mce_amd sp5100_tco
      i2c_piix4 k10temp snd_hwdep snd_seq_device snd_pcm floppy r8169 xhci_hcd
      mii snd_timer snd soundcore snd_page_alloc ipv6 firewire_ohci pata_acpi
      ata_generic firewire_core pata_via crc_itu_t radeon ttm drm_kms_helper
      drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
      [ 9792.654919] Pid: 2762, comm: rm Tainted: G        W   2.6.39+ #1
      [ 9792.654920] Call Trace:
      [ 9792.654922]  [<ffffffff81053c4a>] warn_slowpath_common+0x83/0x9b
      [ 9792.654925]  [<ffffffff81053c7c>] warn_slowpath_null+0x1a/0x1c
      [ 9792.654933]  [<ffffffffa038e747>] btrfs_alloc_free_block+0xca/0x27c [btrfs]
      [ 9792.654945]  [<ffffffffa03b8562>] ? map_extent_buffer+0x6e/0xa8 [btrfs]
      [ 9792.654953]  [<ffffffffa038189b>] __btrfs_cow_block+0xfc/0x30c [btrfs]
      [ 9792.654963]  [<ffffffffa0396aa6>] ? btrfs_buffer_uptodate+0x47/0x58 [btrfs]
      [ 9792.654970]  [<ffffffffa0382e48>] ? read_block_for_search+0x94/0x368 [btrfs]
      [ 9792.654978]  [<ffffffffa0381ba9>] btrfs_cow_block+0xfe/0x146 [btrfs]
      [ 9792.654986]  [<ffffffffa03848b0>] btrfs_search_slot+0x14d/0x4b6 [btrfs]
      [ 9792.654997]  [<ffffffffa03b8562>] ? map_extent_buffer+0x6e/0xa8 [btrfs]
      [ 9792.655022]  [<ffffffffa03938e8>] btrfs_lookup_inode+0x2f/0x8f [btrfs]
      [ 9792.655025]  [<ffffffff8147afac>] ? _cond_resched+0xe/0x22
      [ 9792.655027]  [<ffffffff8147b892>] ? mutex_lock+0x29/0x50
      [ 9792.655039]  [<ffffffffa03d41b1>] btrfs_update_delayed_inode+0x72/0x137 [btrfs]
      [ 9792.655051]  [<ffffffffa03d4ea2>] btrfs_run_delayed_items+0x90/0xdb [btrfs]
      [ 9792.655062]  [<ffffffffa039a69b>] btrfs_commit_transaction+0x228/0x654 [btrfs]
      [ 9792.655064]  [<ffffffff8106e8da>] ? remove_wait_queue+0x3a/0x3a
      [ 9792.655075]  [<ffffffffa03a2fa5>] btrfs_evict_inode+0x14d/0x202 [btrfs]
      [ 9792.655077]  [<ffffffff81132bd6>] evict+0x71/0x111
      [ 9792.655079]  [<ffffffff81132de0>] iput+0x12a/0x132
      [ 9792.655081]  [<ffffffff8112aa3a>] do_unlinkat+0x106/0x155
      [ 9792.655083]  [<ffffffff81127b83>] ? path_put+0x1f/0x23
      [ 9792.655085]  [<ffffffff8109c53c>] ? audit_syscall_entry+0x145/0x171
      [ 9792.655087]  [<ffffffff81128410>] ? putname+0x34/0x36
      [ 9792.655090]  [<ffffffff8112b441>] sys_unlinkat+0x29/0x2b
      [ 9792.655092]  [<ffffffff81482c42>] system_call_fastpath+0x16/0x1b
      [ 9792.655093] ---[ end trace 02b696eb02b3f768 ]---
      
      This patch fix it by setting the reservation of the transaction handle to the
      correct one.
      Reported-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      19fd2949
  27. 04 6月, 2011 1 次提交
    • D
      btrfs: fix uninitialized variable warning · aa0467d8
      David Sterba 提交于
      With Linus' tree, today's linux-next build (powercp ppc64_defconfig)
      produced this warning:
      
      fs/btrfs/delayed-inode.c: In function 'btrfs_delayed_update_inode':
      fs/btrfs/delayed-inode.c:1598:6: warning: 'ret' may be used
      uninitialized in this function
      
      Introduced by commit 16cdcec7 ("btrfs: implement delayed inode items
      operation").
      
      This fixes a bug in btrfs_update_inode(): if the returned value from
      btrfs_delayed_update_inode is a nonzero garbage, inode stat data are not
      updated and several call paths may hit a BUG_ON or fail with strange
      code.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      aa0467d8