1. 02 8月, 2011 3 次提交
  2. 01 8月, 2011 1 次提交
    • J
      Btrfs: load the key from the dir item in readdir into a fake dentry · b4aff1f8
      Josef Bacik 提交于
      In btrfs we have 2 indexes for inodes.  One is for readdir, it's in this nice
      sequential order and works out brilliantly for readdir.  However if you use ls,
      it usually stat's each file it gets from readdir.  This is where the second
      index comes in, which is based on a hash of the name of the file.  So then the
      lookup has to lookup this index, and then lookup the inode.  The index lookup is
      going to be in random order (since its based on the name hash), which gives us
      less than stellar performance.  Since we know the inode location from the
      readdir index, I create a dummy dentry and copy the location key into
      dentry->d_fsdata.  Then on lookup if we have d_fsdata we use that location to
      lookup the inode, avoiding looking up the other directory index.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b4aff1f8
  3. 28 7月, 2011 4 次提交
    • C
      Btrfs: use the commit_root for reading free_space_inode crcs · 2cf8572d
      Chris Mason 提交于
      Now that we are using regular file crcs for the free space cache,
      we can deadlock if we try to read the free_space_inode while we are
      updating the crc tree.
      
      This commit fixes things by using the commit_root to read the crcs.  This is
      safe because we the free space cache file would already be loaded if
      that block group had been changed in the current transaction.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      2cf8572d
    • C
      Btrfs: stop using highmem for extent_buffers · a6591715
      Chris Mason 提交于
      The extent_buffers have a very complex interface where
      we use HIGHMEM for metadata and try to cache a kmap mapping
      to access the memory.
      
      The next commit adds reader/writer locks, and concurrent use
      of this kmap cache would make it even more complex.
      
      This commit drops the ability to use HIGHMEM with extent buffers,
      and rips out all of the related code.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a6591715
    • J
      Btrfs: fix enospc problems with delalloc · 9e0baf60
      Josef Bacik 提交于
      So I had this brilliant idea to use atomic counters for outstanding and reserved
      extents, but this turned out to be a bad idea.  Consider this where we have 1
      outstanding extent and 1 reserved extent
      
      Reserver				Releaser
      					atomic_dec(outstanding) now 0
      atomic_read(outstanding)+1 get 1
      atomic_read(reserved) get 1
      don't actually reserve anything because
      they are the same
      					atomic_cmpxchg(reserved, 1, 0)
      atomic_inc(outstanding)
      atomic_add(0, reserved)
      					free reserved space for 1 extent
      
      Then the reserver now has no actual space reserved for it, and when it goes to
      finish the ordered IO it won't have enough space to do it's allocation and you
      get those lovely warnings.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9e0baf60
    • J
      Btrfs: use find_or_create_page instead of grab_cache_page · a94733d0
      Josef Bacik 提交于
      grab_cache_page will use mapping_gfp_mask(), which for all inodes is set to
      GFP_HIGHUSER_MOVABLE.  So instead use find_or_create_page in all cases where we
      need GFP_NOFS so we don't deadlock.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      a94733d0
  4. 27 7月, 2011 1 次提交
  5. 26 7月, 2011 1 次提交
  6. 21 7月, 2011 1 次提交
  7. 20 7月, 2011 5 次提交
  8. 15 7月, 2011 3 次提交
    • M
      btrfs: Don't BUG_ON alloc_path errors in btrfs_read_locked_inode · 1748f843
      Mark Fasheh 提交于
      btrfs_iget() also needed an update so that errors from btrfs_locked_inode()
      are caught and bubbled back up.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      1748f843
    • M
      btrfs: Don't BUG_ON alloc_path errors in btrfs_truncate_inode_items · 0eb0e19c
      Mark Fasheh 提交于
      I moved the path allocation up a few lines to the top of the function so
      that we couldn't get into the state where we've dropped delayed items and
      the extent cache but fail due to -ENOMEM.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      0eb0e19c
    • M
      btrfs: don't BUG_ON btrfs_alloc_path() errors · d8926bb3
      Mark Fasheh 提交于
      This patch fixes many callers of btrfs_alloc_path() which BUG_ON allocation
      failure. All the sites that are fixed in this patch were checked by me to
      be fairly trivial to fix because of at least one of two criteria:
      
       - Callers of the function catch errors from it already so bubbling the
         error up will be handled.
       - Callers of the function might BUG_ON any nonzero return code in which
         case there is no behavior changed (but we still got to remove a BUG_ON)
      
      The following functions were updated:
      
      btrfs_lookup_extent, alloc_reserved_tree_block, btrfs_remove_block_group,
      btrfs_lookup_csums_range, btrfs_csum_file_blocks, btrfs_mark_extent_written,
      btrfs_inode_by_name, btrfs_new_inode, btrfs_symlink,
      insert_reserved_file_extent, and run_delalloc_nocow
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      d8926bb3
  9. 07 7月, 2011 1 次提交
    • M
      btrfs: fix oops when doing space balance · 149e2d76
      Miao Xie 提交于
      We need to make sure the data relocation inode doesn't go through
      the delayed metadata updates, otherwise we get an oops during balance:
      
      kernel BUG at fs/btrfs/relocation.c:4303!
      [SNIP]
      Call Trace:
       [<ffffffffa03143fd>] ? update_ref_for_cow+0x22d/0x330 [btrfs]
       [<ffffffffa0314951>] __btrfs_cow_block+0x451/0x5e0 [btrfs]
       [<ffffffffa031355d>] ? read_block_for_search+0x14d/0x4d0 [btrfs]
       [<ffffffffa0314beb>] btrfs_cow_block+0x10b/0x240 [btrfs]
       [<ffffffffa031acae>] btrfs_search_slot+0x49e/0x7a0 [btrfs]
       [<ffffffffa032d8af>] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
       [<ffffffff8147bf0e>] ? mutex_lock+0x1e/0x50
       [<ffffffffa0380cf1>] btrfs_update_delayed_inode+0x71/0x160 [btrfs]
       [<ffffffffa037ff27>] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs]
       [<ffffffffa0381cf8>] btrfs_run_delayed_items+0xe8/0x120 [btrfs]
       [<ffffffffa03365e0>] btrfs_commit_transaction+0x250/0x850 [btrfs]
       [<ffffffff810f91d9>] ? find_get_pages+0x39/0x130
       [<ffffffffa0336cd5>] ? join_transaction+0x25/0x250 [btrfs]
       [<ffffffff81081de0>] ? wake_up_bit+0x40/0x40
       [<ffffffffa03785fa>] prepare_to_relocate+0xda/0xf0 [btrfs]
       [<ffffffffa037f2bb>] relocate_block_group+0x4b/0x620 [btrfs]
       [<ffffffffa0334cf5>] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs]
       [<ffffffffa037fa43>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs]
       [<ffffffffa0368ec0>] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
       [<ffffffffa035e39b>] btrfs_relocate_chunk+0x8b/0x670 [btrfs]
       [<ffffffffa031303d>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs]
       [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
       [<ffffffffa031bea1>] ? btrfs_previous_item+0xb1/0x150 [btrfs]
       [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
       [<ffffffffa035f5aa>] btrfs_balance+0x21a/0x2b0 [btrfs]
       [<ffffffffa0368898>] btrfs_ioctl+0x798/0xd20 [btrfs]
       [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270
       [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0
       [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540
       [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0
       [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b
      [SNIP]
      RIP  [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs]
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      149e2d76
  10. 27 6月, 2011 1 次提交
    • M
      btrfs: fix inconsonant inode information · 2f7e33d4
      Miao Xie 提交于
      When iputting the inode, We may leave the delayed nodes if they have some
      delayed items that have not been dealt with. So when the inode is read again,
      we must look up the relative delayed node, and use the information in it to
      initialize the inode. Or we will get inconsonant inode information, it may
      cause that the same directory index number is allocated again, and hit the
      following oops:
      
      [ 5447.554187] err add delayed dir index item(name: pglog_0.965_0) into the
      insertion tree of the delayed node(root id: 262, inode id: 258, errno: -17)
      [ 5447.569766] ------------[ cut here ]------------
      [ 5447.575361] kernel BUG at fs/btrfs/delayed-inode.c:1301!
      [SNIP]
      [ 5447.790721] Call Trace:
      [ 5447.793191]  [<ffffffffa0641c4e>] btrfs_insert_dir_item+0x189/0x1bb [btrfs]
      [ 5447.800156]  [<ffffffffa0651a45>] btrfs_add_link+0x12b/0x191 [btrfs]
      [ 5447.806517]  [<ffffffffa0651adc>] btrfs_add_nondir+0x31/0x58 [btrfs]
      [ 5447.812876]  [<ffffffffa0651d6a>] btrfs_create+0xf9/0x197 [btrfs]
      [ 5447.818961]  [<ffffffff8111f840>] vfs_create+0x72/0x92
      [ 5447.824090]  [<ffffffff8111fa8c>] do_last+0x22c/0x40b
      [ 5447.829133]  [<ffffffff8112076a>] path_openat+0xc0/0x2ef
      [ 5447.834438]  [<ffffffff810c58e2>] ? __perf_event_task_sched_out+0x24/0x44
      [ 5447.841216]  [<ffffffff8103ecdd>] ? perf_event_task_sched_out+0x59/0x67
      [ 5447.847846]  [<ffffffff81121a79>] do_filp_open+0x3d/0x87
      [ 5447.853156]  [<ffffffff811e126c>] ? strncpy_from_user+0x43/0x4d
      [ 5447.859072]  [<ffffffff8111f1f5>] ? getname_flags+0x2e/0x80
      [ 5447.864636]  [<ffffffff8111f179>] ? do_getname+0x14b/0x173
      [ 5447.870112]  [<ffffffff8111f1b7>] ? audit_getname+0x16/0x26
      [ 5447.875682]  [<ffffffff8112b1ab>] ? spin_lock+0xe/0x10
      [ 5447.880882]  [<ffffffff81112d39>] do_sys_open+0x69/0xae
      [ 5447.886153]  [<ffffffff81112db1>] sys_open+0x20/0x22
      [ 5447.891114]  [<ffffffff813b9aab>] system_call_fastpath+0x16/0x1b
      
      Fix it by reusing the old delayed node.
      Reported-by: NJim Schutt <jaschut@sandia.gov>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Tested-by: NJim Schutt <jaschut@sandia.gov>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      2f7e33d4
  11. 25 6月, 2011 1 次提交
  12. 16 6月, 2011 1 次提交
  13. 11 6月, 2011 2 次提交
  14. 04 6月, 2011 1 次提交
  15. 27 5月, 2011 2 次提交
    • C
      fs: pass exact type of data dirties to ->dirty_inode · aa385729
      Christoph Hellwig 提交于
      Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
      anything else, so that the filesystem can track internally if it
      needs to push out a transaction for fdatasync or not.
      
      This is just the prototype change with no user for it yet.  I plan
      to push large XFS changes for the next merge window, and getting
      this trivial infrastructure in this window would help a lot to avoid
      tree interdependencies.
      
      Also remove incorrect comments that ->dirty_inode can't block.  That
      has been changed a long time ago, and many implementations rely on it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aa385729
    • C
      Btrfs: add mount -o auto_defrag · 4cb5300b
      Chris Mason 提交于
      This will detect small random writes into files and
      queue the up for an auto defrag process.  It isn't well suited to
      database workloads yet, but works for smaller files such as rpm, sqlite
      or bdb databases.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      4cb5300b
  16. 26 5月, 2011 3 次提交
  17. 24 5月, 2011 9 次提交
    • J
      fs/btrfs: Add missing btrfs_free_path · b0839166
      Julia Lawall 提交于
      Btrfs_alloc_path should be matched with btrfs_free_path in error-handling code.
      
      A simplified version of the semantic match that finds this problem is as
      follows: (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r exists@
      local idexpression struct btrfs_path * x;
      expression ra,rb;
      position p1,p2;
      @@
      
      x = btrfs_alloc_path@p1(...)
      ...  when != btrfs_free_path(x,...)
           when != if (...) { ... btrfs_free_path(x,...) ...}
           when != x = ra
      if(...) { ... when != x = rb
           when forall
           when != btrfs_free_path(x,...)
       \(return <+...x...+>; \| return@p2...; \) }
      
      @script:python@
      p1 << r.p1;
      p2 << r.p2;
      @@
      
      cocci.print_main("alloc",p1)
      cocci.print_secs("return",p2)
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b0839166
    • T
      Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item & btrfs_extend_item · 1cd30799
      Tsutomu Itoh 提交于
      Currently, btrfs_truncate_item and btrfs_extend_item returns only 0.
      So, the check by BUG_ON in the caller is unnecessary.
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      1cd30799
    • S
      27160b6b
    • J
      Btrfs: leave spinning on lookup and map the leaf · d90c7321
      Josef Bacik 提交于
      On lookup we only want to read the inode item, so leave the path spinning.  Also
      we're just wholesale reading the leaf off, so map the leaf so we don't do a
      bunch of kmap/kunmaps.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      d90c7321
    • J
      Btrfs: don't always do readahead · 026fd317
      Josef Bacik 提交于
      Our readahead is sort of sloppy, and really isn't always needed.  For example if
      ls is doing a stating ls (which is the default) it's going to stat in non-disk
      order, so if say you have a directory with a stupid amount of files, readahead
      is going to do nothing but waste time in the case of doing the stat.  Taking the
      unconditional readahead out made my test go from 57 minutes to 36 minutes.  This
      means that everywhere we do loop through the tree we want to make sure we do set
      path->reada properly, so I went through and found all of the places where we
      loop through the path and set reada to 1.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      026fd317
    • J
      Btrfs: kill BTRFS_I(inode)->block_group · d82a6f1d
      Josef Bacik 提交于
      Originally this was going to be used as a way to give hints to the allocator,
      but frankly we can get much better hints elsewhere and it's not even used at all
      for anything usefull.  In addition to be completely useless, when we initialize
      an inode we try and find a freeish block group to set as the inodes block group,
      and with a completely full 40gb fs this takes _forever_, so I imagine with say
      1tb fs this is just unbearable.  So just axe the thing altoghether, we don't
      need it and it saves us 8 bytes in the inode and saves us 500 microseconds per
      inode lookup in my testcase.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      d82a6f1d
    • J
      Btrfs: fix how we do space reservation for truncate · fcb80c2a
      Josef Bacik 提交于
      The ceph guys keep running into problems where we have space reserved in our
      orphan block rsv when freeing it up.  This is because they tend to do snapshots
      alot, so their truncates tend to use a bunch of space, so when we go to do
      things like update the inode we have to steal reservation space in order to make
      the reservation happen.  This happens because truncate can use as much space as
      it freaking feels like, but we still have to hold space for removing the orphan
      item and updating the inode, which will definitely always happen.  So in order
      to fix this we need to split all of the reservation stuf up.  So with this patch
      we have
      
      1) The orphan block reserve which only holds the space for deleting our orphan
      item when everything is over.
      
      2) The truncate block reserve which gets allocated and used specifically for the
      space that the truncate will use on a per truncate basis.
      
      3) The transaction will always have 1 item's worth of data reserved so we can
      update the inode normally.
      
      Hopefully this will make the ceph problem go away.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      fcb80c2a
    • J
      Btrfs: take away the num_items argument from btrfs_join_transaction · 7a7eaa40
      Josef Bacik 提交于
      I keep forgetting that btrfs_join_transaction() just ignores the num_items
      argument, which leads me to sending pointless patches and looking stupid :).  So
      just kill the num_items argument from btrfs_join_transaction and
      btrfs_start_ioctl_transaction, since neither of them use it.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      7a7eaa40
    • J
      Btrfs: make sure to use the delalloc reserve when filling delalloc · 74b21075
      Josef Bacik 提交于
      In the prealloc filling code and compressed code we don't set trans->block_rsv
      to the delalloc block reserve properly, which is going to make us use metadata
      from the wrong pool, this patch fixes that.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      74b21075
反馈
建议
客服 返回
顶部